Inherent space leak in upsweep
Consider compiling a 10 000 module project.
Each time we compile a module, we produce a HomeModInfo
which contains a ModIface
and a ModDetails
.
- The
ModIface
is a flat, forced representation of the interface. This contains no thunks nor loops. - The
ModDetails
is hydrated from theModIface
and contains a reference to the current HscEnv.
The HscEnv
contains a reference to the HomePackageTable
at the point the module was compiled.
This is the source of the leak, as each time we compile a module, a new dependency is added to the HomePackageTable
which causes part of it to
be reallocated. Note that the leak is due to the structure of the HomePackageTable
rather than the values it contains.
Therefore each module retains a reference to a unique HomePackageTable
.
For 10 000 modules we can estimate the total size of the maps to be at most 50005000. Each Bin
constructor is 64 bytes
, which amounts to 3.2GB of allocations just from the structure of the maps. In practice we have seen this account for about 1.8G.
Two possible solutions:
- We need to avoid each
ModDetails
retaining a reference to a uniqueHscEnv
. At some point we should rehydrate the earlier interfaces with the same HPT as later interfaces. This can be achieved by doing the rehydration every 100 modules or so (but the cost is still obviously non-linear, even if rehydration is not expensive). - Maintaining the HPT as an immutable structure leads to this catastrophic loss of sharing. If the HPT was mutable, and its fields read using
IO
operations, then we could maintain a single global shared map which eachModDetails
could consult when needed. This would structure the HPT in a similar manner to the EPS.
Anyone with any thoughts about what is best to do?