JS: linker uses too much memory
I'm investigating the memory consumption of the JavaScript linker.
If we have a large result, for example something that uses the ghc
package, the linker uses over 10GB of memory. The testsuite has various test cases that run into this. I'm using the PartialDownsweep
test to investigate. Unfortunately my laptop only has 16GB of RAM, and when running with profiling I have to stop the process a bit early to avoid it running forever, swapping (I hardcoded a limit of 450 modules to link before exiting). I'll have a faster machine with 32GB next week.
The linking procedure:
- The linker loads all archives and object files. Loads the dependency sections of each object file and finds all blocks that are reachable from the roots (
main
and some things used directly by the RTS. The information of what exactly to link is stored in theLinkPlan
- The linker loads all archives, extracts the blocks that are in the
LinkPlan
to JS AST, in addition to the data. Then it applies a link-time optimization pass, followed by rendering the data structures to JavaScript and writing them out to a file.
Found so far:
- It's the second step that runs out of memory, it seems that just extracting all the modules runs out of memory, even without rendering anything at all (having
extractBlocks
return an emptyModuleCode
) -
deepseq
forcing theLinkPlan
still gets the same result, with only the second stage of linking running out of memory - The first step allocates almost all of the
FastStrings
, there is a large amount of memory being consumed by theFastString
table, but it's not the reason for 10GB+ of memory. - Disabling the
ArchiveState
cache doesn't solve the problem (loading each archive again when an object inside it is accessed), also doesn't make it worse. It's probably not a good idea to keep the archives from all packages loaded while linking, so I'll make some changes to the caching strategy, but this is not what causes the 10GB+ of memory usage - Type profiling reveals a lot of
ARR_WORDS
,ByteString
,FastString
being retained. - Infotable profiling reveals a lot of data being retained related to the deserializer
- There is about 3.2GB of live data at the end of the run, and the heap is about 12GB. At the end of the run there is a growing difference between the data in "blocks" (~8GB) and the full heap size (~12GB). The difference between the live data and blocks is mostly explained by the copying garbage collector, but the other data I don't know.
Some suspects that I'm investigating:
-
ByteString
s that are a small substring of a large backing buffer, keeping the whole buffer in memory - Unevaluated deserialized data that keeps byte arrays alive
- the
lazyGet
/lazyPut
inStgToJS.Object.getObjBlock
/StgToJS.Object.putObjBLock
respectively.