Skip to content

Fix leaks in --make mode when there are module loops

Matthew Pickering requested to merge wip/module-loops into master

Before (benchmark compiling ghc library with --make mode):






This patch fixes quite a tricky leak where we would end up retaining stale ModDetails due to rehydrating modules against non-finalised interfaces.

Loops with multiple boot files

It is possible for a module graph to have a loop (SCC, when ignoring boot files) which requires multiple boot files to break. In this case we must perform the necessary hydration steps before and after compiling modules which have boot files which are described above for corectness but also perform an additional hydration step at the end of the SCC to remove space leaks.

Consider the following example:

┌───────┐   ┌───────┐
│       │   │       │
│   A   │   │   B   │
│       │   │       │
└─────┬─┘   └───┬───┘
      │         │
 │                 │
 │        C        │
      │         │
 ┌────▼──┐  ┌───▼───┐
 │       │  │       │
 │ A-boot│  │ B-boot│
 │       │  │       │
 └───────┘  └───────┘

A, B and C live together in a SCC. Say we compile the modules in order A-boot, B-boot, C, A, B then when we compile A we will perform the hydration steps (because A has a boot file). Therefore C will be hydrated relative to A, and the ModDetails for A will reference C/A. Then when B is compiled C will be rehydrated again, and so B will reference C/A,B, its interface will be hydrated relative to both A and B. Now there is a space leak because say C is a very big module, there are now two different copies of ModDetails kept alive by modules A and B.

The way to avoid this space leak is to rehydrate an entire SCC together at the end of compilation so that all the ModDetails point to interfaces for .hs files. In this example, when we hydrate A, B and C together then both A and B will refer to C/A,B.

See #21900 (closed) for some more discussion.

In addition to this simple case, there is also the potential for a leak during parallel upsweep which is also fixed by this patch. Transcibed is Note [ModuleNameSet, efficiency and space leaks]

Note [ModuleNameSet, efficiency and space leaks]

During unsweep the results of compiling modules are placed into a MVar, to find the environment the module needs to compile itself in the MVar is consulted and the HomeUnitGraph is set accordingly. The reason we do this is that precisely tracking module dependencies and recreating the HUG from scratch each time is very expensive.

In serial mode (-j1), this all works out fine because a module can only be compiled after its dependencies have finished compiling and not interleaved with compiling module loops. Therefore when we create the finalised or no loop interfaces, the HUG only contains finalised interfaces.

In parallel mode, we have to be more careful because the HUG variable can contain non-finalised interfaces which have been started by another thread. In order to avoid a space leak where a finalised interface is compiled against a HPT which contains a non-finalised interface we have to restrict the HUG to only the visible modules.

The visible modules is recording in the ModuleNameSet, this is propagated upwards whilst compiling and explains which transitive modules are visible from a certain point. This set is then used to restrict the HUG before the module is compiled to only the visible modules and thus avoiding this tricky space leak.

Efficiency of the ModuleNameSet is of utmost importance because a union occurs for each edge in the module graph. Therefore the set is represented directly as an IntSet which provides suitable performance, even using a UniqSet (which is backed by an IntMap) is too slow. The crucial test of performance here is the time taken to a do a no-op build in --make mode.

See test "jspace" for an example which used to trigger this problem.

Fixes #21900 (closed)

Edited by Matthew Pickering

Merge request reports