Improve sharing in `ModIface` during compilation
Generation of ModIface can contain many duplicates, as demonstrated in #24540, especially when -fwrite-if-simplified-core was used, which embeds core expressions into the interface file.
To improve the sharing of ModIface, and thus reducing the memory footprint during compilation, we can use the serialisation mechanism of ModIface to achieve the same memory footprint improvement as in !12371 (closed), but for .hi file generation (again, assuming -fwrite-if-simplified-core is used).
Improve sharing of duplicated values in ModIface
As a ModIface contains often duplicated values that are not
necessarily shared, we improve sharing by serialising the ModIface
to an in-memory byte array. Serialisation uses deduplication tables, and
deserialisation implicitly shares duplicated values.
This helps reducing the peak memory usage while compiling in
--make mode. The peak memory usage is especially reduced when
generating interface files with core expressions
(-fwrite-if-simplified-core).
On agda, this reduces the peak memory usage:
-
2.2 GBto1.9 GBfor a ghci session.
On lib:Cabal, we report:
-
570 MBto500 MBfor a ghci session -
790 MBto667 MBfor compilinglib:Cabalwith ghc
There is a small impact on execution time, around 2% on the agda code base.
However, this is mitigated by avoiding reserialisation to only 1% run-time diff.
Avoid unneccessarily re-serialising the ModIface
To reduce memory usage of ModIface, we serialise ModIface to an
in-memory byte array, which implicitly shares duplicated values.
This serailised byte array can be reused to avoid work when we actually
write the ModIface to disk.
We introduce a new field to ModIface which allows us to save the byte
array, and write it to disk if the ModIface wasn't changed after the
initial serialisation.
This requires us to change absolute offsets, for example to jump to the
deduplication table for Name or FastString with relative offsets, as
the deduplication byte array doesn't contain header information, such as
fingerprints.
To allow us to dump the binary blob to disk, we need to replace all
absolute offsets with relative ones.
This leads to new primitives for ModIface, which help to construct
relative offsets.
Benchmarks
We ran some more benchmarks on the agda codebase. There are two scenarios:
-
normal, simply load a fullghcisession- Only included to show there isn't a regression.
ghci -fforce-recomp +RTS -i0.5
-
cold, load a fullghcisession with-fwrite-if-simplified-corewith no existing interface files.ghci -fforce-recomp -fwrite-if-simplified-core +RTS -i0.5
Each scenario was repeated 5 times and we report the mimimal numbers.
| branch | mode | time | max live bytes (GB) | peak (MB) |
|---|---|---|---|---|
| head | cold | 46.8 s | 1.05 GB | 2211 MB |
| PR | cold | 47.4 s | 0.668 GB | 1718 MB |
| PR without reuse | cold | 48.3 s | 0.674 GB | 1538 MB |
We show there is no performance regression for the most common case:
| branch | mode | time | max live bytes (GB) | peak (MB) |
|---|---|---|---|---|
| head | normal | 42.2 s | 0.464 GB | 1288 MB |
| PR | normal | 42.9 s | 0.438 GB | 1250 MB |
| PR without reuse | normal | 42.4 s | 0.472 GB | 1285 MB |
Closes #24723 (closed)
Currently, this PR is stacked on top of !12371 (closed), as it requires its changes. Thus, currently this is rather difficult to review in isolation.