Skip to content

Improve sharing in `ModIface` during compilation

Generation of ModIface can contain many duplicates, as demonstrated in #24540, especially when -fwrite-if-simplified-core was used, which embeds core expressions into the interface file.

To improve the sharing of ModIface, and thus reducing the memory footprint during compilation, we can use the serialisation mechanism of ModIface to achieve the same memory footprint improvement as in !12371 (closed), but for .hi file generation (again, assuming -fwrite-if-simplified-core is used).

Avoid unneccessarily re-serialising the ModIface

To reduce memory usage of ModIface, we serialise ModIface to an in-memory byte array, which implicitly shares duplicated values.

This serailised byte array can be reused to avoid work when we actually write the ModIface to disk. We introduce a new field to ModIface which allows us to save the byte array, and write it to disk if the ModIface wasn't changed after the initial serialisation.

This requires us to change absolute offsets, for example to jump to the deduplication table for Name or FastString with relative offsets, as the deduplication byte array doesn't contain header information, such as fingerprints. To allow us to dump the binary blob to disk, we need to replace all absolute offsets with relative ones.

This leads to new primitives for ModIface, which help to construct relative offsets.

Improve sharing of duplicated values in ModIface

As a ModIface contains often duplicated values that are not necessarily shared, we improve sharing by serialising the ModIface to an in-memory byte array. Serialisation uses deduplication tables, and deserialisation implicitly shares duplicated values.

This helps reducing the peak memory usage while compiling in --make mode. The peak memory usage is especially reduced when generating interface files with core expressions (-fwrite-if-simplified-core).

On agda, this reduces the peak memory usage:

  • 2.2 GB to 1.9 GB for a ghci session.

On lib:Cabal, we report:

  • 570 MB to 500 MB for a ghci session
  • 790 MB to 667 MB for compiling lib:Cabal with ghc

There is a small impact on execution time, around 2% on the agda code base.

However, this is mitigated by avoiding reserialisation to only 1% run-time diff.

Benchmarks

We ran some more benchmarks on the agda codebase. There are two scenarios:

  • normal, simply load a full ghci session
    • Only included to show there isn't a regression.
    • ghci -fforce-recomp +RTS -i0.5
  • cold, load a full ghci session with -fwrite-if-simplified-core with no existing interface files.
    • ghci -fforce-recomp -fwrite-if-simplified-core +RTS -i0.5

Each scenario was repeated 5 times and we report the mimimal numbers.

branch mode time max live bytes (GB) peak (MB)
head cold 46.8 s 1.05 GB 2211 MB
PR cold 47.4 s 0.668 GB 1718 MB
PR without reuse cold 48.3 s 0.674 GB 1538 MB

We show there is no performance regression for the most common case:

branch mode time max live bytes (GB) peak (MB)
head normal 42.2 s 0.464 GB 1288 MB
PR normal 42.9 s 0.438 GB 1250 MB
PR without reuse normal 42.4 s 0.472 GB 1285 MB

Closes #24723

Currently, this PR is stacked on top of !12371 (closed), as it requires its changes. Thus, currently this is rather difficult to review in isolation.

Merge request reports