!12582: Improve sharing in `ModIface` during compilation · Merge requests · Glasgow Haskell Compiler / GHC

Hannes Siebenhandl requested to merge wip/fendor/ghc-iface-sharing-avoid-reserialisation into master May 06, 2024

Generation of ModIface can contain many duplicates, as demonstrated in #24540, especially when -fwrite-if-simplified-core was used, which embeds core expressions into the interface file.

To improve the sharing of ModIface, and thus reducing the memory footprint during compilation, we can use the serialisation mechanism of ModIface to achieve the same memory footprint improvement as in !12371 (closed), but for .hi file generation (again, assuming -fwrite-if-simplified-core is used).

Avoid unneccessarily re-serialising the `ModIface`

To reduce memory usage of ModIface, we serialise ModIface to an in-memory byte array, which implicitly shares duplicated values.

This serailised byte array can be reused to avoid work when we actually write the ModIface to disk. We introduce a new field to ModIface which allows us to save the byte array, and write it to disk if the ModIface wasn't changed after the initial serialisation.

This requires us to change absolute offsets, for example to jump to the deduplication table for Name or FastString with relative offsets, as the deduplication byte array doesn't contain header information, such as fingerprints. To allow us to dump the binary blob to disk, we need to replace all absolute offsets with relative ones.

This leads to new primitives for ModIface, which help to construct relative offsets.

Improve sharing of duplicated values in `ModIface`

As a ModIface contains often duplicated values that are not necessarily shared, we improve sharing by serialising the ModIface to an in-memory byte array. Serialisation uses deduplication tables, and deserialisation implicitly shares duplicated values.

This helps reducing the peak memory usage while compiling in --make mode. The peak memory usage is especially reduced when generating interface files with core expressions (-fwrite-if-simplified-core).

On agda, this reduces the peak memory usage:

2.2 GB to 1.9 GB for a ghci session.

On lib:Cabal, we report:

570 MB to 500 MB for a ghci session
790 MB to 667 MB for compiling lib:Cabal with ghc

There is a small impact on execution time, around 2% on the agda code base.

However, this is mitigated by avoiding reserialisation to only 1% run-time diff.

Benchmarks

We ran some more benchmarks on the agda codebase. There are two scenarios:

normal, simply load a full ghci session
- Only included to show there isn't a regression.
- ghci -fforce-recomp +RTS -i0.5
cold, load a full ghci session with -fwrite-if-simplified-core with no existing interface files.
- ghci -fforce-recomp -fwrite-if-simplified-core +RTS -i0.5

Each scenario was repeated 5 times and we report the mimimal numbers.

branch	mode	time	max live bytes (GB)	peak (MB)
head	cold	46.8 s	1.05 GB	2211 MB
PR	cold	47.4 s	0.668 GB	1718 MB
PR without reuse	cold	48.3 s	0.674 GB	1538 MB

We show there is no performance regression for the most common case:

branch	mode	time	max live bytes (GB)	peak (MB)
head	normal	42.2 s	0.464 GB	1288 MB
PR	normal	42.9 s	0.438 GB	1250 MB
PR without reuse	normal	42.4 s	0.472 GB	1285 MB

Closes #24723

Currently, this PR is stacked on top of !12371 (closed), as it requires its changes. Thus, currently this is rather difficult to review in isolation.

Improve sharing in `ModIface` during compilation

Avoid unneccessarily re-serialising the ModIface

Improve sharing of duplicated values in ModIface

Benchmarks

Merge request reports

Avoid unneccessarily re-serialising the `ModIface`

Improve sharing of duplicated values in `ModIface`