Improve sharing in `ModIface` during compilation
Generation of ModIface
can contain many duplicates, as demonstrated in #24540, especially when -fwrite-if-simplified-core
was used, which embeds core expressions into the interface file.
To improve the sharing of ModIface
, and thus reducing the memory footprint during compilation, we can use the serialisation mechanism of ModIface
to achieve the same memory footprint improvement as in !12371 (closed), but for .hi
file generation (again, assuming -fwrite-if-simplified-core
is used).
ModIface
Improve sharing of duplicated values in As a ModIface
contains often duplicated values that are not
necessarily shared, we improve sharing by serialising the ModIface
to an in-memory byte array. Serialisation uses deduplication tables, and
deserialisation implicitly shares duplicated values.
This helps reducing the peak memory usage while compiling in
--make
mode. The peak memory usage is especially reduced when
generating interface files with core expressions
(-fwrite-if-simplified-core
).
On agda, this reduces the peak memory usage:
-
2.2 GB
to1.9 GB
for a ghci session.
On lib:Cabal
, we report:
-
570 MB
to500 MB
for a ghci session -
790 MB
to667 MB
for compilinglib:Cabal
with ghc
There is a small impact on execution time, around 2% on the agda code base.
However, this is mitigated by avoiding reserialisation to only 1% run-time diff.
ModIface
Avoid unneccessarily re-serialising the To reduce memory usage of ModIface
, we serialise ModIface
to an
in-memory byte array, which implicitly shares duplicated values.
This serailised byte array can be reused to avoid work when we actually
write the ModIface
to disk.
We introduce a new field to ModIface
which allows us to save the byte
array, and write it to disk if the ModIface
wasn't changed after the
initial serialisation.
This requires us to change absolute offsets, for example to jump to the
deduplication table for Name
or FastString
with relative offsets, as
the deduplication byte array doesn't contain header information, such as
fingerprints.
To allow us to dump the binary blob to disk, we need to replace all
absolute offsets with relative ones.
This leads to new primitives for ModIface
, which help to construct
relative offsets.
Benchmarks
We ran some more benchmarks on the agda codebase. There are two scenarios:
-
normal
, simply load a fullghci
session- Only included to show there isn't a regression.
ghci -fforce-recomp +RTS -i0.5
-
cold
, load a fullghci
session with-fwrite-if-simplified-core
with no existing interface files.ghci -fforce-recomp -fwrite-if-simplified-core +RTS -i0.5
Each scenario was repeated 5 times and we report the mimimal numbers.
branch | mode | time | max live bytes (GB) | peak (MB) |
---|---|---|---|---|
head | cold | 46.8 s | 1.05 GB | 2211 MB |
PR | cold | 47.4 s | 0.668 GB | 1718 MB |
PR without reuse | cold | 48.3 s | 0.674 GB | 1538 MB |
We show there is no performance regression for the most common case:
branch | mode | time | max live bytes (GB) | peak (MB) |
---|---|---|---|---|
head | normal | 42.2 s | 0.464 GB | 1288 MB |
PR | normal | 42.9 s | 0.438 GB | 1250 MB |
PR without reuse | normal | 42.4 s | 0.472 GB | 1285 MB |
Closes #24723 (closed)
Currently, this PR is stacked on top of !12371 (closed), as it requires its changes. Thus, currently this is rather difficult to review in isolation.