Skip to content

Deserialising `mi_extra_decls` introduces a lot of duplication

Summary

When we deserialise mi_extra_decls from disk (after compiling with -fbyte-code-and-object-code for example), we introduce many duplications of the same IfaceTyCon. A two-level census with ghc-debug on the agda code base shows that we have 8_654_045 instances IfaceTyCon, or 8.5 million alive.

However, when we look at the actual IfaceTyCons, we can see that a lot of them are duplicates:

#ConstrName,#NamePtr,#Element,#UniqueNumberOfElements,#`IfaceTyConInfo`-witnesses
IfaceTyCon,0x77a81438bcb0,351211,1,[0x77a8170abd98]
IfaceTyCon,0x77a81462b308,406823,2,[0x77a8170abd98,0x77a8170abed8]
IfaceTyCon,0x77a814388458,553151,1,[0x77a8170abed8]
IfaceTyCon,0x77a814629180,1685764,1,[0x77a8170abed8]

What this essentially means is that we have 1685764 instances of the form IfaceTyCon 0x77a814629180 0x77a8170abed8 on the heap alive.

In fact, there seem to be at most 6000 unique IfaceTyCons (see the attached data).

We know that we have roughly 200MB of IfaceTyCons alive:

ghc-9.9-inplace:GHC.Iface.Type:IfaceTyCon[ghc-9.9-inplace:GHC.Types.Name:Name,THUNK_1_0]:207697080:8654045:24:24.0

Full data: uniqueIfaceTyCon_sorted.txt

Solutions

Reducing the duplication can be achieved in a number of ways.

One idea is to introduce the deduplication logic to the serialisation of the ModIface itself. Similarly to how we deduplicate Name and FastString, we can think of a IfaceTyCon deduplication table. As we don't know yet what data we might want to deduplicate in the future, it would make sense to refactor the serialisation logic of ModIface to remove the special logic for Name and FastString deduplication and replace it with a generic deduplication logic approach. This would allow us to change what we deduplicate without having to rewrite everything whenever we come up with something else to deduplicate. As a positive side effects, this should also noticeably reduce the size of the ModIface on disk.

Edited by Hannes Siebenhandl
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information