Thunks in `IfaceTyCon` retain a large amount of memory
Summary
In heap analysis, the IfaceTyCon
constructor in mi_extra_decls
retains a thunk to IfaceTyConInfo
. This added indirection seems to defeat sharing for the two most common instances of IfaceTyConInfo
which aims to reduce memory usage quite considerably.
All numbers and pictures have been obtained running ghci -fbyte-code-and-object-code
using ghc-debug
and -hT
profiling. GHC itself has been compiled with the flavour perf+ipe
.
In a two-level census heap traversal, we have the following lines (just an excerpt):
key;total;count;max;avg
ghc-9.9-inplace:GHC.Iface.Type:IfaceTyConInfo[ghc-9.9-inplace:Language.Haskell.Syntax.Type:NotPromoted,ghc-9.9-inplace:GHC.Iface.Type:IfaceNormalTyCon];141466584;5894441;24;24.0
ghc-9.9-inplace:GHC.Iface.Type:IfaceTyConInfo[ghc-9.9-inplace:Language.Haskell.Syntax.Type:IsPromoted,ghc-9.9-inplace:GHC.Iface.Type:IfaceNormalTyCon];64949568;2706232;24;24.0
Which hints that we have 5894441 occurrences of IfaceTyConInfo NotPromoted IfaceNormalTyCon
and 2706232 of IfaceTyConInfo IsPromoted IfaceNormalTyCon
on the heap.
Further, we can observe the thunk existing in ghc-debug-brick
.
For reference, this the space behaviour of loading agda into ghci, where the bytecode has been serialised into the interface file:
and ghci +RTS -s -i0.1
5,839,944,688 bytes allocated in the heap
7,203,898,048 bytes copied during GC
1,963,423,488 bytes maximum residency (12 sample(s))
14,932,224 bytes maximum slop
3786 MiB total memory in use (0 MiB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 275 colls, 0 par 3.503s 3.514s 0.0128s 0.2590s
Gen 1 12 colls, 0 par 4.976s 4.995s 0.4162s 2.4095s
TASKS: 6 (1 bound, 5 peak workers (5 total), using -N1)
SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)
INIT time 0.009s ( 0.009s elapsed)
MUT time 3.265s ( 5.421s elapsed)
GC time 8.479s ( 8.509s elapsed)
EXIT time 0.001s ( 0.000s elapsed)
Total time 11.754s ( 13.940s elapsed)
Alloc rate 1,788,578,842 bytes per MUT second
Productivity 27.8% of total user, 38.9% of total elapsed
While it is a little bit unclear why only this field is a thunk, adding the bangs to IfaceTyCon
data IfaceTyCon = IfaceTyCon { ifaceTyConName :: !IfExtName
, ifaceTyConInfo :: !IfaceTyConInfo }
gives the following results for the two level census:
key;total;count;max;avg
ghc-9.9-inplace:GHC.Iface.Type:IfaceTyConInfo[ghc-9.9-inplace:Language.Haskell.Syntax.Type:IsPromoted,ghc-9.9-inplace:GHC.Iface.Type:IfaceNormalTyCon];24;1;24;24.0
ghc-9.9-inplace:GHC.Iface.Type:IfaceTyConInfo[ghc-9.9-inplace:Language.Haskell.Syntax.Type:NotPromoted,ghc-9.9-inplace:GHC.Iface.Type:IfaceNormalTyCon];24;1;24;24.0
This indicates that we now have only exactly one instance for the two common cases.
and more importantly:
ghci +RTS -s -i0.1
output:
6,114,619,032 bytes allocated in the heap
6,458,988,440 bytes copied during GC
1,330,672,456 bytes maximum residency (12 sample(s))
10,910,904 bytes maximum slop
2689 MiB total memory in use (0 MiB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 275 colls, 0 par 3.933s 3.945s 0.0143s 0.2844s
Gen 1 12 colls, 0 par 3.840s 3.854s 0.3212s 1.6629s
TASKS: 6 (1 bound, 5 peak workers (5 total), using -N1)
SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)
INIT time 0.010s ( 0.010s elapsed)
MUT time 3.235s ( 3.728s elapsed)
GC time 7.773s ( 7.799s elapsed)
EXIT time 0.000s ( 0.000s elapsed)
Total time 11.018s ( 11.537s elapsed)
Alloc rate 1,890,406,998 bytes per MUT second
Productivity 29.4% of total user, 32.3% of total elapsed
Environment
- GHC version used: HEAD
Optional:
- Operating System: Arch Linux
- System Architecture: x86_64