Skip to content

Thunks in `IfaceTyCon` retain a large amount of memory

Summary

In heap analysis, the IfaceTyCon constructor in mi_extra_decls retains a thunk to IfaceTyConInfo. This added indirection seems to defeat sharing for the two most common instances of IfaceTyConInfo which aims to reduce memory usage quite considerably.

All numbers and pictures have been obtained running ghci -fbyte-code-and-object-code using ghc-debug and -hT profiling. GHC itself has been compiled with the flavour perf+ipe.

In a two-level census heap traversal, we have the following lines (just an excerpt):

key;total;count;max;avg
ghc-9.9-inplace:GHC.Iface.Type:IfaceTyConInfo[ghc-9.9-inplace:Language.Haskell.Syntax.Type:NotPromoted,ghc-9.9-inplace:GHC.Iface.Type:IfaceNormalTyCon];141466584;5894441;24;24.0
ghc-9.9-inplace:GHC.Iface.Type:IfaceTyConInfo[ghc-9.9-inplace:Language.Haskell.Syntax.Type:IsPromoted,ghc-9.9-inplace:GHC.Iface.Type:IfaceNormalTyCon];64949568;2706232;24;24.0

Which hints that we have 5894441 occurrences of IfaceTyConInfo NotPromoted IfaceNormalTyCon and 2706232 of IfaceTyConInfo IsPromoted IfaceNormalTyCon on the heap.

Further, we can observe the thunk existing in ghc-debug-brick.

image

For reference, this the space behaviour of loading agda into ghci, where the bytecode has been serialised into the interface file: image

and ghci +RTS -s -i0.1


   5,839,944,688 bytes allocated in the heap
   7,203,898,048 bytes copied during GC
   1,963,423,488 bytes maximum residency (12 sample(s))
      14,932,224 bytes maximum slop
            3786 MiB total memory in use (0 MiB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0       275 colls,     0 par    3.503s   3.514s     0.0128s    0.2590s
  Gen  1        12 colls,     0 par    4.976s   4.995s     0.4162s    2.4095s

  TASKS: 6 (1 bound, 5 peak workers (5 total), using -N1)

  SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

  INIT    time    0.009s  (  0.009s elapsed)
  MUT     time    3.265s  (  5.421s elapsed)
  GC      time    8.479s  (  8.509s elapsed)
  EXIT    time    0.001s  (  0.000s elapsed)
  Total   time   11.754s  ( 13.940s elapsed)

  Alloc rate    1,788,578,842 bytes per MUT second

  Productivity  27.8% of total user, 38.9% of total elapsed

While it is a little bit unclear why only this field is a thunk, adding the bangs to IfaceTyCon

data IfaceTyCon = IfaceTyCon { ifaceTyConName :: !IfExtName
                             , ifaceTyConInfo :: !IfaceTyConInfo }

gives the following results for the two level census:

key;total;count;max;avg
ghc-9.9-inplace:GHC.Iface.Type:IfaceTyConInfo[ghc-9.9-inplace:Language.Haskell.Syntax.Type:IsPromoted,ghc-9.9-inplace:GHC.Iface.Type:IfaceNormalTyCon];24;1;24;24.0
ghc-9.9-inplace:GHC.Iface.Type:IfaceTyConInfo[ghc-9.9-inplace:Language.Haskell.Syntax.Type:NotPromoted,ghc-9.9-inplace:GHC.Iface.Type:IfaceNormalTyCon];24;1;24;24.0

This indicates that we now have only exactly one instance for the two common cases.

and more importantly:

image

ghci +RTS -s -i0.1 output:

   6,114,619,032 bytes allocated in the heap
   6,458,988,440 bytes copied during GC
   1,330,672,456 bytes maximum residency (12 sample(s))
      10,910,904 bytes maximum slop
            2689 MiB total memory in use (0 MiB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0       275 colls,     0 par    3.933s   3.945s     0.0143s    0.2844s
  Gen  1        12 colls,     0 par    3.840s   3.854s     0.3212s    1.6629s

  TASKS: 6 (1 bound, 5 peak workers (5 total), using -N1)

  SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

  INIT    time    0.010s  (  0.010s elapsed)
  MUT     time    3.235s  (  3.728s elapsed)
  GC      time    7.773s  (  7.799s elapsed)
  EXIT    time    0.000s  (  0.000s elapsed)
  Total   time   11.018s  ( 11.537s elapsed)

  Alloc rate    1,890,406,998 bytes per MUT second

  Productivity  29.4% of total user, 32.3% of total elapsed

Environment

  • GHC version used: HEAD

Optional:

  • Operating System: Arch Linux
  • System Architecture: x86_64
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information