Improve memory efficiency of `UnlinkedBCO`

There are quite a number of UnlinkedBCOs alive during a GHCi session. For example, loading agda into GHCi, there are around 360050 instances of UnlinkedBCO.

Currently UnlinkedBCO is defined like this:

data UnlinkedBCO
   = UnlinkedBCO {
        unlinkedBCOName   :: !Name,
        unlinkedBCOArity  :: {-# UNPACK #-} !Int,
        unlinkedBCOInstrs :: !(UArray Int Word16),      -- insns
        unlinkedBCOBitmap :: !(UArray Int Word64),      -- bitmap
        unlinkedBCOLits   :: !(SizedSeq BCONPtr),       -- non-ptrs
        unlinkedBCOPtrs   :: !(SizedSeq BCOPtr)         -- ptrs
   }

Saving a couple of Words per UnlinkedBCO can quickly accumulate to tens of megabytes.

This issue discusses two improvements that are relatively non-invasive.

Unpack `UArray Int`

The PR !12142 (closed) implements this improvement

The UArray of unlinkedBCOInstrs and unlinkedBCOBitmap is never used, only its packing of Word16's and Word64's is required. Thus, we can replace UArray Int Word* with a wrapper around the underlying ByteArray# allowing us to save 6 Ints per UnlinkedBCO. In the example of agda, this saves us roughly 360050 * 6 Ints ~= 17282400 ~ 17MB. I am unsure whether an UnliftedNewtype removes even one more indirection, which would reduce the memory 7MB.

Flatten `SizedSeq`

The PR !12170 (closed) implements this improvement.

SizedSeq a is a simple wrapper around [a] that keeps track of the number of elements and supports efficient appending (by keeping the list in reverse). However, as such, the size is only needed once, and the trivial case of SizedSeq 0 [] isn't shared. To add insult to injury, lists are quite inefficient, and the unlinkedBCOLits and unlinkedBCOPtrs are only ever traversed and not filtered or iteratively consumed / constructed.

We show the size distributions of all SizedSeq's, which we obtained using ghc-debug scripts:

      #  Length of SizedSeq
 283147  0 
 198237  1 
 126478  2 
  69233  3 
  26596  4 
   5587  5 
   3903  6 
   2943  7

The majority of cases is the empty list, singleton and tuple, so this lack of sharing is noticeable. We cut off the end of the table, there are rarely larger lists. I am including the full list for completeness: unlinkedbco-sseq

Note, not all SizedSeq's are utilised by UnlinkedBCO, but most of them:

ghc-9.9-inplace:GHC.ByteCode.Types:UnlinkedBCO[ghc-9.9-inplace:GHC.Types.Name:Name,ARR_WORDS,ARR_WORDS,ghc-boot-9.9-inplace:GHC.Data.SizedSeq:SizedSeq,ghc-boot-9.9-inplace:GHC.Data.SizedSeq:SizedSeq]:25923600:360050:72:72.0
ghc-boot-9.9-inplace:GHC.Data.SizedSeq:SizedSeq[ghc-prim:GHC.Types::]:17478120:436953:40:40.0
ghc-boot-9.9-inplace:GHC.Data.SizedSeq:SizedSeq[ghc-prim:GHC.Types:[]]:11325920:283148:40:40.0

This says that there are 436953 + 283148 = 720101 SizedSeqs, and 360050 * 2 = 720100 are retained by UnlinkedBCO.

Thus, we propose to introduce a new data type that is specialised for this payload, called FlatBag which efficiently stores elements and where the empty case can be shared.

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information

Improve memory efficiency of `UnlinkedBCO`

Unpack UArray Int

Flatten SizedSeq

Unpack `UArray Int`

Flatten `SizedSeq`