Improve memory efficiency of `UnlinkedBCO`
There are quite a number of UnlinkedBCO
s alive during a GHCi session. For example, loading agda into GHCi, there are around 360050
instances of UnlinkedBCO
.
Currently UnlinkedBCO
is defined like this:
data UnlinkedBCO
= UnlinkedBCO {
unlinkedBCOName :: !Name,
unlinkedBCOArity :: {-# UNPACK #-} !Int,
unlinkedBCOInstrs :: !(UArray Int Word16), -- insns
unlinkedBCOBitmap :: !(UArray Int Word64), -- bitmap
unlinkedBCOLits :: !(SizedSeq BCONPtr), -- non-ptrs
unlinkedBCOPtrs :: !(SizedSeq BCOPtr) -- ptrs
}
Saving a couple of Words per UnlinkedBCO
can quickly accumulate to tens of megabytes.
This issue discusses two improvements that are relatively non-invasive.
UArray Int
Unpack The PR !12142 (closed) implements this improvement
The UArray
of unlinkedBCOInstrs
and unlinkedBCOBitmap
is never used, only its packing of Word16
's and Word64
's is required.
Thus, we can replace UArray Int Word*
with a wrapper around the underlying ByteArray#
allowing us to save 6 Ints per UnlinkedBCO
.
In the example of agda, this saves us roughly 360050 * 6 Ints ~= 17282400 ~ 17MB
. I am unsure whether an UnliftedNewtype
removes even one more indirection, which would reduce the memory 7MB
.
SizedSeq
Flatten The PR !12170 (closed) implements this improvement.
SizedSeq a
is a simple wrapper around [a]
that keeps track of the number of elements and supports efficient appending (by keeping the list in reverse).
However, as such, the size is only needed once, and the trivial case of SizedSeq 0 []
isn't shared. To add insult to injury, lists are quite inefficient, and the unlinkedBCOLits
and unlinkedBCOPtrs
are only ever traversed and not filtered or iteratively consumed / constructed.
We show the size distributions of all SizedSeq
's, which we obtained using ghc-debug
scripts:
# Length of SizedSeq
283147 0
198237 1
126478 2
69233 3
26596 4
5587 5
3903 6
2943 7
The majority of cases is the empty list, singleton and tuple, so this lack of sharing is noticeable. We cut off the end of the table, there are rarely larger lists. I am including the full list for completeness: unlinkedbco-sseq
Note, not all SizedSeq
's are utilised by UnlinkedBCO
, but most of them:
ghc-9.9-inplace:GHC.ByteCode.Types:UnlinkedBCO[ghc-9.9-inplace:GHC.Types.Name:Name,ARR_WORDS,ARR_WORDS,ghc-boot-9.9-inplace:GHC.Data.SizedSeq:SizedSeq,ghc-boot-9.9-inplace:GHC.Data.SizedSeq:SizedSeq]:25923600:360050:72:72.0
ghc-boot-9.9-inplace:GHC.Data.SizedSeq:SizedSeq[ghc-prim:GHC.Types::]:17478120:436953:40:40.0
ghc-boot-9.9-inplace:GHC.Data.SizedSeq:SizedSeq[ghc-prim:GHC.Types:[]]:11325920:283148:40:40.0
This says that there are 436953 + 283148 = 720101
SizedSeqs, and 360050 * 2 = 720100
are retained by UnlinkedBCO
.
Thus, we propose to introduce a new data type that is specialised for this payload, called FlatBag
which efficiently stores elements and where the empty case can be shared.