Draft: ghc-prim: Strictly in chunks of 32 characters
Laziness is expensive, particularly for computationally light operations like unpackCString#
. Currently unpacking a single 8-bit character required the allocation of one two-word closure, an indirect jump, and the allocation of a two-word C#
constructor (and perhaps some cache misses if it has been a while since we last unpacked from the Addr#
in question). This is pretty awful for something that is so frequently used.
This patch reworks the unpackCString#
and unpackAppendCString#
to rather unpack in strict chunks (currently of 32 characters). This has the effect of amortising the expensive lazy call across many characters. Moreover, allocation is cheap so even if the characters end up not being needed the cost is negligible.
This improves compilation times of Cabal
by 0.5%. Nofib test running.
I have a set of benchmarks for characterising various unpacking strategies here.