Skip to content

Do not make CAFs from literal strings

Currently (as I discovered in #15038 (closed)), we get the following code for GHC.Exception.Base.patError:

lvl2_r3y3 :: [Char]
[GblId]
lvl2_r3y3 = unpackCString# lvl1_r3y2

-- RHS size: {terms: 7, types: 6, coercions: 2, joins: 0/0}
patError :: forall a. Addr# -> a
[GblId, Arity=1, Str=<B,U>x, Unf=OtherCon []]
patError
  = \ (@ a_a2kh) (s_a1Pi :: Addr#) ->
      raise#
        @ SomeException
        @ 'LiftedRep
        @ a_a2kh
        (Control.Exception.Base.$fExceptionPatternMatchFail_$ctoException
           ((untangle s_a1Pi lvl2_r3y3)
            `cast` (Sym (Control.Exception.Base.N:PatternMatchFail[0])
                    :: (String :: *) ~R# (PatternMatchFail :: *))))

That stupid lvl2_r3y3 :: String is a CAF, and hence patError has CAF-refs, and hence so does any function that calls patError, and any function that calls them.

That's bad! Lots more CAF entries in SRTs, lots more work traversing those SRTs in the garbage collector. And for what? To share the work of unpacking a C string! This is nuts.

What to do?

  1. Somehow refrain from floating unpackCSTring# lit to top level, even if you could otherwise do so. But that seems very ad-hoc, and it make the function bigger and less inlinable.

  2. Treat a top level definition

    x :: [Char]
    x = unpackCString# y

    as NOT a CAF, and make it single-entry so that the thunk is not updated. Then every use of x will unpack the string afresh, which is probably a good idea anyhow.

    I like this more. It would be implemented somewhere in the code generator.

Edited by Ömer Sinan Ağacan
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information