StgToByteCode does not support external references to top-level strings, which causes linker errors with -O -fprefer-byte-code
Given the following three modules, running ghc --make -O1 -fwrite-if-simplfied-core -fbyte-code-and-object-code -fprefer-byte-code A.hs
triggers a linker error:
{-# LANGUAGE TemplateHaskell #-}
module A where
import B
$(foo `seq` pure [])
module B where
import C
foo :: String
foo = f ""
{-# NOINLINE foo #-}
module C where
f :: String -> String
f = ("a" ++)
error: GHC.ByteCode.Linker.lookupCE
During interactive linking, GHCi couldn't find the following symbol:
C_f1_closure
No such symbol exists. Compiling with -ddump-simpl
reveals that C.f1
is a top-level Addr#
literal:
C.f1 :: GHC.Prim.Addr#
C.f1 = "a"#
f :: String -> String
f = GHC.CString.unpackAppendCString# C.f1
In StgToCmm, this becomes a symbol named C_f1_bytes
that points to a static data block. In StgToByteCode, top-level Addr#
literals do not result in distinct linkables at all—pointers into a malloc
-allocated region of memory containing all of the top-level Addr#
literals in the module are embedded directly the instruction stream.
The strategy used by StgToByteCode works fine when compiling module C
to bytecode, but it creates trouble when compiling module B
, because GHC is happy to inline C.f
into B.foo
. This results in STG code that references C.f1
directly, which StgToByteCode cannot cope with. It assumes all external references are references to lifted values (and indeed, the only exception is top-level Addr#
bindings, as mentioned by Note [Core top-level string literals]
in GHC.Core
), which is naturally why it tries to link against a non-existent C_f1_closure
symbol. Note that there are really two distinct problems here:
-
The most immediate issue is that StgToByteCode assumes all external references are to closures. Since the simplifier can produce these references, StgToByteCode really ought to handle external references to
Addr#
specially, just as StgToCmm does here. -
Unfortunately, generating a reference to a top-level
Addr#
binding that lives in another module that has itself been compiled to bytecode is currently impossible. Loaded BCOs are recorded in aClosureEnv
, and as the name suggests, it only contains closures. The addresses ofAddr#
literals are not actually recorded anywhere in fully-assembled BCOs, so they cannot be linked against.
Historically, this has not been an issue, since top-level Addr#
literals do not exist at the Haskell level, only in Core, so code compiled with -O0
does not include any of these references and therefore does not have this problem. But the addition of -fbyte-code-and-object-code
means we now sometimes run StgToByteCode on optimized Core, so bytecode ought to support this properly.