Skip to content
GitLab
Projects Groups Topics Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
  • Register
  • Sign in
  • GHC GHC
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributor statistics
    • Graph
    • Compare revisions
    • Locked files
  • Issues 5.6k
    • Issues 5.6k
    • List
    • Boards
    • Service Desk
    • Milestones
    • Iterations
  • Merge requests 663
    • Merge requests 663
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Artifacts
    • Schedules
    • Test cases
  • Deployments
    • Deployments
    • Releases
  • Packages and registries
    • Packages and registries
    • Model experiments
  • Analytics
    • Analytics
    • CI/CD
    • Code review
    • Insights
    • Issue
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • Glasgow Haskell CompilerGlasgow Haskell Compiler
  • GHCGHC
  • Issues
  • #22376

StgToByteCode does not support external references to top-level strings, which causes linker errors with -O -fprefer-byte-code

Given the following three modules, running ghc --make -O1 -fwrite-if-simplfied-core -fbyte-code-and-object-code -fprefer-byte-code A.hs triggers a linker error:

{-# LANGUAGE TemplateHaskell #-}
module A where
import B

$(foo `seq` pure [])
module B where
import C

foo :: String
foo = f ""
{-# NOINLINE foo #-}
module C where

f :: String -> String
f = ("a" ++)
error: GHC.ByteCode.Linker.lookupCE
During interactive linking, GHCi couldn't find the following symbol:
  C_f1_closure

No such symbol exists. Compiling with -ddump-simpl reveals that C.f1 is a top-level Addr# literal:

C.f1 :: GHC.Prim.Addr#
C.f1 = "a"#

f :: String -> String
f = GHC.CString.unpackAppendCString# C.f1

In StgToCmm, this becomes a symbol named C_f1_bytes that points to a static data block. In StgToByteCode, top-level Addr# literals do not result in distinct linkables at all—pointers into a malloc-allocated region of memory containing all of the top-level Addr# literals in the module are embedded directly the instruction stream.

The strategy used by StgToByteCode works fine when compiling module C to bytecode, but it creates trouble when compiling module B, because GHC is happy to inline C.f into B.foo. This results in STG code that references C.f1 directly, which StgToByteCode cannot cope with. It assumes all external references are references to lifted values (and indeed, the only exception is top-level Addr# bindings, as mentioned by Note [Core top-level string literals] in GHC.Core), which is naturally why it tries to link against a non-existent C_f1_closure symbol. Note that there are really two distinct problems here:

  1. The most immediate issue is that StgToByteCode assumes all external references are to closures. Since the simplifier can produce these references, StgToByteCode really ought to handle external references to Addr# specially, just as StgToCmm does here.

  2. Unfortunately, generating a reference to a top-level Addr# binding that lives in another module that has itself been compiled to bytecode is currently impossible. Loaded BCOs are recorded in a ClosureEnv, and as the name suggests, it only contains closures. The addresses of Addr# literals are not actually recorded anywhere in fully-assembled BCOs, so they cannot be linked against.

Historically, this has not been an issue, since top-level Addr# literals do not exist at the Haskell level, only in Core, so code compiled with -O0 does not include any of these references and therefore does not have this problem. But the addition of -fbyte-code-and-object-code means we now sometimes run StgToByteCode on optimized Core, so bytecode ought to support this properly.

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Assignee
Assign to
Time tracking