Concurrent GHC sessions clobber RTS linker state
The following example program (derived from T3372
) illustrates how concurrent GHC sessions can clobber each other’s RTS linker state:
{-# LANGUAGE MagicHash #-}
module Main where
import Data.Foldable
import System.Environment
import GHC (Ghc)
import qualified GHC
import qualified GHC.Driver.Monad as GHC
import qualified GHC.Driver.Session as GHC
import qualified GHC.Platform.Ways as GHC
import qualified GHC.Exts
main :: IO ()
main = do
let test1 = "M1.hs"
let test2 = "M2.hs"
writeFile test1 "module M where x = 1"
writeFile test2 "module M where x = 2"
ghc_1 <- newGhcSession
ghc_2 <- newGhcSession
line "1" $ runInSession ghc_1 $ load (test1, "M")
line "2" $ runInSession ghc_2 $ load (test2, "M")
line "3" $ runInSession ghc_1 $ eval "x"
line "4" $ runInSession ghc_2 $ eval "x"
line "5" $ runInSession ghc_1 $ eval "x"
where
line n a = putStr (n ++ ": ") >> a
type ModuleName = String
newGhcSession :: IO GHC.Session
newGhcSession = do
(libdir:_) <- getArgs
session <- GHC.runGhc (Just libdir) (GHC.reifyGhc pure)
runInSession session $ do
df <- GHC.getSessionDynFlags
let platform = GHC.targetPlatform df
GHC.setSessionDynFlags $
foldl' (flip GHC.setGeneralFlag')
df{GHC.ghcMode = GHC.CompManager,
GHC.ghcLink = GHC.LinkInMemory,
GHC.targetWays_ = GHC.hostFullWays,
GHC.verbosity = 0}
(concatMap (GHC.wayGeneralFlags platform) GHC.hostFullWays)
pure session
runInSession :: GHC.Session -> Ghc a -> IO a
runInSession = flip GHC.reflectGhc
load :: (FilePath, ModuleName) -> Ghc ()
load (f, mn) = do
target <- GHC.guessTarget f Nothing Nothing
GHC.setTargets [target]
res <- GHC.load GHC.LoadAllTargets
GHC.liftIO $ putStrLn ("Load " ++ showSuccessFlag res)
GHC.setContext
[ GHC.IIDecl $ GHC.simpleImportDecl $ GHC.mkModuleName "Prelude"
, GHC.IIDecl $ GHC.simpleImportDecl $ GHC.mkModuleName mn ]
where
showSuccessFlag GHC.Succeeded = "succeeded"
showSuccessFlag GHC.Failed = "failed"
eval :: String -> Ghc ()
eval e = do
show_e <- GHC.compileExpr $ "(show ("++ e ++")) :: String"
GHC.liftIO $ putStrLn (GHC.Exts.unsafeCoerce# show_e)
Compiling and running this program yields the following output:
$ ghc -dynamic -package ghc Main
[1 of 2] Compiling Main ( Main.hs, Main.o )
[2 of 2] Linking Main [Objects changed]
$ ./Main "$(ghc --print-libdir)"
1: Load succeeded
2: Load succeeded
3: 1
4: 2
5: 2
The final line of output is the bug; it should print 1
, but instead it prints 2
.
Explanation
The above program creates two concurrent GHC sessions and loads a module into each one. The modules share the same name and export the same symbol, but the symbol has a different value in each module.
Since the backend is the NCG and the program is dynamically-linked, each GHC session compiles the loaded module to a .o
file, then links the .o
file into a .so
file on demand. The first time the two sessions each evaluate a reference to the loaded symbol, the results are 1
and 2
, respectively, which is correct. However, when the symbol is evaluated in the first session a second time, the 2
value from the second session is printed, instead. In effect, the module loaded in the second session has overwritten the module loaded in the first session, which is quite surprising.
This symbol clobbering only affects symbols loaded from shared objects, and it only affects the symbols resolved from an interpreted context. Even if symbols are “overwritten” in this way, references from native code will still refer to the original, correct symbols. However, any references in newly-created bytecode will be incorrect, and references from existing bytecode objects will become incorrect if the bytecode objects are relinked. Note also that the clobbering is truly on a symbol-by-symbol basis: if the second module were to export some, but not all, of the first module’s symbols, the values of those symbols would be selectively clobbered, leaving the first session in an inconsistent state. If the replaced symbols’ types differ from their replacements, memory corruption or segfaults are likely to occur.
Cause
Currently, the state of the RTS linker is process-global. This includes the RTS symbol table, the list of loaded code objects, and the list of loaded shared libraries. GHC generally expects that it is the exclusive client of the RTS linker, and this assumption is essentially always correct, but concurrent GHC sessions break this assumption, which can result in various misbehaviors.
When the RTS linker is used to load static .o
files or .a
archives, it does all the work of loading the objects itself, including maintaining its own symbol table. For this reason, if the above example is compiled without the -dynamic
option, the outcome is quite different:
$ ghc -package ghc Main
[1 of 2] Compiling Main ( Main.hs, Main.o )
[2 of 2] Linking Main [Objects changed]
$ ./Main "$(ghc --print-libdir)"
1: Load succeeded
2: Load succeeded
3: 1
GHC runtime linker: fatal error: I found a duplicate definition for symbol
M_x_closure
whilst processing object file
/tmp/haskell/M2.o
The symbol was previously defined in
/tmp/haskell/M1.o
This could be caused by:
* Loading two different object files which export the same symbol
* Specifying the same object file twice on the GHCi command line
* An incorrect `package.conf' entry, causing some object to be
loaded twice.
4: Main: loadObj "/tmp/haskell/M2.o": failed
Since the RTS maintains its own symbol table in this configuration, it can detect the symbol collision and report the error. This behavior is arguably still wrong—there is no fundamental reason that the two sessions must share a symbol table, and other session state is kept separate—but it’s at least less mysterious.
In contrast, when the RTS loads a shared library, it defers the work to the system dynamic linker (via dlopen
on Linux and macOS). System dynamic linkers do not provide APIs to obtain the full list of symbols provided by a shared library, so the RTS cannot possibly determine whether two libraries provide conflicting symbols. When a symbol is looked up, each library is tried in order until the symbol is found, starting from the library that was most recently loaded. Within a single interpreter session, this strategy provides the generally-desirable property that new symbols shadow old ones (since most symbol conflicts arise from loading a new version of the same code), but it is much less defensible if multiple sessions exist in the same process.