Profiled RTS has a concurrency bug due to missing mutex protection around refreshProfilingCCSs. This resolves the issue #24423.
It would be nice to have a test case, but this change makes sense to me, and it seems difficult to reliably reproduce. It would be nice if we had some way to assert that a mutex is held (or not held) in the -debug
RTS, both for debugging and for readability. That’s probably outside the scope of this patch, though, so I’m happy to approve this.
If I had to guess, I’d say that it seems likely related to the use of -rdynamic
. That flag is documented in the User’s Guide, and it says this:
This instructs the linker to add all symbols, not only used ones, to the dynamic symbol table. Currently Linux and Windows/MinGW32 only. This is equivalent to using
-optl -rdynamic
on Linux, and-optl -export-all-symbols
on Windows.
Emphasis mine. The reason that flag is necessary is that this test case loads code dynamically using the RTS linker API, so GHC needs to keep all the symbols in base
around so that the loaded code can link against them. If the test case were compiled using -dynamic
, this would all work out automatically, because then we’d be using true shared objects, and the symbols would definitely be available in libHSbase-xxx.so
. However, -prof
is currently incompatible with -dynamic
, and this test case is fundamentally about -prof
.
Somewhat confusingly, it doesn’t seem that -rdynamic
is actually an option accepted by ld
. Rather, it is an option accepted by gcc
that causes it to pass --export-dynamic
to ld
. This suggests to me that the User’s Guide text is not entirely correct, unless I misunderstand what -optl
does and it actually passes flags to the C compiler in linking mode (I have not yet bothered to check).
Regardless, it seems highly likely to me that OpenBSD’s ld
also supports --export-dynamic
. One way to make the test case pass would therefore be to extend GHC’s -rdynamic
flag to support OpenBSD in the same way it currently handles Linux.
This appears to have been fixed by !11656 (closed).
I have discovered a minimal program that causes GHCi to segfault when GHC itself has been compiled with profiling (more specifically, using the perf+profiled_ghc+no_dynamic_ghc
flavour). Here is the code:
module A where
foo :: () -> ()
foo = {-# SCC "foo/expr" #-} (\x -> x)
{-# SCC foo #-}
module B where
import A
bar :: ()
bar = foo ()
Unfortunately, I’ve only managed to trigger the segfault if the code is loaded as a .a
archive, which requires a little setup, since the archive must be registered as a package. This package configuration will suffice:
name: example
version: 0.0.0
visibility: public
id: example
key: example
abi: inplace
exposed: True
exposed-modules: A B
import-dirs: build
library-dirs: build
library-dirs-static: build
dynamic-library-dirs: build
hs-libraries: HSexample
depends: base-4.18.0.0-inplace
Then we can run the following commands to build the package and trigger the segfault:
$ ghc --make -i -isrc A B -prof -osuf p_o -hisuf p_hi -outputdir build -this-unit-id example
[1 of 2] Compiling A ( src/A.hs, build/A.p_o )
[2 of 2] Compiling B ( src/B.hs, build/B.p_o )
$ ar -r build/libHSexample_p.a build/A.p_o build/B.p_o
ar: creating build/libHSexample_p.a
$ ghc-pkg recache --package-db package.db
$ ghc --interactive -package-db package.db -package-id example
GHCi, version 9.7.20230512: https://www.haskell.org/ghc/ :? for help
ghci> import B
ghci> bar
fish: Job 1, '/home/alexis/code/haskell/ghc/m…' terminated by signal SIGSEGV (Address boundary error)
This seems likely to be a bug in the RTS linker, since the issue goes away if code is loaded from the .p_o
files directly.
To make it easier for other people to reproduce this issue, I’ve created an archive containing everything necessary to trigger the problem: standalone.zip Extract the archive, enter the root directory, and run the following command:
env GHC=path/to/your/profiled/ghc ./build.sh
This should reliably trigger the segfault on GHC HEAD when compiled with the perf+profiled_ghc+no_dynamic_ghc
.
Alexis King (0ade483c) at 25 Jan 17:35
wip: use loadNativeObj to implement addDLL
I've encountered ghc: internal error: LDV_recordDead: Failed to find counter for closure
when profiling ghci on a large project.
Minimized reproducer:
./hadrian/build -j --flavour=perf+profiled_ghc+no_dynamic_ghc
I've reproduced on current master dd88a260.
Unpack profiling_bug.zip.
Run
cabal repl bug -w <PATH_TO_PROFILED_GHC> --repl-options='+RTS -hm -hbdrag -RTS' < /dev/null
It is also possible to get a segfault instead, by uncommenting code in A.hs
.
The obvious fix for this particular bug is to make the RTS linker state less global. This sounds difficult, but I think it probably isn’t actually that bad:
The RTS linker is essentially only ever used by GHC and GHCi, which already manage a great deal of per-session state. It would be easy enough to provide a linker API that returns a handle to a fresh linking context that maintains its own symbol table and list of loaded shared libraries.
Other parts of the RTS do not generally rely on the linker state being global. The one exception to this is the garbage collector, which uses a global table of mapped code addresses to trace references to live code objects. This table can easily be maintained even if the rest of the linker state is not global.
Though this would require a completely new API to the RTS linker, it would be possible to maintain backwards compatibility with whatever non-GHC users exist (of which there are probably not many) by providing a global linking context that is implicitly used by the legacy API.
The actual mapping of code objects in memory could be shared process-wide using a reference counting scheme to reduce memory usage. (In fact, this is precisely what system dynamic linkers do.)
However, my impression is that running multiple GHC sessions within a process is currently mostly a curiosity, not something anyone is actually clamoring for. So this doesn’t seem very important to do anytime soon.
The following example program (derived from T3372
) illustrates how concurrent GHC sessions can clobber each other’s RTS linker state:
{-# LANGUAGE MagicHash #-}
module Main where
import Data.Foldable
import System.Environment
import GHC (Ghc)
import qualified GHC
import qualified GHC.Driver.Monad as GHC
import qualified GHC.Driver.Session as GHC
import qualified GHC.Platform.Ways as GHC
import qualified GHC.Exts
main :: IO ()
main = do
let test1 = "M1.hs"
let test2 = "M2.hs"
writeFile test1 "module M where x = 1"
writeFile test2 "module M where x = 2"
ghc_1 <- newGhcSession
ghc_2 <- newGhcSession
line "1" $ runInSession ghc_1 $ load (test1, "M")
line "2" $ runInSession ghc_2 $ load (test2, "M")
line "3" $ runInSession ghc_1 $ eval "x"
line "4" $ runInSession ghc_2 $ eval "x"
line "5" $ runInSession ghc_1 $ eval "x"
where
line n a = putStr (n ++ ": ") >> a
type ModuleName = String
newGhcSession :: IO GHC.Session
newGhcSession = do
(libdir:_) <- getArgs
session <- GHC.runGhc (Just libdir) (GHC.reifyGhc pure)
runInSession session $ do
df <- GHC.getSessionDynFlags
let platform = GHC.targetPlatform df
GHC.setSessionDynFlags $
foldl' (flip GHC.setGeneralFlag')
df{GHC.ghcMode = GHC.CompManager,
GHC.ghcLink = GHC.LinkInMemory,
GHC.targetWays_ = GHC.hostFullWays,
GHC.verbosity = 0}
(concatMap (GHC.wayGeneralFlags platform) GHC.hostFullWays)
pure session
runInSession :: GHC.Session -> Ghc a -> IO a
runInSession = flip GHC.reflectGhc
load :: (FilePath, ModuleName) -> Ghc ()
load (f, mn) = do
target <- GHC.guessTarget f Nothing Nothing
GHC.setTargets [target]
res <- GHC.load GHC.LoadAllTargets
GHC.liftIO $ putStrLn ("Load " ++ showSuccessFlag res)
GHC.setContext
[ GHC.IIDecl $ GHC.simpleImportDecl $ GHC.mkModuleName "Prelude"
, GHC.IIDecl $ GHC.simpleImportDecl $ GHC.mkModuleName mn ]
where
showSuccessFlag GHC.Succeeded = "succeeded"
showSuccessFlag GHC.Failed = "failed"
eval :: String -> Ghc ()
eval e = do
show_e <- GHC.compileExpr $ "(show ("++ e ++")) :: String"
GHC.liftIO $ putStrLn (GHC.Exts.unsafeCoerce# show_e)
Compiling and running this program yields the following output:
$ ghc -dynamic -package ghc Main
[1 of 2] Compiling Main ( Main.hs, Main.o )
[2 of 2] Linking Main [Objects changed]
$ ./Main "$(ghc --print-libdir)"
1: Load succeeded
2: Load succeeded
3: 1
4: 2
5: 2
The final line of output is the bug; it should print 1
, but instead it prints 2
.
The above program creates two concurrent GHC sessions and loads a module into each one. The modules share the same name and export the same symbol, but the symbol has a different value in each module.
Since the backend is the NCG and the program is dynamically-linked, each GHC session compiles the loaded module to a .o
file, then links the .o
file into a .so
file on demand. The first time the two sessions each evaluate a reference to the loaded symbol, the results are 1
and 2
, respectively, which is correct. However, when the symbol is evaluated in the first session a second time, the 2
value from the second session is printed, instead. In effect, the module loaded in the second session has overwritten the module loaded in the first session, which is quite surprising.
This symbol clobbering only affects symbols loaded from shared objects, and it only affects the symbols resolved from an interpreted context. Even if symbols are “overwritten” in this way, references from native code will still refer to the original, correct symbols. However, any references in newly-created bytecode will be incorrect, and references from existing bytecode objects will become incorrect if the bytecode objects are relinked. Note also that the clobbering is truly on a symbol-by-symbol basis: if the second module were to export some, but not all, of the first module’s symbols, the values of those symbols would be selectively clobbered, leaving the first session in an inconsistent state. If the replaced symbols’ types differ from their replacements, memory corruption or segfaults are likely to occur.
Currently, the state of the RTS linker is process-global. This includes the RTS symbol table, the list of loaded code objects, and the list of loaded shared libraries. GHC generally expects that it is the exclusive client of the RTS linker, and this assumption is essentially always correct, but concurrent GHC sessions break this assumption, which can result in various misbehaviors.
When the RTS linker is used to load static .o
files or .a
archives, it does all the work of loading the objects itself, including maintaining its own symbol table. For this reason, if the above example is compiled without the -dynamic
option, the outcome is quite different:
$ ghc -package ghc Main
[1 of 2] Compiling Main ( Main.hs, Main.o )
[2 of 2] Linking Main [Objects changed]
$ ./Main "$(ghc --print-libdir)"
1: Load succeeded
2: Load succeeded
3: 1
GHC runtime linker: fatal error: I found a duplicate definition for symbol
M_x_closure
whilst processing object file
/tmp/haskell/M2.o
The symbol was previously defined in
/tmp/haskell/M1.o
This could be caused by:
* Loading two different object files which export the same symbol
* Specifying the same object file twice on the GHCi command line
* An incorrect `package.conf' entry, causing some object to be
loaded twice.
4: Main: loadObj "/tmp/haskell/M2.o": failed
Since the RTS maintains its own symbol table in this configuration, it can detect the symbol collision and report the error. This behavior is arguably still wrong—there is no fundamental reason that the two sessions must share a symbol table, and other session state is kept separate—but it’s at least less mysterious.
In contrast, when the RTS loads a shared library, it defers the work to the system dynamic linker (via dlopen
on Linux and macOS). System dynamic linkers do not provide APIs to obtain the full list of symbols provided by a shared library, so the RTS cannot possibly determine whether two libraries provide conflicting symbols. When a symbol is looked up, each library is tried in order until the symbol is found, starting from the library that was most recently loaded. Within a single interpreter session, this strategy provides the generally-desirable property that new symbols shadow old ones (since most symbol conflicts arise from loading a new version of the same code), but it is much less defensible if multiple sessions exist in the same process.
Alexis King (84d578b1) at 18 Jan 16:58
wip: make addDLL wrapper around loadNativeObj
... and 452 more commits
Alexis King (74c4f60f) at 18 Jan 16:58
wip: make addDLL wrapper around loadNativeObj
Alexis King (a5fa213e) at 18 Jan 15:37
Always refresh profiling CCSes after running pending initializers
Alexis King (dec3a47a) at 10 Jan 17:48
wip: test that the .hp contains the right output
Alexis King (50e34fc2) at 22 Nov 18:12
Always refresh profiling CCSes after running pending initializers
Alexis King (d8312734) at 22 Nov 16:17
Always refresh profiling CCSes after running pending initializers
This fixes #24171 in the way described in #24171 (comment 536230). Writing a test case for this is a bit tricky, so the current test case uses the RTS API to load an archive directly, which avoids the need for a profiled GHC(i).
Alexis King (01941826) at 22 Nov 16:15
Always refresh profiling CCSes after running pending initializers
Alexis King (eefd0d98) at 20 Nov 21:30
Always refresh profiling CCSes after running pending initializers
... and 5735 more commits
I just investigated this issue. The root cause is a failure to properly call refreshProfilingCCSs
after loading new Haskell code. The sequence of events happens as follows:
An archive is loaded using loadArchive
. This causes its constituent object files to be loaded one at a time using loadOc
.
Normally, loadOc
sets the status of each loaded object to OBJECT_NEEDED
, which will cause resolveObjs
to eagerly resolve, relocate, and initialize them. However, when an object file is loaded from an archive, loadOc
sets its status to OBJECT_LOADED
, instead, which defers resolution until the object’s symbols are needed.
Some time later, a call to lookupSymbol
triggers on-demand linking of the relevant object. loadSymbol
sets the object’s status to OBJECT_NEEDED
and calls ocTryLoad
, then calls refreshProfilingCCSs
. However, this is sadly too early: ocTryLoad
merely sets the object’s status to OBJECT_RESOLVED
; it does not run the object’s initializers.
Eventually, control returns to lookupSymbol
, which calls runPendingInitializers
. This initializes any newly-loaded objects using ocRunInit
, which in this case causes several new cost centres to be registered.
After runPendingInitializers
returns, lookupSymbol
fails to call refreshProfilingCCSs
. This means the registered CCs have been registered, but are not yet initialized, so their selected
field remains its default value of 0
.
At this point, the program has entered an invalid state. Template Haskell evaluation triggers allocation of a closure that belongs to one of these not-yet-initialized cost centres, and a heap census is taken. Since the cost centre’s selected
field is 0
, it is not included in the census.
Some time later, GHC calls resolveObjs
after potentially loading more code. This has the side-effect of calling refreshProfilingCCSs
and initializing the cost centres registered earlier. This sets their selected
field to 1
, and if LDV profiling were disabled, this would return the program to a valid state (albeit with slightly incorrect profiling information).
However, if LDV profiling is enabled, the error mentioned in this issue is triggered when any of the objects allocated in step 6 die. LDV_recordDead
looks up the counter associated with the cost centre from the era in which the object was created, but no such counter exists, since the cost centre was not enabled at that time. Therefore, the RTS panics.
Fortunately, the fix to this issue seems relatively simple: runPendingInitializers
should always call refreshProfilingCCSs
to ensure that any newly-registered cost centres are properly initialized. I will open an MR with this change.