Memory corruption crash: Evaluated a CAF that was GC'd
In the test suite of my Haskell binding to the
lz4 compression library I discovered an apparent GHC bug that leads to memory corruption (segfaults and arbitrary other errors induced by the wrong memory values).
-debug runtime I get this reliable error message:
lz4-frame-conduit-test: internal error: Evaluated a CAF (0xe966e0) that was GC'd! (GHC version 8.10.2 for x86_64_unknown_linux) Please report this as a GHC bug: https://www.haskell.org/ghc/reportabug
The printed memory address (
0xe966e0 here) does not change across runs.
Steps to reproduce
- Clone https://github.com/nh2/lz4-frame-conduit
git checkout 7dd7b90(github link)
- Install the
lz4binary, because the test suite uses it (e.g. package
stack clean && stack test
while (.stack-work/dist/x86_64-linux/Cabal-126.96.36.199/build/lz4-frame-conduit-test/lz4-frame-conduit-test); do sleep 0.01; done
while bash loop executes the binary repeatedly, because it does not crash on every run.
On my machine the error reproduces within 1 minute. It reproduces within a second when I use my own source-built GHC from the
ghc-8.10 branch (exact commit is
35c7451, which has a cherry-pick to fix a build error, see here, currently being backported in !4032 (merged)) with these
BuildFlavour = perf GhcLibHcOpts += -g3 GhcRtsHcOpts += -g3
I have not yet managed to make the repro smaller, or independent of
lz4. Help on that would be appreciated.
But I have some suspicions on what may be going on. My test suite does this:
let prepare :: [BSL.ByteString] -> [ByteString] prepare strings = BSL.toChunks $ BSL.concat $ intersperse " " $ ["BEGIN"] ++ strings ++ ["END"] describe "reproducing memory error" $ do it "compresses 1000 strings" $ do let strings = prepare $ replicate 1000 "hello" -- cannot reproduce if `!`ing this actual <- runCompressToLZ4 (CL.sourceList strings) actual `shouldBe` (BS.concat strings) describe "more reproducing" $ do it "decompresses 10000 strings" $ do let strings = prepare $ replicate 10000 "hello" actual <- runLZ4ToDecompress (CL.sourceList strings) actual `shouldBe` (BS.concat strings)
I suspect that the CAF that's being GC'd here is related to the string constant
- I use
replicate 10000 "hello"in the 2 places shown above. If I change one of them (e.g.
hella) the issue seems to disappear.
- The crash is always in the second code block (decompressing), when accessing the input
Maybe GHC is doing something illegal with that String CAF?
There's some more info useful for reproducing:
stack.yamlI'm pinning the library
conduit-extra-1.3.2because the newer version
1.3.3magically makes the bug disappear.
But the differences in those are very benign (I was involved in writing them). The only thing it does is to add some calls to
hSetBuffering h NoBuffering.
I believe that this is simply because I use this library to communicate with the
lz4binary used in the test, and the change of buffering affects GC behaviour.
- GHC version used: GHC 8.6 to 8.10.2
- Operating System: Ubuntu 18.04 and NixOS 20.03
- System Architecture: x86_64