Memory corruption crash: Evaluated a CAF that was GC'd
Summary
In the test suite of my Haskell binding to the lz4
compression library I discovered an apparent GHC bug that leads to memory corruption (segfaults and arbitrary other errors induced by the wrong memory values).
With the -debug
runtime I get this reliable error message:
lz4-frame-conduit-test: internal error: Evaluated a CAF (0xe966e0) that was GC'd!
(GHC version 8.10.2 for x86_64_unknown_linux)
Please report this as a GHC bug: https://www.haskell.org/ghc/reportabug
The printed memory address (0xe966e0
here) does not change across runs.
Steps to reproduce
- Clone https://github.com/nh2/lz4-frame-conduit
-
git checkout 7dd7b90
(github link) - Install the
lz4
binary, because the test suite uses it (e.g. packageliblz4-tool
on Debian/Ubuntu) stack clean && stack test
while (.stack-work/dist/x86_64-linux/Cabal-3.2.0.0/build/lz4-frame-conduit-test/lz4-frame-conduit-test); do sleep 0.01; done
The while
bash loop executes the binary repeatedly, because it does not crash on every run.
On my machine the error reproduces within 1 minute. It reproduces within a second when I use my own source-built GHC from the ghc-8.10
branch (exact commit is 35c7451
, which has a cherry-pick to fix a build error, see here, currently being backported in !4032 (merged)) with these make
settings
BuildFlavour = perf
GhcLibHcOpts += -g3
GhcRtsHcOpts += -g3
I have not yet managed to make the repro smaller, or independent of lz4
. Help on that would be appreciated.
But I have some suspicions on what may be going on. My test suite does this:
let prepare :: [BSL.ByteString] -> [ByteString]
prepare strings = BSL.toChunks $ BSL.concat $ intersperse " " $ ["BEGIN"] ++ strings ++ ["END"]
describe "reproducing memory error" $ do
it "compresses 1000 strings" $ do
let strings = prepare $ replicate 1000 "hello" -- cannot reproduce if `!`ing this
actual <- runCompressToLZ4 (CL.sourceList strings)
actual `shouldBe` (BS.concat strings)
describe "more reproducing" $ do
it "decompresses 10000 strings" $ do
let strings = prepare $ replicate 10000 "hello"
actual <- runLZ4ToDecompress (CL.sourceList strings)
actual `shouldBe` (BS.concat strings)
Suspicions
I suspect that the CAF that's being GC'd here is related to the string constant "hello"
.
Key observations:
- I use
replicate 10000 "hello"
in the 2 places shown above. If I change one of them (e.g.hello
->hella
) the issue seems to disappear. - The crash is always in the second code block (decompressing), when accessing the input
ByteString
here.
Maybe GHC is doing something illegal with that String CAF?
Additional info
There's some more info useful for reproducing:
-
In
stack.yaml
I'm pinning the libraryconduit-extra-1.3.2
because the newer version1.3.3
magically makes the bug disappear.But the differences in those are very benign (I was involved in writing them). The only thing it does is to add some calls to
hSetBuffering h NoBuffering
.I believe that this is simply because I use this library to communicate with the
lz4
binary used in the test, and the change of buffering affects GC behaviour.
Environment
- GHC version used: GHC 8.6 to 8.10.2
Optional:
- Operating System: Ubuntu 18.04 and NixOS 20.03
- System Architecture: x86_64