Segfaults when using dynamic wrappers and concurrency
I had a largish program that sometimes segfaulted, the segfault seemingly coming from the code that gets a C pointer from an Haskell function.
After much sweat I've managed to produce a self-contained program that exhibits the same behavior:
bitonic@clay /tmp/ptr-crash % uname -a Linux clay 3.13.0-48-generic #80-Ubuntu SMP Thu Mar 12 11:16:15 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux bitonic@clay /tmp/ptr-crash % cabal configure --disable-library-profiling -w ghc-7.11.20150411 Resolving dependencies... Configuring ptr-crash-0... bitonic@clay /tmp/ptr-crash % cabal build Building ptr-crash-0... Preprocessing executable 'ptr-crash' for ptr-crash-0... [1 of 1] Compiling Main ( Main.hs, dist/build/ptr-crash/ptr-crash-tmp/Main.o ) Linking dist/build/ptr-crash/ptr-crash ... bitonic@clay /tmp/ptr-crash % strace -f -r -o strace-out ./dist/build/ptr-crash/ptr-crash +RTS -N2 -RTS  26612 segmentation fault (core dumped) strace -f -r -o strace-out ./dist/build/ptr-crash/ptr-crash +RTS -N2 -RTS
I'm running GHC HEAD on a Linux 64bit machine. In the larger program, I'm pretty sure the segfaults happened on GHC 7.8.4 too, but currently I can reproduce it only on 7.10 and later.
More details (thanks to Sergei Trofimovich on #ghc for helping me in investigating this):
- The segfault only happens when using
- Curiously, the segfault seems to happen much more often when compiling the program with
- The segfault doesn't happen every time, I get it roughly half of the times on my machine.
straceing the program when segfaulting shows that all the threads crash together right after some calls to
mremap. I've attached the end of the output of
gdbing the program and breaking on
mremapshows that all the calls to
getStablePtr. I've attached a run of
gdbthat shows this pattern.
- The segfault only happens with repeated calls to the dynamic wrapper and with certain timings, which explains the weird nature of the example (I kind of mimicked the behaviour of a C function we were calling from a proprietary C library). Note that the call to
sum_arris not really important and it's there just so that some time is spent in the callback -- the example works equally well if we convert the pointer to an Haskell vector and sum it from Haskell.
Sergei had a hunch that this had to do with thread-unsafe calls to