GHC 7.8—8.10, putMVar segfault

Summary

Intermittent segfaults observed in multi-threaded program (likely via BoundedChan) with GHC 8.10.2 [ TL;DR. Missing dirty_MVAR call in putMvar resulted in MVar incorrectly remaining clean even when holding the last reference to a TSO queue head in a younger generation, the queue head was subsequently moved by the GC, but the MVar pointer was not updated. As a result the MVar's TSO queue was corrupted with a dangling pointer to an unexpected object or just random content in memory. ]

Steps to reproduce

Crash many tens of minutes into the run, which was processing tens of GB of data, so the crash is by no means immediate, or easy to reproduce. It has happened a few times now. Rather difficult to reproduce, very sensitive to scheduler timing and workload. Things that made it more likely were:

Limiting the depth of the BoundedChan to 1, thus increasing inter-thread contention
Limiting the heap size with -A128k, making GC more frequent.
Running on a bare-metal 16-core/32-thread machine, to get more effective concurrency.
A multi-layer pipeline of BoundedChan's between source and sink:
- HTTPS or stdin
- gunzip
- group into chunks of 1k lines
- parallel JSON parser/filter
- output
Large dataset from internet-wide IP survey, compressed to multiple GB.

Expected behavior

No segfault. (Appears to be resolved via !4457 (closed), !4458 (closed), !4459 (merged) and !4460 (closed). Issue introduced in 5d9e686c)

Environment

GHC version used: GHC 8.10.2 (but applies to all releases from 7.8 onward, MRs filed for 8.8, 8.10, 9.0 and master).

Optional:

Operating System: Fedora 31
System Architecture: x86_64

Edited Jan 21, 2021 by vdukhovni

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information