Skip to content

Use proper atomic operations

Ben Gamari requested to merge wip/tsan-ci into master

This is a large-scale refactoring of our use of atomic memory operations (or, rather, fixing the lack therefore) in the runtime system.

Background

In short, prior to C11 the C abstract machine had no notion and threads and therefore guaranteed no particular semantics for shared-memory access. Platform-specific libraries such of pthreads offered various safe synchronization primitives, but there was no standard mechanism for achieving lock-free concurrent access. Lacking such a mechanism, programs like GHC were forced to use a variety of tricks (e.g. ubiquitous use of volatile) to try to ensure that the compiler would produce code with the desired operational behavior.

However, these tricks abuse language features unrelated to concurrency (e.g. volatile was intended for interaction with memory-mapped hardware) and guarantee no particular semantics; data races are undefined behavior, volatile or otherwise.

Thankfully, this changed with the appearance of C11, which introduced proper standard atomic operations with support for relaxed acquire/release memory ordering semantics. This allows us to explicitly tell the compiler the ordering that we expect, shifting the onus of proving correctness from us (a responsibility which we have historically shirked) to the compiler. Moreover, using standard atomic operations allows us to benefit from the many tools that have been written around this interface (e.g. ThreadSanitizer).

Changes

This patch makes a few related changes:

  • Use C11-style atomic operations instead of volatile and prayer to ensure correct ordering between shared-memory operations. Note that we do not use C11 atomic variables but rather only the atomic operations themselves. This makes it far easier to support older, non-C11 compilers at the expense of less type safety.
  • Refactor GHC's existing atomic primitives (e.g. cas, atomic_inc) to rather use the __atomic_* family of builtins as the former as deprecated.
  • Introduce support for running the ThreadSanitizer memory-concurrency validator which can identify data races and other concurrency issues.
  • Fix the issues that ThreadSanitizer identifies

To avoid directly depending upon the atomic builtins I define a set of macros exposing load and store operations with various orderings:

  • {SEQ_CST,ACQUIRE,RELEASE}_LOAD
  • {SEQ_CST,ACQUIRE,RELEASE}_STORE
  • SEQ_CST_ADD
  • {SEQ_CST,RELEASE}_FENCE

The __atomic_* builtins that these macros wrap are available in all GCC version since 4.7, released in 2012. To avoid having to emulate these operations on GCC 4.6 (the first GCC version we previously supported), I have bumped the configure check to mandate gcc >= 4.7. Previously we claimed compatibility with GCC >= 4.6, which was released in 2012 In addition, TSAN itself provides a few interfaces for annotating "benign" races, as well as the rare cases in which the checker cannot infer a "happens-before" relation.

ThreadSanitizer support

This branch adds a new Hadrian build flavour, thread-sanitizer, as well as a new nightly CI job, nightly-x86_64-linux-deb9-tsan, which validates in this flavour.

We also add a suppression file to ignore two races which I found were either benign or

All of this is summarised in Note [ThreadSanitizer] in includes/rts/TSANUtils.h.

Concurrency issues fixed

On the whole I was quite impressed by both the specificity and accuracy of the errors reported by ThreadSanitizer; this will be an invaluable tool moving forward. Among the bugs that it caught are:

  • a race in the STM implementation which I believe results in #17757 (closed)
  • a race in setNumCapabilities and itimer which could result in use-after-free (#17289)
  • several thread leaks in the hs_try_putmvar* tests
  • several data races in the tracking of runtime statistics
  • a data race in the resizing of the stable pointer table

Closes #17090 (closed).

Todo

  • Look into suppression of #17289; is it fixed?
  • [ ]
Edited by Ben Gamari

Merge request reports