Skip to content

genSym is not thread safe with respect to setNumCapabilities

In a large proprietary application using the GHC API, we observe really weird errors (e.g. overlapping instances for Eq Foo and Eq Bar, where Foo and Bar are completely unrelated, and come from different modules). The pattern we follow is:

* Running with the threaded RTS, 1 initial thread

* Create a new unique supply with mkSplitUniqSupply and put it in an MVar.

* Repeating many times:

  • Set the thread count higher (e.g. 8) using setNumCapabilities

  • On many threads in parallel:

    * Obtain a new unique supply on the original with splitUniqSupply, protected by the MVar, and update the other one in the MVar

    * Use that unique supply to interact with the GHC API

  • Set the thread count back to 1

Our observations of the errors are best explained by the unique names not being nearly as unique as they might be expected to be. Reading the code for genSym:

    if (n_capabilities == 1)
    {
        GenSymCounter = (GenSymCounter + GenSymInc) & UNIQUE_MASK;
        checkUniqueRange(GenSymCounter);
        return GenSymCounter;
    }
    else
    {
        HsInt n = atomic_inc((StgWord *)&GenSymCounter, GenSymInc) & UNIQUE_MASK;
        checkUniqueRange(n);
        return n;
    }

It only does an atomic_inc if n_capabilities == 1, but it doesn't read n_capabilities atomically, so is it suffering a race?

The solution was to set the thread count initially, before any interactions with the GHC API, which seems to solve the problem. Alas, we don't have a reproducible test case, and in fact were unable to reproduce it anywhere but our Linux CI, and even then non-deterministically. The problem does not currently impact us (the workaround is robust), but it seemed worth sharing.

Trac metadata
Trac field Value
Version 8.6.1
Type Bug
TypeOfFailure OtherFailure
Priority normal
Resolution Unresolved
Component Compiler
Test case
Differential revisions
BlockedBy
Related
Blocking
CC
Operating system
Architecture
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information