genSym is not thread safe with respect to setNumCapabilities
In a large proprietary application using the GHC API, we observe really weird errors (e.g. overlapping instances for Eq Foo and Eq Bar, where Foo and Bar are completely unrelated, and come from different modules). The pattern we follow is:
* Running with the threaded RTS, 1 initial thread
* Create a new unique supply with mkSplitUniqSupply and put it in an MVar.
* Repeating many times:
-
Set the thread count higher (e.g. 8) using
setNumCapabilities -
On many threads in parallel:
* Obtain a new unique supply on the original with
splitUniqSupply, protected by theMVar, and update the other one in theMVar* Use that unique supply to interact with the GHC API
-
Set the thread count back to 1
Our observations of the errors are best explained by the unique names not being nearly as unique as they might be expected to be. Reading the code for genSym:
if (n_capabilities == 1)
{
GenSymCounter = (GenSymCounter + GenSymInc) & UNIQUE_MASK;
checkUniqueRange(GenSymCounter);
return GenSymCounter;
}
else
{
HsInt n = atomic_inc((StgWord *)&GenSymCounter, GenSymInc) & UNIQUE_MASK;
checkUniqueRange(n);
return n;
}
It only does an atomic_inc if n_capabilities == 1, but it doesn't read n_capabilities atomically, so is it suffering a race?
The solution was to set the thread count initially, before any interactions with the GHC API, which seems to solve the problem. Alas, we don't have a reproducible test case, and in fact were unable to reproduce it anywhere but our Linux CI, and even then non-deterministically. The problem does not currently impact us (the workaround is robust), but it seemed worth sharing.
Trac metadata
| Trac field | Value |
|---|---|
| Version | 8.6.1 |
| Type | Bug |
| TypeOfFailure | OtherFailure |
| Priority | normal |
| Resolution | Unresolved |
| Component | Compiler |
| Test case | |
| Differential revisions | |
| BlockedBy | |
| Related | |
| Blocking | |
| CC | |
| Operating system | |
| Architecture |