Calling hs_try_putmvar from an unsafe foreign call can cause the RTS to hang
An unsafe foreign call which calls hs_try_putmvar can cause the RTS to hang, preventing any Haskell threads from making progress. However, compiling with -debug causes it instead to fail an assertion in the scheduler:
internal error: ASSERTION FAILED: file rts/Schedule.c, line 510
(GHC version 8.4.3 for x86_64_apple_darwin)
Here is a minimal test case which reproduces the assertion. It needs to be built with -debug -threaded and run with +RTS -N2 or higher.
import Control.Concurrent (forkIO, threadDelay)
import Control.Concurrent.MVar (MVar, newEmptyMVar, takeMVar)
import Control.Monad (forever)
import Foreign.C.Types (CInt(..))
import Foreign.StablePtr (StablePtr)
import GHC.Conc (PrimMVar, newStablePtrPrimMVar)
foreign import ccall unsafe hs_try_putmvar :: CInt -> StablePtr PrimMVar -> IO ()
main = do
mvar <- newEmptyMVar
forkIO $ forever $ do
takeMVar mvar
forkIO $ forever $ do
sp <- newStablePtrPrimMVar mvar
hs_try_putmvar (-1) sp
threadDelay 1
-- Let it spin a few times to trigger the bug
threadDelay 500
I actually checked out GHC and added this as a test case and did some debugging. The specific assertion that fails is ASSERT(task->cap == cap). This seems to happen because of this code in hs_try_putmvar:
Task *task = getTask();
// ...
ACQUIRE_LOCK(&cap->lock);
// If the capability is free, we can perform the tryPutMVar immediately
if (cap->running_task == NULL) {
cap->running_task = task;
task->cap = cap;
RELEASE_LOCK(&cap->lock);
// ...
releaseCapability(cap);
} else {
// ...
}
Basically it assumes that the current thread's task isn't currently running a capability, so it takes a new one and then releases it without restoring the previous value of task->cap.
Modifying the code to restore the value of task->cap after releasing the capability fixes the assertion. But I don't know enough about the RTS to be sure I'm not missing something here. In particular, is there a problem with the task basically holding two capabilities for a short time?
My other thought is that maybe it should check if its task is currently running a capability, and in that case do something else. But I'm not sure what.
Trac metadata
| Trac field | Value |
|---|---|
| Version | 8.4.3 |
| Type | Bug |
| TypeOfFailure | OtherFailure |
| Priority | normal |
| Resolution | Unresolved |
| Component | Runtime System |
| Test case | |
| Differential revisions | |
| BlockedBy | |
| Related | |
| Blocking | |
| CC | |
| Operating system | |
| Architecture |