High CPU when asynchronous exception and unblocking retry on TVar raced
Detail: https://github.com/nshimaza/race-tmvar-async-exception
Runtime falls into high CPU under racing condition between async exception and unblocking retry on TVar.
- Reproduces with +RTS -Nx where x > 1
- Does NOT reproduce with +RTS -N1
- Program stalls at
killThread - High CPU based on given -Nx
- CPU won't be 100% if you gave x smaller than available hardware threads of your platform.
- Does NOT reproduce if TVar/retry is replaced by MVar
- Reproduced with GHC 8.4.2 (macOS High Sierra (10.13.4))
- Reproduced with GHC 8.4.2 (Docker for Mac Version 18.03.1-ce-mac65)
- Reproduced with ghc-8.5.20180506 (Docker for Mac Version 18.03.1-ce-mac65)
Minimal reproducing code here. (You can find more verbose code on the above github repo.)
main :: IO ()
main = do
let volume = 1000
forM_ [1..1000] $ \i -> do
putStrFlush $ show i ++ " "
-- Spawn massive number of threads.
threads <- replicateM volume $ do
trigger <- newTVarIO False
tid <- forkIO $ void $ atomically $ do
t <- readTVar trigger
if t then pure t else retry
pure (trigger, tid)
-- Make sure all threads are spawned.
threadDelay 30000
-- Let threads start to exit normally.
forkIO $ forM_ threads $ \(trigger, _) -> threadDelay 1 *> atomically (writeTVar trigger True)
-- Concurrently kill threads in order to create race.
-- TMVar operation and asynchronous exception can hit same thread simultaneously.
-- Adjust threadDelay if you don't reproduce very well.
threadDelay 1000
forM_ threads $ \(_, tid) -> do
putCharFlush 'A'
killThread tid -- When the issue reproduced, this killThread doesn't return.
putCharFlush '\b'
This program intentionally creates race condition between asynchronous exception
and unblocking operation of retry on TVar. From one side, a writeTVar trigger True is attempted from external thread while target thread is blocking
at retry on the same TVar. On the other side, an asynchronous exception
ThreadKilled is thrown by yet another external thread to the same target
thread.
In other word, it attempts to kill a thread about to unblock.
I guess when the above two operation hit the same thread at the same time in parallel in SMP environment, GHC runtime falls into high CPU.
Trac metadata
| Trac field | Value |
|---|---|
| Version | 8.4.2 |
| Type | Bug |
| TypeOfFailure | OtherFailure |
| Priority | highest |
| Resolution | Unresolved |
| Component | Runtime System |
| Test case | |
| Differential revisions | |
| BlockedBy | |
| Related | |
| Blocking | |
| CC | |
| Operating system | |
| Architecture |