Skip to content

High CPU when asynchronous exception and unblocking retry on TVar raced

Detail: https://github.com/nshimaza/race-tmvar-async-exception

Runtime falls into high CPU under racing condition between async exception and unblocking retry on TVar.

  • Reproduces with +RTS -Nx where x > 1
  • Does NOT reproduce with +RTS -N1
  • Program stalls at killThread
  • High CPU based on given -Nx
  • CPU won't be 100% if you gave x smaller than available hardware threads of your platform.
  • Does NOT reproduce if TVar/retry is replaced by MVar
  • Reproduced with GHC 8.4.2 (macOS High Sierra (10.13.4))
  • Reproduced with GHC 8.4.2 (Docker for Mac Version 18.03.1-ce-mac65)
  • Reproduced with ghc-8.5.20180506 (Docker for Mac Version 18.03.1-ce-mac65)

Minimal reproducing code here. (You can find more verbose code on the above github repo.)

main :: IO ()
main = do
    let volume = 1000
    forM_ [1..1000] $ \i -> do
        putStrFlush $ show i ++ " "

        -- Spawn massive number of threads.
        threads <- replicateM volume $ do
            trigger <- newTVarIO False
            tid <- forkIO $ void $ atomically $ do
                t <- readTVar trigger
                if t then pure t else retry
            pure (trigger, tid)

        -- Make sure all threads are spawned.
        threadDelay 30000

        -- Let threads start to exit normally.
        forkIO $ forM_ threads $ \(trigger, _) -> threadDelay 1 *> atomically (writeTVar trigger True)

        -- Concurrently kill threads in order to create race.
        -- TMVar operation and asynchronous exception can hit same thread simultaneously.
        -- Adjust threadDelay if you don't reproduce very well.
        threadDelay 1000
        forM_ threads $ \(_, tid) -> do
            putCharFlush 'A'
            killThread tid      -- When the issue reproduced, this killThread doesn't return.
            putCharFlush '\b'

This program intentionally creates race condition between asynchronous exception and unblocking operation of retry on TVar. From one side, a writeTVar trigger True is attempted from external thread while target thread is blocking at retry on the same TVar. On the other side, an asynchronous exception ThreadKilled is thrown by yet another external thread to the same target thread.

In other word, it attempts to kill a thread about to unblock.

I guess when the above two operation hit the same thread at the same time in parallel in SMP environment, GHC runtime falls into high CPU.

Trac metadata
Trac field Value
Version 8.4.2
Type Bug
TypeOfFailure OtherFailure
Priority highest
Resolution Unresolved
Component Runtime System
Test case
Differential revisions
BlockedBy
Related
Blocking
CC
Operating system
Architecture
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information