STM starvation when reading a TVar concurrently from two threads and no writes.
Summary
Reading a bunch of (nested) TVars concurrently from two threads, and no writes, results in starvation when executed on more than one capability, but works fine when executed on one capability.
Steps to reproduce
Project files can be found here.
{-# LANGUAGE NumericUnderscores #-}
import Control.Concurrent (ThreadId, myThreadId)
import qualified Control.Concurrent.Async as Async
import Control.Concurrent.STM (STM, atomically)
import Data.IORef (IORef, modifyIORef', newIORef, readIORef)
import GHC.IO (unsafePerformIO)
import qualified ListT
import qualified StmContainers.Map as STMMap
import System.IO (hFlush, stdout)
main :: IO ()
main = do
let l = [(x,x) | x <- [1 :: Integer .. 100_000]]
stmmap <- STMMap.newIO
atomically $ mapM_ (\(x, y) -> STMMap.insert x y stmmap) l
print "Initialized the STM Map."
hFlush stdout
-- Show that a read from a single thread does succeed
readMap stmmap
-- read the stm map from two threads concurrently
foldr1 Async.concurrently_ [readMap stmmap, readMap stmmap]
-- | irrevocable counter to keep track of transaction restart.
restartCounter :: ThreadId -> IORef Integer -> STM ()
restartCounter threadId counter = unsafePerformIO $ do
r <- readIORef counter
print (threadId, r)
hFlush stdout
modifyIORef' counter (+ 1)
return (return ())
readMap :: (Show a, Show b) => STMMap.Map a b -> IO ()
readMap stmmap = do
threadId <- myThreadId
counter <- newIORef (0 :: Integer)
_ <- atomically $ do
restartCounter threadId counter
ListT.toList (STMMap.listT stmmap)
return ()
restartCounter
can be left out as it's not required to reproduce the bug. It's there to observe a restart without having to enable STM debug logging (+RTS -Dm
) as logging slows down the program a lot.
OUTPUT cabal run map -- +RTS -N1
"Initialized the STM Map."
(ThreadId 4,0)
(ThreadId 5,0)
(ThreadId 6,0)
OUTPUT cabal run map -- +RTS -N
"Initialized the STM Map."
(ThreadId 19,0)
(ThreadId 20,0)
(ThreadId 21,0)
(ThreadId 21,1)
(ThreadId 20,1)
(ThreadId 21,2)
(ThreadId 20,2)
(ThreadId 21,3)
(ThreadId 20,3)
(ThreadId 21,4)
(ThreadId 20,4)
(ThreadId 21,5)
(ThreadId 20,5)
(ThreadId 21,6)
(ThreadId 20,6)
(ThreadId 21,7)
(ThreadId 20,7)
(ThreadId 21,8)
(ThreadId 20,8)
(ThreadId 21,9)
(ThreadId 20,9)
(ThreadId 21,10)
(ThreadId 20,10)
(ThreadId 21,11)
(ThreadId 20,11)
... continues ...
When logging STM operations and running the reproducer with multiple capabilities (+RTS -Dm -N
) two things stand out:
- a transaction successfully acquired a lock and immediately fails to acquire the same lock.
- transactions are found wasting their time.
7f183f627fc0: STM: 0x42041e9b28 : cond_lock_tvar(0x4204524fb8, 0x4204990001)
7f183f627fc0: STM: 0x42041e9b28 : success
7f183f627fc0: STM: 0x42041e9b28 : failed to acquire 0x4204524fb8
... bunch of unlock_tvar ...
7f183f627fc0: STM: 0x42041e9b28 : unlock_tvar(0x4204524f48, 0x4204524f69)
7f183f627fc0: STM: 0x42041e9b28 : unlock_tvar(0x4204524f80, 0x4204524fa1)
7f183f627fc0: STM: 0x42041e9b28 : unlock_stm()
7f183f627fc0: STM: 0x42041e9b28 : stmValidateNestOfTransactions()=0
7f183f627fc0: trec 0x42041d1000 found wasting its time
7f183f627fc0: STM: 0x42041e9b28 : stmCondemnTransaction
7f183f627fc0: STM: 0x42041e9b28 : lock_stm()
I did not see this when running with one capability.
The program does terminate if we use 10_000 entries instead of 100_000 entries and multiple capabilities. So the starvation depends on both the number of capabilities and the size of the container.
Expected behavior
What I would expect is that reading a TVar concurrently from two threads will never result in starvation — regardless of the number of capabilities or entries in the map.
Environment
The reproducer has been tested with GHC 9.9-20231022 and we have observed this behaviour in our application on both GHC 9.4.4 and 9.6.2.