A `threadDelay` appears to require a lot of memory
Summary
A threadDelay
appears to require a lot of memory, relative to other ways of blocking. Context: https://www.reddit.com/r/haskell/comments/13ofj4o/why_threaddelay_is_so_memopry_expensive/
Steps to reproduce
Compile with ghc-9.4.5 -threaded -O2 -Wall Main.hs
and run with ./Main +RTS -s
import Control.Concurrent
import Control.Concurrent.STM
import Control.Monad
import System.Mem
main :: IO ()
main = do
let n = 100_000
-- make sure to keep all threads alive:
v <- newEmptyMVar
t <- newTVarIO 0
replicateM_ n $ do
void . forkIO $ do
atomically $ modifyTVar' t (+1)
-- !!! Uncomment this to trigger bad behavior !!!
-- threadDelay 10_000_000
takeMVar v -- block forever
atomically $ do
r <- readTVar t
when (r < n) retry
putStrLn "All threads launched and blocking. Pausing"
threadDelay 10_000_000
putStrLn "Doing GC"
performMajorGC -- update statistics. Necessary?
putStrLn "Done GC. Pause"
-- did residency change?
threadDelay 10_000_000
-- this wakes one thread, but the point is to not have threads die with
-- BlockedIndefinitely until here
putMVar v ()
Expected behavior
Running above I see
104,839,952 bytes maximum residency (8 sample(s))
which corresponds to expected 1k heap per thread. OS-reported RES is ~250MB which seems in expected range for copying collector (from + to space)
Uncommenting the threadDelay
line, however, I see:
737,888,008 bytes maximum residency (11 sample(s))
and ~700MB OS-reported residency. So that's an extra 4.5K per timer which seems like too much.
All I can get from heap profiling is "STACK"
Environment
- GHC version used: 9.4.5
Optional:
- Operating System: linux
- System Architecture: x86-64