Skip to content

A `threadDelay` appears to require a lot of memory

Summary

A threadDelay appears to require a lot of memory, relative to other ways of blocking. Context: https://www.reddit.com/r/haskell/comments/13ofj4o/why_threaddelay_is_so_memopry_expensive/

Steps to reproduce

Compile with ghc-9.4.5 -threaded -O2 -Wall Main.hs and run with ./Main +RTS -s

import Control.Concurrent
import Control.Concurrent.STM
import Control.Monad
import System.Mem

main :: IO ()
main = do
  let n = 100_000
  -- make sure to keep all threads alive:
  v <- newEmptyMVar
  t <- newTVarIO 0
  replicateM_ n $ do
    void . forkIO $ do
      atomically $ modifyTVar' t (+1)
      -- !!! Uncomment this to trigger bad behavior !!!
      -- threadDelay 10_000_000
      takeMVar v -- block forever

  atomically $ do
    r <- readTVar t
    when (r < n) retry

  putStrLn "All threads launched and blocking. Pausing"
  threadDelay 10_000_000
  putStrLn "Doing GC"
  performMajorGC -- update statistics. Necessary?
  putStrLn "Done GC. Pause"
  -- did residency change?
  threadDelay 10_000_000
  -- this wakes one thread, but the point is to not have threads die with
  -- BlockedIndefinitely until here
  putMVar v ()

Expected behavior

Running above I see

     104,839,952 bytes maximum residency (8 sample(s))

which corresponds to expected 1k heap per thread. OS-reported RES is ~250MB which seems in expected range for copying collector (from + to space)

Uncommenting the threadDelay line, however, I see:

     737,888,008 bytes maximum residency (11 sample(s))

and ~700MB OS-reported residency. So that's an extra 4.5K per timer which seems like too much.

All I can get from heap profiling is "STACK"

Environment

  • GHC version used: 9.4.5

Optional:

  • Operating System: linux
  • System Architecture: x86-64
Edited by jberryman
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information