rts: Threads/caps affinity
Currently GHC supports two kinds of threads with respect to thread migration - pinned to a specific capability and those it can migrate between any capabilities. For purposes of achieving lower latency in Haskell applications it would be nice to have something in between - threads GHC rts can migrate but within a certain subset of capabilities only.
I'm developing a program that contains several kinds of threads - those that do little work and sensitive to latency and those that can spend more CPU time and less latency sensitive. I looked into several cases of increased latency in those sensitive threads (using GHC eventlog) and in all cases sensitive threads were waiting for non-sensitive threads to finish working. I was able to reduce worst case latency by factor of 10 by pinning all the threads in the program to specific capability but manually distributing threads (60+ of them) between capabilities (several different machines with different numbers of cores available) seems very fragile. World stopping GC is still a problem but at least in my case is much less frequently so.
I have a patch for rts that implements this proposal
{- | 'setThreadAffinity' limits RTS ability to migrate thread to
capabilities with numbers that matches set bits of affinity mask, thus
mask of `0b101` (5) will allow RTS to migrate this thread to caps
0 (64, 128, ..) and 3 (64 + 3 = 67, 128 + 3 = 131, ...).
Setting all bits to 0 or 1 will disable the restriction.
-}
setThreadAffinity :: ThreadId -> Int -> IO ()
This allows to define up to 64 distinct groups and allow user to break down their threads into bigger number of potentially intersecting groups by specifying things like capability 0 does latency sensitive things, caps 1..5 - less sensitive things, caps 5-7 bulk things.
Sample program using this API
{-# LANGUAGE LambdaCase #-}
import Data.Time
import Control.Monad
import Control.Concurrent
import System.Environment (getArgs)
import GHC.Conc
wastetime :: Bool -> IO ()
wastetime affine = do
tid <- forkIO $ do
myThreadId >>= \tid -> labelThread tid "timewaster"
forever $ do
when (sum [1..1000000] < (0 :: Integer)) $
print "impossible"
threadDelay 100
yield
when affine $ setThreadAffinity tid (255 - 2)
client :: Bool -> IO ()
client affine = do
myThreadId >>= \tid -> labelThread tid "client"
when affine $ myThreadId >>= \tid -> setThreadAffinity tid 2
before <- getCurrentTime
replicateM_ 10 $ do
threadDelay 10000
after <- getCurrentTime
print $ after `diffUTCTime` before
startClient :: Bool -> IO ()
startClient = {- replicateM_ 10 . -} client
main :: IO ()
main = do
getArgs >>= \case
[wno's, aff's] -> do
let wno = read wno's
aff = read aff's
putStrLn $ unwords ["Affinity:", show aff, "Timewasters:", show wno]
replicateM_ wno (wastetime aff)
startClient aff
_ -> putStrLn "Usage: <progname> <number of time wasters> <enable affinity: True/False>"
Compiled with -threaded and running with rts -N8 on 6 core (12 threads) machine.
Results are noisy but repeatable
Affinity: False Timewasters: 24
0.42482036s
Affinity: True Timewasters: 24
0.111743474s
Trac metadata
Trac field | Value |
---|---|
Version | 8.3 |
Type | FeatureRequest |
TypeOfFailure | OtherFailure |
Priority | normal |
Resolution | Unresolved |
Component | Runtime System |
Test case | |
Differential revisions | |
BlockedBy | |
Related | |
Blocking | |
CC | |
Operating system | |
Architecture |