Skip to content

rts: Threads/caps affinity

Currently GHC supports two kinds of threads with respect to thread migration - pinned to a specific capability and those it can migrate between any capabilities. For purposes of achieving lower latency in Haskell applications it would be nice to have something in between - threads GHC rts can migrate but within a certain subset of capabilities only.

I'm developing a program that contains several kinds of threads - those that do little work and sensitive to latency and those that can spend more CPU time and less latency sensitive. I looked into several cases of increased latency in those sensitive threads (using GHC eventlog) and in all cases sensitive threads were waiting for non-sensitive threads to finish working. I was able to reduce worst case latency by factor of 10 by pinning all the threads in the program to specific capability but manually distributing threads (60+ of them) between capabilities (several different machines with different numbers of cores available) seems very fragile. World stopping GC is still a problem but at least in my case is much less frequently so.

I have a patch for rts that implements this proposal

{- | 'setThreadAffinity' limits RTS ability to migrate thread to
capabilities with numbers that matches set bits of affinity mask, thus
mask of `0b101` (5) will allow RTS to migrate this thread to caps
0 (64, 128, ..) and 3 (64 + 3 = 67, 128 + 3 = 131, ...).

Setting all bits to 0 or 1 will disable the restriction.
-}
setThreadAffinity :: ThreadId -> Int -> IO ()

This allows to define up to 64 distinct groups and allow user to break down their threads into bigger number of potentially intersecting groups by specifying things like capability 0 does latency sensitive things, caps 1..5 - less sensitive things, caps 5-7 bulk things.

Sample program using this API

{-# LANGUAGE LambdaCase #-}

import Data.Time
import Control.Monad
import Control.Concurrent
import System.Environment (getArgs)
import GHC.Conc


wastetime :: Bool -> IO ()
wastetime affine = do
    tid <- forkIO $ do

        myThreadId >>= \tid -> labelThread tid "timewaster"

        forever $ do
            when (sum [1..1000000] < (0 :: Integer)) $
                print "impossible"
            threadDelay 100
            yield
    when affine $ setThreadAffinity tid (255 - 2)

client :: Bool -> IO ()
client affine = do
    myThreadId >>= \tid -> labelThread tid "client"
    when affine $ myThreadId >>= \tid -> setThreadAffinity tid 2
    before <- getCurrentTime
    replicateM_ 10 $ do
        threadDelay 10000
    after <- getCurrentTime
    print $ after `diffUTCTime` before

startClient :: Bool -> IO ()
startClient = {- replicateM_ 10 . -} client


main :: IO ()
main = do

    getArgs >>= \case
        [wno's, aff's] -> do
            let wno = read wno's
                aff = read aff's
            putStrLn $ unwords ["Affinity:", show aff, "Timewasters:", show wno]
            replicateM_ wno (wastetime aff)
            startClient aff
        _ -> putStrLn "Usage: <progname> <number of time wasters> <enable affinity: True/False>"

Compiled with -threaded and running with rts -N8 on 6 core (12 threads) machine.

Results are noisy but repeatable

Affinity: False Timewasters: 24
0.42482036s
Affinity: True Timewasters: 24
0.111743474s
Trac metadata
Trac field Value
Version 8.3
Type FeatureRequest
TypeOfFailure OtherFailure
Priority normal
Resolution Unresolved
Component Runtime System
Test case
Differential revisions
BlockedBy
Related
Blocking
CC
Operating system
Architecture
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information