Skip to content

WIP: thread-protected automatic trace id propagation to child threads

Daneel S. Yaitskov requested to merge wip/thread-protected into master

The feature targets tracing libraries, though thread protected feature could be used somewhere else. Current implementation is not friendly for multiple libraries to share, due shared value type is free format byte array.

The feature allows to associate a value with current thread (similar to Thread Local Storage), but there is additional behaviour - the value defined in a thread propagates automatically to all its child threads.

I was working on an OpenTelemetry ticket and realized that GHC eventlogs allows to record trace id and reconstruct it later, but there is no built-in mechanism for magic trace id propagation between accross threads. Typical application would be built with many libraries not knowing about tracing anything. OpenTelemetry api is pretty simple. There is no dedicated Effect to wrap trace id.

So application developer has to handle passing trace id variables by hand to deliver them to code doing HTTP calls or launching OS processes.

OpenTelemetry standard requires to reserve 24 bytes (16 bytes for trace id and 8 bytes for span id). First I was thinking about putting these 24 bytes by value just in TSO, but later figured out a way to store pointer to a byte array of arbitrary length. I guess trace info could take more than 24 bytes and store some vendor specific fields.

Having small intermediate trace id inside the process and additional global map complicate cleaning dead ids and bring in thread contentions.

import Data.Primitive

import GHC.Conc.Sync (getAdamTraceId, setAdamTraceId, AdamTraceId(..))

ba2Id :: ByteArray -> AdamTraceId
ba2Id (ByteArray ba) = AdamTraceId ba
id2Ba :: AdamTraceId -> ByteArray
id2Ba (AdamTraceId ba) = ByteArray ba

main :: IO ()
main = do
  _ <- forkIO $ do
         ba1 <- newByteArray 77 >>= unsafeFreezeByteArray
         setAdamTraceId (ba2Id ba1)
         performGC
         performMajorGC
         _ <- forkIO $ do
                  performGC
                  performMajorGC
                  ati <- getAdamTraceId
                  putStrLn $ "sub     child trace " ++ show (sizeofByteArray $ id2Ba ati)
                  _ <- forkIO $ do
                    performGC
                    performMajorGC
                    ati1 <- getAdamTraceId
                    putStrLn $ "sub sub child trace " ++ show (sizeofByteArray $ id2Ba ati1)
                  return ()
         return ()
  threadDelay  1000000
  performGC
  performMajorGC
  ati0 <- getAdamTraceId
  putStrLn $ "main          trace " ++ show (sizeofByteArray $ id2Ba ati0)

Merge request reports