Execution time of reads/writes for TVars during transactions slows down with the number of TVars used during the Transaction.

Summary

Reading a list of TVars takes O(n²) as for each read we traverse the linked list of all transaction entries.

Steps to reproduce

This programm showcases the issue:

{-# LANGUAGE NumericUnderscores #-}
{-# LANGUAGE ScopedTypeVariables #-}

module Main where

import Control.Concurrent (ThreadId, myThreadId, forkIO)
import Control.Concurrent.STM (STM, atomically, newTVar, readTVar, TVar)
import Data.IORef (IORef, modifyIORef', newIORef, readIORef)
import GHC.IO (unsafePerformIO)
import System.IO (hFlush, stdout)

import Data.Time.Clock

import Control.Concurrent.MVar
import Control.Monad

main = do

    let nums = [
                 10_000 :: Int
                ,50_000
                ,100_000
                ,200_000
                ,300_000
                ,400_000
                ]

    forM_ nums $ \tvar_count -> do
        print $ "Measuring: " ++ show tvar_count

        (tvars :: [TVar Int]) <- atomically $ mapM (newTVar) [1 .. tvar_count]

        start_time <- getCurrentTime
        s <- atomically $ do
                values <- mapM readTVar tvars :: STM [Int]
                !s <- return $! sum values
                return $ s

        print $ "done, " ++ show s
        done_time <- getCurrentTime

        print $ done_time `diffUTCTime` start_time

Expected behavior

While some overhead might be expected, I would have expected the cost of using additional TVars in an transaction to be roughly linear.

The issue is that the STM transaction log for each transaction log is a linked list of all used TVars and their expected results.

When reading from a TVar we traverse the log to check if the value originates from an earlier write during the same transaction When writing we traverse the log to either update the cached value or to add a entry with the new value.

This isn't quite ideal and causes STM transactions to be far slower than neccessary when used with large numbers of TVars.

One potential solution would be to use a HashMap rather than a linked list. I believe this would be sound since we only need to know:

The last value written to a TVar during a transaction.
The expected value for each TVar at the end of the transaction.

Potential Improvements

Replace the linked list with a HashMap alltogether.

This would mean very fast lookups, but we would have to rebuild the HashMap during GCs.

Use a hashmap merely as a cache in addition to the linked list

This could be done by rebuilding it once it's first needed after GC, or adding TVars to it as they are used.

Either way it would make for slightly less predictable performance, slightly more memory use, but improved performance.

Further we could allocate TVars into non-moving parts of the heap.

But I'm sure that has it's own disadvantages.

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information