Improve GHC.Event.IntTable performance
The performance of
GHC.Event.IntTable can be improved. I've managed to get some nice increases across the board. Benchmarking using
function, % faster than current impl.
There is one strange thing I noted. In
updateWith, there is an inner loop that looks like this:
data Bucket a = Empty | Bucket Int a (Bucket a) go _ Empty = (False, Nothing, Empty) go cont (Bucket key val next) | key == k = case f val of Nothing -> (True, Just val, cont next) Just v -> (False, Just val, cont (Bucket key v next)) | otherwise = go (\x -> cont (Bucket key val x)) next
which returns a tuple that is immediately consumed like so:
(delete_occurred, old_val, new_bkt) <- go id ... when (isJust old_val) $ do <updateIntTable> when delete_occurred <decIntTableSize> return old_val
I expected that inlining the
<decIntTableSize> code blocks directly into
go would result in better code than creating a tuple and then pattern matching on it afterwards. ie.
go _ Empty = return Nothing go cont (Bucket key val next) | key == k = do case f val of Nothing -> <updateIntTable> (cont next) >> <decIntTableSize> Just v -> <updateIntTable> (cont (Bucket key v next) return (Just val) | otherwise = go (\x -> cont (Bucket key val x)) next
which has the exact same semantics. To my suprise, this code is almost 2x slower! The core generated in both cases is exactly what I'd expect; if anything, the second version seems tighter. I'm not sure why the first version is faster, but perhaps the original author, Bryan O'Sullivan, can shed some light as he used the tupled method in the first version.
I'll attach my patch,
criterion's html output for the benchmarks as well as the benchmarking code, and the core for the oddity I discussed above.