Poor performance of higher-order functions with unboxing
I was testing out some code to see how GHC handled some unboxed higher-order functions, and was suprised by the results I found. Here is some sample code:
{-# LANGUAGE BangPatterns #-}
{-# LANGUAGE LambdaCase #-}
{-# LANGUAGE MagicHash #-}
import GHC.Exts
import System.Environment
rel# :: Int# -> Int# -> Int# -> Bool
rel# i# j# k# = (i# +# j# +# k#) ># 100000000#
rel :: Int -> Int -> Int -> Bool
rel (I# i#) (I# j#) (I# k#) = rel# i# j# k#
manual :: (Int# -> Int# -> Int# -> Bool) -> (Int, Int, Int)
manual r# = go 0# 0# 0#
where
go i# j# k# | r# i# j# k# = (I# i#, I# j#, I# k#)
| otherwise = go j# k# (i# +# 1#)
{-# NOINLINE manual #-}
auto :: (Int -> Int -> Int -> Bool) -> (Int, Int, Int)
auto r = go 0 0 0
where
go !i !j !k | r i j k = (i, j, k)
| otherwise = go j k (i+1)
{-# NOINLINE auto #-}
main = getArgs >>= \case
"manual" : _ -> print $ manual rel# -- This case is significantly slower.
"auto" : _ -> print $ auto rel -- Why?
A loop that has to box its loop parameters to call a predicate turns out to be significantly faster than one that uses a predicate that takes unboxed values directly.
The answer turns out to be (I believe) in ghc/utils/genapply/GenApply.hs. applyTypes has an entry [P,P,P], but only [N]. This means that the manual loop has to use a slower calling convention than the boxed loop.
I'm not sure whether this should be 'fixed,' as my encounter with it was experimental in nature, and I don't have a real use case. The comment on applyTypes says its cases cover 99% of uses, and mine is not a real use. This ticket may serve as documentation at least, though.
Trac metadata
Trac field | Value |
---|---|
Version | 7.6.3 |
Type | Task |
TypeOfFailure | OtherFailure |
Priority | low |
Resolution | Unresolved |
Component | Compiler |
Test case | |
Differential revisions | |
BlockedBy | |
Related | |
Blocking | |
CC | |
Operating system | |
Architecture |