GHC does not inline cheap inner loop when used in two places
When I made a Criterion benchmark of Neil Michell's "Tight Inner Loop" blog post (http://neilmitchell.blogspot.co.uk/2014/01/optimising-haskell-for-tight-inner-loop.html), I noticed that GHC 7.6.3 will not inline the performance-critical function when it's called from two different places (inside the same module).
break0_2pleaseinline :: (Char -> Bool) -> ByteString0 -> (ByteString, ByteString0)
break0_2pleaseinline f (BS0 bs) = (BS.unsafeTake i bs, BS0 $ BS.unsafeDrop i bs)
where
i = Internal.inlinePerformIO $ BS.unsafeUseAsCString bs $ \ptr -> do
let start = castPtr ptr :: Ptr Word8
let end = go start
return $! Ptr end `minusPtr` start
go s@(Ptr a) | c == '\0' || f c = a
| otherwise = go $ inc s
where c = chr s
versionInl1 :: ByteString0 -> (ByteString, ByteString0)
versionInl1 str = break0_2pleaseinline test str
where
test x = x <= '$' && (x == ' ' || x == '\r' || x == '\n' || x == '$')
versionInl2 :: ByteString0 -> (ByteString, ByteString0)
versionInl2 str = break0_2pleaseinline test str
where
test x = x <= '$' && (x == ' ' || x == '\r' || x == '\n' || x == '$')
Full code here: https://github.com/nh2/inner-loop-benchmarks/blob/6715d1e9946d6b5e6d9bb53203982ed3d2ed32ff/Bench.hs#L166.
break0_2pleaseinline
does not get inlined, which makes versionInl1
and versionInl2
over 4 times slower than when inlined. The inlining doens't happen, not even with -O2
and -O3
, only an INLINE
pragma will move GHC to do it.
If I was a compiler, I would so inline that function!
I am surprised that GHC doesn't decide to do so.
Trac metadata
Trac field | Value |
---|---|
Version | 7.6.3 |
Type | Bug |
TypeOfFailure | OtherFailure |
Priority | normal |
Resolution | Unresolved |
Component | Compiler |
Test case | |
Differential revisions | |
BlockedBy | |
Related | |
Blocking | |
CC | mail@nh2.me |
Operating system | |
Architecture |