Early inlining causes potential join points to be missed
While trying to make stream fusion work with recursive step functions I noticed that the following filter implementation did not fuse nicely.
data Stream s a = Stream (s -> Step s a) s
data Step s a = Done | Yield a s
sfilter :: (a -> Bool) -> Stream s a -> Stream s a
sfilter pred (Stream step s0) = Stream filterStep s0 where
filterStep s = case step s of
Done -> Done
Yield x ns
| pred x -> Yield x ns
| otherwise -> filterStep ns
fromTo :: Int -> Int -> Stream Int Int
{-# INLINE fromTo #-}
fromTo from to = Stream step from where
step i
| i > to = Done
| otherwise = Yield i (i + 1)
sfoldl :: (b -> a -> b) -> b -> Stream s a -> b
{-# INLINE sfoldl #-}
sfoldl acc z (Stream !step s0) = oneShot go z s0 where
go !y s = case step s of
Done -> y
Yield x ns -> go (acc y x) ns
ssum :: (Num a) => Stream s a -> a
ssum = sfoldl (+) 0
filterTest :: Int
filterTest = ssum $ sfilter even (fromTo 1 101)
For this code to work nicely, GHC should detect that filterStep is a join point. However, in the definition of sfilter it is not because not all references are tail-called & saturated.
After inlining of sfilter and some trivial case-of-case transformations filterStep should become a join point. But it seems like the simplifier never gets the change to do this because float-out optimization makes filterStep a top level binding. With -fno-full-laziness filterStep does become a join point at the call site, but of course this is not really a solution.
Then I found that the following also works:
sfilter :: (a -> Bool) -> Stream s a -> Stream s a
sfilter pred (Stream step s0) = Stream filterStep s0 where
{-# INLINE [2] filterStep #-}
filterStep s = case step s of
Done -> Done
Yield x ns
| pred x -> Yield x ns
| otherwise -> filterStep ns
Simply adding an INLINE [2] pragma disables the inlining in the early run of the simplifier. Therefore, the float out pass does not get the change to float-out before the filterStep is recognized as a joint point. Or at least that is my interpretation of what is going on.
What surprises me about this issue is that the gentle run seems to perform inlining while the wiki mentions that inlining is not performed in this stage: https://ghc.haskell.org/trac/ghc/wiki/Commentary/Compiler/Core2CorePipeline
Intuitively, I would think that floating-out is sub-optimal when the simplifier did not use all its tricks yet, because inlining typically opens up possibilities for simplification while floating-out typically reducing these possibilities.
Trac metadata
Trac field | Value |
---|---|
Version | 8.2.1 |
Type | Bug |
TypeOfFailure | OtherFailure |
Priority | normal |
Resolution | Unresolved |
Component | Compiler |
Test case | |
Differential revisions | |
BlockedBy | |
Related | |
Blocking | |
CC | |
Operating system | |
Architecture |