Improve join point inlining
This ticket relates to the paper "Optimising generics is easy" http://www.dreixel.net/research/pdf/ogie.pdf. Jose writes (about a case where inlining doesn't work well):
I put a minimal source code for this test and resulting Core
(with GHC 6.13.20091115) in https://subversion.cs.uu.nl/repos/staff.jpm.public/Inline/.
UpdateInt
behaves great: in UpdateInt.core.O1.hs
, there are no traces of
generic representations in testOne, testTwo, testThree and testFour.
UpdateString
, however, does not behave so well. In UpdateString.core.O1.hs
,
Test.testLogic_updateString
still has loads of generic representations.
It's easy to see what is happening. Compile UpdateString
(which I've attached to this ticket) with -ddump-simpl
, and look at the core. You see stuff like
Rec { $j1_rx32 = \x. <big nested case expression>
; f = \y. ....($j1_rx32 <big constructor expression>)
---($j1_rx32 <big constructor expression)....
}
So here the $j
(which is a "join point") isn't inlined because it's big, although if it were inlined there would be much goodness because the case expressions would cancel with the explicit constructors.
Why did this happen despite lots of INLINE pragmas? I have not followed all the details, but I'm guessing that if we have, say
{-# INLINE from #-}
from = \x. case x of from_alts
{-# INLINE to #-}
to = \x. case x of to_alts
and we try to optimize this call:
from (mapT f (to x))
then after inlining from
, mapT
, and to
we'll get
case (case (case x of to_alts) of map_alts) of from_alts
And now the case-of-case transform happens, which creates the join points to avoid duplicating map_alts, from_alts into every branch of to_alts. You may say that we've already said that it's ok to duplicate from (and hence from_alts) but we duplicated it once when we inlined it, and then we forget the origin of the code. And indeed, in the worse case you could get a quadratic blow up; and there are only two functions involved. So I'm not unhappy with that.
However, it does make me wonder whether we could not do a better job on the above Rec {..}. Two thoughts occur.
-
We could beef up
SpecConstr
. It doesn't fire at the moment for some reason, even with -O2 -
If we have
f = \x. case x of { C1 x11..x1n -> e1; ... Ck xk1 ... xkm -> ek }
maybe we should worker-wrapper it to
f1 x1 .. x1n = e1 ... fn xk1 .. xkm = en f = \x of pi -> fi xi
and now inline f. The net effect is very similar to the way join points work right now, but it would make it multi-level. In fact, doing this might simplify and generalise the way that join points are currently done, where (rather arbitrarily) we duplicate the outer layer of a single case.
Simon
Trac metadata
Trac field | Value |
---|---|
Version | 6.12.1 |
Type | Task |
TypeOfFailure | OtherFailure |
Priority | normal |
Resolution | Unresolved |
Component | Compiler |
Test case | |
Differential revisions | |
BlockedBy | |
Related | |
Blocking | |
CC | |
Operating system | |
Architecture |