Join point is not inlined ruins performance
Summary
While looking into the regressions caused by implementing idea number 4 of #25170 (closed), I've found a program that regresses in 67% in runtime allocations. The "bad" code and the "good" code only differ in one function.
The bad code has an additional join point which both prevents tail call recursion and causes re-boxing in a loop:
Main = \sc1 eta2 ->
case eta2 `cast`...of
I# i ->
join $j a b = (a, b) `cast` … `cast` …
in case sc1 of wild_1
_DEFAULT ->
case (Main (+# wild_1 1#) (I# (+# i wild_1)) `cast` … of
(x, s’’) -> jump $j x s’’ ←– AWFUL! No tail call + unnecessary rebox
100# -> jump $j () (I# (+# i 100#)) `cast`...
What we previously produced is exactly what we want:
Main = \sc1 eta2 ->
case eta2 `cast`... of
I# i -> case sc1 of wild_1
_DEFAULT ->
Main (+# wild_1 1#) (I# (+# i wild_1)) `cast` …
100# ->
((), (I# (+# i 100#))) `cast` …
My observation is, regardless of that patch, we should prevent the first program from being generated.
Matthew found recently another program which is bad because of a join point like this
Note [Duplicating join points]
explains why we create join points even for trivial things, but I think we may want to change this notion to some extent
Steps to reproduce
With the !13884 (closed) compiler, observe the simplifier output of test T16473
against the output of HEAD ghc.
Notes
Me and @simonpj discussed this example.
After analyzing Note [Duplicating join points]
, we think we can be less conservative and also allow join points to be unconditionally inlined (see uncondInlineJoin
) if they are a simple application to trivial expressions WITHOUT free variables.