Don't float out join points to top-level with an interesting demand
Due to a complicated interaction with FloatOut in !7599 (merged), I'm seeing roughly the following sequence of steps:
==> after DmdAnal
f = ... joinrec go [Dmd=SCS(C1(P(1L,A))),Str=<S><L>] x y =
... case <e> of x' { __DEFAULT -> go x' y' }
in ... go ...
==> FloatOut + FloatIn strict arg `e`
go [Str=<S><L>]
go = ... go <e> y' ... -- NB: Dropped the eval
f = ... go ...
==> DmdAnal, 2nd run
go [Str=<L><L>] -- Urgh, now lazy!
go = ... go <e> y' ...
f = ... go ...
This is what happens:
- The first run of DmdAnal records a signature
<S><L>
for the join pointgo
, assuming a demand ofP(1L,A)
on the join body. NB: DmdAnal currently computes better signatures for join points than for regular functions because their evaluation context is known. - The fact that
go
is strict in its first arg allows us to drop the eval one
in Core and we getgo e y'
in the recursive call becausego
, which is easier to handle in Core2Core passes. - In the meantime,
go
has been simplified enough for the second FloatOut pass to see that it can float to the top-level. Off it goes - But on the top-level it's just a regular function where DmdAnal isn't smart enough (yet?) to figure out an accurate evaluation context. So our strictness signature gets worse,
<L><L>
. And now we dropped the eval in (2) but can't recover it! We made the program lazier; in CorePrep we'll let-bind instead of case-binde
.
In !7599 (merged), we made inlined versions of addListToUniqDSet
lazier in GHC.Linker.Loader
, to detrimental effect on MultiLayerModulesTH_OneShot
, which increased by 4.8%.
In a multi-day effort I could come up with a reproducer, but it only reproduces with !7599 (merged) (because t
has demand MP(1L,A)
which would be MP(ML,A)
today. I couldn't reproduce with a demand of 1P(1L,A)
because then t
will just be turned into a case
and pushed into the join point):
module Lib (f) where
import Data.List
import Data.Ord
newtype Unique = U { unU :: Int }
class Uniquable u where getKey :: u -> Unique
instance Uniquable Int where getKey = U
data UMap a = UMap { unS :: ![(Unique,a)], unI :: !Int }
insertBy' f v !xs = insertBy f v xs
{-# NOINLINE insertBy' #-}
addOne :: Uniquable u => UMap a -> u -> a -> UMap a
addOne (UMap set n) x v = UMap (insertBy' (comparing (unU . fst)) (getKey x,v) set) (n+1)
newtype USet u = USet (UMap u)
insertOne :: Uniquable u => USet u -> u -> USet u
insertOne (USet s) x = USet (addOne s x x)
insertMany :: Uniquable u => USet u -> [u] -> USet u
insertMany s vs = foldl' insertOne s (reverse (reverse vs))
seq' = seq
{-# NOINLINE seq' #-}
blah s@(USet m) = unS m `seq'` s
{-# OPAQUE blah #-}
end (USet m) = unS m
{-# NOINLINE end #-}
f :: USet Int -> [Int] -> [(Unique,Int)]
f !xs ys
| length ys == 13 = end $ blah t
| length ys == 23 = reverse $ end $ blah t
| otherwise = []
where
t = insertMany xs (reverse $ reverse $ reverse $ reverse ys)
Welp.
My suggestion for a fix: In SetLevels.destLevel
, only float join points (to the top-level) if their demand is not "interesting", e.g. not just nCn(C1(L))
. Alternatively, zap their demand signatures when floating. Perhaps a combination of both.
Perhaps a strange artifact is that go
's first arg is actually unlifted (in terms of StrictWorkerId
) and perhaps will be eval'd in Stg2Cmm. But the thunk will be allocated nonetheless which is a waste of ressources. Maybe we could case-bind in CorePrep expressions that go in Unlifted positions. But that isn't strictly the cause of the issue here.