DmdAnal: A lazy Demand's SubDemand should be relative to at least one evaluation
This is take 2 on #18885 (closed), which I'll close in favor of this new summary with the insights we gained since then.
Motivation 1: Nested product strictness
Consider
module Lib (g) where
f :: (Bool, Bool) -> (Bool, Bool)
f pr = (case pr of (a,b) -> a /= b, True)
{-# NOINLINE f #-}
g :: Int -> (Bool, Bool)
g x = let y = let z = odd x in (z,z) in f y
What we would like here is to case bind z
instead of let-binding it, simply because we know that if y
is passed to f
as formal parameter pr
and then evaluated in the body of f
, then f
will also evaluate both components of pr
.
Alas, note that f
is lazy in pr
, signature <ML>
, because pr
is only evaluated when the firsst component of the result is forced. So demand analysis lacks the expressive power to propagate that z
is actually used strictly (assuming an evaluation of its binding context within y
).
Proposed solution
We could give f
a demand signature <MP(1P,1P)>
that says "f
uses its argument pr
lazily but at most once (M
), but assuming at least one such evaluation, it will evaluate both components of the pair strictly.". Then we can put a sub-demand of P(1P,1P)
on the body of y
, seeing that z
is indeed used strictly.
Motivation 2: Eta reduction
In #20040 (closed), we are trying to asses whether it's sound to eta-reduce an argument ifoldl' (\x. f x) ...
to ifoldl' f
. That depends on how ifoldl'
uses its argument. Here's a simpler version of the particular definition of ifoldl'
:
myfoldl :: (a -> b -> a) -> a -> [b] -> a
myfoldl f z [] = z
myfoldl f !z (x:xs) = myfoldl (\x -> f x) (f z x) xs
Today, GHC won't manage to eta-reduce \x -> f x
in the recursive call to myfoldl
.
A sufficient condition to enable eta-reduction would be if myfoldl
put demand SCS(C1(L))
on f
, meaning that "f
is evaluated strictly (possibly multiple times) and is also called strictly with two arguments". Unfortunately, f
is not used strictly because of the base case. Thus, the demand signature of myfoldl
says LCL(C1(L))
and it's unsound to eta-expand based on this signature. The following example demonstrates that:
blah :: (Int -> Int -> Int) -> Int -> Int
blah f 0 = 0
blah f 1 = f `seq` 1
blah f x = f (x+1) (x+2) + f (x+3) (x+4)
{-# NOINLINE blah #-}
main = print $ blah (\x. undefined x) 1
blah
puts demand LCL(C1(L))
on f
. But if we eta-reduce \x. undefined x
at blah
s call site, the program will crash instead of printing 1
.
Proposed solution
The idea is similar to what I propose for nested product demands: Differentiate between myfoldl
and blah
by
- Giving
f
demandLCS(C1(L))
inmyfoldl
, meaning "f
is evaluated 0, 1 or many times (L
), but if it is evaluated, then it is also called (possibly multiple times, so,S
) and everytime it is called in such a way, it is immediately applied to another argument (1
).". - Giving
f
demandLCL(C1(L))
inblarg
(as before), meaning "f
is evaluated 0, 1 or many times (L
), but if it is evaluated, then it might be called 0, 1 or many times (L
) and everytime it is called in such a way, it is immediately applied to another argument (1
).". NB: Assuming at least one evaluation, it might be not called at all, 0 times, in the second clause. That is the important distinction that we are trying to make! At the moment, we leave this expressiveness on the table.
General idea
The general idea for the change discussed here is that whenever we have a Demand
n :* sd
, and the Card
inality n
has a lower bound of 0 (eg., the whole demand is a lazy demand), then we can assume that the SubDemand
sd
actually approximates over (the subset of) all strict evaluation contexts.
Today, a lazy outer demand "pulls down" the immediately nested lower card bounds in sd
to 0 ("lazyfying" them), needlessly destroying information.
Note that when the lower bound is 1, then sd
still applies absolutely, lest we risk regressions. See "Regression tests" below for examples. That makes for a complicated implementation.
Relative vs. Absolute
In #18885 (closed), we say that sd
applies relatively to the assumed strict demand, if and only if n
is lazy.
But the mess in #18885 (closed) also shows that usage demands (e.g., upper bounds) still apply absolutely, independently of lower bounds. So an upper bound of 1
somewhere means that the thing is evaluated at most once wrt. to the whole demand. (The exception is SubDemands inside call demands, which always apply relatively to exactly one call, corresponding to the notion that lambda bodies are freshly instantiated on each call.) I think the relative-absolute tension will pose quite a few challenges for the implementation.
This ticket is for figuring out the "why" (in fact, I hope it just did) and the "how", in particular without losing expressive power in other scenarios and thus risking regressions.
Regression tests
#18885 (closed) painfully brought up examples that showcase what doesn't work.
sd
in call demands Cn(sd)
apply relatively
Requirement: upper bounds in We already have relative call demands; E.g., SCS(C1(L))
means "eval'd multiple times and called multiple times, and always with two arguments", denoting a use of f
in an expression like f 1 2 + f 3 4
. We can use this information to eta-expand f
to arity 2, for example.
Note that this doesn't apply anything for whether the lower bound (strictness) of sd
should apply relatively or absolutely.
Requirement: Absolute product (usage) demands
Why are product demands absolute in the first place? The reason is single-entry information. Consider
h p 0 = 0
h p y = fst p + snd p + y
blurg x =
let a = sum [0..x]
b = sum [1..x]
in h (a,b) x
Today, h
has demand signature <LP(ML,ML)><1A>
. This enough for us to employ call-by-name for the single-entry thunks a
and b
at h
's call site!
If we had relative product demands (NB: not only strictness, also usage), we'd get the same signature, but exploiting it in the same way would be unsound! Example:
h2 p 0 = 0
h2 p y = fst p + fst p + snd p + snd p
blurg2 x =
let a = sum [0..x]
b = sum [1..x]
in h2 (a,b) x
Assuming relative product demands, we'd give h2
the same signature <LP(ML,ML)><1A>
, because each eval of p
will evaluate each component at most once. But a
and b
are actually evaluated (at most) twice each! So the signature actually means something different and we thus can't assume call-by-name for a
and b
. The M
/L
distinction in nested product usage suddenly has become worthless.
Conclusion: We want to stick to absolute product usage.
plusDmd
must lub
in lazy demands
Consider
i :: Bool -> Bool -> (Int, Int) -> Int
i b b' x = (if b then fst x else 3) + (if b' then snd x else 4)
i
must not get a signature that puts demand LP(1L,1L)
on the pair x
, because not all evals of x
will also use both components strictly; counter example is the call site i True False (1,undefined)
. Although it will evaluate the pair and 1
, it won't evaluate undefined
. LP(ML,ML)
is the sound choice here. That implies the following equation for plusDmd
:
-
MP(1L,A) + MP(A,1L) = LP(ML,ML)
(so weplusCard
the outer demands and then have tolub
the lower bounds andplus
the upper bounds on the nested ones)
whereas if the outer M
s where strict demands 1
, we still want
-
1P(1L,a) + 1P(A,1L) = SP(1L,1L)
(soplus
all the way because both nested demands describe all traces)
"General idea" above says that we want relative treatment of a lazy Demand's nested SubDemands. Can we relax the "lazy" qualifier, thus simplifying our idea? The difference in plusDmd
hints at the answer: No. Example:
fst' :: (a,b) -> a
fst' (x,_) = x
{-# NOINLINE fst' #-}
j p = p `seq` fst p
Today, j
puts demand SP(1L,A)
on p
. We want to see that j
is strict in the first component of p
. If we gave the involved strict demands a relative treatment, we'd have to say SP(ML,A)
, because not every eval of p
will evaluate the first component.