Demand Analysis scales quadratically in the size of the program
While investigating #14816, I realised that Demand analysis takes quadratic time and memory for the following kind of program:
{-# LANGUAGE NoImplicitPrelude #-}
module GHC.Ix where
import GHC.Base
class (Ord a) => Ix a where
range :: (a,a) -> [a]
instance (Ix a1, Ix a2, Ix a3, Ix a4, Ix a5, Ix a6, Ix a7, Ix a8, Ix a9,
Ix aA, Ix aB, Ix aC, Ix aD, Ix aE, Ix aF) =>
Ix (a1,a2,a3,a4,a5,a6,a7,a8,a9,aA,aB,aC,aD,aE,aF) where
range ((l1,l2,l3,l4,l5,l6,l7,l8,l9,lA,lB,lC,lD,lE,lF),
(u1,u2,u3,u4,u5,u6,u7,u8,u9,uA,uB,uC,uD,uE,uF)) =
[(i1,i2,i3,i4,i5,i6,i7,i8,i9,iA,iB,iC,iD,iE,iF) | i1 <- range (l1,u1),
i2 <- range (l2,u2),
i3 <- range (l3,u3),
i4 <- range (l4,u4),
i5 <- range (l5,u5),
i6 <- range (l6,u6),
i7 <- range (l7,u7),
i8 <- range (l8,u8),
i9 <- range (l9,u9),
iA <- range (lA,uA),
iB <- range (lB,uB),
iC <- range (lC,uC),
iD <- range (lD,uD),
iE <- range (lE,uE),
iF <- range (lF,uF)]
Fortunately, this is already the largest instance in GHC.Ix
. But these are the numbers for 13-,14-,15- and 16-tuples, respectively:
n | DmdAnal alloc |
---|---|
13 | 46.8 |
14 | 52.5 |
15 | 58.5 |
16 | 65.1 |
Here's a quadratic regression: https://www.desmos.com/calculator/jxnxfp6hte
As you can see, if we had an instance for a 100-tuple, we'd have over 2GB of allocs. OK, still not that bad, but diagnosing and possibly fixing the issue might drive to improved DmdAnal perf for large programs.
Unfortunately, I think the quadratic behavior is quite fundamental due to the nature of fixed-point iteration. For a triple, I see generated code like
$crange_aKj
= \ @a1_aK8 @a2_aK9 @a3_aKa $dIx_aKb $dIx_aKc $dIx_aKd ds_dN4 ->
case ds_dN4 of { (ds_dNg, ds_dNh) ->
case ds_dNg of { (l1_aji, l2_ajj, l3_ajk) ->
case ds_dNh of { (u1_ajl, u2_ajm, u3_ajn) ->
let { lvl_sP4 = range $dIx_aKc (l2_ajj, u2_ajm) } in
let { lvl_sP3 = range $dIx_aKd (l3_ajk, u3_ajn) } in
letrec {
go1_aNT
= \ ds_aNU ->
case ds_aNU of {
[] -> [];
: y_aNX ys_aNY ->
let { z_X4 = go1_aNT ys_aNY } in
letrec {
go1_X5
= \ ds_X6 ->
case ds_X6 of {
[] -> z_X4;
: y_X8 ys_X9 ->
let { z_Xa = go1_X5 ys_X9 } in
letrec {
go1_Xb
= \ ds_Xc ->
case ds_Xc of {
[] -> z_Xa;
: y_Xe ys_Xf -> : (y_aNX, y_X8, y_Xe) (go1_Xb ys_Xf)
}; } in
go1_Xb lvl_sP3
}; } in
go1_X5 lvl_sP4
}; } in
go1_aNT (range $dIx_aKb (l1_aji, u1_ajl))
}
}
}
This is the "call stack" in which we'll analyse RHSs of go1_*
bindings for signatures:
-
aNT
first-
X5
first-
Xb
first -
Xb
second: stable
-
-
X5
second..-
Xb
: still stable
-
- ..
X5
: stable
-
-
aNT
second...-
X5
..-
Xb
: still stable
-
- ..
X5
: still stable
-
- ...
aNT
: stable
The innermost binding Xb
is analysed 4 times. In general, for an n-tuple, the innermost binding will be analysed n+1 times, the next outer binding n times, etc.
Note that the "cunning plan" in Note [Initialising Strictness]
, which reuses signatures between iterations, saves us from scaling exponentially here. Still, we scale quadratically.
While I doubt we can do better in general without conservative approximation, note that in this example, Xb
doesn't actually call X5
(or aNT
)! It should not need to be analysed once it's stable. The recursive occurence is in the thunks z_*
, which are then referenced from the nested loops. We use LetUp for thunks, so there is no way that the RHS of z_Xa
can influence the demand signature of go_Xb
. If we had FV annotations as suggested in #19538, we could save re-analysing Xb
simply by knowing that the unstable outer bindings aNT
and X5
don't occur free in Xb
.
I'm a bit doubtful that we can make that plan happen easily. Ultimately what we want is more of a finer-grained dependency graph for fixed-point iteration, like I used in my master's thesis: https://pp.ipd.kit.edu/uploads/publikationen/graf17masterarbeit.pdf#section.4.8, but that was VERY complicated and unfortunately has huge constants.