I had hoped to be able to solve this by just looking at the code, because I have only made a few small changes. But it seems I really need to do some debugging now.
@simonpj I'm getting a panic when I try that:
Command line: _build/stage0/bin/ghc -Wall -Wcompat -dynamic-too -hisuf hi -osuf o -hcsuf hc -static -hide-all-packages -no-user-package-db '-package-env -' '-package-db _build/stage1/inplace/package.conf.d' '-this-unit-id ghc-9.9-inplace' '-this-package-name ghc' '-package-id array-0.5.6.0-inplace' '-package-id base-4.19.0.0-inplace' '-package-id binary-0.8.9.1-inplace' '-package-id bytestring-0.12.1.0-inplace' '-package-id containers-0.7-inplace' '-package-id deepseq-1.5.0.0-inplace' '-package-id directory-1.3.8.3-inplace' '-package-id exceptions-0.10.7-inplace' '-package-id filepath-1.5.0.0-inplace' '-package-id ghc-boot-9.9-inplace' '-package-id ghc-heap-9.9-inplace' '-package-id ghci-9.9-inplace' '-package-id hpc-0.7.0.1-inplace' '-package-id process-1.6.18.0-inplace' '-package-id semaphore-compat-1.0.0-inplace' '-package-id stm-2.5.1.0-inplace' '-package-id template-haskell-2.21.0.0-inplace' '-package-id time-1.12.2-inplace' '-package-id transformers-0.6.1.0-inplace' '-package-id unix-2.8.5.0-inplace' -i -i/home/user/haskell/ghc/_build/stage1/compiler/build -i/home/user/haskell/ghc/_build/stage1/compiler/build/autogen -i/home/user/haskell/ghc/compiler -Irts/include -I_build/stage1/compiler/build -I_build/stage1/compiler/build/. -Icompiler/. -I/home/user/haskell/ghc/libraries/process/include -I/home/user/haskell/ghc/_build/stage1/libraries/process/build/include -I/home/user/haskell/ghc/libraries/directory -I/home/user/haskell/ghc/_build/stage1/libraries/directory/build -I/home/user/haskell/ghc/libraries/unix/include -I/home/user/haskell/ghc/_build/stage1/libraries/unix/build/include -I/home/user/haskell/ghc/libraries/time/lib/include -I/home/user/haskell/ghc/_build/stage1/libraries/time/build/lib/include -I/home/user/haskell/ghc/libraries/containers/containers/include -I/home/user/haskell/ghc/_build/stage1/libraries/containers/containers/build/include -I/home/user/haskell/ghc/libraries/bytestring/include -I/home/user/haskell/ghc/_build/stage1/libraries/bytestring/build/include -I/home/user/haskell/ghc/libraries/ghc-internal/include -I/home/user/haskell/ghc/_build/stage1/libraries/ghc-internal/build/include -I/nix/store/k35vkrrjwaz4fz2nlp6x3maahjdhjqs9-gmp-with-cxx-6.2.1-dev/include -I/home/user/haskell/ghc/libraries/ghc-bignum/include/ -I/home/user/haskell/ghc/_build/stage1/libraries/ghc-bignum/build/include/ -I/nix/store/5g383djp9x6rfqxpmi277aciagaf2i08-elfutils-0.189-dev/include -I/nix/store/0zwvjs9ijzma1bw2mlzsc4n1x8wqkyas-numactl-2.0.16/include -I/home/user/haskell/ghc/rts/include -I/home/user/haskell/ghc/_build/stage1/rts/build/include -optP-include -optP_build/stage1/compiler/build/autogen/cabal_macros.h -optP-DHAVE_INTERNAL_INTERPRETER -optP-DCAN_LOAD_DLL -outputdir _build/stage1/compiler/build -fdiagnostics-color=always -Wall -Wno-name-shadowing -Wnoncanonical-monad-instances -Wnoncanonical-monoid-instances -XGHC2021 -XNoImplicitPrelude -XMonoLocalBinds -no-global-package-db -package-db=/home/user/haskell/ghc/_build/stage1/inplace/package.conf.d -ghcversion-file=rts/include/ghcversion.h -ghcversion-file=rts/include/ghcversion.h -Wnoncanonical-monad-instances -optc-Wno-error=inline -c compiler/GHC/Core/Unfold/Make.hs -o _build/stage1/compiler/build/GHC/Core/Unfold/Make.o -O2 -H32m -dlint -haddock -Winvalid-haddock -Wno-deprecated-flags -Wcpp-undef
===> Command failed with error code: 1
ghc: panic! (the 'impossible' happened)
GHC version 9.9.20240304:
refineFromInScope
InScope {wild_00 wild_X1 wild_X2 wild_X3 opts_a3G0 spec_bndrs_a3G1
spec_app_a3G2 rule_lhs_args_a3G3 df_a3G4 wild_a597 x_a598 ds_d4ka
ds_d4kb ds_d4kc ds_d4kg ds_d4kh ds_d4ki ds_d4kj ds_d4kk
certainlyWillInline mkCompulsoryUnfolding mkCompulsoryUnfolding'
mkCoreUnfolding mkDFunUnfolding mkDataConUnfolding mkFinalUnfolding
mkFinalUnfolding' mkInlinableUnfolding mkInlineUnfoldingNoArity
mkInlineUnfoldingWithArity mkSimpleUnfolding mkUnfolding
mkWorkerUnfolding mkWrapperUnfolding specUnfolding $trModule
$trModule_s4IC $trModule_s4ID $trModule_s4IE $trModule_s4IF
expr_s5tH cache_s5tJ cache_s5tL cache_s5tN cache_s5tP lvl_s5u8
lvl_s5u9 lvl_s5ua lvl_s5ub lvl_s5uc go2_s5ue go2_s5uh lvl_s5uj
lvl_s5uo}
$wgo6_s5uf
Call stack:
CallStack (from HasCallStack):
callStackDoc, called at compiler/GHC/Utils/Panic.hs:186:37 in ghc:GHC.Utils.Panic
pprPanic, called at compiler/GHC/Core/Opt/Simplify/Env.hs:931:30 in ghc:GHC.Core.Opt.Simplify.Env
CallStack (from HasCallStack):
panic, called at compiler/GHC/Utils/Error.hs:507:29 in ghc:GHC.Utils.Error
If I disable tryLoopify
by always returning nothing then I don't get panics, so this is not due other merge issues.
Here's the code that I have now by the way:
tryLoopify :: OccEnv -> TopLevelFlag -> Id -> [Id] -> UsageDetails -> [NodeDetails]
-> Maybe ([CoreBind] -> CoreBind, UsageDetails)
-- See Note [Join point loopification]
tryLoopify env lvl loop_entry bndrs body_uds details_s
| not (isJoinId loop_entry)
, NoTailCallInfo <- lookupTailCallInfo body_uds loop_entry
-- otherwise turn it into a joinrec rightaway
, let unadj_uds = foldr (andUDs . nd_rhs_uds) emptyDetails details_s
, decideRecJoinPointHood NotTopLevel unadj_uds bndrs
-- Main test is in tagRecBinders. Keep in sync
-- NotTopLevel: After loopification, bndrs would be a local binding,
-- regardless of whether it came from the top-level or not
, AlwaysTailCalled arity <- lookupTailCallInfo unadj_uds loop_entry
, Just loop_nds <- find ((== loop_entry) . nd_bndr) details_s
, (!lam_bndrs,_) <- collectNBinders arity (nd_rhs_exp loop_nds)
, let !tail_call_uds = mkOneOcc env loop_entry IsInteresting arity
, (loop_entry',_) <- tagNonRecBinder lvl (lookupLetOccInfo body_uds loop_entry) loop_entry
-- NB: Here we pass lvl, because the NonRec will stay at the same level
= Just (wrap_entry loop_entry' lam_bndrs, tail_call_uds)
| otherwise
= Nothing
where
nd_rhs_uds x = let (WTUD (TUD _ uds) _) = nd_rhs x in uds
nd_rhs_exp x = let (WTUD _ e) = nd_rhs x in e
wrap_entry f xs binds =
NonRec f (mkLams xs (mkLets binds (mkVarApps (Var f) xs)))
I'm taking a look.
I currently have this:
tryLoopify :: OccEnv -> TopLevelFlag -> Id -> [Id] -> UsageDetails -> [NodeDetails]
-> Maybe ([CoreBind] -> CoreBind, UsageDetails, UsageDetails)
-- See Note [Join point loopification]
tryLoopify env lvl loop_entry bndrs body_uds details_s
| not (isJoinId loop_entry)
, NoTailCallInfo <- lookupTailCallInfo body_uds loop_entry
-- otherwise turn it into a joinrec rightaway
, let unadj_uds = foldr (andUDs . (\(WTUD (TUD _ uds) _) -> uds) . nd_rhs) emptyDetails details_s
, decideRecJoinPointHood NotTopLevel unadj_uds bndrs
-- Main test is in tagRecBinders. Keep in sync
-- NotTopLevel: After loopification, bndrs would be a local binding,
-- regardless of whether it came from the top-level or not
, AlwaysTailCalled arity <- lookupTailCallInfo unadj_uds loop_entry
, Just loop_nds <- find ((== loop_entry) . nd_bndr) details_s
, (!lam_bndrs,_) <- collectNBinders arity ((\(WTUD _ e) -> e) (nd_rhs loop_nds))
, let !tail_call_uds = mkOneOcc env loop_entry IsInteresting arity
, (loop_entry',joinPointHood) <- tagNonRecBinder lvl (lookupLetOccInfo body_uds loop_entry) loop_entry
-- NB: Here we pass lvl, because the NonRec will stay at the same level
= Just (wrap_entry loop_entry' lam_bndrs, _body_uds', tail_call_uds)
| otherwise
= Nothing
where
wrap_entry f xs binds =
NonRec f (mkLams xs (mkLets binds (mkVarApps (Var f) xs)))
So really perhaps the main remaining todo is figuring out how to adapt the final body usage details, i.e. what should go in the place of _body_uds'
.
I'm trying to fast forward this, but got a bit stuck on tryLoopify
. Some functions have changed in a way I can't easily migrate, namely:
decideJoinPointHood
seems to have been renamed to decideRecJoinPointHood
, but I'm not sure if its essential semantics has also changed.mkOneOcc
now requires an OccEnv
env
as an argument to tryLoopify
)tagNonRecBinder
now requires OccInfo
and doesn't return something wrapped in WithUsageDetails
It seems to be related to #23942 (closed). It seems the fix for now is to add import GHC.Base ()
to compiler/GHC/Utils/Containers/Internal/StrictPair.hs
This MR addresses #24471
More than one significant change.
I've recently ran into a similar missed fusion opportunity again which involves join points and stream fusion. Here's a minimal reproducer:
data List_ a b = Nil_ | Cons_ a !b
data ListCoCh a = forall s . ListCoCh (s -> List_ a s) !s
betweenCoCh :: (Int, Int) -> ListCoCh Int
betweenCoCh s0 = ListCoCh between_step s0 where
between_step (!x,!y) = if x > y then Nil_ else Cons_ x (x+1, y)
filterCoCh :: (a -> Bool) -> ListCoCh a -> ListCoCh a
filterCoCh p (ListCoCh step s0) = ListCoCh filter_step s0 where
filter_step !s = case step s of
Nil_ -> Nil_
Cons_ x xs -> if p x then Cons_ x xs else filter_step xs
mapCoCh :: (a -> b) -> ListCoCh a -> ListCoCh b
mapCoCh f (ListCoCh step s0) = ListCoCh map_step s0 where
map_step !s = case step s of
Nil_ -> Nil_
Cons_ x xs -> Cons_ (f x) xs
sumCoCh :: ListCoCh Int -> Int
sumCoCh (ListCoCh step s) = sum_go s 0 where
sum_go !s' !acc = case step s' of
Nil_ -> acc
Cons_ x xs -> sum_go xs (x + acc)
pipeline :: (Int, Int) -> Int
pipeline = sumCoCh . mapCoCh (+2) . filterCoCh even . betweenCoCh
main :: IO ()
main = let !_ = pipeline (1,1000) in pure ()
It produces core that looks like this:
Rec {
$wfilter_step :: Int -> Int -> List_ Int (Int, Int)
$wfilter_step
= \ (ww :: Int) (ww1 :: Int) ->
case ww of x { I# ipv ->
case ww1 of y { I# ipv1 ->
case ># ipv ipv1 of {
__DEFAULT ->
case remInt# ipv 2# of {
__DEFAULT -> $wfilter_step (I# (+# ipv 1#)) y;
0# -> Cons_ x (I# (+# ipv 1#), y)
};
1# -> Nil_
}
}
}
end Rec }
Rec {
$wsum_go :: Int -> Int -> Int# -> Int#
$wsum_go
= \ (ww :: Int) (ww1 :: Int) (ww2 :: Int#) ->
case $wfilter_step ww ww1 of {
Nil_ -> ww2;
Cons_ x xs ->
case xs of { (ww3, ww4) ->
case x of { I# x1 -> $wsum_go ww3 ww4 (+# 2# (+# ww2 x1)) }
}
}
end Rec }
But filter_step
started off as a join point, so we would have wanted that case "crocodile" in $wsum_go
to have been pushed into the filter_step
function and that way it could have removed all allocation in the pipeline.
With both GHC 9.8.1 and 9.4.8 that doesn't happen.
It doesn't even help if I inline every part of the pipeline:
pipeline :: (Int, Int) -> Int
pipeline range = sum_go range 0
where
between_step (!x,!y) = if x > y then Nil_ else Cons_ x (x+1, y)
filter_step !s = case between_step s of
Nil_ -> Nil_
Cons_ x xs -> if even x then Cons_ x xs else filter_step xs
map_step !s = case filter_step s of
Nil_ -> Nil_
Cons_ x xs -> Cons_ (x + 2) xs
sum_go !s !acc = case map_step s of
Nil_ -> acc
Cons_ x xs -> sum_go xs (x + acc)
What we want GHC to do is float in all those join points into eachother:
pipeline :: (Int, Int) -> Int
pipeline range = sum_go range 0
where
sum_go !s !acc = case
let map_step !s = case
let filter_step !s =
case
let between_step (!x,!y) = if x > y then Nil_ else Cons_ x (x+1, y)
in between_step s
of
Nil_ -> Nil_
Cons_ x xs -> if even x then Cons_ x xs else filter_step xs
in filter_step s
of
Nil_ -> Nil_
Cons_ x xs -> Cons_ (x + 2) xs
in map_step s
of
Nil_ -> acc
Cons_ x xs -> sum_go xs (x + acc)
Note that even at this point GHC isn't able to optimise this, see below* for more on that.
And then it should do commuting conversions:
pipeline :: (Int, Int) -> Int
pipeline range = sum_go range 0
where
sum_go !s !acc = case
let map_step !s = case
let filter_step (!x,!y) =
if x > y then Nil_ else
if even x then Cons_ x (x+1, y) else filter_step (x+1, y)
in filter_step s
of
Nil_ -> Nil_
Cons_ x xs -> Cons_ (x + 2) xs
in map_step s
of
Nil_ -> acc
Cons_ x xs -> sum_go xs (x + acc)
And another:
pipeline :: (Int, Int) -> Int
pipeline range = sum_go range 0
where
sum_go !s !acc = case
let filter_step (!x,!y) =
if x > y then Nil_ else
if even x then Cons_ (x + 2) (x+1, y) else filter_step (x+1, y)
in filter_step s
of
Nil_ -> acc
Cons_ x xs -> sum_go xs (x + acc)
And finally:
pipeline :: (Int, Int) -> Int
pipeline range = sum_go range 0
where
sum_go !s !acc =
let filter_step (!x,!y) =
if x > y then acc else
if even x then sum_go (x+1, y) (x + 2 + acc) else filter_step (x+1, y)
in filter_step s
* Here's what the core looks like after the first simplification pass:
main :: State# RealWorld -> (# State# RealWorld, () #)
main
= \ (eta :: State# RealWorld) ->
case joinrec {
sum_go :: (Int, Int) -> Int -> Int
sum_go (s :: (Int, Int)) (acc :: Int)
= case s of s { (ipv, ipv) ->
case acc of acc { I# ipv ->
case case joinrec {
filter_step :: (Int, Int) -> List_ Int (Int, Int)
filter_step (s :: (Int, Int))
= case s of { (ipv, ipv) ->
case case ipv of x { I# ipv ->
case ipv of y { I# ipv ->
case gtInt x y of {
False -> $WCons_ x (I# (+# ipv 1#), y);
True -> Nil_
}
}
}
of {
Nil_ -> Nil_;
Cons_ x xs ->
case even $fIntegralInt x of {
False -> jump filter_step xs;
True -> $WCons_ x xs
}
}
}; } in
jump filter_step s
of {
Nil_ -> Nil_;
Cons_ x xs -> $WCons_ (case x of { I# x -> I# (+# x 2#) }) xs
}
of {
Nil_ -> acc;
Cons_ x xs -> jump sum_go xs (case x of { I# x -> I# (+# x ipv) })
}
}
}; } in
jump sum_go (I# 1#, I# 1000#) (I# 0#)
of
{ I# ipv ->
(# eta, () #)
}
So even inside the body of the filter_step
join point there is a potential for case-of-case that is not happening.
The rest of the changes look good.
This "FloatNone ==> not HNF ==> can't eta-expand" reasoning goes a bit too fast for me to follow. The second implication I can understand because we only want to eta-expand partial applications and those are HNF by definition. But I don't see how the FloatNone decision implies that the arg cannot be HNF.
This was fixed in !11092 (closed)
The documentation of role annotations uses this example:
type role Ptr representational
data Ptr a = Ptr Addr#
And it says that this is the disired role of Ptr
, but the source code of Ptr
tells us:
-- The role of Ptr's parameter is phantom, as there is no relation between
-- the Haskell representation and whatever the user puts at the end of the
-- pointer. And phantom is useful to implement castPtr (see #9163)
-- redundant role annotation checks that this doesn't change
type role Ptr phantom
data Ptr a = Ptr Addr#
deriving ( Eq -- ^ @since 2.01
, Ord -- ^ @since 2.01
)
I propose to find another example or just remove this example and skip straight to the nominal Set
example.
(By the way, I noticed some people mistyping my handle here, so I've changed it to "jaro" for simplicity.)
Yes, the lvlMFE
change looks good but I'll have to look a bit more carefully at the other two changes. I'll do that tomorrow.
I was thinking it should be possible to generalise freeVars
to generate monoidal summaries instead of the full free variable sets. I'm sure there are other places in the compiler that could then share the same machinery.
However, this requires more structure than just a monoid, because each variable binding needs to remove information from the summaries.
Perhaps something like this would work:
-- freeVars = freeVarsSummary (MkSummariseFVs emptyDVarSet unionFVs aFreeVar delBinderFV unionFVss)
data SummariseFVs a = MkSummariseFVs
{ emptyDVarSetSummary :: a
, unionFVsSummary :: a -> a -> a
, aFreeVarSummary :: Var -> a
, delBinderFVSummary :: Var -> a -> a
, unionFVssSummary :: [a] -> a
}
freeVarsBindSummary :: forall a. SummariseFVs a
-> CoreBind
-> a -- Free vars of scope of binding
-> (CoreBindWith a, a) -- Return free vars of binding + scope
freeVarsBindSummary summariser@MkSummariseFVs
{ emptyDVarSetSummary = emptyDVarSet
, unionFVsSummary = unionFVs
, aFreeVarSummary = aFreeVar
, delBinderFVSummary = delBinderFV
}
(NonRec binder rhs) body_fvs
= ( AnnNonRec binder rhs2
, freeVarsOf rhs2 `unionFVs` body_fvs2
`unionFVs` bndrRuleAndUnfoldingVarsDSet binder )
where
mkDVarSet = foldr (\x xs -> aFreeVar x `unionFVs` xs) emptyDVarSet
bndrRuleAndUnfoldingVarsDSet id = mkDVarSet . fvVarList $ bndrRuleAndUnfoldingFVs id
freeVarsOf (x, _) = x
freeVars = freeVarsSummary summariser
rhs2 = freeVars rhs
body_fvs2 = binder `delBinderFV` body_fvs
freeVarsBindSummary summariser@MkSummariseFVs
{ emptyDVarSetSummary = emptyDVarSet
, unionFVsSummary = unionFVs
, aFreeVarSummary = aFreeVar
, delBinderFVSummary = delBinderFV
}
(Rec binds) body_fvs
= ( AnnRec (binders `zip` rhss2)
, delBindersFV binders all_fvs )
where
mkDVarSet = foldr (\x xs -> aFreeVar x `unionFVs` xs) emptyDVarSet
freeVarsOf (x, _) = x
delBindersFV bs fvs = foldr delBinderFV fvs bs
freeVars = freeVarsSummary summariser
(binders, rhss) = unzip binds
rhss2 = map freeVars rhss
rhs_body_fvs = foldr (unionFVs . freeVarsOf) body_fvs rhss2
binders_fvs = mkDVarSet . fvVarList $ mapUnionFV bndrRuleAndUnfoldingFVs binders
-- See Note [The FVAnn invariant]
all_fvs = rhs_body_fvs `unionFVs` binders_fvs
-- The "delBinderFV" happens after adding the idSpecVars,
-- since the latter may add some of the binders as fvs
freeVarsSummary :: forall a. SummariseFVs a -> CoreExpr -> CoreExprWith a
-- ^ Annotate a 'CoreExpr' with a monoidal summary of its (non-global) free type
-- and value variables at every tree node.
freeVarsSummary summariser@MkSummariseFVs
{ emptyDVarSetSummary = emptyDVarSet
, unionFVsSummary = unionFVs
, aFreeVarSummary = aFreeVar
, delBinderFVSummary = delBinderFV
, unionFVssSummary = unionFVss
}
= go
where
mkDVarSet = foldr (\x xs -> aFreeVar x `unionFVs` xs) emptyDVarSet
tyCoVarsOfTypeDSet ty = mkDVarSet (fvVarList (tyCoFVsOfType ty))
dVarTypeTyCoVars v = mkDVarSet (fvVarList (varTypeTyCoFVs v))
tyCoVarsOfCoDSet co = mkDVarSet (fvVarList (tyCoFVsOfCo co))
freeVarsOf (x, _) = x
delBindersFV bs fvs = foldr delBinderFV fvs bs
freeVarsBind = freeVarsBindSummary summariser
go :: CoreExpr -> CoreExprWith a
go (Var v)
| isLocalVar v = (aFreeVar v `unionFVs` ty_fvs `unionFVs` mult_vars, AnnVar v)
| otherwise = (emptyDVarSet, AnnVar v)
where
mult_vars = tyCoVarsOfTypeDSet (varMult v)
ty_fvs = dVarTypeTyCoVars v
-- See Note [The FVAnn invariant]
go (Lit lit) = (emptyDVarSet, AnnLit lit)
go (Lam b body)
= ( b_fvs `unionFVs` (b `delBinderFV` body_fvs)
, AnnLam b body' )
where
body'@(body_fvs, _) = go body
b_ty = idType b
b_fvs = tyCoVarsOfTypeDSet b_ty
-- See Note [The FVAnn invariant]
go (App fun arg)
= ( freeVarsOf fun' `unionFVs` freeVarsOf arg'
, AnnApp fun' arg' )
where
fun' = go fun
arg' = go arg
go (Case scrut bndr ty alts)
= ( (bndr `delBinderFV` alts_fvs)
`unionFVs` freeVarsOf scrut2
`unionFVs` tyCoVarsOfTypeDSet ty
-- Don't need to look at (idType bndr)
-- because that's redundant with scrut
, AnnCase scrut2 bndr ty alts2 )
where
scrut2 = go scrut
(alts_fvs_s, alts2) = mapAndUnzip fv_alt alts
alts_fvs = unionFVss alts_fvs_s
fv_alt (Alt con args rhs) = (delBindersFV args (freeVarsOf rhs2),
(AnnAlt con args rhs2))
where
rhs2 = go rhs
go (Let bind body)
= (bind_fvs, AnnLet bind2 body2)
where
(bind2, bind_fvs) = freeVarsBind bind (freeVarsOf body2)
body2 = go body
go (Cast expr co)
= ( freeVarsOf expr2 `unionFVs` cfvs
, AnnCast expr2 (cfvs, co) )
where
expr2 = go expr
cfvs = tyCoVarsOfCoDSet co
go (Tick tickish expr)
= ( tickishFVs tickish `unionFVs` freeVarsOf expr2
, AnnTick tickish expr2 )
where
expr2 = go expr
tickishFVs (Breakpoint _ _ ids _) = mkDVarSet ids
tickishFVs _ = emptyDVarSet
go (Type ty) = (tyCoVarsOfTypeDSet ty, AnnType ty)
go (Coercion co) = (tyCoVarsOfCoDSet co, AnnCoercion co)
My inner perfectionist is telling me to investigate the algorithm more deeply. Eliminating top-level variables from the free variable sets is a nice special case to solve, but I think we should be able to solve the whole problem more generally.
Theoretically, I believe it should be possible to reuse work from lower layers in the expression tree so that we only really have to consider the new free variables that are introduced in each layer (i.e. the difference between the free variable set of the current expression and the free variable sets of its subexpressions).
But maybe the payoffs are not worth the work that such an approach would require.
Yeah, that helps a lot.
The remaining bottleneck seems to be destLevel
and maxIn
:
COST CENTRE MODULE SRC %time %alloc
$wdestLevel GHC.Core.Opt.SetLevels compiler/GHC/Core/Opt/SetLevels.hs:1444:1-9 26.5 0.0
$wmaxIn GHC.Core.Opt.SetLevels compiler/GHC/Core/Opt/SetLevels.hs:1652:1-5 14.3 35.2
If I change that abstractVars
call to just always return an empty list then compilation speeds up a lot, but I still see the quadratic time with these main culprits:
COST CENTRE MODULE SRC %time %alloc
lvlMFE GHC.Core.Opt.SetLevels <no location info> 30.8 0.2
$wdestLevel GHC.Core.Opt.SetLevels compiler/GHC/Core/Opt/SetLevels.hs:1444:1-9 23.8 0.0
$wmaxIn GHC.Core.Opt.SetLevels compiler/GHC/Core/Opt/SetLevels.hs:1652:1-5 19.5 68.5
Ah, so what I think happens is that the first float out pass floats out all the numbers in the list. Then in the second float all the nested expressions have a large set of free variables. And this abstractVars
function does some linear amount of work in the number of free variables at every level of the expression.
Profiling GHC points me to quite a clear culprit:
individual inherited
COST CENTRE MODULE SRC no. entries %time %alloc %time %alloc
lvlMFE GHC.Core.Opt.SetLevels <no location info> 3938 30008 6.0 0.1 69.3 85.2
$wmaxIn GHC.Core.Opt.SetLevels compiler/GHC/Core/Opt/SetLevels.hs:1652:1-5 3943 9030006 1.2 2.6 1.2 2.6
$w$wgo6 GHC.Core.Opt.SetLevels <no location info> 3952 21006 0.0 0.0 0.0 0.0
go1 GHC.Core.Opt.SetLevels compiler/GHC/Core/Opt/SetLevels.hs:1049:5-6 3945 18008 0.0 0.0 0.0 0.0
$wabstractVars GHC.Core.Opt.SetLevels compiler/GHC/Core/Opt/SetLevels.hs:1684:1-12 3946 18004 55.7 80.0 55.8 80.0
So more than half the time is spent in the abstractVars
call in lvlMFE
.