Non-existant scoping semantics for IMP-RULEs makes Simplifier build ill-scoped Core programs
Motivation
While making CoreLint more picky in scope-checking the telescope of top-level bindings in Core programs when working on #19516, I noticed that we indeed generate ill-scoped programs after Simplification (#19516 (comment 337567)). Basically, we can end up in the situation where we have
(Non)Rec { foo = ... $sf ... }
NonRec { $sf = ... }
@simonpj said this was due to Note [Glomming]
and how it affects the Simplifier. The gist is this:
module A (f) where
f :: C a => ...
f = ...
{-# INLINEABLE f #-}
g :: C a => ...
g = /\a. ... f @a ...
{-# INLINEABLE g #-}
module B where
import A
foo = ... A.g @Int ...
...
$sf = ...
RULE: "spec" A.f @Int = $sf
RULE "spec" is an IMP-RULE: A rewrite rule in the current module B
with an imported function A.f
in its head.
OccAnal has trouble to track the dependency edges foo -(1)-> A.g -(2)-> A.f -(3)-> $sf
here. It is well-equipped to track (1) and (3), but (2) is lost after finishing compilation of A. It could be recovered if we made OccAnal look at unfoldings of exported functions, but that seems quite complicated.
The Simplifier, on the other hand, will happily inline A.g
into foo
, thus making (2) visible in B
. The Simplifier will immediately rewrite with "spec" to $sf
and we get
module B where
import A
NonRec { foo = ... $sf ... }
...
NonRec { $sf = ... }
RULE: "spec" A.f @Int = $sf
Which is ill-scoped. The next run of OccAnal will see the weird scoping and "glomm" all top-level bindings into a big Rec, thus restoring well-scopedness. But that doesn't help if there is no more run of OccAnal after the Simplifier.
Proposal
The bug is caused by
- OccAnal not seeing edge (2) because it doesn't look into the unfolding of imports
- The Simplifier immediately rewriting with "spec" after having inlined
A.g
Idea: Fix (1)
Rectifying (1) by making OccAnal look into unfoldings sounds unpractical. On the plus side, I believe it would also fix the need for "glomming", because we won't suddenly discover another back edge wrt. condensation graph. We could pre-compute the free vars of an imported unfolding and persist it in the interface file. Still, the overhead can probably grow above and beyond in some cases.
I'd rather not try it; glomming isn't so bad.
Idea: Fix (2)
Plan A
The challenge is: How to decide whether "spec" is active or not? My idea: Give IMP-RULEs scoping semantics like every other top-level binding (type CoreProgram = [Either CoreBind ImpRule]
). In the example above, "spec" should only be active in bindings "further down". Because "spec" appears after foo
, the Simplifier will be unable to rewrite with "spec" after inlining A.g
. After the first run of the Simplifier, we'll get
module B where
import A
NonRec {foo = ... A.f @Int ...}
...
NonRec {$sf = ...}
RULE: "spec" A.f @Int = $sf
The next run of OccAnal sees foo -(1)-> A.f -(3)-> $sf
with the back/cross edge (3). It reports "glomming", re-analyses the program (stretch-goal optimisation: take a sharp look at which SCCs might be affected by the back edge and only re-analyse those) and arrives at
module B where
import A
NonRec {$sf = ...}
RULE: "spec" A.f @Int = $sf
...
NonRec {foo = ... A.f @Int ...}
All without ever creating an ill-scoped Core program in the process.
Unbreaking scoping and simplifying a lot of code are good arguments to pursue this approach.
Plan B
Simon had some concerns and outlined an alternative plan that controls whether an IMP-RULE is active: By making the Simplifier test if all the free-variables of the RHS of the IMP-RULE are in scope.
While it's arguably much simpler than worrying about IMP-RULEs as top-level entities, it needs OccAnal to compute the free vars of the RULE. We were worried that it involves taking the transitive closure:
RULE "f" A.f (p,q) = A.g p
RULE "g" A.g _ = r
E.g., fvs("f") = {p,r}
, not just {p}
. But I don't think that's necessary! Suppose only p
was in scope, but r
was not and we had an expression A.f (p,q)
. Then the Simplifier is free to rewrite with "f", but may not rewrite with "g". Pretty simple, and no transitive closure necessary. Nice!
Note that feasibility of this plan hinges on there being not only no shadowing, but also that top-level names are unique. Otherwise it's easy to trick the Simplifier in believing that e.g. r
is in scope just because there is a local variable with the same name. I don't know if that is an assumption we can make? e.g.
p = ...
q = ...
foo = let r = ... in
A.f (p, q) -- DO NOT REWRITE WITH "g"!
r = ...
RULE "f" A.f (p,q) = A.g p
RULE "g" A.g _ = r
Note that "g" means the r
that is visible at the declaration site of the RULE (wherever that is?!). Very tricky.
(I wonder if we can't apply the same idea to RULEs in general.)
Summary
So while Plan A is the nicer model in which we should think about IMP-RULEs as bindings, it also pollutes the CoreProgram
synonym with the cruft of a seldomly interesting data type. Plan B is a pretty simple implementation after that model that works without the overhead of extending Core etc., provided we get the scoping part right...