Non-existant scoping semantics for IMP-RULEs makes Simplifier build ill-scoped Core programs

Motivation

While making CoreLint more picky in scope-checking the telescope of top-level bindings in Core programs when working on #19516, I noticed that we indeed generate ill-scoped programs after Simplification (#19516 (comment 337567)). Basically, we can end up in the situation where we have

(Non)Rec { foo = ... $sf ... }
NonRec   { $sf = ... }

@simonpj said this was due to Note [Glomming] and how it affects the Simplifier. The gist is this:

module A (f) where
  f :: C a => ...
  f = ...
  {-# INLINEABLE f #-}
  g :: C a => ...
  g = /\a. ... f @a ...
  {-# INLINEABLE g #-} 
module B where
  import A
  foo = ... A.g @Int ...
  ...
  $sf = ...
  RULE: "spec" A.f @Int = $sf

RULE "spec" is an IMP-RULE: A rewrite rule in the current module B with an imported function A.f in its head.

OccAnal has trouble to track the dependency edges foo -(1)-> A.g -(2)-> A.f -(3)-> $sf here. It is well-equipped to track (1) and (3), but (2) is lost after finishing compilation of A. It could be recovered if we made OccAnal look at unfoldings of exported functions, but that seems quite complicated.

The Simplifier, on the other hand, will happily inline A.g into foo, thus making (2) visible in B. The Simplifier will immediately rewrite with "spec" to $sf and we get

module B where
  import A
  NonRec { foo = ... $sf ... }
  ...
  NonRec { $sf = ... }
  RULE: "spec" A.f @Int = $sf

Which is ill-scoped. The next run of OccAnal will see the weird scoping and "glomm" all top-level bindings into a big Rec, thus restoring well-scopedness. But that doesn't help if there is no more run of OccAnal after the Simplifier.

Proposal

The bug is caused by

OccAnal not seeing edge (2) because it doesn't look into the unfolding of imports
The Simplifier immediately rewriting with "spec" after having inlined A.g

Idea: Fix (1)

Rectifying (1) by making OccAnal look into unfoldings sounds unpractical. On the plus side, I believe it would also fix the need for "glomming", because we won't suddenly discover another back edge wrt. condensation graph. We could pre-compute the free vars of an imported unfolding and persist it in the interface file. Still, the overhead can probably grow above and beyond in some cases.

I'd rather not try it; glomming isn't so bad.

Idea: Fix (2)

Plan A

The challenge is: How to decide whether "spec" is active or not? My idea: Give IMP-RULEs scoping semantics like every other top-level binding (type CoreProgram = [Either CoreBind ImpRule]). In the example above, "spec" should only be active in bindings "further down". Because "spec" appears after foo, the Simplifier will be unable to rewrite with "spec" after inlining A.g. After the first run of the Simplifier, we'll get

module B where
  import A
  NonRec {foo = ... A.f @Int ...}
  ...
  NonRec {$sf = ...}
  RULE: "spec" A.f @Int = $sf

The next run of OccAnal sees foo -(1)-> A.f -(3)-> $sf with the back/cross edge (3). It reports "glomming", re-analyses the program (stretch-goal optimisation: take a sharp look at which SCCs might be affected by the back edge and only re-analyse those) and arrives at

module B where
  import A
  NonRec {$sf = ...}
  RULE: "spec" A.f @Int = $sf
  ...
  NonRec {foo = ... A.f @Int ...}

All without ever creating an ill-scoped Core program in the process.

Unbreaking scoping and simplifying a lot of code are good arguments to pursue this approach.

Plan B

Simon had some concerns and outlined an alternative plan that controls whether an IMP-RULE is active: By making the Simplifier test if all the free-variables of the RHS of the IMP-RULE are in scope.

While it's arguably much simpler than worrying about IMP-RULEs as top-level entities, it needs OccAnal to compute the free vars of the RULE. We were worried that it involves taking the transitive closure:

RULE "f" A.f (p,q) = A.g p
RULE "g" A.g _     = r

E.g., fvs("f") = {p,r}, not just {p}. But I don't think that's necessary! Suppose only p was in scope, but r was not and we had an expression A.f (p,q). Then the Simplifier is free to rewrite with "f", but may not rewrite with "g". Pretty simple, and no transitive closure necessary. Nice!

Note that feasibility of this plan hinges on there being not only no shadowing, but also that top-level names are unique. Otherwise it's easy to trick the Simplifier in believing that e.g. r is in scope just because there is a local variable with the same name. I don't know if that is an assumption we can make? e.g.

p = ...
q = ...
foo = let r = ... in
      A.f (p, q) -- DO NOT REWRITE WITH "g"!
r = ...
RULE "f" A.f (p,q) = A.g p
RULE "g" A.g _     = r

Note that "g" means the r that is visible at the declaration site of the RULE (wherever that is?!). Very tricky.

(I wonder if we can't apply the same idea to RULEs in general.)

Summary

So while Plan A is the nicer model in which we should think about IMP-RULEs as bindings, it also pollutes the CoreProgram synonym with the cruft of a seldomly interesting data type. Plan B is a pretty simple implementation after that model that works without the overhead of extending Core etc., provided we get the scoping part right...

Edited Mar 12, 2021 by Sebastian Graf

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information