Idea: Annotating Core to avoid unnecessary traversal of large subexpressions
Very often when working with Core ASTs we need to compute and test against free variable sets. Currently we must compute such free variable sets from scratch, always traversing the entirety of the expression in question. Similarly, substitution (another very common operation) must always traverse the entirety of the expression being substituted into. This seems potentially wasteful: we might rebuild an entire large expression only to find that it did not mention any variables mentioned in the substitution.
This suggests an (potentially hair-brained) idea: Perhaps we can avoid traversing a subexpression entirely by introducing a means of annotating expressions (and types, coercions) with their free variable set. e.g.,
data Expr b
| Var Id
| Lit Literal
| ...
| Coercion Coercion
-- new:
| FVAnn VarSet (Expr b)
-- TODO: It seems wrong for this to be part of `Expr`;
-- perhaps this should rather be a TTG-style extension?
Note that FVAnn
nodes here would have not change program semantics. Rather, they are merely a book-keeping measure which serve as a cheap way to determine whether, e.g., substitution needs to walk into the wrapped subexpression. As such they can be dropped and introduced freely.
There are a few places where these FVAnn
things would be useful:
- When, e.g., substitution encounters an
FVAnn
node it could check whether there is any overlap between the free variable set and the domain of the substitution; if not it can avoid looking at the wrapped subexpression entirely. I suspect that this might in particular benefit the simplifier as it will substitute into the program on every simplifier iteration, even if most of the program has already "converged". - Computing the free variable set of an
FVAnn
node would need to do no further work.
Given that computing and maintaining these lists is expensive, we almost certainly wouldn't want to wrap every every (sub)expression of a program in an FVAnn
. Rather you would want to "sprinkle" such sets around the program occasionally as an opportunistic optimization.
There are many of open questions here:
- how frequently in the AST when would these be introduced (e.g. every /n/ levels of depth)?
- should substituting into an expression preserve
FVAnn
s or drop them? The former of course would come at a cost while the latter would greatly reduce the impact of the optimisation - when during the compilation pipeline do such annotations get added (e.g. when desugaring?)