E-Graphs for representation of normalised refinement types in the pattern-match checker
I realised today that the quite complicated data structure+invariants (Nabla
's TmState
) we use while pattern-match coverage checking fits "e-graphs" quite well.
I realised so while reading the POPL 2021 paper egg
: Fast and Extensible Equality Saturation. The similarity is striking. I think the inhabitation test could probably be expressed as an e-graph analysis.
To be a bit more concrete, an e-graph is basically a union-find data structure on terms. If we start with the disjoint equality classes {Just x}, {Just y}, {x}, {y}
and later find out that x ~ y
, we'd merge
the equality classes {x}
and {y}
. The e-graph data structure then restores congruence, putting Just x
and Just y
into the same equality class as well: {Just x, Just y}, {x, y}
. Note how this also puts the "CSE" pass envisioned in #17208 on firm ground. There we have
safeLast3 :: [a] -> Maybe a
safeLast3 xs
| [] <- reverse xs = Nothing
safeLast3 xs'
| (x:_) <- reverse xs' = Just x
To see that the match is exhaustive, we have to see that reverse xs ~ reverse xs'
from xs ~ xs'
. This is exactly the kind of congruence that e-graphs maintain: Replace Just
by reverse
in the example above. (It does so via a technique called "upward merge" that we'd have to implement in an ad-hoc manner to fix #17208).
The e-graph representation is quite compact and trivially supports the ∆(x) lookup from the paper (it's just the find
operation), and extends it to expressions, ∆(e), via canonicalize
.
While I think this is an important and useful observation, I'm not so keen about implementing it; it's a bit unclear to me if we can retain the same (mostly sufficient) efficiency that our checker has today, and #17208 alone is not compelling enough of a use case for me to pursue.
Maybe that data structure would be useful in other parts of GHC as well. I imagine that Type Family rewrites fit into the same framework. It was conceived for theorem provers, after all. Maybe it's also a useful representation for the Simplifier, although implementing it would be a huge change with possibly daunting repercussions. Still, a good data structure to have up one's sleeve.