Skip to content

E-Graphs for representation of normalised refinement types in the pattern-match checker

I realised today that the quite complicated data structure+invariants (Nabla's TmState) we use while pattern-match coverage checking fits "e-graphs" quite well.

I realised so while reading the POPL 2021 paper egg: Fast and Extensible Equality Saturation. The similarity is striking. I think the inhabitation test could probably be expressed as an e-graph analysis.

To be a bit more concrete, an e-graph is basically a union-find data structure on terms. If we start with the disjoint equality classes {Just x}, {Just y}, {x}, {y} and later find out that x ~ y, we'd merge the equality classes {x} and {y}. The e-graph data structure then restores congruence, putting Just x and Just y into the same equality class as well: {Just x, Just y}, {x, y}. Note how this also puts the "CSE" pass envisioned in #17208 on firm ground. There we have

safeLast3 :: [a] -> Maybe a
safeLast3 xs
  | []    <- reverse xs = Nothing
safeLast3 xs'
  | (x:_) <- reverse xs' = Just x

To see that the match is exhaustive, we have to see that reverse xs ~ reverse xs' from xs ~ xs'. This is exactly the kind of congruence that e-graphs maintain: Replace Just by reverse in the example above. (It does so via a technique called "upward merge" that we'd have to implement in an ad-hoc manner to fix #17208).

The e-graph representation is quite compact and trivially supports the ∆(x) lookup from the paper (it's just the find operation), and extends it to expressions, ∆(e), via canonicalize.

While I think this is an important and useful observation, I'm not so keen about implementing it; it's a bit unclear to me if we can retain the same (mostly sufficient) efficiency that our checker has today, and #17208 alone is not compelling enough of a use case for me to pursue.

Maybe that data structure would be useful in other parts of GHC as well. I imagine that Type Family rewrites fit into the same framework. It was conceived for theorem provers, after all. Maybe it's also a useful representation for the Simplifier, although implementing it would be a huge change with possibly daunting repercussions. Still, a good data structure to have up one's sleeve.

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information