## E-Graphs for representation of normalised refinement types in the pattern-match checker

I realised today that the quite complicated data structure+invariants (`Nabla`

's `TmState`

) we use while pattern-match coverage checking fits "e-graphs" quite well.

I realised so while reading the POPL 2021 paper * egg: Fast and Extensible Equality Saturation*. The similarity is striking. I think the inhabitation test could probably be expressed as an e-graph analysis.

To be a bit more concrete, an e-graph is basically a union-find data structure on terms. If we start with the disjoint equality classes `{Just x}, {Just y}, {x}, {y}`

and later find out that `x ~ y`

, we'd `merge`

the equality classes `{x}`

and `{y}`

. The e-graph data structure then restores congruence, putting `Just x`

and `Just y`

into the same equality class as well: `{Just x, Just y}, {x, y}`

. Note how this also puts the "CSE" pass envisioned in #17208 on firm ground. There we have

```
safeLast3 :: [a] -> Maybe a
safeLast3 xs
| [] <- reverse xs = Nothing
safeLast3 xs'
| (x:_) <- reverse xs' = Just x
```

To see that the match is exhaustive, we have to see that `reverse xs ~ reverse xs'`

from `xs ~ xs'`

. This is exactly the kind of congruence that e-graphs maintain: Replace `Just`

by `reverse`

in the example above. (It does so via a technique called "upward merge" that we'd have to implement in an ad-hoc manner to fix #17208).

The e-graph representation is quite compact and trivially supports the ∆(x) lookup from the paper (it's just the `find`

operation), and extends it to expressions, ∆(e), via `canonicalize`

.

While I think this is an important and useful observation, I'm not so keen about implementing it; it's a bit unclear to me if we can retain the same (mostly sufficient) efficiency that our checker has today, and #17208 alone is not compelling enough of a use case for me to pursue.

Maybe that data structure would be useful in other parts of GHC as well. I imagine that Type Family rewrites fit into the same framework. It was conceived for theorem provers, after all. Maybe it's also a useful representation for the Simplifier, although implementing it would be a huge change with possibly daunting repercussions. Still, a good data structure to have up one's sleeve.