Consider floating let and single-branch case inwards when matching rules

Motivation

I'm trying to implement stream fusion for concatMap using the new higher order patterns in rewrite rules. I'm now running into other limitations of the rule matcher. In particular, let and (single-branch) case expressions can inhibit rules from firing.

As a quick reminder, here's the rewrite rule for concatMap that uses higher order patterns:

{-# RULES
"concatMap" forall next f. concatMap (\x -> Stream next (f x)) = concatMap' next f
#-}

There is already a mechanism in place that tries to float out let expressions when matching (See Note [Matching lets]). However, when local binders are used in the let then floating outwards is not possible. For example if the matcher encounters an expression like this:

concatMap (\x -> let s = ... x ... in Stream next s)

Then the s binding cannot be floated out.

I thought I had a good common example, but I'm not sure any more

One particularly common example is due to strictness of the stream state. For optimization purposes, we want to keep the stream state strict and thus define Stream like this:

data Stream a = forall s. Stream (s -> Step s a) !s

That strict field means there will often be a case expression around the Stream which forces that field, e.g. something like this:

concatMap (\x -> case x of n { I# _ -> Stream next n })

(I should perhaps find a real example and not write this off the top of my head. Please prod me if this example is unclear.)

Lessons from the HERMIT in the stream

This had already been observed in the HERMIT in the stream paper, which aims to solve the same problem via an external rewrite engine. They say:

2b. Float a let inwards. The inner stream will often be wrapped in let bindings, especially if we have unfolded a stream combinator. These bindings will scope over the entire Stream constructor, and may or may not depend on the value of the outer stream, so we cannot reliably float them outwards. We can observe, however, that they will never capture the Stream constructor itself, so we can reliably float them inwards past the constructor.
let b = e in Stream g s =⇒ Stream (let b = e in g) (let b = e in s)
Floating inwards necessarily duplicates let bindings. This loss of sharing could result in duplicated allocation and computation. In practice, it appears rare that a let binding is used in both the generator and the state (the two arguments to the Stream constructor), so at least one will usually be eliminated by the next gentle simplification step, which, according to Algorithm 1, will occur directly after this transformation. In addition to applications, lets are floated into case expressions and lambdas.

And they argue similarly for floating in single-branch case expressions:

2e. Float a case inwards. Strictness annotations result in case expressions with a single, always matching default alternative. The right-hand side of the alternative may refer to the bound result of the evaluated scrutinee. For the same reasons we cannot always float lets outward, our only option is to eliminate the case or float it inwards. We know that we cannot eliminate it wholesale, or step 2c would have done so. Eliminating it involves either changing the case to a let, which would make the program more lazy, or substituting the scrutinee into the right-hand side of the alternative, which could duplicate work. We instead choose to float it inwards.
case e of         =⇒  Stream (case e of v → g)
  v → Stream g s             (case e of v → s)
In most situations, one of these cases is eliminated, avoiding duplicated work, though we have not explicitly quantified how often this happens in practice. Within the context of the overall transformation, strictness remains unaltered. Nominally, if e is ⊥, it is now possible to force the entire expression to the Stream constructor, where before it would have diverged. However, forcing just the Stream constructor without forcing one of its arguments provides no useful information, so only a pathological stream combinator would do so. The other situation which requires this rule is the desugaring of pattern binders for list comprehensions (Section 5.3). The desugaring results in a case with a single alternative to bind the components of the pattern. Again, we float the case inwards, possibly duplicating computation.

Proposal

I propose we implement similar transformations when matching rules. Specifically, whenever we are matching a template against an expression and that expression is a let or a single-branch case (and the template is not a let or case) then

I have a crude prototype which seems to be sufficient for my purposes (!12723):

tryFloatIn :: CoreExpr -> Maybe CoreExpr
tryFloatIn = go emptyVarSet False id where
  go vs _ c (Let bind e) = go (extendVarSetList vs (bindersOf bind)) True (c . Let bind) e
  go vs _ c (Case scrut case_bndr ty [Alt con alt_bndrs rhs]) = go (extendVarSetList vs alt_bndrs) True (c . (\x -> Case scrut case_bndr (exprType x) [Alt con alt_bndrs x])) rhs
  go vs True c (App e1 e2) = App <$> go vs True c e1 <*> pure (c e2)
  go vs True c e@(Var v) | not (v `elemVarSet` vs) = Just e
  go vs True _ e@Type{} = Just e
  go vs True _ e@Lit{} = Just e
  go _ _ _ _ = Nothing

-- ... and add a new penultimate case to the `match` function:

-- try floating in let and (single-branch) case expressions
match renv subst e1 e2 mco
  | Just e2' <- tryFloatIn e2
  = match renv subst e1 e2' mco

I think the main open question is: under which circumstances should we do this floating?

Should we always apply them?
Should programmers get the ability to control which rules are allowed to do this floating?
Should we implement heuristics to determine when this floating in is not very harmful?

Edited May 24, 2024 by Jaro Reinders

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information