Allow rules to fire on workers / WW can break rule matching.

mentioned in issue #20359 (closed)

This seems like the standard sort of inlining vs rules issue, is there anything special about WW apart from the fact we perform it ourselves? Perhaps you could you argument extending to any inlining decision and rewriting rules based on inlinings.

This is the kind of area that equality saturation is supposed to solve.

I agree. It's the same (unsatifying) answer as always: If you want your function to take part in rewrite rules (constant folding, list fusion), then mark it INLINE or INLINEABLE. Otherwise, all bets are off.

Edit: Also agreed on equality saturation, but that's still quite a hard thing to do efficiently. Also it's not completely clear what the fitness function is that enables you to pick the best candidate.

This seems like the standard sort of inlining vs rules issue, is there anything special about WW apart from the fact we perform it ourselves? Perhaps you could you argument extending to any inlining decision and rewriting rules based on inlinings.

Yes there is. We apply WW to functions marked NOINLINE so their rules can trivially break if:

We WW the function foo (which is marked NOINLINE)
We inline the wrapper into bar giving bar = ... $wfoo .... (bar not having any pragma)
We now inline bar into a context which would make a rule for foo trigger. But we only see the application of $wfoo so it doesn't.

mentioned in issue #20360 (closed)

As sebastian alluded to this can be solved by giving every function in which foo is used a INLINEABLE or a INLINE[n] pragma. For cases where the function we apply WW on is marked NOINLINE this is a pretty steep price to pay.

But not sure if fully implementing something like this would be fast enough in regards to compile time to be a real alternative. Someone would have to try.

It's the same (unsatifying) answer as always: If you want your function to take part in rewrite rules (constant folding, list fusion), then mark it INLINE or INLINEABLE. Otherwise, all bets are off.

This is true, but (a) it's nothing to do with w/w and (b) it's hard to see what else we could possibly do. In the Description, if you don't put a NOINLINE on plus you have no reason to suppose that GHC won't inline plus early (quite independently of w/w). If you want a RULE to fire, add a NOINLINE pragma.

Returning to w/w.

If there is no INLINE/NOINLINE pragma, the, for the reasons above, the the programmer has no reason to expect that the function will not be inlined, and has no reason to expect that RULEs involving that function will fire.
If there is a NOINLINE pragma, then GHC honours it, even through w/w.
If there is an INLINE pragma, GHC doesn't do w/w

See

Note [Worker/wrapper for INLINABLE functions]
Note [Worker/wrapper for NOINLINE functions]
Note [Wrapper activation]

I'm reasonably content with the status quo. Maybe I'm missing something.

changed the description

This is true, but (a) it's nothing to do with w/w and (b) it's hard to see what else we could possibly do. In the Description, if you don't put a NOINLINE on plus you have no reason to suppose that GHC won't inline plus early (quite independently of w/w). If you want a RULE to fire, add a NOINLINE pragma.

All that I wrote also happens for NOINLINE functions. Sorry if that was unclear.

If there is a NOINLINE pragma, then GHC honours it, even through w/w.

I'm unsure what exactly you mean here. We "honour" it in the sense that we don't inline the resulting worker. We still inline the wrapper breaking rules as a consequence.

You can try it yourself plus does indeed get a INLINE[final] unfolding.

Step1: We WW the function plus (which is marked NOINLINE)


module A where

{-# NOINLINE plus #-}
plus :: (Int, Int) -> Int
plus (x,y) = x + y :: Int

--This produces a wrapper that will get inlined:

-- RHS size: {terms: 17, types: 13, coercions: 0, joins: 0/0}
plus [InlPrag=[final]] :: (Int, Int) -> Int
[GblId,
 Arity=1,
 Str=<1P(1P(L),1P(L))>,
 Cpr=1,
 Unf=Unf{Src=InlineStable, TopLvl=True, Value=True, ConLike=True,
         WorkFree=True, Expandable=True,
         Guidance=ALWAYS_IF(arity=1,unsat_ok=True,boring_ok=False)
         Tmpl= \ (ds_sER [Occ=Once1!] :: (Int, Int)) ->
                 case ds_sER of { (ww_sET [Occ=Once1!], ww1_sEX [Occ=Once1!]) ->
                 case ww_sET of { GHC.Types.I# ww2_sEV [Occ=Once1] ->
                 case ww1_sEX of { GHC.Types.I# ww3_sEZ [Occ=Once1] ->
                 case A.$wplus ww2_sEV ww3_sEZ of ww4_sF5 [Occ=Once1] { __DEFAULT ->
                 GHC.Types.I# ww4_sF5
                 }
                 }
                 }
                 }}]
...

Step2: We inline the wrapper into a function bar giving bar = ... $wfoo .... (bar not having any pragma)

module B where

import A

bar :: (Int, Int) -> Int
bar (x,y)= plus (x,y+1)

-------------

-- RHS size: {terms: 19, types: 13, coercions: 0, joins: 0/0}
bar :: (Int, Int) -> Int
[GblId,
 Arity=1,
 Str=<1P(1P(L),1P(L))>,
 Cpr=1,
 Unf=Unf{Src=InlineStable, TopLvl=True, Value=True, ConLike=True,
         WorkFree=True, Expandable=True,
         Guidance=ALWAYS_IF(arity=1,unsat_ok=True,boring_ok=False)
         Tmpl= \ (ds_dJg [Occ=Once1!] :: (Int, Int)) ->
                 case ds_dJg of { (x_aIX [Occ=Once1!], y_aIY [Occ=Once1!]) ->
                 case x_aIX of { GHC.Types.I# ww2_iJn [Occ=Once1] ->
                 case y_aIY of { GHC.Types.I# x1_aEQ [Occ=Once1] ->
                 case A.$wplus ww2_iJn (GHC.Prim.+# x1_aEQ 1#)
                 of ww4_iJs [Occ=Once1]
                 { __DEFAULT ->
                 GHC.Types.I# ww4_iJs
                 }
                 }
                 }
                 }}]
bar
  = \ (ds_dJg :: (Int, Int)) -> ...

Step 3:

We have a (important) rule matching on plus, we inline bar into a context where the rule would fire on the wrapper. But it fails to fire because we only ever see the worker:

module C where

import B
import A

f = bar (0,-1) -- bar = plus (x,y+1) so this should end up being `plus (0,0)` and be rewritten to 0.

{-# RULES
"plus/zero" forall. plus(0,0) = 0
#-}

-- The rule fails to fire so we get a function call for essentially plus(0,0)

-- RHS size: {terms: 7, types: 1, coercions: 0, joins: 0/0}
f :: Int
[GblId,
 Unf=Unf{Src=<vanilla>, TopLvl=True, Value=False, ConLike=False,
         WorkFree=False, Expandable=False, Guidance=IF_ARGS [] 50 10}]
f = case A.$wplus 0# 0# of ww4_iL9 { __DEFAULT ->
    GHC.Types.I# ww4_iL9
    }

We have a (important) rule matching on plus, we inline bar into a context where the rule would fire on the wrapper

But if the rule is important, then you are implicitly relying on bar being inlined. But if you don't have an INLINE pragma on bar that is an extremely delicate property.

Moreover, this is nothing to do with w/w. It'll happen even if plus is not w/w'd, but is small, then it may still inline into bar before bar's unfolding is created.

So my answer is: if you think it's important, put an INLINE pragma on bar.

To add to what Simon said,

I'm unsure what exactly you mean here. We "honour" it in the sense that we don't inline the resulting worker. We still inline the wrapper breaking rules as a consequence.

We honour the NOINLINE in WW just fine by allocating a wrapper that inlines very late, in the final phase. At that point all rewrite rules had ample chance to fire. The problems you are observing are cross-module, when we compile module C after the wrappers had been inlined in module B. There's no other way than attaching an INLINE/INLINABLE pragma to bar (which could just as well have been defined in module A here), indicating that you want to compile it "simultaneously" with any client modules.

Anything else is frankly a performance bug in the making. It's unfortunate that we only discover them by doing more WW, but it's not the fault of WW per sé. It's actually pretty interesting, another example for why doing more WW might regress things (such as list fusion).

Maybe we should stabilise unfoldings of exported bindings just before WW? But then why not do it earlier? Not satisfactory at all.

So the bottom line is that for any function that is suitable for W/W all transitive use sites up to where a rule might fire needs to be marked INLINE[ABLE].

The problem of course is that for a user writing code it might not be at all apparent that his function is such a transitive use site that needs to get a pragma. What get's a worker and what doesn't similarly isn't stable.

Overall I don't find this satisfactory. Perhaps #19276 would help. At least then users can make a deliberate choice besides disabling W/W for a whole module if they want all use sites to be matchable by rules.

The problem of course is that for a user writing code it might not be at all apparent that his function is such a transitive use site that needs to get a pragma.

Exactly, I also think that's a huge problem of rewrite rules in general and foldr/build fusion in particular.

It actually has nothing to do with WW; WW is but one program transformation that might change calling convention around a function. Inlining is a much more prominent one (although that is taken care off with one NOINLINE at the definition site), Specialisation (type class or constructor) being another (or maybe we don't specialise NOINLINE functions, not sure). Oh, and delicate rewrite rule frameworks won't work across modules, either (without INLINEABLE) as the following example shows for foldr/build fusion:

module A where
myMap f xs = [ f x | x <- xs ]

module B where
import A
foo :: [Int]
foo = myMap (*2) [1,2,3]
bar :: [Int]
bar = map (*2) [1,2,3]

If you look at the Core for bar you'll see that it will become a constant [2,4,6]. Not so for foo, which inlines myMap (which is SAT'd, so at least we get to specialise for (*2)) and then we have a local go loop that is unable to fuse with the list producer [1,2,3]. That's because fusion is supposed to happen in phase 1 and phase 1 for A happens at a different point in time than phase 1 for B. Crucially, phase 0 rewrites to the local go loop because that's more efficient to execute, but that is unable to fuse in B's phase 1.

As I said: If the user wants rewrite rules to apply to their functions' RHS, they have to mark it INLINABLE. If they don't, then all bets are off. It might work, but often times it doesn't. If you mark myMap with INLINABLE, foo and bar optimise to the same constant (and are even identified by CSE).

Now if WW really was the only pass that changes calling convention, we could try and stabilise all unfoldings prior to WW. Although that's challenging because we still want to inline wrapper unfoldings within the same module. In effect, we'd need multiple different unfoldings per function. And since it really is not the only pass that meddles with calling convention (foldr/build fusion being another), there's no chance at automating the solution. (I don't consider equality saturation a solution because I'm pretty sure it's too slow.)

So the bottom line is that for any function that is suitable for W/W all transitive use sites up to where a rule might fire needs to be marked INLINE[ABLE].

Not quite. If

f calls g, and
g has a rule that you really want to fire at calls of f

then you must give an INLINE pragma for f. And so on transitively. If you don't put that INLINE pragma, you have no reason to be sure that f will inline, and hence no reason to to sure that g's rules will fire for that call of f. I see no way to avoid this, and to me it does not seem unreasonable. It's nothing to do with w/w.

then you must give an INLINE pragma for f. And so on transitively. If you don't put that INLINE pragma, you have no reason to be sure that f will inline, and hence no reason to to sure that g's rules will fire for that call of f. I see no way to avoid this, and to me it does not seem unreasonable. It's nothing to do with w/w.

So in general I agree with the fact that if you want to guarantee good rule behaviour this is required. And expecting different is on the users. I think that is something that's very clear.

However programmers being humans seem to often write a "trivial" function f and just rely on it being inlined via heuristics and g not being W/Wed. We do so in base at various places.

If f stops being inlined it breaks. And if g gets W/Wed it breaks.

So what W/W has to do with this issue is that code depending on rules that worked fine without W/W breaks when W/W fires on a specific function. Perhaps that's expected but that's what this all has to do with W/W.

I didn't expect that sort of breakage, and even in hindsight I still think this kind of behavior is a bit surprising.

Now from there forward there are two options:

We say users should learn to expect this, here is how you avoid it, and close this ticket. It's the price to pay if you (or your libraries) depend on rules.
I don't think this is great. Mostly because it means for every imported function that has any inline related pragma (including NOINLINE) I have to check if it has associated rules. But maybe we accept that.

Something as innocent as asMyInt x = fromIntegral x :: MyInt then requires the right INLINE pragma. I find this pretty frustrating.

Or we consider this behaviour as undesireable. This doesn't mean we give guarantees, but that we least consider changes that would make it "better". If we can do better without making compiler complexity/compile times suffer is another question though.

added Tfeature request rules runtime perf worker/wrapper transformation labels and removed needs triage label

marked this issue as related to #19276

@bgamari mentioned that we could also generate rules that fire on the worker rather than trying to fit wrapper-based rules on occurences of the worker.

Having thought about it a bit but I don't think this is as simple a solution either. For a rule mentioning N bindings with a worker, we would need to generate 2^N rules to cover all wrapper/worker combinations.

marked this issue as related to #8354

mentioned in issue #8354

mentioned in merge request !5614 (merged)

Allow rules to fire on workers / WW can break rule matching.

The problem

A basic example

This could in theory be solved.

Child items ...

Activity