GHC issueshttps://gitlab.haskell.org/ghc/ghc/-/issues2023-12-21T21:18:02Zhttps://gitlab.haskell.org/ghc/ghc/-/issues/24229Order-sensitivity in SpecConstr2023-12-21T21:18:02ZSimon Peyton JonesOrder-sensitivity in SpecConstrConsider this code
```haskell
foo :: Int -> (a,a) -> Maybe (a,a)
foo 0 p = Just p
foo n (x,y) = foo (n-1) (y,x)
wombat1 = foo 20 ("yes", "no")
wombat2 xs ys = foo 3 (xs, ys)
```
Here `foo` is lazy in its second argument, but it does...Consider this code
```haskell
foo :: Int -> (a,a) -> Maybe (a,a)
foo 0 p = Just p
foo n (x,y) = foo (n-1) (y,x)
wombat1 = foo 20 ("yes", "no")
wombat2 xs ys = foo 3 (xs, ys)
```
Here `foo` is lazy in its second argument, but it does decompose it, so SpecConstr
should catch it. We have two calls, one of which (in `wombat2`)is polymorphic.
Ideally we would like to see
```
RULE forall a. forall n::Int, x::a, y:;a. foo n (x,y) = $sfoo n x y
```
which fires on all three calls to `foo`. And that is what happens *if the declaration of
`wombat2` appears before `wombat1`*. But as written, we get this specialisation (only):
```
RULES: "SC:$wfoo0" [2]
forall (sc_sVo :: [Char])
(sc_sVp :: [Char])
(sc_sVn :: GHC.Prim.Int#).
$wfoo_sV6 @String sc_sVn (sc_sVo, sc_sVp)
= $s$wfoo_sVt sc_sVo sc_sVp sc_sVn]
```
and indeed the call in `wombat2` is never specialise. Boo!
## Diagnosis
We discover two calls but in `callsToNewPats` we see
```haskell
-- Remove duplicates
non_dups = nubBy samePat new_pats
```
Moreover `samePat` **treats both calls (one polymorphic and one at type String) as the "same"**.
So the `nubBy` drops one of them, and which is dropped is order-dependent.
Why does it treat them the same? Because of `Note [Ignore type
differences]` which points out that we don't want lots of identical
specialisations, differing only in their type. Good point, but the consequences are bad.
## Cure
We need some form of patten-generality comparison, to make the polymorphic pattern
"beat" the monomorphic one.
This pattern-subsumption approach would then be vulnerable to generating multiple specialisations for calls
```haskell
foo 10 ("foo", "baz") -- Called at String
foo 10 (True, False) -- Called at Bool
```
Ideally we'd like to to generalise to a specialisation that works for all types, not just `String` and `Bool`.
And that must be possible, because if the function scrutinises its argument, can't be
poymorphic in it. Using `foo`'s polymorphic type, We ought to be able to generalise from a single call
```haskell
foo @Int 10 (3,4)
```
to the more general form
```haskell
foo @a n (x::a, y:;a)
```
That looks do-able, but not trivial.
Meanwhile, a simple subumption check would avoid discarding the wrong pattern. It risks generating lots of identical specialisations, while we currently arbitrarily pick one. But we have other (crude) ways of throttling lots of specialisations.https://gitlab.haskell.org/ghc/ghc/-/issues/22902Optimization opportunity: hidden identity functions.2023-03-01T16:47:38ZJaro ReindersOptimization opportunity: hidden identity functions.<!--
READ THIS FIRST: If the feature you are proposing changes the language that GHC accepts
or adds any warnings to `-Wall`, it should be written as a [GHC Proposal](https://github.com/ghc-proposals/ghc-proposals/).
Other features, appr...<!--
READ THIS FIRST: If the feature you are proposing changes the language that GHC accepts
or adds any warnings to `-Wall`, it should be written as a [GHC Proposal](https://github.com/ghc-proposals/ghc-proposals/).
Other features, appropriate for a GitLab feature request, include GHC API/plugin
innovations, new low-impact compiler flags, or other similar additions to GHC.
-->
## Motivation
See [my discourse thread for the background story](https://discourse.haskell.org/t/a-solved-benchmarking-mystery/5738?u=jaror).
I am investigating if stream fusion could replace foldr/build fusion in base. During the investigation I've encountered bad benchmark results for the streamed `drop` function: it is up to **20x slower** than `Prelude.drop`. I've looked at the core and found that the streamed `drop` boiles down to a function like this:
```haskell
drop :: Int -> [a] -> [a]
drop 0 [] = []
drop 0 (x:xs) = x : drop 0 xs -- [1]
drop n [] = []
drop n (x:xs) = drop (n - 1) xs
```
The main difference with `Prelude.drop` is that this function walks over the remainder of the list that it does not drop (see the recursive call at [1]). That is extremely inefficient. In my benchmarks it is up to 20x slower than `Prelude.drop` because of the extra allocation and subsequent copying of the remainder of the list after dropping.
## Proposal
I think this function could be optimized further. First of all, SpecConstr could split it into two:
```haskell
drop :: Int -> [a] -> [a]
drop 0 [] = []
drop 0 (x:xs) = x : drop0 xs
drop n [] = []
drop n (x:xs) = drop (n - 1) xs
drop0 :: [a] -> [a]
drop0 [] = []
drop0 (x:xs) = x : drop0 xs
```
SpecConstr doesn't do this optimization yet, but #22781 tracks that issue.
Then a special optimization pass could recognize that `drop0` is an identity function and optimize it further to:
```haskell
drop :: Int -> [a] -> [a]
drop 0 [] = []
drop 0 (x:xs) = x : drop0 xs
drop n [] = []
drop n (x:xs) = drop (n - 1) xs
drop0 :: [a] -> [a]
drop0 xs = xs
```
This optimization could simply look for the pattern:
```
<f> xs =
case xs of
[] -> []
x : xs -> x : <f> xs
```
In Core and replace that with `<f> xs = xs`.
It also shouldn't be hard to generalize that for any ADT.
Then finally the result of that optimization already simplifies to (modulo unboxing):
```haskell
drop :: Int -> [a] -> [a]
drop 0 xs = xs
drop n [] = []
drop n (x:xs) = drop (n - 1) xs
```
Which is what we want.https://gitlab.haskell.org/ghc/ghc/-/issues/22787SpecConstr and Specialise should be combined into a single pass.2024-02-27T14:22:06ZAndreas KlebingerSpecConstr and Specialise should be combined into a single pass.Structurally they do very much the same thing:
* Traverse the AST looking for function calls we can specialize for.
* Specialize the RHS of the called function.
* Looking at the specialized rhs for additional calls to specialize.
The m...Structurally they do very much the same thing:
* Traverse the AST looking for function calls we can specialize for.
* Specialize the RHS of the called function.
* Looking at the specialized rhs for additional calls to specialize.
The main difference being that one specializes for dictionary arguments, and the other for value arguments. But to me it seems odd to so strictly separate these two concerns.
I (perhaps naively) assume we could get compile time and maintenance benefits from this in the long run.
* We no longer need to traverse the AST twice.
* We would produce less code in some situations. Currently if we specialize `elem @(Eq Int) (I# 1#) xs` we will first specialise for the dictionary. And then do *another* specialization for the term argument later one. When we could do both in one go.
* We can avoid re-implementing cross-cutting concerns like https://gitlab.haskell.org/ghc/ghc/-/merge_requests/8666 twice for both passes.
That being said doing such a refactor does seem like no small feat. And I have no plans to tackle this in the near future. But it seems like it would be the right thing to do to me.https://gitlab.haskell.org/ghc/ghc/-/issues/22786SpecConstr should fire rules and run the simple optimizer much in the same wa...2023-01-31T15:26:27ZAndreas KlebingerSpecConstr should fire rules and run the simple optimizer much in the same way we do for type class specialization.Consider this (somewhat silly) example:
```haskell
{-# OPTIONS_GHC -fspec-constr-keen -fspec-constr-count=99 -fspec-constr-threshold=90000 -fspec-constr-recursive=500 #-}
{-# LANGUAGE MagicHash #-}
module M(baz) where
import GHC.Exts
...Consider this (somewhat silly) example:
```haskell
{-# OPTIONS_GHC -fspec-constr-keen -fspec-constr-count=99 -fspec-constr-threshold=90000 -fspec-constr-recursive=500 #-}
{-# LANGUAGE MagicHash #-}
module M(baz) where
import GHC.Exts
import Data.Coerce
{-# NOINLINE baz #-}
baz = I# (goz 0# 4#)
where
goz :: Int# -> Int# -> Int#
goz 0# 1# = 1#
goz 0# x = goz 0# (f x)
goz _ (0#) = 3#
goz n x = 6#
f x = case x of
_
| isTrue# (x ># 20#) -> flarge (x -# -20#)
| otherwise -> flarge (x +# 20#)
where
flarge 1# = 2#
flarge 2# = 3#
flarge 3# = 4#
flarge 4# = 5#
flarge 5# = 6#
flarge 6# = 7#
flarge 7# = 8#
flarge 8# = 9#
flarge 9# = 10#
flarge 10# = 11#
flarge 11# = 12#
flarge 12# = 13#
flarge 13# = 14#
flarge 14# = 15#
flarge 15# = 16#
flarge x = x
```
This will eventually reduce to a single number. Naively (with the given spec-constr flags) this function should optimize down to a statically known number at *compile time*.
One blocker for this to happen is https://gitlab.haskell.org/ghc/ghc/-/issues/22781 which I have a fix for already. Applying this fix we run into another issue.
The problem is we specialize `goz` for `goz 0# 4#`.
This gives us the following specialized RHS:
```
-- RHS size: {terms: 83, types: 6, coercions: 0, joins: 1/1}
$sgoz_szB :: (# #) -> Int#
[LclId[StrictWorker([])], Arity=1, Str=<L>]
$sgoz_szB
= \ (void_0E :: (# #)) ->
join {
flarge_szr :: Int# -> Int#
[LclId[JoinId(1)(Nothing)], Arity=1, Str=<SL>]
flarge_szr (eta_B1 [Dmd=SL, OS=OneShot] :: Int#)
= case eta_B1 of ds_X2 {
__DEFAULT -> goz_aiQ 0# ds_X2;
1# -> goz_aiQ 0# 2#;
2# -> goz_aiQ 0# 3#;
3# -> goz_aiQ 0# 4#;
4# -> goz_aiQ 0# 5#;
5# -> goz_aiQ 0# 6#;
6# -> goz_aiQ 0# 7#;
7# -> goz_aiQ 0# 8#;
8# -> goz_aiQ 0# 9#;
9# -> goz_aiQ 0# 10#;
10# -> goz_aiQ 0# 11#;
11# -> goz_aiQ 0# 12#;
12# -> goz_aiQ 0# 13#;
13# -> goz_aiQ 0# 14#;
14# -> goz_aiQ 0# 15#;
15# -> goz_aiQ 0# 16#
} } in
case ># 4# 20# of {
__DEFAULT -> jump flarge_szr (+# 4# 20#);
1# -> jump flarge_szr (-# 4# -20#)
}
```
But that's pretty silly. As it will occur specializations of `goz_aiQ 0# 1#` `goz_aiQ 0# 2#` ... and so on.
If we were to run the simple optimizer on the specialized rhs instead I imagine we would do a lot better.
The case should constant fold away:
```haskell
case ># 4# 20# of {
__DEFAULT -> jump flarge_szr (+# 4# 20#);
1# -> jump flarge_szr (-# 4# -20#)
=>
jump flarge_szr (+# 4# 20#);
```
With now just a single occurence of the join point it should inline. Allowing more case of case, resulting in just a constant integer.
I think we should just do the same as we do for the type class specializer. As I understand it there when we specialize a rhs we run the simple optimizer on it in order to discover additional opportunities for specialization.https://gitlab.haskell.org/ghc/ghc/-/issues/22781SpecConstr does not specialise for literal arguments.2023-09-07T11:00:35ZAndreas KlebingerSpecConstr does not specialise for literal arguments.I would expect this code to specialize:
```
{-# NOINLINE bar #-}
bar = goo 0 1
where
goo :: Int -> Int -> Int
goo 0 1 = 1
goo _ (0) = 3
goo n x = goo n (x-1)
```
But after worker-wrapper the go function takes literal ...I would expect this code to specialize:
```
{-# NOINLINE bar #-}
bar = goo 0 1
where
goo :: Int -> Int -> Int
goo 0 1 = 1
goo _ (0) = 3
goo n x = goo n (x-1)
```
But after worker-wrapper the go function takes literal ints for which SpecConstr currently doesn't create specializations.Andreas KlebingerAndreas Klebingerhttps://gitlab.haskell.org/ghc/ghc/-/issues/22657SpecConstr reboxes inside loops2023-09-07T09:22:08ZSebastian GrafSpecConstr reboxes inside loopsCompile the following program with `-O2 -fno-liberate-case`:
```hs
f :: (Int, Int) -> Int -> Int
f _ 0 = 0
f p n = length (go n) + f p (n-1)
where
go 0 = []
go n = case p of (x,y) -> (p,x+y) : go (n-1)
main = print $ f (1,2) ...Compile the following program with `-O2 -fno-liberate-case`:
```hs
f :: (Int, Int) -> Int -> Int
f _ 0 = 0
f p n = length (go n) + f p (n-1)
where
go 0 = []
go n = case p of (x,y) -> (p,x+y) : go (n-1)
main = print $ f (1,2) 10000
```
For me, it takes about 6.8GB of memory to run the resulting binary. If I also pass `-fno-spec-constr`, it only takes 5.6GB and is a bit faster.
(No liberate case is material, otherwise the `case` will just be floated out.)
Here's the simplified Core with SpecConstr:
```
Rec {
-- RHS size: {terms: 44, types: 36, coercions: 0, joins: 0/1}
Main.main_$s$wf [Occ=LoopBreaker]
:: Int -> Int -> GHC.Prim.Int# -> GHC.Prim.Int#
[GblId, Arity=3, Str=<L><L><1L>, Unf=OtherCon []]
Main.main_$s$wf
= \ (sc_s2dV :: Int)
(sc1_s2dW :: Int)
(ww_s2dz :: GHC.Prim.Int#) ->
case ww_s2dz of ds_X1 {
__DEFAULT ->
letrec {
$wgo_s2du [InlPrag=[2], Occ=LoopBreaker, Dmd=SC(S,L)]
:: GHC.Prim.Int# -> [((Int, Int), Int)]
[LclId, Arity=1, Str=<1L>, Unf=OtherCon []]
$wgo_s2du
= \ (ww1_s2dr :: GHC.Prim.Int#) ->
case ww1_s2dr of wild_X1F {
__DEFAULT ->
GHC.Types.:
@((Int, Int), Int)
((sc_s2dV, sc1_s2dW), GHC.Num.$fNumInt_$c+ sc_s2dV sc1_s2dW)
($wgo_s2du (GHC.Prim.-# wild_X1F 1#));
0# -> GHC.Types.[] @((Int, Int), Int)
}; } in
case GHC.List.$wlenAcc @((Int, Int), Int) ($wgo_s2du ds_X1) 0#
of ww1_a2dn
{ __DEFAULT ->
case Main.main_$s$wf sc_s2dV sc1_s2dW (GHC.Prim.-# ds_X1 1#)
of ww2_s2dH
{ __DEFAULT ->
GHC.Prim.+# ww1_a2dn ww2_s2dH
}
};
0# -> 0#
}
end Rec }
```
Note the reboxing of `(sc_s2dV, sc1_s2dW)` inside the loop `$wgo`. Bad bad bad!
It's all due to SpecConstr *substituting* `(sc_s2dV, sc1_s2dW)` for the pair `p` instead of merely binding let-binding it in `$s$wf`.
It would be reasonable for SpecConstr to do the latter. It can still do case-of-known-con via its `sc_val` env, I think.9.10.1https://gitlab.haskell.org/ghc/ghc/-/issues/22277SpecConstr: Denest local, non-recursive lets2022-10-17T16:48:20ZSebastian GrafSpecConstr: Denest local, non-recursive letsSuppose we see (local or at top-level)
```
f xs = let binds in g as;
rest
```
where
* `binds` binds `g`
* `xs` don't occur in `binds` and
* `as` do not mention `binds`.
In this situation it might be interesting to specialise `f` and...Suppose we see (local or at top-level)
```
f xs = let binds in g as;
rest
```
where
* `binds` binds `g`
* `xs` don't occur in `binds` and
* `as` do not mention `binds`.
In this situation it might be interesting to specialise `f` and `g` for call patterns in `rest`,
but it is difficult to do it in this nested form, because
1. We only get to see `ScrutOcc`s on `g`, in its RHS
2. The interesting call patterns in `rest` apply only to `f`
3. Specialising `f` and `g` for those call patterns duplicates `binds` twice:
We keep one copy of `bind` in the original `f`, one copy of `bind` in `$sf`
and another specialised copy `$sbind` (containing `$sg`) in `$sf`.
So for SpecConstr, we should float out `binds` (removing potential join-point-ness)
binds;
rest[f:=\xs -> g as]
Because now all call patterns of `f` directly apply to `g` and might match up
with one of the `ScrutOcc`s in its RHS, while only needing a single duplicate of
`bind`.
The idea works around some of the badness in #14951 and makes fixing #14844 less important.https://gitlab.haskell.org/ghc/ghc/-/issues/21938Making inlining heuristics work for functions matching on nested data structu...2024-01-29T15:50:09ZAndreas KlebingerMaking inlining heuristics work for functions matching on nested data structures like generics.Currently GHC does a terrible job at inlining functions which take apart or branch on a known data structure in order to produce a small result.
## The problem
Consider one of the simplest nested data structures, inductively defined nu...Currently GHC does a terrible job at inlining functions which take apart or branch on a known data structure in order to produce a small result.
## The problem
Consider one of the simplest nested data structures, inductively defined numbers:
```
data Nat = Zero | Succ Nat
natToSmallInt :: Nat -> Int
natToSmallInt Zero = 0
natToSmallInt (Succ Zero) = 1
natToSmallInt (Succ (Succ Zero)) = 2
natToSmallInt (Succ (Succ (Succ Zero))) = 3
natToSmallInt (Succ (Succ (Succ (Succ Zero)))) = 4
natToSmallInt (Succ (Succ (Succ (Succ (Succ Zero))))) = 5
natToSmallInt (Succ (Succ (Succ (Succ (Succ (Succ Zero)))))) = 6
natToSmallInt (Succ (Succ (Succ (Succ (Succ (Succ (Succ Zero))))))) = 7
natToSmallInt (Succ (Succ (Succ (Succ (Succ (Succ (Succ (Succ Zero)))))))) = 8
natToSmallInt (Succ (Succ (Succ (Succ (Succ (Succ (Succ (Succ (Succ Zero))))))))) = 9
natToSmallInt (Succ (Succ (Succ (Succ (Succ (Succ (Succ (Succ (Succ (Succ Zero)))))))))) = 10
natToSmallInt _ = error "Not Small"
foo :: Int
foo = natToSmallInt (Succ (Succ (Succ (Succ (Succ (Succ (Succ (Succ (Succ (Succ Zero)))))))))) -- 10 as argument
bar :: Int
bar = natToSmallInt Zero
```
We, as humans, can easily see that this function applied to any statically known argument will produce a small expression once we apply case of known constructor. However we don't inline `natToSmallInt` as the nested cases make it fairly large and we only apply a discount based on the argument once.
In a sense the issue is that GHC ignores how the function uses the components of it's argument, as well as GHC not considering the structure of the argument when computing the discount.
The end result is that we get:
```
-- RHS size: {terms: 12, types: 0, coercions: 0, joins: 0/0}
foo :: Int
[GblId,
Unf=Unf{Src=<vanilla>, TopLvl=True, Value=False, ConLike=False,
WorkFree=False, Expandable=False, Guidance=IF_ARGS [] 120 0}]
foo
= natToSmallInt
(A.Succ
(A.Succ
(A.Succ
(A.Succ
(A.Succ (A.Succ (A.Succ (A.Succ (A.Succ (A.Succ A.Zero))))))))))
-- RHS size: {terms: 2, types: 0, coercions: 0, joins: 0/0}
bar :: Int
[GblId,
Unf=Unf{Src=<vanilla>, TopLvl=True, Value=False, ConLike=False,
WorkFree=False, Expandable=False, Guidance=IF_ARGS [] 20 0}]
bar = natToSmallInt A.Zero
```
Where we would have expected `foo = 10, bar = 0`
In practice this pattern arises (at least) with generics like described here: https://gitlab.haskell.org/ghc/ghc/-/issues/21457#note_445884 where we have a largish function taking apart a largish generic representation to select between a number of small rhs expressions.
### Option A: Improve inlining to account for nested data structures.
The only way I can (currently) think of to make this general pattern work well would be to have discounts based on the structure of arguments if they are known.
That is for natToSmallInt instead of generating this unfolding info:
```
-- RHS size: {terms: 71, types: 25, coercions: 4, joins: 0/0}
natToSmallInt :: Nat -> Int
[Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True,
WorkFree=True, Expandable=True, Guidance=IF_ARGS [40] 420 110}]
```
We generate argument guidance based on the shape of the argument. So rather than just using `[40]` we could generate somthing like:
```
[(Default:10,
Zero:190 [],
Succ:40
[ (Default:10
, Zero:140
, Succ:..)]
)]
```
Numbers are subject to change but in general the argument guidance could be computed by somthing along the lines of:
* Default: The discount we apply simply for knowing it's an evaluated argument so just a small discount.
* Zero: A known constructor which is a terminal for our discrimination.
+ Apply a decent discount to account for the branch being eliminated.
+ Apply an additional discount based on the size difference of the rhs if we apply case-of-known-constructor
using this constructor. This allows e.g. `natToSmallInt Zero` to inline.
* Succ: Similarly to the Zero case.
+ But additionally compute a discount for the constructors arguments given the alt-rhs we chose for `Succ`.
When we then see an application like `natToSmallInt (Succ Zero)` we compute the discount by:
* Apply the `Default` discount since the argument is evaluated.
* Apply the relevant constructor discount if any.
* Apply the relevant discounts for the constructor arguments recursively.
* Add up all these discounts as usual and make a decision.
I think this should work. But there is of course concern for impact on compile time. In particular we likely don't want to compute individual argument guidance on cases with >100 alternatives, or types with >100 arguments.
But having a (somewhat arbitrary) cutoff to achieve this seems reasonable.
### Option B: Solve this by making SpecConstr apply more liberally.
SpecConstr is actually **very** well equipped to handle cases like these already. Although it has a number of issues which prevent it from being a direct solution.
* It's not always applied e.g. top level non-recursive definitions like `natToSmallInt` don't get any benefit currently (see https://gitlab.haskell.org/ghc/ghc/-/issues/21457).
* It's not enabled by default at `-O` since it can destroy sharing.
* It can destroy sharing.
* It is by default limited to **3** specializations of a function. For `natToSmallInt` we would want 10 at least. For more complex types like generic instances this number increases rapidly.
* SpecConstr only considers the size of the *input* when deciding if something is worth specializing for. Although perhaps that can/should be fixed by applying a discount in a fashion similar to what I describe above for inlining.
* It's only run once while we apply inlining often and very opportunistically.
* **It doesn't work across modules**
So imo in order for SpecConstr to even be an option for addressing this we need to:
* Make it work across module boundaries.
* Have it work for arbitrarily large input bindings if we can tell that they will optimize to a small rhs when specialized. (E.g apply discounts similar to described above for inlining).
* Remove the limit on number of specializations. However this likely also requires us to implement some sort of heuristic/cost model that prevents GHC from generating an insurmountable number of relatively useless specializations.
* Improve the issue of it destroying sharing at least enough to make it safe to enable for -O
Overall this seems less doable than improving inlining.https://gitlab.haskell.org/ghc/ghc/-/issues/21688SpecConstr: Instead of filtering out in scope free variables after the fact f...2022-06-06T13:04:59ZAndreas KlebingerSpecConstr: Instead of filtering out in scope free variables after the fact filter the free variables as we construct the patterns.Currently in SpecConstr we:
* Transform the arguments into potential call patterns
* Gather the free variables of all call patterns to construct the set of potential arguments for the specialized version
* Remove any argument from the s...Currently in SpecConstr we:
* Transform the arguments into potential call patterns
* Gather the free variables of all call patterns to construct the set of potential arguments for the specialized version
* Remove any argument from the set which is in scope.
It seems to me we could just pass the in scope set to argToPat and have it do this work.
That is instead of returning `(Bool, CoreArg)` argToPat would return `(Bool, CoreArg, [Var])`.
The list of vars is the list of variables not in scope in the patterns. And we save ourselfs the trouble of computing the free variables of the patterns.
Likely not a huge win so not planing to do this right now. So if anyone feels like tackling this feel free.https://gitlab.haskell.org/ghc/ghc/-/issues/21608Need a bit more case-of-case in Inital Phase2022-08-19T10:07:47ZSimon Peyton JonesNeed a bit more case-of-case in Inital Phase[This comment](https://gitlab.haskell.org/ghc/ghc/-/merge_requests/7997#note_427680) in !7997 describes an infelicity in floating.
In the nofib `simple` benchmark we have
```
polynomial degree ... = ....(\xyz. foldr k z [1..degree]).......[This comment](https://gitlab.haskell.org/ghc/ghc/-/merge_requests/7997#note_427680) in !7997 describes an infelicity in floating.
In the nofib `simple` benchmark we have
```
polynomial degree ... = ....(\xyz. foldr k z [1..degree])....
```
By the time we get to FloatOut we have (roughly)
```
polynomial degree ... = ....(\xyz. foldr k z (case degree of I# d# -> build (enum 1# d#))...
```
Do we float that `[1..degree]` out of the `\xyz`? Doing so kills fusion; but it shares the `[1..degree]` which in general may be a big win.
General principle: trust the programmer: if fusion can happen, do it, even at the cost of floating.
In HEAD
* Fusion does not happen in `InitialPhase` because the `case` gets in between the `foldr` and the `build`.
* The `case` does not get pulled out of the strict argument because `sm_case_case` is off in `InitialPhase`.
* We don't float because of the bizarre `Case [MFEs]` in SetLevels (c.f. #19001). It's a strict argument position, headed by a `case` so we don't float. I'm trying to get rid of this strange special case.
* So in the subsequent Phase 2 simplification (with `sm_case_case` on) fusion can happen, quite coincidentally.
So in HEAD we get fusion, but only by a fluke; and !7997 removes some of the bizarreness, so the fluke no longer happens.
As a result, `simple` allocates 28% more.
----------------------------
I think the right solution is to make more fusion happen in `InitialPhase`. Maybe we can make `sm_case_case` switch off case-of-case *only for multi-alternative cases*. I *think* those are the ones that are the true target of `sm_case_case=False` (a sadly un-documented design).https://gitlab.haskell.org/ghc/ghc/-/issues/21562Make use of boxity analysis in SpecConstr.2024-02-27T13:58:43ZAndreas KlebingerMake use of boxity analysis in SpecConstr.It seems we put a lot of work into figuring out already if unboxing an argument is safe to do or not.
We should make use of that for SpecConstr which currently will always take apart boxed arguments if it can.
The main difference for S...It seems we put a lot of work into figuring out already if unboxing an argument is safe to do or not.
We should make use of that for SpecConstr which currently will always take apart boxed arguments if it can.
The main difference for SpecConstr is that we *also* want to apply this logic to lazy arguments. I'm not sure how hard that would be to achieve.
But even just not taking apart strict arguments if boxity indicates we shouldn't would be an improvement.https://gitlab.haskell.org/ghc/ghc/-/issues/21458SpecConstr : Should NOINLINE functions be allowed to specialize.2022-08-29T16:24:09ZAndreas KlebingerSpecConstr : Should NOINLINE functions be allowed to specialize.Updated Ticket:
Currently we don't specialize functions marked NOINLINE and there is the Note \[Transfer activation\] explaining some things around this decision.
But we changed wrapper activation in https://gitlab.haskell.org/ghc/ghc/...Updated Ticket:
Currently we don't specialize functions marked NOINLINE and there is the Note \[Transfer activation\] explaining some things around this decision.
But we changed wrapper activation in https://gitlab.haskell.org/ghc/ghc/-/commit/f748988bbea1442b898ba107653216543a293b4d so it might be worth reevaluating this.https://gitlab.haskell.org/ghc/ghc/-/issues/21457SpecConstr should treat local and top level non-recursive bindings the same.2023-11-10T12:10:59ZAndreas KlebingerSpecConstr should treat local and top level non-recursive bindings the same.SpecConstr currently specializes local non recursive let bindings. But doesn't specialize top level non-recursive let bindings.
This means if we change the local binding in a way which allows it to float to the top performance can sudde...SpecConstr currently specializes local non recursive let bindings. But doesn't specialize top level non-recursive let bindings.
This means if we change the local binding in a way which allows it to float to the top performance can suddenly tank as we no longer get specialization for it.
--------------
This is however not as trivial as it sounds. In order to make this work coherently in the current design we would need to analyze the whole module for call patterns before we generate a specialization.
But we analyze the whole module for call patterns already! So maybe we could get away with:
* Do what we do now but keep around all call patterns for non-recursive top level bindings.
* We collected all the call patterns and specialize all the recursive bindings.
* Then do *one* pass over the non-recursive bindings generating specializations for their call patterns as well.
Seems plausible at least.https://gitlab.haskell.org/ghc/ghc/-/issues/21456SpecConstr: -fspec-constr-threshold has likely some undesired behaviour2024-01-29T15:44:15ZAndreas KlebingerSpecConstr: -fspec-constr-threshold has likely some undesired behaviourAs the user guide helpfully points out `-fspec-constr-threshold` does "Set the size threshold for the SpecConstr transformation."
Currently best I can tell what it does is that if any rhss of a recursive group is larger than `threshold`...As the user guide helpfully points out `-fspec-constr-threshold` does "Set the size threshold for the SpecConstr transformation."
Currently best I can tell what it does is that if any rhss of a recursive group is larger than `threshold` we avoid specializing any of them. I assume to avoid code blowup. So far so good.
----
But I think we still traverse the AST looking for local bindings to specialize. We don't check the size of local bindings when deciding if we should specialise. So if the top level binding is just a thin (recursive) function over a huge local binding we would still end up specializing the local binding anyway.
As a result using `-fspec-constr-threshold` avoid code size blowups is hit and miss.
We also make the decision for the whole recursive group. But if we have some group:
```
Rec {
f_huge = <huge>
f_2 = <smallish>
f_3 = <smallish>
}
```
We should still specialize `f_2` and `f_3` for all call patterns inside the whole group. There is really no good reason not to!https://gitlab.haskell.org/ghc/ghc/-/issues/20321SpecConstr increasing allocation. Maybe by triggering reboxing2022-05-17T14:59:44ZAndreas KlebingerSpecConstr increasing allocation. Maybe by triggering reboxingI noticed that nofib/real/ben-raytrace seems to increase allocations quite significantly with -fspec-constr compared to without.
In nofib/real/ben-raytracer we have this function:
```haskell
hitTestMany :: [Figure] -> Ray3 -> Interval ...I noticed that nofib/real/ben-raytrace seems to increase allocations quite significantly with -fspec-constr compared to without.
In nofib/real/ben-raytracer we have this function:
```haskell
hitTestMany :: [Figure] -> Ray3 -> Interval -> Maybe Hit
hitTestMany figs ray = go_hit figs Nothing
where
go_hit :: [Figure] -> Maybe Hit -> Interval -> Maybe Hit
go_hit [] accum !_ = accum
go_hit (fig:rest) accum int =
case hitTest fig ray int of
Nothing -> go_hit rest accum int
Just hit ->
case accum of
Nothing
-> go_hit rest (Just hit) (int {iUpper=hitDistance hit})
Just oldHit
| hitDistance hit < hitDistance oldHit
-> go_hit rest (Just hit) (int {iUpper=hitDistance hit})
| otherwise
-> go_hit rest accum int
```
Without spec-constr we get somewhat straight forward core:
```haskell
joinrec {
$wgo_hit_s2L4 [InlPrag=[2],
Occ=LoopBreaker,
Dmd=SCS(C1(C1(C1(L))))]
:: [Figure]
-> Maybe Hit -> GHC.Prim.Double# -> GHC.Prim.Double# -> Maybe Hit
[LclId[JoinId(4)], Arity=4, Str=<1L><1L><L><L>, Unf=OtherCon []]
$wgo_hit_s2L4 (ds_s2KW :: [Figure])
(accum_s2KX :: Maybe Hit)
(ww9_X2 :: GHC.Prim.Double#)
(ww10_X3 :: GHC.Prim.Double#)
= case ds_s2KW of {
[] -> accum_s2KX;
: fig_aSB rest_aSC ->
case fig_aSB of { Figure ds1_d27m ds2_d27n ->
case ds2_d27n ray_aSy (Interval.Interval ww9_X2 ww10_X3)
of wild4_X6 {
Nothing -> jump $wgo_hit_s2L4 rest_aSC accum_s2KX ww9_X2 ww10_X3;
Just hit_aSF ->
case accum_s2KX of wild5_X7 {
Nothing ->
case hit_aSF of
{ Hit bx_d2f3 ds3_d27x ds4_d27y ds5_d27z ds6_d27A ->
jump $wgo_hit_s2L4 rest_aSC wild4_X6 ww9_X2 bx_d2f3
};
Just oldHit_aSG ->
case hit_aSF of
{ Hit bx_d2f3 ds3_d27x ds4_d27y ds5_d27z ds6_d27A ->
case oldHit_aSG of { Hit bx1_Xa ds7_Xb ds8_Xc ds9_Xd ds10_Xe ->
case GHC.Prim.<## bx_d2f3 bx1_Xa of {
__DEFAULT -> jump $wgo_hit_s2L4 rest_aSC wild5_X7 ww9_X2 ww10_X3;
1# -> jump $wgo_hit_s2L4 rest_aSC wild4_X6 ww9_X2 bx_d2f3
}
}
}
}
}
}
}; } in
```
Not that we case on the accumulator `accum_s2KX` but pass it along boxed and as-is.
However if we enable SpecConstr instead then we get the code hidden below the spoiler tag:
<details>
```haskell
joinrec {
$s$wgo_hit_s2Pr [Occ=LoopBreaker,
Dmd=LCL(C1(C1(C1(C1(C1(C1(C1(L))))))))]
:: GHC.Prim.Double#
-> GHC.Prim.Double#
-> GHC.Prim.Double#
-> Pt Vec3
-> Vec3
-> Pt Vec2
-> Material
-> [Figure]
-> Maybe Hit
[LclId[JoinId(8)],
Arity=8,
Str=<L><L><L><L><L><L><L><1L>,
Unf=OtherCon []]
$s$wgo_hit_s2Pr (sc_s2Pp :: GHC.Prim.Double#)
(sc1_s2Po :: GHC.Prim.Double#)
(sc2_s2Pj :: GHC.Prim.Double#)
(sc3_s2Pk :: Pt Vec3)
(sc4_s2Pl :: Vec3)
(sc5_s2Pm :: Pt Vec2)
(sc6_s2Pn :: Material)
(sc7_s2Pi :: [Figure])
= case sc7_s2Pi of {
[] ->
GHC.Maybe.Just
@Hit (Figure.Hit sc2_s2Pj sc3_s2Pk sc4_s2Pl sc5_s2Pm sc6_s2Pn);
: fig_aSC rest_aSD ->
case fig_aSC of { Figure ds_d27n ds1_d27o ->
case ds1_d27o ray_aSz (Interval.Interval sc1_s2Po sc_s2Pp) of {
Nothing ->
jump $s$wgo_hit_s2Pr
sc_s2Pp
sc1_s2Po
sc2_s2Pj
sc3_s2Pk
sc4_s2Pl
sc5_s2Pm
sc6_s2Pn
rest_aSD;
Just hit_aSG ->
case hit_aSG of
{ Hit bx_d2f4 ds2_d27y ds3_d27z ds4_d27A ds5_d27B ->
case GHC.Prim.<## bx_d2f4 sc2_s2Pj of {
__DEFAULT ->
jump $s$wgo_hit_s2Pr
sc_s2Pp
sc1_s2Po
sc2_s2Pj
sc3_s2Pk
sc4_s2Pl
sc5_s2Pm
sc6_s2Pn
rest_aSD;
1# ->
jump $s$wgo_hit_s2Pr
bx_d2f4
sc1_s2Po
bx_d2f4
ds2_d27y
ds3_d27z
ds4_d27A
ds5_d27B
rest_aSD
}
}
}
}
}; } in
joinrec {
$s$wgo_hit1_s2Pq [Occ=LoopBreaker, Dmd=LCL(C1(C1(L)))]
:: GHC.Prim.Double# -> GHC.Prim.Double# -> [Figure] -> Maybe Hit
[LclId[JoinId(3)], Arity=3, Str=<L><L><1L>, Unf=OtherCon []]
$s$wgo_hit1_s2Pq (sc_s2Ph :: GHC.Prim.Double#)
(sc1_s2Pg :: GHC.Prim.Double#)
(sc2_s2Pf :: [Figure])
= case sc2_s2Pf of {
[] -> GHC.Maybe.Nothing @Hit;
: fig_aSC rest_aSD ->
case fig_aSC of { Figure ds_d27n ds1_d27o ->
case ds1_d27o ray_aSz (Interval.Interval sc1_s2Pg sc_s2Ph) of {
Nothing -> jump $s$wgo_hit1_s2Pq sc_s2Ph sc1_s2Pg rest_aSD;
Just hit_aSG ->
case hit_aSG of
{ Hit bx_d2f4 ds2_d27y ds3_d27z ds4_d27A ds5_d27B ->
jump $s$wgo_hit_s2Pr
bx_d2f4
sc1_s2Pg
bx_d2f4
ds2_d27y
ds3_d27z
ds4_d27A
ds5_d27B
rest_aSD
}
}
}
}; } in
case figs_s2L7 of { :| a1_a28j as_a28k ->
case a1_a28j of { Figure ds_d27n ds1_d27o ->
case ds1_d27o ray_aSz wild1_X1 of {
Nothing -> jump $s$wgo_hit1_s2Pq ww8_s2L2 ww7_s2L1 figs1_s2CL;
Just hit_aSG ->
case hit_aSG of
{ Hit bx_d2f4 ds2_d27y ds3_d27z ds4_d27A ds5_d27B ->
jump $s$wgo_hit_s2Pr
bx_d2f4
ww7_s2L1
bx_d2f4
ds2_d27y
ds3_d27z
ds4_d27A
ds5_d27B
figs1_s2CL
}
}
```
</details>
SpecConstr seems to specialise on the accumulator and in the `Just` case also unboxes the Hit constructor.
However we eventually want to return the boxed `Just (Hit ...)` constructor just the way we got it. So we eventually end up reboxing it (and the Just it's contained in).
This is quite annoying. I guess this could eventually be tied into the boxity analysis @sgraf812 mentions in #19871.Research neededhttps://gitlab.haskell.org/ghc/ghc/-/issues/20320Argument is marked as absent when it likely should not be.2022-05-24T14:32:37ZAndreas KlebingerArgument is marked as absent when it likely should not be.This function from nofib/real/ben-raytrace carries an absent demand on the first argument. But neither does it seem absent (it appears in the body) nor should workers take an absent demand to begin with.
I'm a bit puzzled. This is from ...This function from nofib/real/ben-raytrace carries an absent demand on the first argument. But neither does it seem absent (it appears in the body) nor should workers take an absent demand to begin with.
I'm a bit puzzled. This is from a recent build on master.
Here is the core function in full:
<details>
```
Rec {
-- RHS size: {terms: 298, types: 320, coercions: 50, joins: 2/6}
Sampler.sampleImagePixel_$s$wsample [Occ=LoopBreaker]
:: forall {s}.
BoundingBox.BoundingBox
-> (Ray3 -> Interval -> Maybe Hit)
-> GHC.Prim.Int#
-> (Ray3 -> Colour)
-> Pt Vec3
-> Vec3
-> Random.GenState
-> GHC.Prim.State# s
-> (# GHC.Prim.State# s, (Colour, Random.GenState) #)
[GblId,
Arity=8,
Str= <A>
<LCL(C1(L))>
<1L>
<MCM(L)>
<L>
<L>
<L>
<L>,
Unf=OtherCon []]
Sampler.sampleImagePixel_$s$wsample
= \ (@s_s2m1)
(sc_s2pI :: BoundingBox.BoundingBox)
(sc1_s2pJ :: Ray3 -> Interval -> Maybe Hit)
(sc2_s2pH :: GHC.Prim.Int#)
(sc3_s2pG :: Ray3 -> Colour)
(ww_s2ma
:: Pt Vec3
Unf=OtherCon [])
(ww1_s2mb
:: Vec3
Unf=OtherCon [])
(eta_s2md [OS=OneShot] :: Random.GenState)
(eta1_s2me [OS=OneShot] :: GHC.Prim.State# s_s2m1)
->
case sc2_s2pH of ds_X2 {
__DEFAULT ->
let {
ray_X3 :: Ray3
[LclId, Unf=OtherCon []]
ray_X3 = Ray.Ray3 ww_s2ma ww1_s2mb } in
case sc1_s2pJ ray_X3 lvl3_r2rg of {
Nothing -> (# eta1_s2me, (sc3_s2pG ray_X3, eta_s2md) #);
Just hit_aRy ->
case hit_aRy of wild1_a1W5
{ Hit bx_a1Wc ds1_a1Wd ds2_a1We ds3_a1Wf ds4_a1Wg ->
case ds4_a1Wg of { Material ds7_s2of ds8_s2og ->
case ((((ds7_s2of @s_s2m1 ray_X3 wild1_a1W5)
`cast` (SamplerMonad.Naive.N:SamplerM[0] <s_s2m1>_N <(Vec3,
Maybe Ray3)>_N
; Control.Monad.Trans.State.Strict.N:StateT[0]
<System.Random.Internal.StdGen>_N
<GHC.ST.ST s_s2m1>_R
<(Vec3, Maybe Ray3)>_N
:: SamplerM s_s2m1 (Vec3, Maybe Ray3)
~R# (System.Random.Internal.StdGen
-> GHC.ST.ST
s_s2m1
((Vec3, Maybe Ray3), System.Random.Internal.StdGen))))
eta_s2md)
`cast` (GHC.ST.N:ST[0]
<s_s2m1>_N <((Vec3, Maybe Ray3), System.Random.Internal.StdGen)>_R
:: GHC.ST.ST
s_s2m1 ((Vec3, Maybe Ray3), System.Random.Internal.StdGen)
~R# GHC.ST.STRep
s_s2m1 ((Vec3, Maybe Ray3), System.Random.Internal.StdGen)))
eta1_s2me
of
{ (# ipv_a29P, ipv1_a29Q #) ->
case ipv1_a29Q of { (a1_a29T, s'_a29U) ->
case a1_a29T of { (atten_aRA, mb_scattered_aRB) ->
case mb_scattered_aRB of {
Nothing ->
case ds8_s2og of {
Figure.NoEmission ->
(# ipv_a29P,
(Colour.black1
`cast` (Sym (Colour.N:Colour[0]) :: Vec3 ~R# Colour),
s'_a29U) #);
Figure.ConstEmission x_a1Vb -> (# ipv_a29P, (x_a1Vb, s'_a29U) #)
};
Just scattered_aRC ->
case GHC.Prim.<# ds_X2 45# of {
__DEFAULT ->
case scattered_aRC of { Ray3 ww2_X9 ww3_Xa ->
case Sampler.sampleImagePixel_$s$wsample
@s_s2m1
sc_s2pI
sc1_s2pJ
(GHC.Prim.-# ds_X2 1#)
sc3_s2pG
ww2_X9
ww3_Xa
s'_a29U
ipv_a29P
of
{ (# ipv2_Xc, ipv3_Xd #) ->
case ipv3_Xd of { (a2_Xf, s'1_Xg) ->
case ds8_s2og of {
Figure.NoEmission ->
case atten_aRA of { Vec3 bx1_a2aF bx2_a2aG bx3_a2aH ->
case a2_Xf `cast` (Colour.N:Colour[0] :: Colour ~R# Vec3) of
{ Vec3 bx4_a2aK bx5_a2aL bx6_a2aM ->
(# ipv2_Xc,
((Vector.Vec3
(GHC.Prim.*## bx1_a2aF bx4_a2aK)
(GHC.Prim.*## bx2_a2aG bx5_a2aL)
(GHC.Prim.*## bx3_a2aH bx6_a2aM))
`cast` (Sym (Colour.N:Colour[0]) :: Vec3 ~R# Colour),
s'1_Xg) #)
}
};
Figure.ConstEmission x_a1Vb ->
case x_a1Vb `cast` (Colour.N:Colour[0] :: Colour ~R# Vec3) of
{ Vec3 bx3_s2oj bx4_s2ok bx5_s2ol ->
case atten_aRA of { Vec3 bx1_a2aF bx2_a2aG bx6_a2aH ->
case a2_Xf `cast` (Colour.N:Colour[0] :: Colour ~R# Vec3) of
{ Vec3 bx7_a2aK bx8_a2aL bx9_a2aM ->
(# ipv2_Xc,
((Vector.Vec3
(GHC.Prim.+## (GHC.Prim.*## bx1_a2aF bx7_a2aK) bx3_s2oj)
(GHC.Prim.+## (GHC.Prim.*## bx2_a2aG bx8_a2aL) bx4_s2ok)
(GHC.Prim.+## (GHC.Prim.*## bx6_a2aH bx9_a2aM) bx5_s2ol))
`cast` (Sym (Colour.N:Colour[0]) :: Vec3 ~R# Colour),
s'1_Xg) #)
}
}
}
}
}
}
};
1# ->
case s'_a29U
`cast` (System.Random.Internal.N:StdGen[0]
:: System.Random.Internal.StdGen
~R# System.Random.SplitMix.SMGen)
of
{ System.Random.SplitMix.SMGen bx1_a2jl bx2_a2jm ->
case atten_aRA of { Vec3 bx3_a1VB bx4_a1VC bx5_a1VD ->
join {
$w$j_s2lZ [InlPrag=[2], Dmd=1C1(L)]
:: GHC.Prim.Double#
-> (# GHC.Prim.State# s_s2m1, (Colour, Random.GenState) #)
[LclId[JoinId(1)], Arity=1, Str=<L>, Unf=OtherCon []]
$w$j_s2lZ (x1_s2lW [OS=OneShot] :: GHC.Prim.Double#)
= join {
$j_s2jH [Dmd=1C1(L)]
:: GHC.Prim.Double#
-> (# GHC.Prim.State# s_s2m1, (Colour, Random.GenState) #)
[LclId[JoinId(1)], Arity=1, Str=<L>, Unf=OtherCon []]
$j_s2jH (x_a2b7 [OS=OneShot] :: GHC.Prim.Double#)
= let {
seed'_a2jk :: GHC.Prim.Word#
[LclId]
seed'_a2jk = GHC.Prim.plusWord# bx1_a2jl bx2_a2jm } in
let {
x#_a2jo :: GHC.Prim.Word#
[LclId]
x#_a2jo
= GHC.Prim.timesWord#
(GHC.Prim.xor#
seed'_a2jk (GHC.Prim.uncheckedShiftRL# seed'_a2jk 33#))
18397679294719823053## } in
let {
x#1_a2jp :: GHC.Prim.Word#
[LclId]
x#1_a2jp
= GHC.Prim.timesWord#
(GHC.Prim.xor#
x#_a2jo (GHC.Prim.uncheckedShiftRL# x#_a2jo 33#))
14181476777654086739## } in
case GHC.Prim.>##
x_a2b7
(GHC.Prim./##
(GHC.Prim.word2Double#
(GHC.Prim.xor#
x#1_a2jp
(GHC.Prim.uncheckedShiftRL# x#1_a2jp 33#)))
1.8446744073709552e19##)
of {
__DEFAULT ->
case ds8_s2og of {
Figure.NoEmission ->
(# ipv_a29P,
(Colour.black1
`cast` (Sym (Colour.N:Colour[0]) :: Vec3 ~R# Colour),
(System.Random.SplitMix.SMGen seed'_a2jk bx2_a2jm)
`cast` (Sym (System.Random.Internal.N:StdGen[0])
:: System.Random.SplitMix.SMGen
~R# System.Random.Internal.StdGen)) #);
Figure.ConstEmission x2_a1Vb ->
(# ipv_a29P,
(x2_a1Vb,
(System.Random.SplitMix.SMGen seed'_a2jk bx2_a2jm)
`cast` (Sym (System.Random.Internal.N:StdGen[0])
:: System.Random.SplitMix.SMGen
~R# System.Random.Internal.StdGen)) #)
};
1# ->
case scattered_aRC of { Ray3 ww2_X9 ww3_Xa ->
case Sampler.sampleImagePixel_$s$wsample
@s_s2m1
sc_s2pI
sc1_s2pJ
(GHC.Prim.-# ds_X2 1#)
sc3_s2pG
ww2_X9
ww3_Xa
((System.Random.SplitMix.SMGen seed'_a2jk bx2_a2jm)
`cast` (Sym (System.Random.Internal.N:StdGen[0])
:: System.Random.SplitMix.SMGen
~R# System.Random.Internal.StdGen))
ipv_a29P
of
{ (# ipv2_Xk, ipv3_Xl #) ->
case ipv3_Xl of { (a2_Xn, s'1_Xo) ->
case ds8_s2og of {
Figure.NoEmission ->
case a2_Xn `cast` (Colour.N:Colour[0] :: Colour ~R# Vec3)
of
{ Vec3 bx6_a2aK bx7_a2aL bx8_a2aM ->
(# ipv2_Xk,
((Vector.Vec3
(GHC.Prim.*## bx3_a1VB bx6_a2aK)
(GHC.Prim.*## bx4_a1VC bx7_a2aL)
(GHC.Prim.*## bx5_a1VD bx8_a2aM))
`cast` (Sym (Colour.N:Colour[0]) :: Vec3 ~R# Colour),
s'1_Xo) #)
};
Figure.ConstEmission x2_a1Vb ->
case x2_a1Vb
`cast` (Colour.N:Colour[0] :: Colour ~R# Vec3)
of
{ Vec3 bx6_s2oo bx7_s2op bx8_s2oq ->
case a2_Xn `cast` (Colour.N:Colour[0] :: Colour ~R# Vec3)
of
{ Vec3 bx9_a2aK bx10_a2aL bx11_a2aM ->
(# ipv2_Xk,
((Vector.Vec3
(GHC.Prim.+##
(GHC.Prim.*## bx3_a1VB bx9_a2aK) bx6_s2oo)
(GHC.Prim.+##
(GHC.Prim.*## bx4_a1VC bx10_a2aL) bx7_s2op)
(GHC.Prim.+##
(GHC.Prim.*## bx5_a1VD bx11_a2aM) bx8_s2oq))
`cast` (Sym (Colour.N:Colour[0]) :: Vec3 ~R# Colour),
s'1_Xo) #)
}
}
}
}
}
}
} } in
case GHC.Prim.<=## x1_s2lW bx5_a1VD of {
__DEFAULT -> jump $j_s2jH x1_s2lW;
1# -> jump $j_s2jH bx5_a1VD
} } in
case GHC.Prim.<=## bx3_a1VB bx4_a1VC of {
__DEFAULT -> jump $w$j_s2lZ bx3_a1VB;
1# -> jump $w$j_s2lZ bx4_a1VC
}
}
}
}
}
}
}
}
}
}
};
-1# ->
(# eta1_s2me,
(Colour.black1
`cast` (Sym (Colour.N:Colour[0]) :: Vec3 ~R# Colour),
eta_s2md) #)
}
end Rec }
```
</details>
@sgraf812 Maybe you have an idea?https://gitlab.haskell.org/ghc/ghc/-/issues/19796SpecConstr passes unused parameters2021-09-08T13:55:13ZSimon Peyton JonesSpecConstr passes unused parametersConsider
```
f x y = case x of
a : as -> f as y
[] -> True
boo p ps = f (p:ps) 'v'
```
Notice that `y` is unused, and so is `a` in the pattern match.
Compile with -O2, and we get (as the output of SpecConstr):
`...Consider
```
f x y = case x of
a : as -> f as y
[] -> True
boo p ps = f (p:ps) 'v'
```
Notice that `y` is unused, and so is `a` in the pattern match.
Compile with -O2, and we get (as the output of SpecConstr):
```
Rec {
-- RHS size: {terms: 5, types: 7, coercions: 0, joins: 0/0}
Foo.f_$s$wf [Occ=LoopBreaker] :: forall a. a -> [a] -> Bool
[GblId, Arity=2, Caf=NoCafRefs, Str=<L,A><S,1*U>, Unf=OtherCon []]
Foo.f_$s$wf
= \ (@ a_aud) _ [Occ=Dead] (sc1_sv3 :: [a_aud]) ->
Foo.$wf @ a_aud @ Char sc1_sv3
-- RHS size: {terms: 10, types: 13, coercions: 0, joins: 0/0}
Foo.$wf [InlPrag=NOUSERINLINE[2], Occ=LoopBreaker]
:: forall a t. [a] -> Bool
[GblId, Arity=1, Caf=NoCafRefs, Str=<S,1*U>, Unf=OtherCon []]
Foo.$wf
= \ (@ a_suO) (@ t_suP) (w_suQ :: [a_suO]) ->
case w_suQ of {
[] -> GHC.Types.True;
: a1_agf as_agg -> Foo.$wf @ a_suO @ t_suP as_agg
}
end Rec }
-- RHS size: {terms: 6, types: 9, coercions: 0, joins: 0/0}
f [InlPrag=NOUSERINLINE[2]] :: forall a t. [a] -> t -> Bool
[GblId,
Arity=2,
Caf=NoCafRefs,
Str=<S,1*U><L,A>,
Unf=Unf{Src=InlineStable, TopLvl=True, Value=True, ConLike=True,
WorkFree=True, Expandable=True,
Guidance=ALWAYS_IF(arity=2,unsat_ok=True,boring_ok=True)
Tmpl= \ (@ a_suO)
(@ t_suP)
(w_suQ [Occ=Once] :: [a_suO])
_ [Occ=Dead] ->
Foo.$wf @ a_suO @ t_suP w_suQ}]
f = \ (@ a_suO) (@ t_suP) (w_suQ :: [a_suO]) _ [Occ=Dead] ->
Foo.$wf @ a_suO @ t_suP w_suQ
-- RHS size: {terms: 2, types: 0, coercions: 0, joins: 0/0}
Foo.boo1 :: Char
[GblId,
Caf=NoCafRefs,
Str=m,
Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True,
WorkFree=True, Expandable=True, Guidance=IF_ARGS [] 10 20}]
Foo.boo1 = GHC.Types.C# 'v'#
-- RHS size: {terms: 6, types: 6, coercions: 0, joins: 0/0}
boo :: forall a. a -> [a] -> Bool
[GblId,
Arity=2,
Caf=NoCafRefs,
Str=<L,A><S,1*U>,
Unf=Unf{Src=InlineStable, TopLvl=True, Value=True, ConLike=True,
WorkFree=True, Expandable=True,
Guidance=ALWAYS_IF(arity=2,unsat_ok=True,boring_ok=False)
Tmpl= \ (@ a_aud)
(p_atA [Occ=Once] :: a_aud)
(ps_atB [Occ=Once] :: [a_aud]) ->
f @ a_aud @ Char (GHC.Types.: @ a_aud p_atA ps_atB) Foo.boo1}]
boo
= \ (@ a_aud) (p_atA :: a_aud) (ps_atB :: [a_aud]) ->
Foo.f_$s$wf @ a_aud p_atA ps_atB
------ Local rules for imported ids --------
"SC:$wf0" [2]
forall (@ a_aud) (sc_sv2 :: a_aud) (sc1_sv3 :: [a_aud]).
Foo.$wf @ a_aud @ Char (GHC.Types.: @ a_aud sc_sv2 sc1_sv3)
= Foo.f_$s$wf @ a_aud sc_sv2 sc1_sv3
```
The worker `$wf` drops `y`, as it should, as a result of demand analysis. But `a` is unused too, and lo! `sc_sv2` is passed by the RULE to `f_$s$wf`, which indeed does not use it.
This is fixed by `-flate-dmd-anal`, but there is a missed opportunity. SpecConstr actually gathers `NoOcc` info, and could exploit it to pass fewer arguments to the specialised function.https://gitlab.haskell.org/ghc/ghc/-/issues/19695Exponential memory use when using lots of <*>2022-12-21T16:26:50ZmaxigitExponential memory use when using lots of <*>## Summary
Some code with the shape `F <$> a1 <*> a2 <*> a3 <*> ...` takes forever or lots of memory (up to 50G) or end up in stack overflow (depending on how long I wait) when using optimisation.
## Steps to reproduce
I've created a...## Summary
Some code with the shape `F <$> a1 <*> a2 <*> a3 <*> ...` takes forever or lots of memory (up to 50G) or end up in stack overflow (depending on how long I wait) when using optimisation.
## Steps to reproduce
I've created a single file project which reproduce the issue. It can be found on [github]( https://github.com/maxigit/ghc-bug)
Clone the project.
run `stack build` and wait ...
The issue seems to be related to the use of `CountryCode` (the `shCountry` field) which is a enum with around hundred constructor.
- Disabling the optimisation fixes the problem
- Removing a few field in the form fixes the problem too
## Expected behavior
The compilation should finish in a few seconds not crash after hours
## Environment
* GHC version used:
8.10.4 8.10.3 8.8.4 8.4.4
Optional:
* Operating System:
Linux
* System Architecture:https://gitlab.haskell.org/ghc/ghc/-/issues/18837Improve interaction of exitification with spec constr2020-11-09T14:01:11ZharendraImprove interaction of exitification with spec constr## Summary
Exitification pass happens before spec constr pass. It moves exit path code out of a recursive function. However, this impacts the efficacy of the later spec constr pass.
See https://github.com/composewell/streamly/issues/70...## Summary
Exitification pass happens before spec constr pass. It moves exit path code out of a recursive function. However, this impacts the efficacy of the later spec constr pass.
See https://github.com/composewell/streamly/issues/703 for details.
## Steps to reproduce
1. `git clone https://github.com/composewell/streamly.git`
2. `git checkout ghc-spec-constr-keen`
3. `cabal build streamly`
4. `ghc -ddump-simpl -ddump-to-file -dsuppress-all -O2 -fmax-worker-args=16 -fspec-constr-recursive=16 -funfolding-use-threshold=600 fold-bench.hs`
5. time ./fold-bench +RTS -s
6. Examine fold-bench.dump-simpl look for `W8#`.
We can see a W8# being passed to the join point $wstep3_s5F2:
```
jump $wstep3_s5F2
sc9_s5W3
sc8_s5W4
(PlainPtr sc7_s5W5)
sc6_s5W6
()
(W8# ipv8_a4F5)
sc_s5Wc
```
This join point then passes it to an exit point:
```
1# ->
jump exit1_X19
ww_s5EQ ww1_s5EV ww2_s5EW ww3_s5EX ww4_s5F0 w_s5EG w1_s5EH
```
The exit point examines this W8#:
```
exit1_X19 ww_s5EQ
ww1_s5EV
ww2_s5EW
ww3_s5EX
ww4_s5F0
w_s5EG
w1_s5EH
= case w_s5EG of { W8# x_a4tc ->
```
The W8# constructor is not removed by the spec-constr pass because the W8# constructor is not being examined by `$wstep3_s5F2`. The code examining the W8# has been moved to the exit point by the exitification pass which happens before the spec-constr pass. So it becomes opaque to the spec-constr pass and not removed.
We can use `-fspec-constr-keen` option in the compilation step above and see that the W8# is removed by it. However, using that option creates regressions in other benchmarks, it is not always good.
Another option is to do exitification pass after the spec-constr pass. See [this note](https://gitlab.haskell.org/ghc/ghc/-/blob/a9ce159ba58ca7e8946b46e19b1361588b677a26/compiler/GHC/Core/Opt/Exitify.hs#L451). We would not have this issue if we use the option D suggested in this note i.e. put exitification before the final simplifier pass. However, this is also not the best possible option because exitify before spec-constr + -fspec-constr-keen generates much better code and has 2x better performance.
We can checkout the branch `ghc-spec-constr-before-exitify` and repeat the steps 3-6 above to see that the W8# constructor is gone when exitify is done after spec-constr.
## Expected behavior
The point of this issue is to explore if we can keep exitify before spec-constr but do what -fspec-constr-keen does automatically without enabling it always but only surgically in cases like this. Can we make spec-constr keen through exit join points only?
The effect of removing the W8# in the above examples is 4x improvement because it in the fast path, called for every element of the stream. Depending on the use case we can get very significant gains if we can do spec-constr smartly to get the best results.
## Environment
* GHC version used:
8.10.2 branch
Optional:
* Operating System: macOS
* System Architecture: x86_64https://gitlab.haskell.org/ghc/ghc/-/issues/17592SpecConstr rule base woes2020-01-15T20:36:10ZSebastian GrafSpecConstr rule base woesConsider the following stream fusion approach harnessing the type class specialiser (and also constructor-pattern specialisation for subsequently specialising `loop`):
```hs
{-# LANGUAGE TypeFamilies, FlexibleContexts, BangPatterns #-}
...Consider the following stream fusion approach harnessing the type class specialiser (and also constructor-pattern specialisation for subsequently specialising `loop`):
```hs
{-# LANGUAGE TypeFamilies, FlexibleContexts, BangPatterns #-}
data Step iter
= Yield !iter !(Item iter)
| Skip !iter
| Done
class Iterator iter where
type Item iter
next :: iter -> Step iter
data Singleton a = Empty' | Singleton !a
singletonS :: a -> Singleton a
singletonS = Singleton
instance Iterator (Singleton a) where
type Item (Singleton a) = a
next Empty' = Done
next (Singleton a) = Yield Empty' a
data EnumFromTo a = EnumFromTo !a !a
enumFromToS :: a -> a -> EnumFromTo a
enumFromToS = EnumFromTo
instance (Ord a, Num a) => Iterator (EnumFromTo a) where
type Item (EnumFromTo a) = a
next (EnumFromTo i high)
| i > high = Done
| otherwise = Yield (EnumFromTo (i + 1) high) i
sumS :: (Num (Item iter), Iterator iter) => iter -> Item iter
sumS = loop 0
where
loop !total iter1 =
case next iter1 of
Done -> total
Skip iter2 -> loop total iter2
Yield iter2 x -> loop (total + x) iter2
-- {-# NOINLINE[99] loop #-} -- comment in for 10x speedup
{-# INLINE sumS #-}
data ConcatMap i o = ConcatMap !(Maybe o) !(Item i -> o) !i
concatMapS :: (Item i -> o) -> i -> ConcatMap i o
concatMapS = ConcatMap Nothing
instance (Iterator i, Iterator o) => Iterator (ConcatMap i o) where
type Item (ConcatMap i o) = Item o
next (ConcatMap Nothing f i) = case next i of
Skip i' -> Skip (ConcatMap Nothing f i')
Yield i' x -> Skip (ConcatMap (Just (f x)) f i')
Done -> Done
next (ConcatMap (Just o) f i) = case next o of
Skip o' -> Skip (ConcatMap (Just o') f i)
Yield o' x -> Yield (ConcatMap (Just o') f i) x
Done -> Skip (ConcatMap Nothing f i)
ex1 :: Int -> Int
ex1
= sumS
. concatMapS (\a -> singletonS (a*a))
. enumFromToS 1
{-# NOINLINE ex1 #-}
main = print (ex1 99999999)
```
If the `{-# NOINLINE[99] loop #-}` pragma is commented in, `ex1` will compile down to this perfect bit of code:
```
Main.main_$s$wloop
= \ (sc_s4GJ :: GHC.Prim.Int#)
(sc1_s4GI :: GHC.Prim.Int#)
(sc2_s4GH :: GHC.Prim.Int#) ->
case GHC.Prim.># sc1_s4GI sc_s4GJ of {
__DEFAULT ->
Main.main_$s$wloop
sc_s4GJ
(GHC.Prim.+# sc1_s4GI 1#)
(GHC.Prim.+# sc2_s4GH (GHC.Prim.*# sc1_s4GI sc1_s4GI));
1# -> sc2_s4GH
}
```
Beautiful!
Without the `NOINLINE` pragma, not so much, though. Here's the Core:
```
Main.main_$s$wloop [Occ=LoopBreaker]
:: GHC.Prim.Int#
-> GHC.Prim.Int#
-> (Item (EnumFromTo Int) -> Singleton Int)
-> Singleton Int
-> GHC.Prim.Int#
-> GHC.Prim.Int#
[GblId,
Arity=5,
Caf=NoCafRefs,
Str=<S,U><S,U><L,C(U)><S,1*U><S,U>,
Unf=OtherCon []]
Main.main_$s$wloop
= \ (sc_s4FQ :: GHC.Prim.Int#)
(sc1_s4FP :: GHC.Prim.Int#)
(sc2_s4FO :: Item (EnumFromTo Int) -> Singleton Int)
(sc3_s4FN :: Singleton Int)
(sc4_s4FM :: GHC.Prim.Int#) ->
case sc3_s4FN of {
Empty' ->
case GHC.Prim.># sc1_s4FP sc_s4FQ of {
__DEFAULT ->
Main.main_$s$wloop
sc_s4FQ
(GHC.Prim.+# sc1_s4FP 1#)
sc2_s4FO
(sc2_s4FO
((GHC.Types.I# sc1_s4FP)
`cast` (Sub (Sym (Main.D:R:ItemEnumFromTo[0] <Int>_N))
:: Int ~R# Item (EnumFromTo Int))))
sc4_s4FM;
1# -> sc4_s4FM
};
Singleton a_a1z4 ->
case a_a1z4 of { GHC.Types.I# y_s4Dg ->
Main.main_$s$wloop
sc_s4FQ
sc1_s4FP
sc2_s4FO
(Main.Empty' @ Int)
(GHC.Prim.+# sc4_s4FM y_s4Dg)
}
}
```
Long story short, constructor specialisation failed to specialise `$wloop` for some higher-order arguments (the function of type `Item (EnumFromTo Int) -> Singleton Int` in particular). That results in a slow-down of a factor of 10 and it goes from constant space to allocating huge amounts of memory (still constant residency, though).
# Why?
**Edit: Don't bother reading this section yet, it's incoherent and inaccurate. Suffice it to say, the conclusion is accurate: Even though we specialise for the stronger call pattern, we don't rewrite to it because the rewrite rule is not attached to the strictly weaker specialisation.**
Looking at the difference in `-dverbose-core2core`, the first real divergence is in the output from desugaring after optimisation, where the inner `loop` is specialised for the type arguments and dictionaries of the outer `sumS` function in the version without `NOINLINE`.
Then after the initial simplifier phase we get the first digression because `loop` was specialised for the particular types it was called at, whereas it wasn't in the `NOINLINE` version (which makes sense).
Now, the critical moment is when `loop` becomes too large to inline into `ex1` in the version *without* `NOINLINE`. But `ex1` calls `loop` with quite interesting arguments, those we want to specialise for (the particular `ConcatMap` datum in particular)! In code:
```
f :: Int -> Singleton Int -- same function for both
f ...
inner :: Int -> EnumFromToInt Int -- the input stream from which ConcatMap draws elements
inner ...
-- without NOINLINE
loop (ConcatMap x f inner) = HUGE and recursive
ex1 n = loop (ConcatMap Nothing f (inner n))
-- with NOINLINE[99], loop gets inlined after the initial phase
ex1 n = joinrec loop' (ConcatMap x f (inner n)) = <HUGE and recursive> in loop' (ConcatMap Nothing f (inner n))
```
Then WW kicks in. `loop` (and `loop'`) is deeply strict in the `ConcatMap` we pass it, but it refuses to 'unpack' (i.e. specialise) for `ConcatMap`'s `f`. Note also that it will WW `loop`, `ex1` and `loop'`:
```
-- without NOINLINE
$wloop x f <unpacked components from inner> = HUGE and recursive
loop (ConcatMap x f (.. unpack components from inner ...)) = $wloop x f <unpacked components from inner>
ex1 n = $wloop (ConcatMap Nothing f (inner n))
-- with NOINLINE
$wex1 n = joinrec $wloop' (ConcatMap x f inner) = <HUGE and recursive> in $wloop' (ConcatMap Nothing f (inner n))
ex1 (I# n) = $wex1 n
```
Note how in the NOINLINE case this separated the definition of `f` from its usage by a lambda binder, so the subsequent SpecConstr pass is unable to figure out what to specialise for. It can't look through `$fw`. I think this is the same or at least a similar effect that @nomeata experienced in #14068 and accounted for in #14844.
Or at least that's my working hypothesis, I'll edit this post accordingly tomorrow when I had time to investigate.