GHC issueshttps://gitlab.haskell.org/ghc/ghc/-/issues2022-08-02T12:26:32Zhttps://gitlab.haskell.org/ghc/ghc/-/issues/21685CSE doesn't correctly handle shadowing.2022-08-02T12:26:32ZAndreas KlebingerCSE doesn't correctly handle shadowing.Here is the relevant core input:
```haskell
join {
exit_X2 :: Int# -> Array Int ()
exit_X2 (wild_X1 :: Int#)
= case $windexError
showSignedInt l_a12J u_a12K (I# wild_X1) ...Here is the relevant core input:
```haskell
join {
exit_X2 :: Int# -> Array Int ()
exit_X2 (wild_X1 :: Int#)
= case $windexError
showSignedInt l_a12J u_a12K (I# wild_X1) (unpackCString# "Int"#)
of wild_00 {
} } in
joinrec {
go3_a1fA :: Int# -> State# RealWorld -> Array Int ()
go3_a1fA (x_a1fB :: Int#) (eta_B0 :: State# RealWorld)
= case x_a1fB of wild_X1 {
__DEFAULT ->
join {
$j_X2 :: Array Int ()
$j_X2 = jump exit_X2 wild_X1 } in
case <=# 1# wild_X1 of { ... };
1# -> jump go3_a1fA 2# eta_B0
}; } in
```
The tricky part here is that the first X2 `exit_X2` and the second X2 `$j_X2` have the same unique! It gets even tricker as the first `exit_X2` is mentioned in the body of `$j_X2` however `X2` in this context refers to `exit_X2` not `$j_X2`.
While is this a very rare combination it's valid core.
CSE get's this wrong however.
After CSE we end up with this:
```
join {
exit_X2 :: Int# -> Array Int ()
exit_X2 (wild_X1 :: Int#)
= ... } in
joinrec {
go3_a1fA (x_a1fB :: Int#) (eta_B0 :: State# RealWorld)
= case x_a1fB of wild_X1 {
__DEFAULT ->
join {
$j_X6 :: Array Int ()
$j_X6 = jump $j_X6 x_a1fB }
in
....
```
However `$j_X6 = jump $j_X6 x_a1fB` is neither correct nor does it compile as `$j_X6` isn't in scope in `$j_X6`s rhs. (Thankfully!)
This is because CSE is subtly broken. We have:
```
cseBind :: TopLevelFlag -> CSEnv -> CoreBind -> (CSEnv, CoreBind)
cseBind toplevel env (NonRec b e)
= (env2, NonRec b2 e2)
where
(env1, b1) = addBinder env b
(env2, (b2, e2)) = cse_bind toplevel env1 (b,e) b1
```
`addBinder` is essentially `substBndr`, we rename `$j_X2` to `$j_X6` and then in `cse_bind` we process the rhs:
```
cse_bind :: TopLevelFlag -> CSEnv -> (InId, InExpr) -> OutId -> (CSEnv, (OutId, OutExpr))
cse_bind toplevel env (in_id, in_rhs) out_id
...
| Just arity <- isJoinId_maybe in_id
-- See Note [Look inside join-point binders]
= let (params, in_body) = collectNBinders arity in_rhs
(env', params') = addBinders env params
out_body = tryForCSE env' in_body
in (env, (out_id, mkLams params' out_body))
...
```
The problem is that we process the rhs with the substitution `X2 -> $j_X6` active.
So in the rhs of `$j_X2 = jump exit_X2 wild_X1` we end up replacing `exit_X2` with `$j_X6` because the code assumes it refers to `$j_X2`. And we end up with `$j_X6 = jump $j_X6 wild_X1`. In short it's a mess.
I think the solution is that we need to use `addBinder` *after* we processed the rhs instead of before. Unless it's a recursive binder. This shouldn't be hard to fix I will write a patch.https://gitlab.haskell.org/ghc/ghc/-/issues/13331Worker/wrapper can lead to sharing failure2022-05-12T08:49:43ZDavid FeuerWorker/wrapper can lead to sharing failure`nofib` turned up a serious performance bug in the implementation of `insert` in `containers-0.5.10.1`. The function was defined thus:
```hs
origInsert :: Ord k => k -> a -> Map k a -> Map k a
origInsert = go
where
go :: Ord k => ...`nofib` turned up a serious performance bug in the implementation of `insert` in `containers-0.5.10.1`. The function was defined thus:
```hs
origInsert :: Ord k => k -> a -> Map k a -> Map k a
origInsert = go
where
go :: Ord k => k -> a -> Map k a -> Map k a
go !kx x Tip = singleton kx x
go !kx x t@(Bin sz ky y l r) =
case compare kx ky of
LT | l' `ptrEq` l -> t
| otherwise -> balanceL ky y l' r
where !l' = go kx x l
GT | r' `ptrEq` r -> t
| otherwise -> balanceR ky y l r'
where !r' = go kx x r
EQ | kx `ptrEq` ky && x `ptrEq` y -> t
| otherwise -> Bin sz kx x l r
{-# INLINABLE origInsert #-}
```
When this specializes to `Int` keys (or any other "unboxable" ones, including tuples), worker/wrapper botches the job:
```
Rec {
-- RHS size: {terms: 102, types: 65, coercions: 0}
$w$sgo
:: forall a_a7M6.
Int# -> a_a7M6 -> Map Int a_a7M6 -> Map Int a_a7M6
$w$sgo =
\ (@ a_a7M6)
(ww_s8oI :: Int#)
(w_s8oE :: a_a7M6)
(w1_s8oF :: Map Int a_a7M6) ->
let {
kx_X7KQ :: Int
kx_X7KQ = I# ww_s8oI } in
case w1_s8oF of wild_Xg {
[...]
origInsertInt_$sgo
:: forall a_a7M6. Int -> a_a7M6 -> Map Int a_a7M6 -> Map Int a_a7M6
origInsertInt_$sgo =
\ (@ a_a7M6)
(w_s8oD :: Int)
(w1_s8oE :: a_a7M6)
(w2_s8oF :: Map Int a_a7M6) ->
case w_s8oD of _ { I# ww1_s8oI -> $w$sgo ww1_s8oI w1_s8oE w2_s8oF }
```
The wrapper opens the box, throws it away, and passes the contents to the worker, which immediately builds a *new* box with exactly the same contents. This prevents the pointer equality tests from succeeding for these types, and it also turns out to cause quite a lot of extra allocation for some types (leading to the severe nofib regression).
One could reasonably argue that the code above is a bit complicated, and that GHC could be forgiven for failing to realize that the box should be saved. Unfortunately, a straightforward change that would seem to make this clear does not in fact convince GHC:
```hs
myInsert :: Ord k => k -> a -> Map k a -> Map k a
myInsert kx0 = go kx0
where
go !kx x Tip = singleton kx0 x
go !kx x t@(Bin sz ky y l r) =
case compare kx ky of
LT | l' `ptrEq` l -> t
| otherwise -> balanceL ky y l' r
where !l' = go kx x l
GT | r' `ptrEq` r -> t
| otherwise -> balanceR ky y l r'
where !r' = go kx x r
EQ | kx0 `ptrEq` ky && x `ptrEq` y -> t
| otherwise -> Bin sz kx0 x l r
{-# INLINABLE myInsert #-}
```
does exactly the same thing. The only *simple* way I found to avoid that is to remove the bang patterns, which really ''shouldn't'' work, but does. This, however, is prohibited by the desired semantics—I believe we want to be strict in the key even if comparison is not. In any case, that really shouldn't be causing trouble and it is. The only fix I've found thus far is truly disgusting, and seems to work at least partly by mistake:
```hs
insert :: Ord k => k -> a -> Map k a -> Map k a
insert kx0 = go kx0 kx0
where
go :: Ord k => k -> k -> a -> Map k a -> Map k a
go orig !kx x Tip = singleton (lazy orig) x
go orig !kx x t@(Bin sz ky y l r) =
case compare kx ky of
LT | l' `ptrEq` l -> t
| otherwise -> balanceL ky y l' r
where !l' = go orig kx x l
GT | r' `ptrEq` r -> t
| otherwise -> balanceR ky y l r'
where !r' = go orig kx x r
EQ | x `ptrEq` y && (lazy orig `seq` (orig `ptrEq` ky)) -> t
| otherwise -> Bin sz (lazy orig) x l r
{-# INLINABLE insert #-}
```
We would also like to be able to experiment with an implementation that uses CPS (recursive join points today!) rather than pointer equality tests for the internal nodes, leaving pointer equality to the leaves. But I have not found any way whatsoever to avoid this W/W problem in that version.https://gitlab.haskell.org/ghc/ghc/-/issues/14186CSE fails to CSE two identical large top-level functions2021-08-17T11:32:01ZJoachim Breitnermail@joachim-breitner.deCSE fails to CSE two identical large top-level functionsConsider this code:
```hs
module CSEBug where
data Succs a = Succs a [a]
instance Functor Succs where
fmap f (Succs o s) = Succs (f o) (map f s)
foo, bar :: (a -> b) -> (b -> c) -> Succs a -> Succs c
foo f g x = fmap (g . f) x
ba...Consider this code:
```hs
module CSEBug where
data Succs a = Succs a [a]
instance Functor Succs where
fmap f (Succs o s) = Succs (f o) (map f s)
foo, bar :: (a -> b) -> (b -> c) -> Succs a -> Succs c
foo f g x = fmap (g . f) x
bar f g x = fmap (g . f) x
```
If I compile this with `-O`, `foo` and `bar` are not CSEd, which can be seen with `-ddump-cse`.
Removing either the first or the second argument of `Succs` makes CSE work.
Is there a size limit on CSE?https://gitlab.haskell.org/ghc/ghc/-/issues/2940Do CSE after CorePrep2019-07-07T19:06:12ZSimon Peyton JonesDo CSE after CorePrepCommon sub-expression analysis is deliberately conservative, but it's really *too* conservative: we are missing obvious opportunities. Consider
```
{-# OPTIONS_GHC -XBangPatterns -XMagicHash #-}
module Foo where
import GHC.Base
-- Co...Common sub-expression analysis is deliberately conservative, but it's really *too* conservative: we are missing obvious opportunities. Consider
```
{-# OPTIONS_GHC -XBangPatterns -XMagicHash #-}
module Foo where
import GHC.Base
-- CorePrep evaluates (reverse xs) twice
f xs = let !v1 = reverse (reverse xs)
!v2 = filter id (reverse xs)
in (v1, v2)
-- Even CSE inside CorePrep would not get this right;
-- the strict evaluation of (reverse xs) doesn't scope
-- over the non-strict version
g xs = reverse (reverse xs) ++ filter id (reverse xs)
-- Duplicate evaluation of (x +# 1#)
h :: Int# -> ( Int, Int )
h x = ( I# (x +# 1#), I# ((x +# 1#) *# 2#) )
```
If you compile this you'll see that there are obvious missed CSE opportunities in `f`, `g` and `h`; but they only show up after `CorePrep`.
I guess the right thing is to CSE after `CorePrep`, but then CSE would have to maintain the `CorePrep` invariants, which isn't trivial. Something to think about.
Simon
<details><summary>Trac metadata</summary>
| Trac field | Value |
| ---------------------- | ------------ |
| Version | 6.10.1 |
| Type | Bug |
| TypeOfFailure | OtherFailure |
| Priority | normal |
| Resolution | Unresolved |
| Component | Compiler |
| Test case | |
| Differential revisions | |
| BlockedBy | |
| Related | |
| Blocking | |
| CC | |
| Operating system | |
| Architecture | |
</details>
<!-- {"blocked_by":[],"summary":"Do CSE after CorePrep","status":"New","operating_system":"","component":"Compiler","related":[],"milestone":"","resolution":"Unresolved","owner":{"tag":"OwnedBy","contents":"simonpj"},"version":"6.10.1","keywords":[],"differentials":[],"test_case":"","architecture":"","cc":[""],"type":"Bug","description":"Common sub-expression analysis is deliberately conservative, but it's really ''too'' conservative: we are missing obvious opportunities. Consider\r\n{{{\r\n{-# OPTIONS_GHC -XBangPatterns -XMagicHash #-}\r\n\r\nmodule Foo where\r\n\r\nimport GHC.Base\r\n\r\n-- CorePrep evaluates (reverse xs) twice\r\nf xs = let !v1 = reverse (reverse xs)\r\n \t !v2 = filter id (reverse xs)\r\n in (v1, v2)\r\n\r\n-- Even CSE inside CorePrep would not get this right;\r\n-- the strict evaluation of (reverse xs) doesn't scope\r\n-- over the non-strict version\r\ng xs = reverse (reverse xs) ++ filter id (reverse xs)\r\n\r\n\r\n-- Duplicate evaluation of (x +# 1#)\r\nh :: Int# -> ( Int, Int )\r\nh x = ( I# (x +# 1#), I# ((x +# 1#) *# 2#) )\r\n}}}\r\nIf you compile this you'll see that there are obvious missed CSE opportunities in `f`, `g` and `h`; but they only show up after `CorePrep`. \r\n\r\nI guess the right thing is to CSE after `CorePrep`, but then CSE would have to maintain the `CorePrep` invariants, which isn't trivial. Something to think about.\r\n\r\nSimon","type_of_failure":"OtherFailure","blocking":[]} -->8.0.1Simon Peyton JonesSimon Peyton Joneshttps://gitlab.haskell.org/ghc/ghc/-/issues/9441CSE should deal with letrec2019-07-07T18:40:23ZDavid FeuerCSE should deal with letrecIf I redefine
```hs
{-# INLINE reverse #-}
reverse :: [a] -> [a]
reverse xs = build $ \c n -> foldl (\a x -> x `c` a) n xs
```
and then write a couple test cases:
```hs
appRev xs ys = xs ++ reverse ys
revAppRev xs ys = reverse xs ++ r...If I redefine
```hs
{-# INLINE reverse #-}
reverse :: [a] -> [a]
reverse xs = build $ \c n -> foldl (\a x -> x `c` a) n xs
```
and then write a couple test cases:
```hs
appRev xs ys = xs ++ reverse ys
revAppRev xs ys = reverse xs ++ reverse ys
```
I end up getting some rather annoying code duplication (lots of stuff omitted from the following):
```hs
Rec {
poly_go_r2v3
poly_go_r2v3 =
\ @ a_a2nF ds_a2zc eta_Xl ->
case ds_a2zc of _ {
[] -> eta_Xl;
: y_a2zh ys_a2zi -> poly_go_r2v3 ys_a2zi (: y_a2zh eta_Xl)
}
end Rec }
reverse
reverse = \ @ a_a2nF eta_B1 -> poly_go_r2v3 eta_B1 ([])
Rec {
revAppRev2
revAppRev2 =
\ @ a_a2y7 ds_a2zc eta_B1 ->
case ds_a2zc of _ {
[] -> eta_B1;
: y_a2zh ys_a2zi -> revAppRev2 ys_a2zi (: y_a2zh eta_B1)
}
end Rec }
Rec {
revAppRev1
revAppRev1 =
\ @ a_a2y7 ds_a2zc eta_B1 ->
case ds_a2zc of _ {
[] -> eta_B1;
: y_a2zh ys_a2zi -> revAppRev1 ys_a2zi (: y_a2zh eta_B1)
}
end Rec }
Rec {
appRev1
appRev1 =
\ @ a_a2xE ds_a2zc eta_B1 ->
case ds_a2zc of _ {
[] -> eta_B1;
: y_a2zh ys_a2zi -> appRev1 ys_a2zi (: y_a2zh eta_B1)
}
end Rec }
```
The `reverse` function was inlined three times. In each case, there was no fusion, so `build` was inlined and the resulting copy of the `reverse` worker lifted to the top level. It would seem to me that once simplification is complete, it should be safe to merge all these copies into one. They are all `Rec {\ ... -> ...}` forms, so I don't think this has any potential to introduce undesirable sharing.
<details><summary>Trac metadata</summary>
| Trac field | Value |
| ---------------------- | -------------- |
| Version | 7.8.2 |
| Type | FeatureRequest |
| TypeOfFailure | OtherFailure |
| Priority | normal |
| Resolution | Unresolved |
| Component | Compiler |
| Test case | |
| Differential revisions | |
| BlockedBy | |
| Related | |
| Blocking | |
| CC | |
| Operating system | |
| Architecture | |
</details>
<!-- {"blocked_by":[],"summary":"Merge identical top-level expressions following simplification when it is safe to do so","status":"New","operating_system":"","component":"Compiler","related":[],"milestone":"","resolution":"Unresolved","owner":{"tag":"Unowned"},"version":"7.8.2","keywords":[],"differentials":[],"test_case":"","architecture":"","cc":[""],"type":"FeatureRequest","description":"If I redefine\r\n\r\n{{{#!hs\r\n{-# INLINE reverse #-}\r\nreverse :: [a] -> [a]\r\nreverse xs = build $ \\c n -> foldl (\\a x -> x `c` a) n xs\r\n}}}\r\n\r\nand then write a couple test cases:\r\n\r\n{{{#!hs\r\nappRev xs ys = xs ++ reverse ys\r\nrevAppRev xs ys = reverse xs ++ reverse ys\r\n}}}\r\n\r\nI end up getting some rather annoying code duplication (lots of stuff omitted from the following):\r\n\r\n{{{#!hs\r\nRec {\r\npoly_go_r2v3\r\npoly_go_r2v3 =\r\n \\ @ a_a2nF ds_a2zc eta_Xl ->\r\n case ds_a2zc of _ {\r\n [] -> eta_Xl;\r\n : y_a2zh ys_a2zi -> poly_go_r2v3 ys_a2zi (: y_a2zh eta_Xl)\r\n }\r\nend Rec }\r\n\r\nreverse\r\nreverse = \\ @ a_a2nF eta_B1 -> poly_go_r2v3 eta_B1 ([])\r\n\r\nRec {\r\nrevAppRev2\r\nrevAppRev2 =\r\n \\ @ a_a2y7 ds_a2zc eta_B1 ->\r\n case ds_a2zc of _ {\r\n [] -> eta_B1;\r\n : y_a2zh ys_a2zi -> revAppRev2 ys_a2zi (: y_a2zh eta_B1)\r\n }\r\nend Rec }\r\n\r\nRec {\r\nrevAppRev1\r\nrevAppRev1 =\r\n \\ @ a_a2y7 ds_a2zc eta_B1 ->\r\n case ds_a2zc of _ {\r\n [] -> eta_B1;\r\n : y_a2zh ys_a2zi -> revAppRev1 ys_a2zi (: y_a2zh eta_B1)\r\n }\r\nend Rec }\r\n\r\nRec {\r\nappRev1\r\nappRev1 =\r\n \\ @ a_a2xE ds_a2zc eta_B1 ->\r\n case ds_a2zc of _ {\r\n [] -> eta_B1;\r\n : y_a2zh ys_a2zi -> appRev1 ys_a2zi (: y_a2zh eta_B1)\r\n }\r\nend Rec }\r\n}}}\r\n\r\nThe `reverse` function was inlined three times. In each case, there was no fusion, so `build` was inlined and the resulting copy of the `reverse` worker lifted to the top level. It would seem to me that once simplification is complete, it should be safe to merge all these copies into one. They are all `Rec {\\ ... -> ...}` forms, so I don't think this has any potential to introduce undesirable sharing.","type_of_failure":"OtherFailure","blocking":[]} -->Roland SennRoland Senn