GHC issueshttps://gitlab.haskell.org/ghc/ghc/-/issues2020-02-06T09:36:39Zhttps://gitlab.haskell.org/ghc/ghc/-/issues/7596Opportunity to improve CSE2020-02-06T09:36:39ZSimon Peyton JonesOpportunity to improve CSEIn `nofib/spectral/mandel2`, the function `check_perim` calls `point_colour` on various arguments. After inlining `point_colour` there is the opportunity to CSE among the sub-expressions the inlinings create.
In GHC 7.6, the join point ...In `nofib/spectral/mandel2`, the function `check_perim` calls `point_colour` on various arguments. After inlining `point_colour` there is the opportunity to CSE among the sub-expressions the inlinings create.
In GHC 7.6, the join point didn't have a "one-shot" flag, so the full laziness pass floated these sub-expressions out, and they got CSEd.
As part of Ilya's new demand analyser changes, we get the "one-shot" flag right, so don't MFE those sub-expressions, so they aren't CSEd. As a result, allocation on `mandel2` increases slightly (4.2%).
The solution is, I think, to do something a bit like like `CorePrep` before CSE. At the moment if we have
```
case f (I# y) of { (a,b) ->
case g (I# y) of { (p,q) -> ... }}
```
we stuipdly don't CSE that `(I# y)` even though it is manifestly sharable. Somehow we should.
<details><summary>Trac metadata</summary>
| Trac field | Value |
| ---------------------- | ------------ |
| Version | 7.6.1 |
| Type | Bug |
| TypeOfFailure | OtherFailure |
| Priority | normal |
| Resolution | Unresolved |
| Component | Compiler |
| Test case | |
| Differential revisions | |
| BlockedBy | |
| Related | |
| Blocking | |
| CC | |
| Operating system | |
| Architecture | |
</details>
<!-- {"blocked_by":[],"summary":"Opportunity to improve CSE","status":"New","operating_system":"","component":"Compiler","related":[],"milestone":"","resolution":"Unresolved","owner":{"tag":"Unowned"},"version":"7.6.1","keywords":[],"differentials":[],"test_case":"","architecture":"","cc":[""],"type":"Bug","description":"In `nofib/spectral/mandel2`, the function `check_perim` calls `point_colour` on various arguments. After inlining `point_colour` there is the opportunity to CSE among the sub-expressions the inlinings create.\r\n\r\nIn GHC 7.6, the join point didn't have a \"one-shot\" flag, so the full laziness pass floated these sub-expressions out, and they got CSEd.\r\n\r\nAs part of Ilya's new demand analyser changes, we get the \"one-shot\" flag right, so don't MFE those sub-expressions, so they aren't CSEd. As a result, allocation on `mandel2` increases slightly (4.2%).\r\n\r\nThe solution is, I think, to do something a bit like like `CorePrep` before CSE. At the moment if we have\r\n{{{\r\ncase f (I# y) of { (a,b) ->\r\ncase g (I# y) of { (p,q) -> ... }}\r\n}}}\r\nwe stuipdly don't CSE that `(I# y)` even though it is manifestly sharable. Somehow we should.\r\n","type_of_failure":"OtherFailure","blocking":[]} -->8.0.1Simon Peyton JonesSimon Peyton Joneshttps://gitlab.haskell.org/ghc/ghc/-/issues/20681Interesting CSE/Code duplication issue.2021-11-16T18:10:00ZAndreas KlebingerInteresting CSE/Code duplication issue.While spelunking in GHC generate core after turning inlining up somewhat I came across this beast:
```
...
VanillaReg dt_a7lB [Dmd=<S,U>]
ds1_a7lC ->
case dt_a7lB
of ds2_a7lE [Dmd=<L,A>] {
__DEFAULT ->
...While spelunking in GHC generate core after turning inlining up somewhat I came across this beast:
```
...
VanillaReg dt_a7lB [Dmd=<S,U>]
ds1_a7lC ->
case dt_a7lB
of ds2_a7lE [Dmd=<L,A>] {
__DEFAULT ->
jump go1_X7 ys_Xc eta_X9;
3# ->
jump go1_X7
ys_Xc
($sgo5_s6Vp y_Xb y_Xb eta_X9);
...
6# ->
jump go1_X7
ys_Xc
($sgo5_s6Vp y_Xb y_Xb eta_X9)
};
FloatReg dt_Xe [Dmd=<S,U>] ->
case dt_Xe of ds1_Xf [Dmd=<L,A>] {
__DEFAULT ->
jump go1_X7 ys_Xc eta_X9;
1# ->
jump go1_X7
ys_Xc
($sgo5_s6Vp y_Xb y_Xb eta_X9);
...
6# ->
jump go1_X7
ys_Xc
($sgo5_s6Vp y_Xb y_Xb eta_X9)
};
```
The jump being duplicated is not a problem. It's simply compiled to a goto. However the call to `($sgo5_s6Vp y_Xb y_Xb eta_X9)` is duplicated a few times. It appears ~1500 times in the *simplified* output.
Since `go1_X7` is strict in it's argument we float it out and end up with an infinity of alternatives like this:
```
3# ->
case $sgo5_s6Vp y_Xb y_Xb eta_X9 of
sat_sfwa [Occ=Once1]
{
__DEFAULT ->
go1_X7
ys1_sfw4
sat_sfwa;
};
```
This seems quite similar to https://gitlab.haskell.org/ghc/ghc/-/issues/14684 however the main difference is that there legitimately are more default than non-default causes which is why I'm putting this into a separate ticket.
I'm not really sure how to get there. But what seems logical to do is generate join points for the common cases and then simply jump to them. That is do a transformation like this:
```haskell
case x of
_Default -> e_default;
C1 b1 -> e_1[b1];
...
Cn bn -> e_n[bn];
===> if e_1 == e_n then transform this into
join common_alt n = e[n]
case x of
_Default -> e_default;
C1 b1 -> jump common_alt b1;
...
Cn bn -> jump common_alt bn;
```
In the case I noticed there are no constructor binders. But even when there are we could common up alternatives who's binders match up in their runtimerep. When common_alt has no binding arguments we would then float it out and CSE could remove duplicate definitions.
The join point would just be a single jump so this should be beneficial even for smallish expressions (depending on how common they are). There is a risk that we would end up with a (pull alts out <=> inline alts) loop though.
Maybe this could be worked into FloatOut somehow? Not sure.https://gitlab.haskell.org/ghc/ghc/-/issues/20301Should we CSE simple primop applications?2021-09-04T00:40:56ZAndreas KlebingerShould we CSE simple primop applications?Sebastian commited this wonderful bit-fiddling code to GHC recently:
```haskell
-- | Denotes '+' on lower and upper bounds of 'Card'.
plusCard :: Card -> Card -> Card
-- See Note [Algebraic specification for plusCard and multCard]
plusC...Sebastian commited this wonderful bit-fiddling code to GHC recently:
```haskell
-- | Denotes '+' on lower and upper bounds of 'Card'.
plusCard :: Card -> Card -> Card
-- See Note [Algebraic specification for plusCard and multCard]
plusCard (Card a) (Card b)
= Card (bit0 .|. bit1 .|. bitN)
where
bit0 = (a .&. b) .&. 0b001
bit1 = (a .|. b) .&. 0b010
bitN = ((a .|. b) .|. shiftL (a .&. b) 1) .&. 0b100
```
Curious as I am I couldn't resist but look at what it compiles to:
```
GHC.Types.Demand.plusCard
:: GHC.Types.Demand.Card
-> GHC.Types.Demand.Card -> GHC.Types.Demand.Card
[GblId, Arity=2, Str=<1P(L)><1P(L)>, Cpr=1, Unf=OtherCon []] =
\r [ds_s7DU ds1_s7DV]
case ds_s7DU of {
GHC.Types.I# x#_s7DX ->
case ds1_s7DV of {
GHC.Types.I# y#_s7DZ ->
case andI# [x#_s7DX y#_s7DZ] of sat_s7E6 [Occ=Once1] {
__DEFAULT ->
case uncheckedIShiftL# [sat_s7E6 1#] of sat_s7E7 [Occ=Once1] {
__DEFAULT ->
case orI# [x#_s7DX y#_s7DZ] of sat_s7E5 [Occ=Once1] {
__DEFAULT ->
case orI# [sat_s7E5 sat_s7E7] of sat_s7E8 [Occ=Once1] {
__DEFAULT ->
case andI# [sat_s7E8 4#] of sat_s7E9 [Occ=Once1] {
__DEFAULT ->
case orI# [x#_s7DX y#_s7DZ] of sat_s7E2 [Occ=Once1] {
__DEFAULT ->
case andI# [sat_s7E2 2#] of sat_s7E3 [Occ=Once1] {
__DEFAULT ->
case andI# [x#_s7DX y#_s7DZ] of sat_s7E0 [Occ=Once1] {
__DEFAULT ->
case andI# [sat_s7E0 1#] of sat_s7E1 [Occ=Once1] {
__DEFAULT ->
case orI# [sat_s7E1 sat_s7E3] of sat_s7E4 [Occ=Once1] {
__DEFAULT ->
case orI# [sat_s7E4 sat_s7E9] of sat_s7Ea [Occ=Once1] {
__DEFAULT -> GHC.Types.I# [sat_s7Ea];
```
It's good in that it compiles to nice straight line code.
But I noticed that `andI# [x#_s7DX y#_s7DZ]` and `orI# [x#_s7DX y#_s7DZ]` are both computed twice. Which was surprising to me.
Now *sometimes* it can be beneficial to avoid doing those sorts of CSE applications, because keeping the result
in a register can cause spilling which is worse then repeating a cheap computation. But when we have enough registers around then keeping the computed value around is better as we save on compute.
It's my understanding that solving this in general is a hard problem. But I wonder if we currently avoid commoning up these expressions by design or by accident. And if the later perhaps we should try to common them up! At least then we would know for sure.https://gitlab.haskell.org/ghc/ghc/-/issues/19841Debug.Trace.trace optimized away when throwing exceptions, starting with GHC 8.82023-07-20T07:17:29ZAndreas AbelDebug.Trace.trace optimized away when throwing exceptions, starting with GHC 8.8## Summary
`Debug.Trace.trace` messages printed with `-O0` are optimized away (e.g. with `-O1`) in the presence of `Control.Exception.throw`, starting with GHC 8.8.
## Steps to reproduce
Compile and run this with either `-O0` or `-O1`...## Summary
`Debug.Trace.trace` messages printed with `-O0` are optimized away (e.g. with `-O1`) in the presence of `Control.Exception.throw`, starting with GHC 8.8.
## Steps to reproduce
Compile and run this with either `-O0` or `-O1`:
```haskell
import Control.Exception
import Debug.Trace
data MyException = MyException deriving Show
instance Exception MyException
main :: IO ()
main = handle (\ MyException -> putStrLn "Exception") $ do
trace "Debugging" 5
`seq` throw MyException
`seq` putStrLn "Survived Exception"
```
## Expected behavior
I expect the output
```
Debugging
Exception
```
which is delivered with `-O0` and, up to GHC 8.6, also with `-O1`.
Starting with GHC 8.8 at `-O1`, it prints only
```
Exception
```
## Further research
My mental model for `Debug.Trace.trace :: String -> a -> a` is one of
```haskell
trace msg a = unsafePerformIO (putStrLn msg) `seq` a
trace msg a = unsafePerformIO $ do
putStrLn msg
return a
```
Both of these actually work, meaning that the debug message is **not** optimized away for these simple implementations.
Trivium: I found this bug (as I would call it) when researching https://github.com/agda/agda/issues/5379.
## Environment
* GHC versions used: 8.0.2, 8.2.2, 8.4.4, 8.6.4, 8.8.4, 8.10.4, 9.0.1
Optional:
* Operating System: macOS Mojave 10.14.6
* System Architecture: Macky McMacFacehttps://gitlab.haskell.org/ghc/ghc/-/issues/18564Common subexpression ellimination fails to detect common suexpression2021-09-04T00:40:57ZNeoCommon subexpression ellimination fails to detect common suexpressionFor the following program:
```
{-# LANGUAGE MagicHash #-}
{-# Language UnboxedTuples #-}
module Magic (sq, sumSq) where
import GHC.Exts
sq :: Int# -> (# Int#, Int# #)
sq x = (# x *# x, x *# x #)
sumSq :: Int# -> Int# ->...For the following program:
```
{-# LANGUAGE MagicHash #-}
{-# Language UnboxedTuples #-}
module Magic (sq, sumSq) where
import GHC.Exts
sq :: Int# -> (# Int#, Int# #)
sq x = (# x *# x, x *# x #)
sumSq :: Int# -> Int# -> Int#
sumSq x y =
case sq x of
(# a1, a2 #) ->
case sq y of
(# b1, b2 #) ->
a1 +# a2 +# b1 +# b2
```
ghc (version 8.8.4, with -O2) produces the following Core for function `sumSq`:
```
sumSq
= \ (x_awb :: Int#) (y_awc :: Int#) ->
+# (+# (*# 2# (*# x_awb x_awb)) (*# y_awc y_awc)) (*# y_awc y_awc)
```
Here expression `(*# y_awc y_awc)` is evaluated twice. Surprisingly ghc is able to share the calculation of `(*# x_awb x_awb)`. We can see the issue all the way to assembly language (see below) where there are 3 `imul` instead of the optimal 2.
```
.globl Magic_sumSq_info
.type Magic_sumSq_info, @function
Magic_sumSq_info:
.Lc1fC:
movq %rsi,%rax
imulq %rsi,%rax
movq %rsi,%rbx
imulq %rsi,%rbx
addq %rax,%rbx
movq %rbx,%rax
movq %r14,%rbx
imulq %r14,%rbx
shlq $1,%rbx
addq %rax,%rbx
jmp *(%rbp)
```https://gitlab.haskell.org/ghc/ghc/-/issues/14684combineIdenticalAlts is only partially implemented2022-07-18T05:47:40ZMatthew PickeringcombineIdenticalAlts is only partially implementedcombineIdenticalAlts commons up branches of case expressions which have the same RHS.
However, it is not fully implemented so opportunities to common up branches are missed. For very big case expressions in might impact compilation ti...combineIdenticalAlts commons up branches of case expressions which have the same RHS.
However, it is not fully implemented so opportunities to common up branches are missed. For very big case expressions in might impact compilation time but it could be something which is enabled by `-O2`.
For example, the `C` and `D` case for `foo` are not commened up but the `A` and `B` case in `foo2` are.
```
module Foo where
data T = A | B | C | D
foo x = case x of
A -> 0
B -> 1
C -> 2
D -> 2
foo2 x = case x of
A -> 2
B -> 2
C -> 0
D -> 1
```
**See also #18377** for another version of this same issue.
<details><summary>Trac metadata</summary>
| Trac field | Value |
| ---------------------- | ------------ |
| Version | 8.2.2 |
| Type | Bug |
| TypeOfFailure | OtherFailure |
| Priority | normal |
| Resolution | Unresolved |
| Component | Compiler |
| Test case | |
| Differential revisions | |
| BlockedBy | |
| Related | |
| Blocking | |
| CC | |
| Operating system | |
| Architecture | |
</details>
<!-- {"blocked_by":[],"summary":"combineIdenticalAlts is only partially implemented","status":"New","operating_system":"","component":"Compiler","related":[],"milestone":"","resolution":"Unresolved","owner":{"tag":"Unowned"},"version":"8.2.2","keywords":[],"differentials":[],"test_case":"","architecture":"","cc":[""],"type":"Bug","description":"combineIdenticalAlts commons up branches of case expressions which have the same RHS.\r\n\r\nHowever, it is not fully implemented so opportunities to common up branches are missed. For very big case expressions in might impact compilation time but it could be something which is enabled by `-O2`.\r\n\r\nFor example, the `C` and `D` case for `foo` are not commened up but the `A` and `B` case in `foo2` are.\r\n\r\n{{{\r\nmodule Foo where\r\n\r\ndata T = A | B | C | D\r\n\r\n\r\nfoo x = case x of\r\n A -> 0\r\n B -> 1\r\n C -> 2\r\n D -> 2\r\n\r\nfoo2 x = case x of\r\n A -> 2\r\n B -> 2\r\n C -> 0\r\n D -> 1\r\n}}}","type_of_failure":"OtherFailure","blocking":[]} -->https://gitlab.haskell.org/ghc/ghc/-/issues/14222Simple text fusion example results in rather duplicative code2019-12-24T19:37:27ZBen GamariSimple text fusion example results in rather duplicative codeConsider this program,
```hs
module T14221 where
import Data.Text as T
isNumeric :: Text -> Bool
isNumeric t =
T.all isNumeric' t && T.any isNumber t
where
isNumber c = '0' <= c && c <= '9'
isNumeric' c = isNumber c
...Consider this program,
```hs
module T14221 where
import Data.Text as T
isNumeric :: Text -> Bool
isNumeric t =
T.all isNumeric' t && T.any isNumber t
where
isNumber c = '0' <= c && c <= '9'
isNumeric' c = isNumber c
|| c == 'e'
|| c == 'E'
|| c == '.'
|| c == '-'
|| c == '+'
```
Compiling with `-O` results in over 1 kilobyte of assembler. After looking at the simplified Core the issue becomes apparent: the case analysis of `isNumeric'` is duplicated six times,
```hs
case ww4_X2zB of {
__DEFAULT -> GHC.Types.False;
'+'# -> jump $wloop_all_s2z2 (GHC.Prim.+# ww3_s2z0 2#);
'-'# -> jump $wloop_all_s2z2 (GHC.Prim.+# ww3_s2z0 2#);
'.'# -> jump $wloop_all_s2z2 (GHC.Prim.+# ww3_s2z0 2#);
'E'# -> jump $wloop_all_s2z2 (GHC.Prim.+# ww3_s2z0 2#);
'e'# -> jump $wloop_all_s2z2 (GHC.Prim.+# ww3_s2z0 2#)
};
```
It seems to me that we would ideally try to share this bit. Of course, this may be quite tricky to do in practice.
<details><summary>Trac metadata</summary>
| Trac field | Value |
| ---------------------- | ------------ |
| Version | 8.2.1 |
| Type | Bug |
| TypeOfFailure | OtherFailure |
| Priority | normal |
| Resolution | Unresolved |
| Component | Compiler |
| Test case | |
| Differential revisions | |
| BlockedBy | |
| Related | |
| Blocking | |
| CC | |
| Operating system | |
| Architecture | |
</details>
<!-- {"blocked_by":[],"summary":"Simple text fusion example results in rather duplicative code","status":"New","operating_system":"","component":"Compiler","related":[],"milestone":"","resolution":"Unresolved","owner":{"tag":"Unowned"},"version":"8.2.1","keywords":[],"differentials":[],"test_case":"","architecture":"","cc":[""],"type":"Bug","description":"Consider this program,\r\n{{{#!hs\r\nmodule T14221 where\r\n\r\nimport Data.Text as T\r\n\r\nisNumeric :: Text -> Bool\r\nisNumeric t =\r\n T.all isNumeric' t && T.any isNumber t\r\n where\r\n isNumber c = '0' <= c && c <= '9'\r\n isNumeric' c = isNumber c\r\n || c == 'e'\r\n || c == 'E'\r\n || c == '.'\r\n || c == '-'\r\n || c == '+'\r\n}}}\r\n\r\nCompiling with `-O` results in over 1 kilobyte of assembler. After looking at the simplified Core the issue becomes apparent: the case analysis of `isNumeric'` is duplicated six times,\r\n{{{#!hs\r\ncase ww4_X2zB of {\r\n __DEFAULT -> GHC.Types.False;\r\n '+'# -> jump $wloop_all_s2z2 (GHC.Prim.+# ww3_s2z0 2#);\r\n '-'# -> jump $wloop_all_s2z2 (GHC.Prim.+# ww3_s2z0 2#);\r\n '.'# -> jump $wloop_all_s2z2 (GHC.Prim.+# ww3_s2z0 2#);\r\n 'E'# -> jump $wloop_all_s2z2 (GHC.Prim.+# ww3_s2z0 2#);\r\n 'e'# -> jump $wloop_all_s2z2 (GHC.Prim.+# ww3_s2z0 2#)\r\n};\r\n}}}\r\n\r\nIt seems to me that we would ideally try to share this bit. Of course, this may be quite tricky to do in practice.","type_of_failure":"OtherFailure","blocking":[]} -->https://gitlab.haskell.org/ghc/ghc/-/issues/13694CSE runs before SpecConstr2019-07-07T18:20:34ZMatthew PickeringCSE runs before SpecConstrSpecConstr can lead to more CSE opportunities. Currently CSE is run only once after the full laziness transformation but then not again later in the pipeline. Running it again after the final clean-up simplification might lead to smaller...SpecConstr can lead to more CSE opportunities. Currently CSE is run only once after the full laziness transformation but then not again later in the pipeline. Running it again after the final clean-up simplification might lead to smaller programs.
One example with `-fspec-constr-keen`.
```
module Foo where
main :: [Int]
main = drop 1 [1,2]
mainMODULE :: [Int]
mainMODULE = mydrop 1 [1,2]
mydrop :: Int -> [a] -> [a]
mydrop 0 xs = xs
mydrop n (x:xs) = mydrop (n-1) xs
```
<details><summary>Trac metadata</summary>
| Trac field | Value |
| ---------------------- | ------------ |
| Version | 8.0.1 |
| Type | Bug |
| TypeOfFailure | OtherFailure |
| Priority | normal |
| Resolution | Unresolved |
| Component | Compiler |
| Test case | |
| Differential revisions | |
| BlockedBy | |
| Related | |
| Blocking | |
| CC | |
| Operating system | |
| Architecture | |
</details>
<!-- {"blocked_by":[],"summary":"CSE runs before SpecConstr","status":"New","operating_system":"","component":"Compiler","related":[],"milestone":"","resolution":"Unresolved","owner":{"tag":"Unowned"},"version":"8.0.1","keywords":["SpecConstr,","cse"],"differentials":[],"test_case":"","architecture":"","cc":[""],"type":"Bug","description":"SpecConstr can lead to more CSE opportunities. Currently CSE is run only once after the full laziness transformation but then not again later in the pipeline. Running it again after the final clean-up simplification might lead to smaller programs.\r\n\r\nOne example with `-fspec-constr-keen`.\r\n\r\n{{{\r\nmodule Foo where\r\n\r\nmain :: [Int]\r\nmain = drop 1 [1,2]\r\n\r\nmainMODULE :: [Int]\r\nmainMODULE = mydrop 1 [1,2]\r\n\r\nmydrop :: Int -> [a] -> [a]\r\nmydrop 0 xs = xs\r\nmydrop n (x:xs) = mydrop (n-1) xs\r\n}}}","type_of_failure":"OtherFailure","blocking":[]} -->https://gitlab.haskell.org/ghc/ghc/-/issues/13589Possible inconsistency in CSE's treatment of NOINLINE2019-07-07T18:21:08ZBen GamariPossible inconsistency in CSE's treatment of NOINLINEWhile debugging #13535 I noticed the following inconsistency in CSE:
`Note [CSE for INLINE and NOINLINE]` states that we need to take care when adding expressions bound to binders with inline pragmas to the `CSEnv`. To see why, consider...While debugging #13535 I noticed the following inconsistency in CSE:
`Note [CSE for INLINE and NOINLINE]` states that we need to take care when adding expressions bound to binders with inline pragmas to the `CSEnv`. To see why, consider the following,
```hs
{-# NOINLINE bar #-}
bar = <rhs> -- Same rhs as foo
foo = <rhs>
```
Given this program, we need to avoid producing `foo = bar` since doing so would mean that we would lose the ability to inline `foo`'s original RHS.
The note then goes on to give the following rule,
> We should not add
>
> `<rhs> :-> bar`
>
> to the CSEnv if `bar` has any constraints on when it can inline;
> that is, if its 'activation' not always active. Otherwise we
> might replace `<rhs>` by `bar`, and then later be unable to see that it
> really was `<rhs>`.
This rule is implemented in `noCSE` with,
```hs
not (isAlwaysActive (idInlineActivation id))
```
However, it's quite unclear to me that this rule avoids the issue we set out to solve. Afterall, `NOINLINE bar` is always active, but it still means that rewriting `foo` to `foo=bar` would lose us the ability to see `foo`'s original RHS.
<details><summary>Trac metadata</summary>
| Trac field | Value |
| ---------------------- | ------------ |
| Version | 8.0.1 |
| Type | Bug |
| TypeOfFailure | OtherFailure |
| Priority | normal |
| Resolution | Unresolved |
| Component | Compiler |
| Test case | |
| Differential revisions | |
| BlockedBy | |
| Related | |
| Blocking | |
| CC | |
| Operating system | |
| Architecture | |
</details>
<!-- {"blocked_by":[],"summary":"Possible inconsistency in CSE's treatment of NOINLINE","status":"New","operating_system":"","component":"Compiler","related":[],"milestone":"","resolution":"Unresolved","owner":{"tag":"Unowned"},"version":"8.0.1","keywords":[],"differentials":[],"test_case":"","architecture":"","cc":[""],"type":"Bug","description":"While debugging #13535 I noticed the following inconsistency in CSE:\r\n\r\n`Note [CSE for INLINE and NOINLINE]` states that we need to take care when adding expressions bound to binders with inline pragmas to the `CSEnv`. To see why, consider the following,\r\n{{{#!hs\r\n{-# NOINLINE bar #-}\r\nbar = <rhs> -- Same rhs as foo\r\n\r\nfoo = <rhs>\r\n}}}\r\nGiven this program, we need to avoid producing `foo = bar` since doing so would mean that we would lose the ability to inline `foo`'s original RHS.\r\n\r\nThe note then goes on to give the following rule,\r\n> We should not add\r\n>\r\n> {{{<rhs> :-> bar}}}\r\n>\r\n> to the CSEnv if `bar` has any constraints on when it can inline;\r\n> that is, if its 'activation' not always active. Otherwise we\r\n> might replace `<rhs>` by `bar`, and then later be unable to see that it\r\n> really was `<rhs>`.\r\nThis rule is implemented in `noCSE` with,\r\n{{{#!hs\r\n not (isAlwaysActive (idInlineActivation id))\r\n}}}\r\n\r\nHowever, it's quite unclear to me that this rule avoids the issue we set out to solve. Afterall, `NOINLINE bar` is always active, but it still means that rewriting `foo` to `foo=bar` would lose us the ability to see `foo`'s original RHS.\r\n\r\n","type_of_failure":"OtherFailure","blocking":[]} -->https://gitlab.haskell.org/ghc/ghc/-/issues/13219CSE for join points2021-12-08T09:52:41ZlukemaurerCSE for join pointsJoin points are hazardous for CSE:
```
join j = e in ... f e ...
```
The temptation is of course to rewrite to
```
join j = e in ... f (jump j) ...
```
which doesn't typecheck because you can't have a jump in an argument.
Similarly:...Join points are hazardous for CSE:
```
join j = e in ... f e ...
```
The temptation is of course to rewrite to
```
join j = e in ... f (jump j) ...
```
which doesn't typecheck because you can't have a jump in an argument.
Similarly:
```
join j x = e in
let y = (... join j2 x = e in ...) in ...
```
can't be rewritten to
```
join j x = e in
let y = (... join j2 x = jump j x ...) in ...
```
because you can't jump out of a value binding.
This doesn't seem to come up very often (took a long time for me to trip over it). The simple solution is simply to ignore CSE for join points. That's what we do at the moment. But this is saddening in some cases:
```
join j x = e in
join j2 x = e in ...
```
can perfectly well become
```
join j x = e in
join j2 x = jump j x in ...
```
We just need to track which join points are still currently valid.
<details><summary>Trac metadata</summary>
| Trac field | Value |
| ---------------------- | ------------ |
| Version | 8.1 |
| Type | Bug |
| TypeOfFailure | OtherFailure |
| Priority | normal |
| Resolution | Unresolved |
| Component | Compiler |
| Test case | |
| Differential revisions | |
| BlockedBy | |
| Related | |
| Blocking | |
| CC | |
| Operating system | |
| Architecture | |
</details>
<!-- {"blocked_by":[],"summary":"CSE for join points","status":"New","operating_system":"","component":"Compiler","related":[],"milestone":"","resolution":"Unresolved","owner":{"tag":"OwnedBy","contents":"lukemaurer"},"version":"8.1","keywords":["JoinPoints"],"differentials":[],"test_case":"","architecture":"","cc":[""],"type":"Bug","description":"Join points are hazardous for CSE:\r\n\r\n{{{\r\njoin j = e in ... f e ...\r\n}}}\r\nThe temptation is of course to rewrite to\r\n{{{\r\njoin j = e in ... f (jump j) ...\r\n}}}\r\nwhich doesn't typecheck because you can't have a jump in an argument.\r\n\r\nSimilarly:\r\n{{{\r\njoin j x = e in\r\nlet y = (... join j2 x = e in ...) in ...\r\n}}}\r\ncan't be rewritten to\r\n{{{\r\njoin j x = e in\r\nlet y = (... join j2 x = jump j x ...) in ...\r\n}}}\r\nbecause you can't jump out of a value binding.\r\n\r\nThis doesn't seem to come up very often (took a long time for me to trip over it). The simple solution is simply to ignore CSE for join points. That's what we do at the moment. But this is saddening in some cases:\r\n\r\n{{{\r\njoin j x = e in\r\njoin j2 x = e in ...\r\n}}}\r\ncan perfectly well become\r\n{{{\r\njoin j x = e in\r\njoin j2 x = jump j x in ...\r\n}}}\r\n\r\nWe just need to track which join points are still currently valid.","type_of_failure":"OtherFailure","blocking":[]} -->lukemaurerlukemaurerhttps://gitlab.haskell.org/ghc/ghc/-/issues/12620Allow the user to prevent floating and CSE2021-09-27T11:45:37ZJoachim Breitnermail@joachim-breitner.deAllow the user to prevent floating and CSEThis is a write-up of a rough idea that Andres Löh and me had at ICFP 2016 in order to address some Real World problems Andres noticed and that are currently hard to avoid.
The goal is to give the user more control about expressions tha...This is a write-up of a rough idea that Andres Löh and me had at ICFP 2016 in order to address some Real World problems Andres noticed and that are currently hard to avoid.
The goal is to give the user more control about expressions that the compiler would like to float out (or CSE), but the programmer knows better. Example (assume no list fusion exists):
```
enum xs = zip [1..] xs
```
This leads to a horrible space leak, as GHC will float out `[1..]` to the top.
Our idea is to have a magic function `nofloat :: a -> a` (magic in the same sense as `inline` and `lazy`) that the programmer would use here:
```
enum xs = zip (nofloat [1..]) xs
```
With these effects:
- Sub expressions are not floated out of a `nofloat`.
- An expression of the form `nofloat e` would not be floated beyond the innermost enclosing lambda.
- Two expressions of the form `nofloat e` would not be commoned up by CSE.
This way, unwanted sharing is prevented.
In contrast to a hypothetical `veryCheap` function, it does *not* mean that the compiler should float it into lambda (no unwanted duplication either).
Two open questions (among many others, I am sure:)
- Likely, rule matching should look through `nofloat`. At least in this example (and similar ones like `map (nofloat [1..])`, the rules in question will avoid the spaceleaks).
- Possibly, nothing should be floated (inlined) *into* a `nofloat`. Rationale: Assume the library is changed so that
`
[n..] = nofloat (realEnumFrom n)
{-# INLINE [n..] #-}
`
> Then `zip [fib 1000..]` would be rewritten by the inliner to `zip (let x = fib 1000 in (nofloat [x..]))`. Moving the `fib 1000` into the `nofloat` would change the behaviour in a possibly surprising way.https://gitlab.haskell.org/ghc/ghc/-/issues/9688Improve the interaction between CSE and the join point transformation2019-07-07T18:39:28ZDavid FeuerImprove the interaction between CSE and the join point transformationIt appears that the join point transformation sometimes interferes with CSE when CSE would be much better. Two examples:
### digitToIntMaybe
Suppose we define
```hs
isHexDigit :: Char -> Bool
isHexDigit c = (f...It appears that the join point transformation sometimes interferes with CSE when CSE would be much better. Two examples:
### digitToIntMaybe
Suppose we define
```hs
isHexDigit :: Char -> Bool
isHexDigit c = (fromIntegral (ord c - ord '0')::Word) <= 9 ||
(fromIntegral (ord c - ord 'a')::Word) <= 5 ||
(fromIntegral (ord c - ord 'A')::Word) <= 5
digitToInt c
| (fromIntegral dec::Word) <= 9 = dec
| (fromIntegral hexl::Word) <= 5 = hexl + 10
| (fromIntegral hexu::Word) <= 5 = hexu + 10
| otherwise = error ("Char.digitToInt: not a digit " ++ show c) -- sigh
where
dec = ord c - ord '0'
hexl = ord c - ord 'a'
hexu = ord c - ord 'A'
-- We could also expand this out in cases manually, but it makes no
-- difference as far as I can tell.
```
Suppose we then write a naive `digitToIntMaybe` function:
```hs
digitToIntMaybe c
| isHexDigit c = Just (digitToInt c)
| otherwise = Nothing
```
What I would want this to do is "zip" the nested cases and give Core like this:
```
$wdigitToIntMaybe
$wdigitToIntMaybe =
\ ww_s2Ag ->
let {
x#_a2yy
x#_a2yy = -# (ord# ww_s2Ag) 48 } in
case tagToEnum# (leWord# (int2Word# x#_a2yy) (__word 9)) of _ {
False ->
let {
x#1_X2z7
x#1_X2z7 = -# (ord# ww_s2Ag) 97 } in
case tagToEnum# (leWord# (int2Word# x#1_X2z7) (__word 5)) of _ {
False ->
let {
x#2_X2zh
x#2_X2zh = -# (ord# ww_s2Ag) 65 } in
case tagToEnum# (leWord# (int2Word# x#2_X2zh) (__word 5)) of _ {
False -> Nothing;
True -> Just (I# (+# x#2_X2zh 10))
};
True -> Just (I# (+# x#1_X2z7 10))
};
True -> Just (I# x#_a2yy)
}
digitToIntMaybe
digitToIntMaybe =
\ w_s2Ad ->
case w_s2Ad of _ { C# ww1_s2Ag -> $wdigitToIntMaybe ww1_s2Ag }
```
But instead, the join point transformation triggers, and we get this:
```
digitToIntMaybe1
digitToIntMaybe1 =
\ ww_s2Cp ->
error
(unpackAppendCString#
"Char.digitToInt: not a digit "# ($w$cshowsPrec15 ww_s2Cp ([])))
$wdigitToIntMaybe
$wdigitToIntMaybe =
\ ww_s2Cp ->
let {
$j_s2Bc
$j_s2Bc =
\ _ ->
Just
(let {
a_s2B5
a_s2B5 = int2Word# (-# (ord# ww_s2Cp) 48) } in
case tagToEnum# (leWord# a_s2B5 (__word 9)) of _ {
False ->
let {
a1_s2B7
a1_s2B7 = int2Word# (-# (ord# ww_s2Cp) 97) } in
case tagToEnum# (leWord# a1_s2B7 (__word 5)) of _ {
False ->
let {
a2_s2B9
a2_s2B9 = int2Word# (-# (ord# ww_s2Cp) 65) } in
case tagToEnum# (leWord# a2_s2B9 (__word 5)) of _ {
False -> digitToIntMaybe1 ww_s2Cp;
True -> I# (+# (word2Int# a2_s2B9) 10)
};
True -> I# (+# (word2Int# a1_s2B7) 10)
};
True -> I# (word2Int# a_s2B5)
}) } in
case tagToEnum#
(leWord# (int2Word# (-# (ord# ww_s2Cp) 48)) (__word 9))
of _ {
False ->
case tagToEnum#
(leWord# (int2Word# (-# (ord# ww_s2Cp) 97)) (__word 5))
of _ {
False ->
case tagToEnum#
(leWord# (int2Word# (-# (ord# ww_s2Cp) 65)) (__word 5))
of _ {
False -> Nothing;
True -> $j_s2Bc void#
};
True -> $j_s2Bc void#
};
True -> $j_s2Bc void#
}
digitToIntMaybe
digitToIntMaybe =
\ w_s2Cm ->
case w_s2Cm of _ { C# ww1_s2Cp -> $wdigitToIntMaybe ww1_s2Cp }
```
We perform the same three tests twice each, and test for an error condition that obviously can't happen.
### `quotRem` and `divMod`
If we define
```hs
x `quot` y = fst (x `quotRem` y)
x `rem` y = snd (x `quotRem` y)
```
and then write something like
```hs
f x y | x `rem` y == 0 = x `quot` y
| otherwise = 17
```
then CSE works some magic and we only calculate `quotRem x y` once.
Unfortunately, if we do this:
```hs
whatever x y = if x `myRem` y == 0 then (x `myQuot` y) + 14 else x `myQuot` y
```
then the join point transformation fires, collecting the `myQuot x y` expressions in the case branches and preventing CSE from recognizing the much better opportunity to eliminate those calculations altogether.
The situation with `divMod` is much worse. The join point transformation applied to the cases defining `divMod` prevents CSE from working magic on it in even simple situations, unless one of the arguments is known, making this definition unusable (the resulting Core is too horrifyingly long to paste here). It would probably be possible to improve the `divMod` situation to something close to the `quotRem` one by making `divMod` `NOINLINE` and adding special `divModLit` rules, but I'd much rather see a general solution.
<details><summary>Trac metadata</summary>
| Trac field | Value |
| ---------------------- | -------------- |
| Version | 7.9 |
| Type | FeatureRequest |
| TypeOfFailure | OtherFailure |
| Priority | normal |
| Resolution | Unresolved |
| Component | Compiler |
| Test case | |
| Differential revisions | |
| BlockedBy | |
| Related | |
| Blocking | |
| CC | |
| Operating system | |
| Architecture | |
</details>
<!-- {"blocked_by":[],"summary":"Improve the interaction between CSE and the join point transformation","status":"New","operating_system":"","component":"Compiler","related":[],"milestone":"","resolution":"Unresolved","owner":{"tag":"Unowned"},"version":"7.9","keywords":["CSE"],"differentials":[],"test_case":"","architecture":"","cc":[""],"type":"FeatureRequest","description":"It appears that the join point transformation sometimes interferes with CSE when CSE would be much better. Two examples:\r\n\r\n=== digitToIntMaybe ===\r\nSuppose we define\r\n\r\n{{{#!hs\r\nisHexDigit :: Char -> Bool\r\nisHexDigit c = (fromIntegral (ord c - ord '0')::Word) <= 9 ||\r\n (fromIntegral (ord c - ord 'a')::Word) <= 5 ||\r\n (fromIntegral (ord c - ord 'A')::Word) <= 5\r\n\r\ndigitToInt c\r\n | (fromIntegral dec::Word) <= 9 = dec\r\n | (fromIntegral hexl::Word) <= 5 = hexl + 10\r\n | (fromIntegral hexu::Word) <= 5 = hexu + 10\r\n | otherwise = error (\"Char.digitToInt: not a digit \" ++ show c) -- sigh\r\n where\r\n dec = ord c - ord '0'\r\n hexl = ord c - ord 'a'\r\n hexu = ord c - ord 'A'\r\n-- We could also expand this out in cases manually, but it makes no\r\n-- difference as far as I can tell.\r\n}}}\r\n\r\nSuppose we then write a naive `digitToIntMaybe` function:\r\n\r\n{{{#!hs\r\ndigitToIntMaybe c\r\n | isHexDigit c = Just (digitToInt c)\r\n | otherwise = Nothing\r\n}}}\r\n\r\nWhat I would want this to do is \"zip\" the nested cases and give Core like this:\r\n\r\n{{{\r\n$wdigitToIntMaybe\r\n$wdigitToIntMaybe =\r\n \\ ww_s2Ag ->\r\n let {\r\n x#_a2yy\r\n x#_a2yy = -# (ord# ww_s2Ag) 48 } in\r\n case tagToEnum# (leWord# (int2Word# x#_a2yy) (__word 9)) of _ {\r\n False ->\r\n let {\r\n x#1_X2z7\r\n x#1_X2z7 = -# (ord# ww_s2Ag) 97 } in\r\n case tagToEnum# (leWord# (int2Word# x#1_X2z7) (__word 5)) of _ {\r\n False ->\r\n let {\r\n x#2_X2zh\r\n x#2_X2zh = -# (ord# ww_s2Ag) 65 } in\r\n case tagToEnum# (leWord# (int2Word# x#2_X2zh) (__word 5)) of _ {\r\n False -> Nothing;\r\n True -> Just (I# (+# x#2_X2zh 10))\r\n };\r\n True -> Just (I# (+# x#1_X2z7 10))\r\n };\r\n True -> Just (I# x#_a2yy)\r\n }\r\n\r\ndigitToIntMaybe\r\ndigitToIntMaybe =\r\n \\ w_s2Ad ->\r\n case w_s2Ad of _ { C# ww1_s2Ag -> $wdigitToIntMaybe ww1_s2Ag }\r\n}}}\r\n\r\nBut instead, the join point transformation triggers, and we get this:\r\n\r\n{{{\r\ndigitToIntMaybe1\r\ndigitToIntMaybe1 =\r\n \\ ww_s2Cp ->\r\n error\r\n (unpackAppendCString#\r\n \"Char.digitToInt: not a digit \"# ($w$cshowsPrec15 ww_s2Cp ([])))\r\n\r\n$wdigitToIntMaybe\r\n$wdigitToIntMaybe =\r\n \\ ww_s2Cp ->\r\n let {\r\n $j_s2Bc\r\n $j_s2Bc =\r\n \\ _ ->\r\n Just\r\n (let {\r\n a_s2B5\r\n a_s2B5 = int2Word# (-# (ord# ww_s2Cp) 48) } in\r\n case tagToEnum# (leWord# a_s2B5 (__word 9)) of _ {\r\n False ->\r\n let {\r\n a1_s2B7\r\n a1_s2B7 = int2Word# (-# (ord# ww_s2Cp) 97) } in\r\n case tagToEnum# (leWord# a1_s2B7 (__word 5)) of _ {\r\n False ->\r\n let {\r\n a2_s2B9\r\n a2_s2B9 = int2Word# (-# (ord# ww_s2Cp) 65) } in\r\n case tagToEnum# (leWord# a2_s2B9 (__word 5)) of _ {\r\n False -> digitToIntMaybe1 ww_s2Cp;\r\n True -> I# (+# (word2Int# a2_s2B9) 10)\r\n };\r\n True -> I# (+# (word2Int# a1_s2B7) 10)\r\n };\r\n True -> I# (word2Int# a_s2B5)\r\n }) } in\r\n case tagToEnum#\r\n (leWord# (int2Word# (-# (ord# ww_s2Cp) 48)) (__word 9))\r\n of _ {\r\n False ->\r\n case tagToEnum#\r\n (leWord# (int2Word# (-# (ord# ww_s2Cp) 97)) (__word 5))\r\n of _ {\r\n False ->\r\n case tagToEnum#\r\n (leWord# (int2Word# (-# (ord# ww_s2Cp) 65)) (__word 5))\r\n of _ {\r\n False -> Nothing;\r\n True -> $j_s2Bc void#\r\n };\r\n True -> $j_s2Bc void#\r\n };\r\n True -> $j_s2Bc void#\r\n }\r\n\r\ndigitToIntMaybe\r\ndigitToIntMaybe =\r\n \\ w_s2Cm ->\r\n case w_s2Cm of _ { C# ww1_s2Cp -> $wdigitToIntMaybe ww1_s2Cp }\r\n}}}\r\n\r\nWe perform the same three tests twice each, and test for an error condition that obviously can't happen.\r\n\r\n\r\n=== `quotRem` and `divMod` ===\r\n\r\nIf we define\r\n\r\n{{{#!hs\r\nx `quot` y = fst (x `quotRem` y)\r\nx `rem` y = snd (x `quotRem` y)\r\n}}}\r\n\r\nand then write something like\r\n\r\n{{{#!hs\r\nf x y | x `rem` y == 0 = x `quot` y\r\n | otherwise = 17\r\n}}}\r\n\r\nthen CSE works some magic and we only calculate `quotRem x y` once.\r\n\r\nUnfortunately, if we do this:\r\n\r\n{{{#!hs\r\nwhatever x y = if x `myRem` y == 0 then (x `myQuot` y) + 14 else x `myQuot` y\r\n}}}\r\n\r\nthen the join point transformation fires, collecting the `myQuot x y` expressions in the case branches and preventing CSE from recognizing the much better opportunity to eliminate those calculations altogether.\r\n\r\nThe situation with `divMod` is much worse. The join point transformation applied to the cases defining `divMod` prevents CSE from working magic on it in even simple situations, unless one of the arguments is known, making this definition unusable (the resulting Core is too horrifyingly long to paste here). It would probably be possible to improve the `divMod` situation to something close to the `quotRem` one by making `divMod` `NOINLINE` and adding special `divModLit` rules, but I'd much rather see a general solution.","type_of_failure":"OtherFailure","blocking":[]} -->https://gitlab.haskell.org/ghc/ghc/-/issues/947ghc -O space leak: CSE between different CAFs2019-07-07T19:16:10ZBertram Felgenhauerghc -O space leak: CSE between different CAFsConsider the following program for generating the 1,000,000th prime:
```
module Main (main) where
sieve0 (n:ns) (p:ps)
| p*p == n = sieve0 (filter (\n -> n`mod`p /= 0) ns) ps
| otherwise = n:sieve0 ns (p:ps)
primes0 :: [Int]
...Consider the following program for generating the 1,000,000th prime:
```
module Main (main) where
sieve0 (n:ns) (p:ps)
| p*p == n = sieve0 (filter (\n -> n`mod`p /= 0) ns) ps
| otherwise = n:sieve0 ns (p:ps)
primes0 :: [Int]
primes0 = 3:sieve0 [5,7..] primes0
primes :: [Int]
primes = 2:3:sieve0 [5,7..] primes0
main = print $ primes !! 1000000
```
The intention of the separate primes0 function is to limit the number of primes that need to be held in memory. Unfortunately, ghc -O combines the common subexpressions in primes0 and primes so this effort is wasted. The resulting program needs noticably more memory than required, as can be seen by replacing the definition of `primes` by
```
primes = 2:3:sieve0 (iterate (2+) 5) primes0
```
which prevents the CSE from happening.
Excerpt of `+RTS -s` output, original version:
```
12,099,221,160 bytes allocated in the heap
279,720,116 bytes copied during GC
15,834,912 bytes maximum residency (6 sample(s))
```
modified version that prevents CSE:
```
127,736,408 bytes copied during GC (scavenged)
233,388 bytes copied during GC (not scavenged)
30,624 bytes maximum residency (1 sample(s))
```
Tested with ghc 6.4.2 and a current (as of 2006-10-17) darcs ghc 6.5.
<details><summary>Trac metadata</summary>
| Trac field | Value |
| ---------------------- | ------------ |
| Version | 6.5 |
| Type | Bug |
| TypeOfFailure | OtherFailure |
| Priority | normal |
| Resolution | Unresolved |
| Component | Compiler |
| Test case | |
| Differential revisions | |
| BlockedBy | |
| Related | |
| Blocking | |
| CC | |
| Operating system | |
| Architecture | |
</details>
<!-- {"blocked_by":[],"summary":"ghc -O space leak: CSE between different CAFs","status":"New","operating_system":"","component":"Compiler","related":[],"milestone":"","resolution":"Unresolved","owner":{"tag":"Unowned"},"version":"6.5","keywords":[],"differentials":[],"test_case":"","architecture":"","cc":[""],"type":"Bug","description":"Consider the following program for generating the 1,000,000th prime:\r\n\r\n{{{\r\nmodule Main (main) where\r\n\r\nsieve0 (n:ns) (p:ps)\r\n | p*p == n = sieve0 (filter (\\n -> n`mod`p /= 0) ns) ps\r\n | otherwise = n:sieve0 ns (p:ps)\r\n\r\nprimes0 :: [Int]\r\nprimes0 = 3:sieve0 [5,7..] primes0\r\n\r\nprimes :: [Int]\r\nprimes = 2:3:sieve0 [5,7..] primes0\r\n\r\nmain = print $ primes !! 1000000\r\n}}}\r\n\r\nThe intention of the separate primes0 function is to limit the number of primes that need to be held in memory. Unfortunately, ghc -O combines the common subexpressions in primes0 and primes so this effort is wasted. The resulting program needs noticably more memory than required, as can be seen by replacing the definition of {{{primes}}} by\r\n\r\n{{{\r\nprimes = 2:3:sieve0 (iterate (2+) 5) primes0\r\n}}}\r\n\r\nwhich prevents the CSE from happening.\r\n\r\nExcerpt of {{{+RTS -s}}} output, original version:\r\n{{{\r\n12,099,221,160 bytes allocated in the heap\r\n279,720,116 bytes copied during GC\r\n 15,834,912 bytes maximum residency (6 sample(s))\r\n}}}\r\nmodified version that prevents CSE:\r\n{{{\r\n127,736,408 bytes copied during GC (scavenged)\r\n 233,388 bytes copied during GC (not scavenged)\r\n 30,624 bytes maximum residency (1 sample(s))\r\n}}}\r\n\r\nTested with ghc 6.4.2 and a current (as of 2006-10-17) darcs ghc 6.5.","type_of_failure":"OtherFailure","blocking":[]} -->https://gitlab.haskell.org/ghc/ghc/-/issues/701Better CSE optimisation2020-08-12T09:37:52ZSimon MarlowBetter CSE optimisationGHC's CSE optimisation is pretty weedy. It looks for expressions like this:
```
let x = e1 in e2
```
and replaces all occurrences of e1 in e2 with x. This doesn't do full CSE, but it does catch some cases. There have been case where...GHC's CSE optimisation is pretty weedy. It looks for expressions like this:
```
let x = e1 in e2
```
and replaces all occurrences of e1 in e2 with x. This doesn't do full CSE, but it does catch some cases. There have been case where full CSE would be significantly beneficial, though.
One possible way forward is to have a separate CSE pass that transformed expressions containing common subexpressions into the let-form above, and let the existing CSE pass do the final replacement.
We must be cautious, though: increasing sharing can introduce space leaks. Sometimes we can prove that this cannot happen, for example when the shared object is primitive, or has a bounded size.
<details><summary>Trac metadata</summary>
| Trac field | Value |
| ---------------------- | ------------ |
| Version | 6.4.1 |
| Type | Task |
| TypeOfFailure | OtherFailure |
| Priority | normal |
| Resolution | Unresolved |
| Component | Compiler |
| Test case | |
| Differential revisions | |
| BlockedBy | |
| Related | |
| Blocking | |
| CC | |
| Operating system | Unknown |
| Architecture | Unknown |
</details>
<!-- {"blocked_by":[],"summary":"Better CSE optimisation","status":"New","operating_system":"Unknown","component":"Compiler","related":[],"milestone":"","resolution":"Unresolved","owner":{"tag":"Unowned"},"version":"6.4.1","keywords":[],"differentials":[],"test_case":"","architecture":"Unknown","cc":[""],"type":"Task","description":"GHC's CSE optimisation is pretty weedy. It looks for expressions like this: \r\n{{{\r\n let x = e1 in e2\r\n}}}\r\nand replaces all occurrences of e1 in e2 with x. This doesn't do full CSE, but it does catch some cases. There have been case where full CSE would be significantly beneficial, though.\r\n\r\nOne possible way forward is to have a separate CSE pass that transformed expressions containing common subexpressions into the let-form above, and let the existing CSE pass do the final replacement.\r\n\r\nWe must be cautious, though: increasing sharing can introduce space leaks. Sometimes we can prove that this cannot happen, for example when the shared object is primitive, or has a bounded size.","type_of_failure":"OtherFailure","blocking":[]} -->Michal TerepetaMichal Terepetahttps://gitlab.haskell.org/ghc/ghc/-/issues/149missed CSE opportunity2023-04-18T11:16:02Znobodymissed CSE opportunityDon't know if this is a bug, but it was at least
_surprising_ to find that
```
playerMostOccur [a] = a
playerMostOccur (x:xs)
| numOccur x (x:xs) > numOccur (playerMostOccur xs) xs = x
| otherwise = playerMostOccur xs
```
was exponen...Don't know if this is a bug, but it was at least
_surprising_ to find that
```
playerMostOccur [a] = a
playerMostOccur (x:xs)
| numOccur x (x:xs) > numOccur (playerMostOccur xs) xs = x
| otherwise = playerMostOccur xs
```
was exponentially slower when compiled with ghc-5.04.2
-O than:
```
playerMostOccur [a] = a
playerMostOccur (x:xs)
| numOccur x (x:xs) > numOccur pmo xs = x
| otherwise = pmo
where pmo = playerMostOccur xs
```
Although the student responsible for the code couldn't
spot the
obvious optimisation, I was expecting that GHC's
optimiser would. :)
If it's not a bug, could you explain it to me?
-Greg(gregm.at.cs.uwa.edu.au)https://gitlab.haskell.org/ghc/ghc/-/issues/20138Missed optimization opportunity with redundant case expression.2023-06-15T15:50:47ZbenjaminMissed optimization opportunity with redundant case expression.## Motivation
Consider the following function:
```haskell
f :: Int -> Int
f n = case n of
2 -> n
n -> n
```
GHC is of course smart enough to know that it does not have to evaluate the case expression.
The resulting STG looks l...## Motivation
Consider the following function:
```haskell
f :: Int -> Int
f n = case n of
2 -> n
n -> n
```
GHC is of course smart enough to know that it does not have to evaluate the case expression.
The resulting STG looks like this.
```
Test.f :: GHC.Types.Int -> GHC.Types.Int
[GblId, Arity=1, Caf=NoCafRefs, Str=<S,1*U(U)>m, Unf=OtherCon []] =
\r [n_s1pK] n_s1pK;
```
However, you could also write the function like this and the semantics should be the same.
```haskell
g :: Int -> Int
g n = case n of
2 -> 2
n -> n
```
But GHC does not eliminate the case-expression in the STG.
```
Test.g1 :: GHC.Types.Int
[GblId, Caf=NoCafRefs, Str=m, Unf=OtherCon []] =
CCS_DONT_CARE GHC.Types.I#! [2#];
Test.g :: GHC.Types.Int -> GHC.Types.Int
[GblId,
Arity=1,
Caf=NoCafRefs,
Str=<S(S),1*U(U)>m,
Unf=OtherCon []] =
\r [n_s1pL]
case n_s1pL of wild_s1pM [Occ=Once] {
GHC.Types.I# ds_s1pN [Occ=Once!] ->
case ds_s1pN of {
__DEFAULT -> wild_s1pM;
2# -> Test.g1;
};
};
```
GHC seems to be able to perform this optimization, because when using unboxed types, the STG looks fine.
```haskell
h :: Int# -> Int#
h n = case n of
2# -> 2#
n -> n
```
```
Test.h :: GHC.Prim.Int# -> GHC.Prim.Int#
[GblId, Arity=1, Caf=NoCafRefs, Str=<S,1*U>, Unf=OtherCon []] =
\r [n_s1pP] n_s1pP;
```
The STG output was produced by GHC 9.0.1 with -O2. Source is attached.
## Proposal
`f` and `g` should produce the same code.
[test.hs](/uploads/e240e66e0d7886ad84fdb067df99d259/test.hs)https://gitlab.haskell.org/ghc/ghc/-/issues/17795Optimize repeated usage of instance methods.2020-02-18T16:18:30ZAndreas KlebingerOptimize repeated usage of instance methods.## Motivation
Consider this code:
```haskell
class C a where
op1 :: a -> a
op2 :: a -> a -> a
foo :: C a => a -> a -> a
foo x y =
op1 x `op2` op1 y
```
In the absence of specialization I expected `op1` and `op2` to be ext...## Motivation
Consider this code:
```haskell
class C a where
op1 :: a -> a
op2 :: a -> a -> a
foo :: C a => a -> a -> a
foo x y =
op1 x `op2` op1 y
```
In the absence of specialization I expected `op1` and `op2` to be extracted once from the constraint/dictionary and to be reused for each use.
Instead we currently extract the methods once per use:
```haskell
foo :: forall a. C a => a -> a -> a
[GblId,
Arity=3,
Caf=NoCafRefs,
Str=<S,U><L,U><L,U>, Unf=<irrelevant>
foo
= \ (@a_aAq)
($dC_aAs :: C a_aAq)
(x_agS :: a_aAq)
(y_agT :: a_aAq) ->
op2
@a_aAq
$dC_aAs
((op1 @a_aAq $dC_aAs) x_agS)
((op1 @a_aAq $dC_aAs) y_agT)
```
This isn't cheap as it causes a bunch of unknown calls and jumping back and forth. (On the plus side it doesn't allocate at least).
Even if we manually bind op1 to a name the simplifier will happily inline it again, duplicating the work performed.
## Proposal
Ideally we would translate `op1 x `op2` op1 y` from above into
```haskell
let o = op1 @a_aAq $dC_aAs
in o x `op2` o y
```
The problems preventing this are:
* We desugar into the form `op1 x `op2` op1 y` quite early.
* Even if we don't the simplifier will inline `o` as `op1` is considered cheap.
* Instances currently generate rules which will match on `op1 @a_aAq $knownDict x_agS`. If the dictionary is known, the rule will replace it with `op1_instance @a_aAq $knownDict x_agS`. This would break under the form above.
* While the let for `o` can usually be turned into a (non-allocating) case this is not guaranteed. Allocating might be worse than the duplicated work.
I don't think this is a big problem in practice.
If performance really matters we would always want the above snippet to specialize so the overhead can disappear completely.
If performance doesn't really matter then neither does it if we do a little extra work.
So while I don't intend to work on this anytime soon it seemed reasonable to document this behaviour.⊥https://gitlab.haskell.org/ghc/ghc/-/issues/5344CSE should look through coercions2019-07-07T18:55:48ZreinerpCSE should look through coercionsThis is probably a known limitation of the CSE pass, but there doesn't appear to be a ticket so I'll make one.
Consider the module:
```
module M where
newtype Id a = Id a
f (a, b) = (Id a, b)
g (a, b) = (a, b)
```
Compiling this wi...This is probably a known limitation of the CSE pass, but there doesn't appear to be a ticket so I'll make one.
Consider the module:
```
module M where
newtype Id a = Id a
f (a, b) = (Id a, b)
g (a, b) = (a, b)
```
Compiling this with `ghc -ddump-simpl -dsuppress-all -O2`, we get
```
g = \ (@ t_adf) (@ t1_adg) (ds_ddl :: (t_adf, t1_adg)) -> ds_ddl
f =
\ (@ t_adi) (@ a_adj) (ds_ddn :: (a_adj, t_adi)) ->
case ds_ddn of _ { (a_aaW, b_aaX) -> (a_aaW `cast` ..., b_aaX) }
```
We see that `g` shares its argument tuple, but `f` allocates a new copy of it. Ideally `f` would also share its argument tuple, and would look like this:
```
f = \ (@ t_adi) (@ a_adj) (ds_ddn :: (a_adj, t_adi)) -> ds_ddn `cast` ...
```
<details><summary>Trac metadata</summary>
| Trac field | Value |
| ---------------------- | -------------- |
| Version | 7.0.3 |
| Type | FeatureRequest |
| TypeOfFailure | OtherFailure |
| Priority | normal |
| Resolution | Unresolved |
| Component | Compiler |
| Test case | |
| Differential revisions | |
| BlockedBy | |
| Related | |
| Blocking | |
| CC | |
| Operating system | |
| Architecture | |
</details>
<!-- {"blocked_by":[],"summary":"CSE should look through coercions","status":"New","operating_system":"","component":"Compiler","related":[],"milestone":"","resolution":"Unresolved","owner":{"tag":"Unowned"},"version":"7.0.3","keywords":["cse"],"differentials":[],"test_case":"","architecture":"","cc":[""],"type":"FeatureRequest","description":"This is probably a known limitation of the CSE pass, but there doesn't appear to be a ticket so I'll make one.\r\n\r\nConsider the module:\r\n{{{\r\nmodule M where\r\n\r\nnewtype Id a = Id a\r\n\r\nf (a, b) = (Id a, b)\r\n\r\ng (a, b) = (a, b)\r\n}}}\r\n\r\nCompiling this with {{{ghc -ddump-simpl -dsuppress-all -O2}}}, we get\r\n\r\n{{{\r\ng = \\ (@ t_adf) (@ t1_adg) (ds_ddl :: (t_adf, t1_adg)) -> ds_ddl\r\n\r\nf =\r\n \\ (@ t_adi) (@ a_adj) (ds_ddn :: (a_adj, t_adi)) ->\r\n case ds_ddn of _ { (a_aaW, b_aaX) -> (a_aaW `cast` ..., b_aaX) }\r\n}}}\r\n\r\nWe see that {{{g}}} shares its argument tuple, but {{{f}}} allocates a new copy of it. Ideally {{{f}}} would also share its argument tuple, and would look like this:\r\n\r\n{{{\r\nf = \\ (@ t_adi) (@ a_adj) (ds_ddn :: (a_adj, t_adi)) -> ds_ddn `cast` ...\r\n}}}","type_of_failure":"OtherFailure","blocking":[]} -->Simon Peyton JonesSimon Peyton Joneshttps://gitlab.haskell.org/ghc/ghc/-/issues/18377Better combining for identical case alternatives2021-02-02T10:32:29ZSimon Peyton JonesBetter combining for identical case alternativesSometimes (e.g. [here](https://gitlab.haskell.org/ghc/ghc/-/merge_requests/3566#note_283571)) the user wants to write
```
f (IfLFCon n) = unitNameSet n
f (IfLFReEntrant _) = emptyNameSet
f (IfLFThunk _ _) = emptyNameSet
f (IfLFUn...Sometimes (e.g. [here](https://gitlab.haskell.org/ghc/ghc/-/merge_requests/3566#note_283571)) the user wants to write
```
f (IfLFCon n) = unitNameSet n
f (IfLFReEntrant _) = emptyNameSet
f (IfLFThunk _ _) = emptyNameSet
f (IfLFUnknown _) = emptyNameSet
f IfLFUnlifted = emptyNameSet
```
enumerating all the constructors for the sake of explicitness. But we don't want to do all those tests at runtime! We'd prefer to execute
```
f (IfLFCon n) = unitNameSet n
f _ = emptyNameSet
```
Some bits of GHC already combine identical alternatives:
* The Simplifier uses `combineIdenticalAlts`
* CSE uses `combineAlts`, which tries a bit harder
But neither does the full job:
* CSE doesn't use occurrence analysis to look for dead binders, so it only looks for nullary constructors
* The Simplifier uses only `cheapEqExpr`, and only combines the first alternative.
What we want is to
* Look at alternatives whose binders are all dead
* Among them pick the most popular RHS
* Combine them
Not really hard. Probably appropriate to do this in CSE.
Then we could *guarantee* that we wouldn't make redundant runtime tests.
**See also #14684** for an earlier version of the same issue.