GHC issueshttps://gitlab.haskell.org/ghc/ghc/-/issues2020-07-14T14:16:26Zhttps://gitlab.haskell.org/ghc/ghc/-/issues/18231FloatOut should only eta-expand a dead-ending RHS when arity will increase2020-07-14T14:16:26ZSebastian GrafFloatOut should only eta-expand a dead-ending RHS when arity will increaseEta-expansion can turn trivial RHSs into non-trivial RHSs. That leads to more top-level bindings after the next Simplifier run, meaning more churn. We should therefore only eta-expand (at least trivial RHSs) if we actually need to (OTOH that's only the case for CorePrep).
FloatOut is currently a bit careless in that regard. Consider
```hs
module Lib where
import Control.Monad (forever)
import Control.Monad.Trans.State.Strict
inc :: State Int ()
inc = modify' (+1)
m :: State Int ()
m = forever inc
```
After the second FloatOut (after demand analysis), we have
```
...
m :: State Int ()
m = ...
Rec {
-- RHS size: {terms: 6, types: 1, coercions: 0, joins: 0/0}
lvl_sOv
:: GHC.Prim.Int# -> Data.Functor.Identity.Identity ((), Int)
[LclId, Arity=1, Str=<L,U>b, Cpr=b]
lvl_sOv
= \ (x_aNg :: GHC.Prim.Int#) ->
a'_sO0 (GHC.Types.I# (GHC.Prim.+# x_aNg 1#))
-- RHS size: {terms: 6, types: 3, coercions: 0, joins: 0/0}
a'_sO0 [Occ=LoopBreaker]
:: Int -> Data.Functor.Identity.Identity ((), Int)
[LclId, Arity=1, Str=<L,U>b, Cpr=b]
a'_sO0
= \ (s1_aNP :: Int) ->
case s1_aNP of { GHC.Types.I# x_aNg [Dmd=<B,A>] -> lvl_sOv x_aNg }
end Rec }
-- RHS size: {terms: 3, types: 1, coercions: 5, joins: 0/0}
a :: State Int ()
[LclIdX,
Arity=1,
Str=<B,1*H>b,
Cpr=b,
Unf=Unf{Src=InlineStable, TopLvl=True, Value=True, ConLike=True,
WorkFree=True, Expandable=True,
Guidance=ALWAYS_IF(arity=0,unsat_ok=True,boring_ok=True)
Tmpl= a'_sO0
`cast` (Sym (Control.Monad.Trans.State.Strict.N:StateT[0]
<Int>_N <Data.Functor.Identity.Identity>_R <()>_N)
:: (Int -> Data.Functor.Identity.Identity ((), Int))
~R# StateT Int Data.Functor.Identity.Identity ())}]
a = (\ (eta_B1 :: Int) -> a'_sO0 eta_B1)
`cast` (Sym (Control.Monad.Trans.State.Strict.N:StateT[0]
<Int>_N <Data.Functor.Identity.Identity>_R <()>_N)
:: (Int -> Data.Functor.Identity.Identity ((), Int))
~R# StateT Int Data.Functor.Identity.Identity ())
```
Both `lvl` and the eta-expanded RHS of `a` (which will lead to yet another top-level binding after the next simplifier run) are unnecessary.
Also we end up allocating an `I#` box in each loop iteration (in `lvl`). But that is probably due to the Simplifier refraining from inlining dead-ending functions. Or SetLevels. I don't know. I think we should inline that, but don't know on what grounds and if we would punish code using `error` by that. I have to understand `Note [Bottoming floats]`. Or maybe it's because we don't do strictness WW for dead-ending functions. Anyway, the allocation isn't too concerning, but the extra binding is annoying.Eta-expansion can turn trivial RHSs into non-trivial RHSs. That leads to more top-level bindings after the next Simplifier run, meaning more churn. We should therefore only eta-expand (at least trivial RHSs) if we actually need to (OTOH that's only the case for CorePrep).
FloatOut is currently a bit careless in that regard. Consider
```hs
module Lib where
import Control.Monad (forever)
import Control.Monad.Trans.State.Strict
inc :: State Int ()
inc = modify' (+1)
m :: State Int ()
m = forever inc
```
After the second FloatOut (after demand analysis), we have
```
...
m :: State Int ()
m = ...
Rec {
-- RHS size: {terms: 6, types: 1, coercions: 0, joins: 0/0}
lvl_sOv
:: GHC.Prim.Int# -> Data.Functor.Identity.Identity ((), Int)
[LclId, Arity=1, Str=<L,U>b, Cpr=b]
lvl_sOv
= \ (x_aNg :: GHC.Prim.Int#) ->
a'_sO0 (GHC.Types.I# (GHC.Prim.+# x_aNg 1#))
-- RHS size: {terms: 6, types: 3, coercions: 0, joins: 0/0}
a'_sO0 [Occ=LoopBreaker]
:: Int -> Data.Functor.Identity.Identity ((), Int)
[LclId, Arity=1, Str=<L,U>b, Cpr=b]
a'_sO0
= \ (s1_aNP :: Int) ->
case s1_aNP of { GHC.Types.I# x_aNg [Dmd=<B,A>] -> lvl_sOv x_aNg }
end Rec }
-- RHS size: {terms: 3, types: 1, coercions: 5, joins: 0/0}
a :: State Int ()
[LclIdX,
Arity=1,
Str=<B,1*H>b,
Cpr=b,
Unf=Unf{Src=InlineStable, TopLvl=True, Value=True, ConLike=True,
WorkFree=True, Expandable=True,
Guidance=ALWAYS_IF(arity=0,unsat_ok=True,boring_ok=True)
Tmpl= a'_sO0
`cast` (Sym (Control.Monad.Trans.State.Strict.N:StateT[0]
<Int>_N <Data.Functor.Identity.Identity>_R <()>_N)
:: (Int -> Data.Functor.Identity.Identity ((), Int))
~R# StateT Int Data.Functor.Identity.Identity ())}]
a = (\ (eta_B1 :: Int) -> a'_sO0 eta_B1)
`cast` (Sym (Control.Monad.Trans.State.Strict.N:StateT[0]
<Int>_N <Data.Functor.Identity.Identity>_R <()>_N)
:: (Int -> Data.Functor.Identity.Identity ((), Int))
~R# StateT Int Data.Functor.Identity.Identity ())
```
Both `lvl` and the eta-expanded RHS of `a` (which will lead to yet another top-level binding after the next simplifier run) are unnecessary.
Also we end up allocating an `I#` box in each loop iteration (in `lvl`). But that is probably due to the Simplifier refraining from inlining dead-ending functions. Or SetLevels. I don't know. I think we should inline that, but don't know on what grounds and if we would punish code using `error` by that. I have to understand `Note [Bottoming floats]`. Or maybe it's because we don't do strictness WW for dead-ending functions. Anyway, the allocation isn't too concerning, but the extra binding is annoying.Sebastian GrafSebastian Grafhttps://gitlab.haskell.org/ghc/ghc/-/issues/18056Float out prevents RULES from firing.2020-04-17T22:26:23ZAndreas KlebingerFloat out prevents RULES from firing.## Motivation
Consider this function and rule:
```haskell
foo :: Int -> Bool
foo x = L.elem x [1,2::Int]
{-# RULES
"elem/build" [2] forall x (g :: forall b . (Int -> b -> b) -> b -> b)
. L.elem x (build g) = g (\ y r -> (x == y) || r) False
#-}
```
If compiled with -O -fno-full-laziness this fires as expected and produces:
```
foo
= \ (x_ay4 :: Int) ->
case x_ay4 of { I# x1_a1rE ->
case x1_a1rE of {
__DEFAULT -> GHC.Types.False;
1# -> GHC.Types.True;
2# -> GHC.Types.True
}
}
```
which is what we want.
If we compile using -O we get after FloatOut during the InitialPhase:
```
lvl_s1vK :: forall b. (Int -> b -> b) -> b -> b
lvl_s1vK
= \ f xs ->
f 1 (f 2 xs)
list :: [Int]
list = build lvl_s1vK
foo :: Int -> Bool
foo = \ x -> L.elem x list
```
However we never match on `L.elem x list` even though it would match the rule if the referenced bindings where inlined.
## Proposal
I'm not sure how this would be best addressed.## Motivation
Consider this function and rule:
```haskell
foo :: Int -> Bool
foo x = L.elem x [1,2::Int]
{-# RULES
"elem/build" [2] forall x (g :: forall b . (Int -> b -> b) -> b -> b)
. L.elem x (build g) = g (\ y r -> (x == y) || r) False
#-}
```
If compiled with -O -fno-full-laziness this fires as expected and produces:
```
foo
= \ (x_ay4 :: Int) ->
case x_ay4 of { I# x1_a1rE ->
case x1_a1rE of {
__DEFAULT -> GHC.Types.False;
1# -> GHC.Types.True;
2# -> GHC.Types.True
}
}
```
which is what we want.
If we compile using -O we get after FloatOut during the InitialPhase:
```
lvl_s1vK :: forall b. (Int -> b -> b) -> b -> b
lvl_s1vK
= \ f xs ->
f 1 (f 2 xs)
list :: [Int]
list = build lvl_s1vK
foo :: Int -> Bool
foo = \ x -> L.elem x list
```
However we never match on `L.elem x list` even though it would match the rule if the referenced bindings where inlined.
## Proposal
I'm not sure how this would be best addressed.Research neededhttps://gitlab.haskell.org/ghc/ghc/-/issues/16039'GHC.Magic.noinline <var>' should not float out2020-01-23T19:38:55ZGabor Greif'GHC.Magic.noinline <var>' should not float outWhile working on #15155, I discovered that the magic function `GHC.Magic.noinline` tricked the float-out transformation into performing undesirable operations:
Looking at the Core produced:
```hs
{ ghc-prim-0.5.3:GHC.Types.I# x_azMX [Dmd=<S,U>] ->
case x_azMX of {
__DEFAULT -> jump $j_sMly;
3674937295934324920# ->
src<compiler/typecheck/TcTyClsDecls.hs:3561:9-35>
check_ty_roles_sLb3
(src<compiler/typecheck/TcTyClsDecls.hs:3561:24-26> env_aqfK)
(src<compiler/typecheck/TcTyClsDecls.hs:3561:28-31> role_aqfL)
(ghc-prim-0.5.3:GHC.Magic.noinline @ Kind liftedTypeKind)
}
}
```
After simplification we end up with a toplevel variable basically redirecting to an imported variable.
```hs
-- RHS size: {terms: 2, types: 1, coercions: 0, joins: 0/0}
lvl330_r12qn :: Kind
[GblId]
lvl330_r12qn
= ghc-prim-0.5.3:GHC.Magic.noinline @ Kind liftedTypeKind
```
This would cause `IND_STATIC`s when eventually emitted as assembly.
IIRC `GHC.Magic.noinline` was introduced with GHC v8.2. This probably regressed the float-out optimisation.
I have a patch to reclassify `noinline <some-var>` as cheap, so that it won't be floated out.
N.B.: for more details, see: https://github.com/ghc/ghc/pull/224
<details><summary>Trac metadata</summary>
| Trac field | Value |
| ---------------------- | ------------ |
| Version | 8.2.1 |
| Type | Bug |
| TypeOfFailure | OtherFailure |
| Priority | normal |
| Resolution | Unresolved |
| Component | Compiler |
| Test case | |
| Differential revisions | |
| BlockedBy | |
| Related | |
| Blocking | |
| CC | |
| Operating system | |
| Architecture | |
</details>
<!-- {"blocked_by":[],"summary":"'GHC.Magic.noinline <var>' should not float out","status":"New","operating_system":"","component":"Compiler","related":[],"milestone":"8.8.1","resolution":"Unresolved","owner":{"tag":"Unowned"},"version":"8.2.1","keywords":["FloatOut"],"differentials":[],"test_case":"","architecture":"","cc":[""],"type":"Bug","description":"While working on #15155, I discovered that the magic function `GHC.Magic.noinline` tricked the float-out transformation into performing undesirable operations:\r\n\r\nLooking at the Core produced:\r\n{{{#!hs\r\n { ghc-prim-0.5.3:GHC.Types.I# x_azMX [Dmd=<S,U>] ->\r\n case x_azMX of {\r\n __DEFAULT -> jump $j_sMly;\r\n 3674937295934324920# ->\r\n src<compiler/typecheck/TcTyClsDecls.hs:3561:9-35>\r\n check_ty_roles_sLb3\r\n (src<compiler/typecheck/TcTyClsDecls.hs:3561:24-26> env_aqfK)\r\n (src<compiler/typecheck/TcTyClsDecls.hs:3561:28-31> role_aqfL)\r\n (ghc-prim-0.5.3:GHC.Magic.noinline @ Kind liftedTypeKind)\r\n }\r\n }\r\n}}}\r\nAfter simplification we end up with a toplevel variable basically redirecting to an imported variable.\r\n{{{#!hs\r\n-- RHS size: {terms: 2, types: 1, coercions: 0, joins: 0/0}\r\nlvl330_r12qn :: Kind\r\n[GblId]\r\nlvl330_r12qn\r\n = ghc-prim-0.5.3:GHC.Magic.noinline @ Kind liftedTypeKind\r\n}}}\r\nThis would cause `IND_STATIC`s when eventually emitted as assembly.\r\n\r\nIIRC `GHC.Magic.noinline` was introduced with GHC v8.2. This probably regressed the float-out optimisation.\r\n\r\nI have a patch to reclassify `noinline <some-var>` as cheap, so that it won't be floated out.\r\n\r\nN.B.: for more details, see: https://github.com/ghc/ghc/pull/224","type_of_failure":"OtherFailure","blocking":[]} -->While working on #15155, I discovered that the magic function `GHC.Magic.noinline` tricked the float-out transformation into performing undesirable operations:
Looking at the Core produced:
```hs
{ ghc-prim-0.5.3:GHC.Types.I# x_azMX [Dmd=<S,U>] ->
case x_azMX of {
__DEFAULT -> jump $j_sMly;
3674937295934324920# ->
src<compiler/typecheck/TcTyClsDecls.hs:3561:9-35>
check_ty_roles_sLb3
(src<compiler/typecheck/TcTyClsDecls.hs:3561:24-26> env_aqfK)
(src<compiler/typecheck/TcTyClsDecls.hs:3561:28-31> role_aqfL)
(ghc-prim-0.5.3:GHC.Magic.noinline @ Kind liftedTypeKind)
}
}
```
After simplification we end up with a toplevel variable basically redirecting to an imported variable.
```hs
-- RHS size: {terms: 2, types: 1, coercions: 0, joins: 0/0}
lvl330_r12qn :: Kind
[GblId]
lvl330_r12qn
= ghc-prim-0.5.3:GHC.Magic.noinline @ Kind liftedTypeKind
```
This would cause `IND_STATIC`s when eventually emitted as assembly.
IIRC `GHC.Magic.noinline` was introduced with GHC v8.2. This probably regressed the float-out optimisation.
I have a patch to reclassify `noinline <some-var>` as cheap, so that it won't be floated out.
N.B.: for more details, see: https://github.com/ghc/ghc/pull/224
<details><summary>Trac metadata</summary>
| Trac field | Value |
| ---------------------- | ------------ |
| Version | 8.2.1 |
| Type | Bug |
| TypeOfFailure | OtherFailure |
| Priority | normal |
| Resolution | Unresolved |
| Component | Compiler |
| Test case | |
| Differential revisions | |
| BlockedBy | |
| Related | |
| Blocking | |
| CC | |
| Operating system | |
| Architecture | |
</details>
<!-- {"blocked_by":[],"summary":"'GHC.Magic.noinline <var>' should not float out","status":"New","operating_system":"","component":"Compiler","related":[],"milestone":"8.8.1","resolution":"Unresolved","owner":{"tag":"Unowned"},"version":"8.2.1","keywords":["FloatOut"],"differentials":[],"test_case":"","architecture":"","cc":[""],"type":"Bug","description":"While working on #15155, I discovered that the magic function `GHC.Magic.noinline` tricked the float-out transformation into performing undesirable operations:\r\n\r\nLooking at the Core produced:\r\n{{{#!hs\r\n { ghc-prim-0.5.3:GHC.Types.I# x_azMX [Dmd=<S,U>] ->\r\n case x_azMX of {\r\n __DEFAULT -> jump $j_sMly;\r\n 3674937295934324920# ->\r\n src<compiler/typecheck/TcTyClsDecls.hs:3561:9-35>\r\n check_ty_roles_sLb3\r\n (src<compiler/typecheck/TcTyClsDecls.hs:3561:24-26> env_aqfK)\r\n (src<compiler/typecheck/TcTyClsDecls.hs:3561:28-31> role_aqfL)\r\n (ghc-prim-0.5.3:GHC.Magic.noinline @ Kind liftedTypeKind)\r\n }\r\n }\r\n}}}\r\nAfter simplification we end up with a toplevel variable basically redirecting to an imported variable.\r\n{{{#!hs\r\n-- RHS size: {terms: 2, types: 1, coercions: 0, joins: 0/0}\r\nlvl330_r12qn :: Kind\r\n[GblId]\r\nlvl330_r12qn\r\n = ghc-prim-0.5.3:GHC.Magic.noinline @ Kind liftedTypeKind\r\n}}}\r\nThis would cause `IND_STATIC`s when eventually emitted as assembly.\r\n\r\nIIRC `GHC.Magic.noinline` was introduced with GHC v8.2. This probably regressed the float-out optimisation.\r\n\r\nI have a patch to reclassify `noinline <some-var>` as cheap, so that it won't be floated out.\r\n\r\nN.B.: for more details, see: https://github.com/ghc/ghc/pull/224","type_of_failure":"OtherFailure","blocking":[]} -->Gabor GreifGabor Greifhttps://gitlab.haskell.org/ghc/ghc/-/issues/15606Don't float out lets in between lambdsa2019-07-07T18:03:46ZSimon Peyton JonesDon't float out lets in between lambdsaConsider
```
f = \x. let y = <blah>
in \z. let v = h x y in <stuff>
```
The full laziness pass will float out that v-binding thus
```
f = \x. let y = <blah>
v = h x y
in \z. <stuff>
```
And now (if `h` is, say, imported) it'll stay like that.
But suppose `<blah>` simlifies to `Just x`. Then we allow ourselves to eta-expand thus
```
f = \x z. let y = <blah>
v = h x y
in <stuff>
```
Now (an early design choice in the let-floater) we never float the v-binding in between the `\x` and `\z`.
This is very non-confluent: a smal change in exactly how rapidly `<blah>` simplifies can
have a big, irreversible effect on the code for `f`.
IDEA: extend the let-floater's design choice to not float out between two lambdas, even if
they are separated by lets/cases etc. One way to say this is to ask when a lambda gets
a new level number compared to its immediately enclosing lambda.
Examples where `y` gets the same level number as `x`
- `\x.\y. blah`
- `\x. let binds in \y`
- `\x. case scrut of pi -> \y.blah`
Examples where `y` gets the a level number one bigger than `x`
- `\x. let v = \y.rhs in blah`
- `\x. f (\y.rhs)`
This probably won't make a lot of difference, but it'd be worth trying
<details><summary>Trac metadata</summary>
| Trac field | Value |
| ---------------------- | ------------ |
| Version | 8.4.3 |
| Type | Bug |
| TypeOfFailure | OtherFailure |
| Priority | normal |
| Resolution | Unresolved |
| Component | Compiler |
| Test case | |
| Differential revisions | |
| BlockedBy | |
| Related | |
| Blocking | |
| CC | |
| Operating system | |
| Architecture | |
</details>
<!-- {"blocked_by":[],"summary":"Don't float out lets in between lambdsa","status":"New","operating_system":"","component":"Compiler","related":[],"milestone":"8.6.1","resolution":"Unresolved","owner":{"tag":"Unowned"},"version":"8.4.3","keywords":[],"differentials":[],"test_case":"","architecture":"","cc":[""],"type":"Bug","description":"Consider\r\n{{{\r\nf = \\x. let y = <blah>\r\n in \\z. let v = h x y in <stuff>\r\n}}}\r\nThe full laziness pass will float out that v-binding thus\r\n{{{\r\nf = \\x. let y = <blah>\r\n v = h x y\r\n in \\z. <stuff>\r\n}}}\r\nAnd now (if `h` is, say, imported) it'll stay like that.\r\n\r\nBut suppose `<blah>` simlifies to `Just x`. Then we allow ourselves to eta-expand thus\r\n{{{\r\nf = \\x z. let y = <blah>\r\n v = h x y\r\n in <stuff>\r\n}}}\r\nNow (an early design choice in the let-floater) we never float the v-binding in between the `\\x` and `\\z`.\r\n\r\nThis is very non-confluent: a smal change in exactly how rapidly `<blah>` simplifies can\r\nhave a big, irreversible effect on the code for `f`.\r\n\r\nIDEA: extend the let-floater's design choice to not float out between two lambdas, even if\r\nthey are separated by lets/cases etc. One way to say this is to ask when a lambda gets\r\na new level number compared to its immediately enclosing lambda.\r\n\r\nExamples where `y` gets the same level number as `x`\r\n* `\\x.\\y. blah`\r\n* `\\x. let binds in \\y`\r\n* `\\x. case scrut of pi -> \\y.blah`\r\n\r\nExamples where `y` gets the a level number one bigger than `x`\r\n* `\\x. let v = \\y.rhs in blah`\r\n* `\\x. f (\\y.rhs)`\r\n\r\nThis probably won't make a lot of difference, but it'd be worth trying\r\n","type_of_failure":"OtherFailure","blocking":[]} -->Consider
```
f = \x. let y = <blah>
in \z. let v = h x y in <stuff>
```
The full laziness pass will float out that v-binding thus
```
f = \x. let y = <blah>
v = h x y
in \z. <stuff>
```
And now (if `h` is, say, imported) it'll stay like that.
But suppose `<blah>` simlifies to `Just x`. Then we allow ourselves to eta-expand thus
```
f = \x z. let y = <blah>
v = h x y
in <stuff>
```
Now (an early design choice in the let-floater) we never float the v-binding in between the `\x` and `\z`.
This is very non-confluent: a smal change in exactly how rapidly `<blah>` simplifies can
have a big, irreversible effect on the code for `f`.
IDEA: extend the let-floater's design choice to not float out between two lambdas, even if
they are separated by lets/cases etc. One way to say this is to ask when a lambda gets
a new level number compared to its immediately enclosing lambda.
Examples where `y` gets the same level number as `x`
- `\x.\y. blah`
- `\x. let binds in \y`
- `\x. case scrut of pi -> \y.blah`
Examples where `y` gets the a level number one bigger than `x`
- `\x. let v = \y.rhs in blah`
- `\x. f (\y.rhs)`
This probably won't make a lot of difference, but it'd be worth trying
<details><summary>Trac metadata</summary>
| Trac field | Value |
| ---------------------- | ------------ |
| Version | 8.4.3 |
| Type | Bug |
| TypeOfFailure | OtherFailure |
| Priority | normal |
| Resolution | Unresolved |
| Component | Compiler |
| Test case | |
| Differential revisions | |
| BlockedBy | |
| Related | |
| Blocking | |
| CC | |
| Operating system | |
| Architecture | |
</details>
<!-- {"blocked_by":[],"summary":"Don't float out lets in between lambdsa","status":"New","operating_system":"","component":"Compiler","related":[],"milestone":"8.6.1","resolution":"Unresolved","owner":{"tag":"Unowned"},"version":"8.4.3","keywords":[],"differentials":[],"test_case":"","architecture":"","cc":[""],"type":"Bug","description":"Consider\r\n{{{\r\nf = \\x. let y = <blah>\r\n in \\z. let v = h x y in <stuff>\r\n}}}\r\nThe full laziness pass will float out that v-binding thus\r\n{{{\r\nf = \\x. let y = <blah>\r\n v = h x y\r\n in \\z. <stuff>\r\n}}}\r\nAnd now (if `h` is, say, imported) it'll stay like that.\r\n\r\nBut suppose `<blah>` simlifies to `Just x`. Then we allow ourselves to eta-expand thus\r\n{{{\r\nf = \\x z. let y = <blah>\r\n v = h x y\r\n in <stuff>\r\n}}}\r\nNow (an early design choice in the let-floater) we never float the v-binding in between the `\\x` and `\\z`.\r\n\r\nThis is very non-confluent: a smal change in exactly how rapidly `<blah>` simplifies can\r\nhave a big, irreversible effect on the code for `f`.\r\n\r\nIDEA: extend the let-floater's design choice to not float out between two lambdas, even if\r\nthey are separated by lets/cases etc. One way to say this is to ask when a lambda gets\r\na new level number compared to its immediately enclosing lambda.\r\n\r\nExamples where `y` gets the same level number as `x`\r\n* `\\x.\\y. blah`\r\n* `\\x. let binds in \\y`\r\n* `\\x. case scrut of pi -> \\y.blah`\r\n\r\nExamples where `y` gets the a level number one bigger than `x`\r\n* `\\x. let v = \\y.rhs in blah`\r\n* `\\x. f (\\y.rhs)`\r\n\r\nThis probably won't make a lot of difference, but it'd be worth trying\r\n","type_of_failure":"OtherFailure","blocking":[]} -->8.6.1https://gitlab.haskell.org/ghc/ghc/-/issues/8457-ffull-laziness does more harm than good2020-07-09T10:08:05Zerrge-ffull-laziness does more harm than goodIn this bug report I'd like to argue that `-ffull-laziness` shouldn't
be turned on automatically with either `-O` nor `-O2`, because it's
dangerous and can cause serious memory leaks which are hard to debug
or prevent. I'll also try to show that its optimization benefits are
negligible. Actually, my benchmarks show that it's beneficial to turn
it off even in the cases where we don't hit a space leak.
We've met this issue last week, but it had been reported several times
before: e.g. #917 and #5262.
A typical example is the following:
```haskell
main :: IO ()
main = task () >> task ()
task :: () -> IO ()
task () = printvalues [1..1000000 :: Int]
printvalues :: [Int] -> IO ()
printvalues (x:xs) = print x >> printvalues xs
printvalues [] = return ()
```
We succeed with `-O0`, but fail with `-O`:
```
errge@curry:~/tmp $ ~/tmp/ghc/inplace/bin/ghc-stage2 -v0 -O0 -fforce-recomp lazy && ./lazy +RTS -t >/dev/null
<<ghc: 1620098744 bytes, 3117 GCs, 32265/42580 avg/max bytes residency (3 samples), 2M in use, 0.00 INIT (0.00 elapsed), 1.28 MUT (1.28 elapsed), 0.02 GC (0.02 elapsed) :ghc>>
errge@curry:~/tmp $ ~/tmp/ghc/inplace/bin/ghc-stage2 -v0 -O -fforce-recomp lazy && ./lazy +RTS -t >/dev/null
<<ghc: 1444098612 bytes, 2761 GCs, 3812497/13044272 avg/max bytes residency (7 samples), 28M in use, 0.00 INIT (0.00 elapsed), 1.02 MUT (1.03 elapsed), 0.12 GC (0.12 elapsed) :ghc>>
```
28M? What the leak!? Well, it's `-ffull-laziness`:
```
errge@curry:~/tmp $ ~/tmp/ghc/inplace/bin/ghc-stage2 -v0 -O -fno-full-laziness -fforce-recomp lazy && ./lazy +RTS -t >/dev/null
<<ghc: 1484098612 bytes, 2835 GCs, 34812/42580 avg/max bytes residency (2 samples), 1M in use, 0.00 INIT (0.00 elapsed), 1.04 MUT (1.04 elapsed), 0.02 GC (0.02 elapsed) :ghc>>
```
We get constant space and the fastest run-time too, since we spare
some cycles on GC.
Note, that in this instance we are trying to explicity disable sharing
by using `()` as a fake argument for the function. Also note, that
this function may easily be a utility function in a larger code base
or in a library, therefore it's impractical to say that you shouldn't
use it twice "too close together".
Quoting from the GHC user guide:
```
-O2:
Means: “Apply every non-dangerous optimisation, even if it means
significantly longer compile times.”
The avoided “dangerous” optimisations are those that can make
runtime or space worse if you're unlucky. They are normally turned
on or off individually.
At the moment, -O2 is unlikely to produce better code than -O.
```
This seems to be false at the moment.
We decided to make a broader investigation into this issue and wanted
to know if we can disable this optimization without too much pain.
Came up with this benchmark plan:
> - let's benchmark GHC,
- compile all stages with -O, but hack the stage1 compiler to
> emit `-t` statistics for every file compiled,
- gather these statistics while compiling the libraries and the
> stage2 compiler.
On the second run we compile the stage1 compiler with
`-O -fno-full-laziness`, but leave everything else unchanged in the
environment.
When we have both results of the compilation of \~1600 files, we match
them up and compute the (logarithmic) ratio of CPU and memory
difference between compilations, the final results for our benchmark.
The results and the raw data can be found at
https://github.com/errge/notlazy.
The overall compilation time dropped from 26:20 to 25:12, which is a
4% improvement. Investigating the full matching shows that this
overall result is from small improvements all around the place.
The results plotted:
> - https://github.com/errge/notlazy/blob/master/cpu.png
> - https://github.com/errge/notlazy/blob/master/mem.png
The graphs show the logarithmic (100\*log_10(new/orig)) ratio of change
in cpu and memory consumption. Therefore negative results mean that
the new compilation method is faster.
As can be seen on the CPU graph, in most of the cases the difference
is negligible (actually smaller than what can be measured on small
files, this is why we have the spike at 0). In overall we see a small
improvement in CPU, and there are some outliers in both directions,
but there are more drastic improvement cases than drastic regressions.
On the memory graph the situation is much more close to zero. There
is one big positive memory outlier: `DsListComp.lhs`. It uses 69M
originally and now uses 103M. But compiles in 2 seconds both ways and
there are files in the source tree which requires 400M to compile, so
this is not an issue.
After all this, I'd like to hear other opinions about just disabling
this optimization in `-O` and `-O2` and leaving it as an option that
can be turned on when needed, my reasons once more:
- it's unsafe,
- it's hard to debug when you hit its issues,
- the optimization doesn't seem to be very productive,
- it's always easy to force sharing, but it's not easy to force
> copying.
Apparently a Haskell programmer should be lazy, but never fully lazy.
Research done by Gergely Risko \<errge\> and Mihaly Barasz \<klao\>,
confirmed on two different machines with no other running processes.
<details><summary>Trac metadata</summary>
| Trac field | Value |
| ---------------------- | ------------ |
| Version | 7.7 |
| Type | Bug |
| TypeOfFailure | OtherFailure |
| Priority | high |
| Resolution | Unresolved |
| Component | Compiler |
| Test case | |
| Differential revisions | |
| BlockedBy | |
| Related | |
| Blocking | |
| CC | |
| Operating system | |
| Architecture | |
</details>
<!-- {"blocked_by":[],"summary":"-ffull-laziness does more harm than good","status":"New","operating_system":"","component":"Compiler","related":[],"milestone":"7.8.1","resolution":"Unresolved","owner":{"tag":"Unowned"},"version":"7.7","keywords":[],"differentials":[],"test_case":"","architecture":"","cc":[""],"type":"Bug","description":"In this bug report I'd like to argue that `-ffull-laziness` shouldn't\r\nbe turned on automatically with either `-O` nor `-O2`, because it's\r\ndangerous and can cause serious memory leaks which are hard to debug\r\nor prevent. I'll also try to show that its optimization benefits are\r\nnegligible. Actually, my benchmarks show that it's beneficial to turn\r\nit off even in the cases where we don't hit a space leak.\r\n\r\nWe've met this issue last week, but it had been reported several times\r\nbefore: e.g. #917 and #5262.\r\n\r\nA typical example is the following:\r\n{{{\r\n#!haskell\r\nmain :: IO ()\r\nmain = task () >> task ()\r\n\r\ntask :: () -> IO ()\r\ntask () = printvalues [1..1000000 :: Int]\r\n\r\nprintvalues :: [Int] -> IO ()\r\nprintvalues (x:xs) = print x >> printvalues xs\r\nprintvalues [] = return ()\r\n}}}\r\n\r\nWe succeed with `-O0`, but fail with `-O`:\r\n{{{\r\nerrge@curry:~/tmp $ ~/tmp/ghc/inplace/bin/ghc-stage2 -v0 -O0 -fforce-recomp lazy && ./lazy +RTS -t >/dev/null\r\n<<ghc: 1620098744 bytes, 3117 GCs, 32265/42580 avg/max bytes residency (3 samples), 2M in use, 0.00 INIT (0.00 elapsed), 1.28 MUT (1.28 elapsed), 0.02 GC (0.02 elapsed) :ghc>>\r\nerrge@curry:~/tmp $ ~/tmp/ghc/inplace/bin/ghc-stage2 -v0 -O -fforce-recomp lazy && ./lazy +RTS -t >/dev/null\r\n<<ghc: 1444098612 bytes, 2761 GCs, 3812497/13044272 avg/max bytes residency (7 samples), 28M in use, 0.00 INIT (0.00 elapsed), 1.02 MUT (1.03 elapsed), 0.12 GC (0.12 elapsed) :ghc>>\r\n}}}\r\n\r\n28M? What the leak!? Well, it's `-ffull-laziness`:\r\n{{{\r\nerrge@curry:~/tmp $ ~/tmp/ghc/inplace/bin/ghc-stage2 -v0 -O -fno-full-laziness -fforce-recomp lazy && ./lazy +RTS -t >/dev/null\r\n<<ghc: 1484098612 bytes, 2835 GCs, 34812/42580 avg/max bytes residency (2 samples), 1M in use, 0.00 INIT (0.00 elapsed), 1.04 MUT (1.04 elapsed), 0.02 GC (0.02 elapsed) :ghc>>\r\n}}}\r\n\r\nWe get constant space and the fastest run-time too, since we spare\r\nsome cycles on GC.\r\n\r\nNote, that in this instance we are trying to explicity disable sharing\r\nby using `()` as a fake argument for the function. Also note, that\r\nthis function may easily be a utility function in a larger code base\r\nor in a library, therefore it's impractical to say that you shouldn't\r\nuse it twice \"too close together\".\r\n\r\nQuoting from the GHC user guide:\r\n{{{\r\n -O2:\r\n\r\n Means: “Apply every non-dangerous optimisation, even if it means\r\n significantly longer compile times.”\r\n\r\n The avoided “dangerous” optimisations are those that can make\r\n runtime or space worse if you're unlucky. They are normally turned\r\n on or off individually.\r\n\r\n At the moment, -O2 is unlikely to produce better code than -O.\r\n}}}\r\n\r\nThis seems to be false at the moment.\r\n\r\nWe decided to make a broader investigation into this issue and wanted\r\nto know if we can disable this optimization without too much pain.\r\nCame up with this benchmark plan:\r\n\r\n - let's benchmark GHC,\r\n\r\n - compile all stages with -O, but hack the stage1 compiler to\r\n emit `-t` statistics for every file compiled,\r\n\r\n - gather these statistics while compiling the libraries and the\r\n stage2 compiler.\r\n\r\nOn the second run we compile the stage1 compiler with\r\n`-O -fno-full-laziness`, but leave everything else unchanged in the\r\nenvironment.\r\n\r\nWhen we have both results of the compilation of ~1600 files, we match\r\nthem up and compute the (logarithmic) ratio of CPU and memory\r\ndifference between compilations, the final results for our benchmark.\r\n\r\nThe results and the raw data can be found at\r\nhttps://github.com/errge/notlazy.\r\n\r\nThe overall compilation time dropped from 26:20 to 25:12, which is a\r\n4% improvement. Investigating the full matching shows that this\r\noverall result is from small improvements all around the place.\r\n\r\nThe results plotted:\r\n - https://github.com/errge/notlazy/blob/master/cpu.png\r\n - https://github.com/errge/notlazy/blob/master/mem.png\r\n\r\nThe graphs show the logarithmic (100*log_10(new/orig)) ratio of change\r\nin cpu and memory consumption. Therefore negative results mean that\r\nthe new compilation method is faster.\r\n\r\nAs can be seen on the CPU graph, in most of the cases the difference\r\nis negligible (actually smaller than what can be measured on small\r\nfiles, this is why we have the spike at 0). In overall we see a small\r\nimprovement in CPU, and there are some outliers in both directions,\r\nbut there are more drastic improvement cases than drastic regressions.\r\n\r\nOn the memory graph the situation is much more close to zero. There\r\nis one big positive memory outlier: `DsListComp.lhs`. It uses 69M\r\noriginally and now uses 103M. But compiles in 2 seconds both ways and\r\nthere are files in the source tree which requires 400M to compile, so\r\nthis is not an issue.\r\n\r\nAfter all this, I'd like to hear other opinions about just disabling\r\nthis optimization in `-O` and `-O2` and leaving it as an option that\r\ncan be turned on when needed, my reasons once more:\r\n - it's unsafe,\r\n - it's hard to debug when you hit its issues,\r\n - the optimization doesn't seem to be very productive,\r\n - it's always easy to force sharing, but it's not easy to force\r\n copying.\r\n\r\nApparently a Haskell programmer should be lazy, but never fully lazy. \r\n\r\nResearch done by Gergely Risko <errge> and Mihaly Barasz <klao>,\r\nconfirmed on two different machines with no other running processes.\r\n","type_of_failure":"OtherFailure","blocking":[]} -->In this bug report I'd like to argue that `-ffull-laziness` shouldn't
be turned on automatically with either `-O` nor `-O2`, because it's
dangerous and can cause serious memory leaks which are hard to debug
or prevent. I'll also try to show that its optimization benefits are
negligible. Actually, my benchmarks show that it's beneficial to turn
it off even in the cases where we don't hit a space leak.
We've met this issue last week, but it had been reported several times
before: e.g. #917 and #5262.
A typical example is the following:
```haskell
main :: IO ()
main = task () >> task ()
task :: () -> IO ()
task () = printvalues [1..1000000 :: Int]
printvalues :: [Int] -> IO ()
printvalues (x:xs) = print x >> printvalues xs
printvalues [] = return ()
```
We succeed with `-O0`, but fail with `-O`:
```
errge@curry:~/tmp $ ~/tmp/ghc/inplace/bin/ghc-stage2 -v0 -O0 -fforce-recomp lazy && ./lazy +RTS -t >/dev/null
<<ghc: 1620098744 bytes, 3117 GCs, 32265/42580 avg/max bytes residency (3 samples), 2M in use, 0.00 INIT (0.00 elapsed), 1.28 MUT (1.28 elapsed), 0.02 GC (0.02 elapsed) :ghc>>
errge@curry:~/tmp $ ~/tmp/ghc/inplace/bin/ghc-stage2 -v0 -O -fforce-recomp lazy && ./lazy +RTS -t >/dev/null
<<ghc: 1444098612 bytes, 2761 GCs, 3812497/13044272 avg/max bytes residency (7 samples), 28M in use, 0.00 INIT (0.00 elapsed), 1.02 MUT (1.03 elapsed), 0.12 GC (0.12 elapsed) :ghc>>
```
28M? What the leak!? Well, it's `-ffull-laziness`:
```
errge@curry:~/tmp $ ~/tmp/ghc/inplace/bin/ghc-stage2 -v0 -O -fno-full-laziness -fforce-recomp lazy && ./lazy +RTS -t >/dev/null
<<ghc: 1484098612 bytes, 2835 GCs, 34812/42580 avg/max bytes residency (2 samples), 1M in use, 0.00 INIT (0.00 elapsed), 1.04 MUT (1.04 elapsed), 0.02 GC (0.02 elapsed) :ghc>>
```
We get constant space and the fastest run-time too, since we spare
some cycles on GC.
Note, that in this instance we are trying to explicity disable sharing
by using `()` as a fake argument for the function. Also note, that
this function may easily be a utility function in a larger code base
or in a library, therefore it's impractical to say that you shouldn't
use it twice "too close together".
Quoting from the GHC user guide:
```
-O2:
Means: “Apply every non-dangerous optimisation, even if it means
significantly longer compile times.”
The avoided “dangerous” optimisations are those that can make
runtime or space worse if you're unlucky. They are normally turned
on or off individually.
At the moment, -O2 is unlikely to produce better code than -O.
```
This seems to be false at the moment.
We decided to make a broader investigation into this issue and wanted
to know if we can disable this optimization without too much pain.
Came up with this benchmark plan:
> - let's benchmark GHC,
- compile all stages with -O, but hack the stage1 compiler to
> emit `-t` statistics for every file compiled,
- gather these statistics while compiling the libraries and the
> stage2 compiler.
On the second run we compile the stage1 compiler with
`-O -fno-full-laziness`, but leave everything else unchanged in the
environment.
When we have both results of the compilation of \~1600 files, we match
them up and compute the (logarithmic) ratio of CPU and memory
difference between compilations, the final results for our benchmark.
The results and the raw data can be found at
https://github.com/errge/notlazy.
The overall compilation time dropped from 26:20 to 25:12, which is a
4% improvement. Investigating the full matching shows that this
overall result is from small improvements all around the place.
The results plotted:
> - https://github.com/errge/notlazy/blob/master/cpu.png
> - https://github.com/errge/notlazy/blob/master/mem.png
The graphs show the logarithmic (100\*log_10(new/orig)) ratio of change
in cpu and memory consumption. Therefore negative results mean that
the new compilation method is faster.
As can be seen on the CPU graph, in most of the cases the difference
is negligible (actually smaller than what can be measured on small
files, this is why we have the spike at 0). In overall we see a small
improvement in CPU, and there are some outliers in both directions,
but there are more drastic improvement cases than drastic regressions.
On the memory graph the situation is much more close to zero. There
is one big positive memory outlier: `DsListComp.lhs`. It uses 69M
originally and now uses 103M. But compiles in 2 seconds both ways and
there are files in the source tree which requires 400M to compile, so
this is not an issue.
After all this, I'd like to hear other opinions about just disabling
this optimization in `-O` and `-O2` and leaving it as an option that
can be turned on when needed, my reasons once more:
- it's unsafe,
- it's hard to debug when you hit its issues,
- the optimization doesn't seem to be very productive,
- it's always easy to force sharing, but it's not easy to force
> copying.
Apparently a Haskell programmer should be lazy, but never fully lazy.
Research done by Gergely Risko \<errge\> and Mihaly Barasz \<klao\>,
confirmed on two different machines with no other running processes.
<details><summary>Trac metadata</summary>
| Trac field | Value |
| ---------------------- | ------------ |
| Version | 7.7 |
| Type | Bug |
| TypeOfFailure | OtherFailure |
| Priority | high |
| Resolution | Unresolved |
| Component | Compiler |
| Test case | |
| Differential revisions | |
| BlockedBy | |
| Related | |
| Blocking | |
| CC | |
| Operating system | |
| Architecture | |
</details>
<!-- {"blocked_by":[],"summary":"-ffull-laziness does more harm than good","status":"New","operating_system":"","component":"Compiler","related":[],"milestone":"7.8.1","resolution":"Unresolved","owner":{"tag":"Unowned"},"version":"7.7","keywords":[],"differentials":[],"test_case":"","architecture":"","cc":[""],"type":"Bug","description":"In this bug report I'd like to argue that `-ffull-laziness` shouldn't\r\nbe turned on automatically with either `-O` nor `-O2`, because it's\r\ndangerous and can cause serious memory leaks which are hard to debug\r\nor prevent. I'll also try to show that its optimization benefits are\r\nnegligible. Actually, my benchmarks show that it's beneficial to turn\r\nit off even in the cases where we don't hit a space leak.\r\n\r\nWe've met this issue last week, but it had been reported several times\r\nbefore: e.g. #917 and #5262.\r\n\r\nA typical example is the following:\r\n{{{\r\n#!haskell\r\nmain :: IO ()\r\nmain = task () >> task ()\r\n\r\ntask :: () -> IO ()\r\ntask () = printvalues [1..1000000 :: Int]\r\n\r\nprintvalues :: [Int] -> IO ()\r\nprintvalues (x:xs) = print x >> printvalues xs\r\nprintvalues [] = return ()\r\n}}}\r\n\r\nWe succeed with `-O0`, but fail with `-O`:\r\n{{{\r\nerrge@curry:~/tmp $ ~/tmp/ghc/inplace/bin/ghc-stage2 -v0 -O0 -fforce-recomp lazy && ./lazy +RTS -t >/dev/null\r\n<<ghc: 1620098744 bytes, 3117 GCs, 32265/42580 avg/max bytes residency (3 samples), 2M in use, 0.00 INIT (0.00 elapsed), 1.28 MUT (1.28 elapsed), 0.02 GC (0.02 elapsed) :ghc>>\r\nerrge@curry:~/tmp $ ~/tmp/ghc/inplace/bin/ghc-stage2 -v0 -O -fforce-recomp lazy && ./lazy +RTS -t >/dev/null\r\n<<ghc: 1444098612 bytes, 2761 GCs, 3812497/13044272 avg/max bytes residency (7 samples), 28M in use, 0.00 INIT (0.00 elapsed), 1.02 MUT (1.03 elapsed), 0.12 GC (0.12 elapsed) :ghc>>\r\n}}}\r\n\r\n28M? What the leak!? Well, it's `-ffull-laziness`:\r\n{{{\r\nerrge@curry:~/tmp $ ~/tmp/ghc/inplace/bin/ghc-stage2 -v0 -O -fno-full-laziness -fforce-recomp lazy && ./lazy +RTS -t >/dev/null\r\n<<ghc: 1484098612 bytes, 2835 GCs, 34812/42580 avg/max bytes residency (2 samples), 1M in use, 0.00 INIT (0.00 elapsed), 1.04 MUT (1.04 elapsed), 0.02 GC (0.02 elapsed) :ghc>>\r\n}}}\r\n\r\nWe get constant space and the fastest run-time too, since we spare\r\nsome cycles on GC.\r\n\r\nNote, that in this instance we are trying to explicity disable sharing\r\nby using `()` as a fake argument for the function. Also note, that\r\nthis function may easily be a utility function in a larger code base\r\nor in a library, therefore it's impractical to say that you shouldn't\r\nuse it twice \"too close together\".\r\n\r\nQuoting from the GHC user guide:\r\n{{{\r\n -O2:\r\n\r\n Means: “Apply every non-dangerous optimisation, even if it means\r\n significantly longer compile times.”\r\n\r\n The avoided “dangerous” optimisations are those that can make\r\n runtime or space worse if you're unlucky. They are normally turned\r\n on or off individually.\r\n\r\n At the moment, -O2 is unlikely to produce better code than -O.\r\n}}}\r\n\r\nThis seems to be false at the moment.\r\n\r\nWe decided to make a broader investigation into this issue and wanted\r\nto know if we can disable this optimization without too much pain.\r\nCame up with this benchmark plan:\r\n\r\n - let's benchmark GHC,\r\n\r\n - compile all stages with -O, but hack the stage1 compiler to\r\n emit `-t` statistics for every file compiled,\r\n\r\n - gather these statistics while compiling the libraries and the\r\n stage2 compiler.\r\n\r\nOn the second run we compile the stage1 compiler with\r\n`-O -fno-full-laziness`, but leave everything else unchanged in the\r\nenvironment.\r\n\r\nWhen we have both results of the compilation of ~1600 files, we match\r\nthem up and compute the (logarithmic) ratio of CPU and memory\r\ndifference between compilations, the final results for our benchmark.\r\n\r\nThe results and the raw data can be found at\r\nhttps://github.com/errge/notlazy.\r\n\r\nThe overall compilation time dropped from 26:20 to 25:12, which is a\r\n4% improvement. Investigating the full matching shows that this\r\noverall result is from small improvements all around the place.\r\n\r\nThe results plotted:\r\n - https://github.com/errge/notlazy/blob/master/cpu.png\r\n - https://github.com/errge/notlazy/blob/master/mem.png\r\n\r\nThe graphs show the logarithmic (100*log_10(new/orig)) ratio of change\r\nin cpu and memory consumption. Therefore negative results mean that\r\nthe new compilation method is faster.\r\n\r\nAs can be seen on the CPU graph, in most of the cases the difference\r\nis negligible (actually smaller than what can be measured on small\r\nfiles, this is why we have the spike at 0). In overall we see a small\r\nimprovement in CPU, and there are some outliers in both directions,\r\nbut there are more drastic improvement cases than drastic regressions.\r\n\r\nOn the memory graph the situation is much more close to zero. There\r\nis one big positive memory outlier: `DsListComp.lhs`. It uses 69M\r\noriginally and now uses 103M. But compiles in 2 seconds both ways and\r\nthere are files in the source tree which requires 400M to compile, so\r\nthis is not an issue.\r\n\r\nAfter all this, I'd like to hear other opinions about just disabling\r\nthis optimization in `-O` and `-O2` and leaving it as an option that\r\ncan be turned on when needed, my reasons once more:\r\n - it's unsafe,\r\n - it's hard to debug when you hit its issues,\r\n - the optimization doesn't seem to be very productive,\r\n - it's always easy to force sharing, but it's not easy to force\r\n copying.\r\n\r\nApparently a Haskell programmer should be lazy, but never fully lazy. \r\n\r\nResearch done by Gergely Risko <errge> and Mihaly Barasz <klao>,\r\nconfirmed on two different machines with no other running processes.\r\n","type_of_failure":"OtherFailure","blocking":[]} -->8.0.1https://gitlab.haskell.org/ghc/ghc/-/issues/7367float-out causes extra allocation2019-07-07T18:50:04Zwurmlifloat-out causes extra allocationThe Haskell fannkuchredux (included in nofib as fannkuch-redux) contribution of Louis Wasserman to "The Computer Language Benchmarks Game" at shootout.alioth.debian.org times out on the amd64 machines, but not on the i386. I can reproduce it on my Debian amd64 machine.
It turns out that compiling without optimisation or with a simple -O produces a fast program, but with enormously large heap space allocated (10G compared with 67k on a virtual i386 machine) and also more garbage collector activity.
The source is below (because I don't find a way to attach the file). At the end of the source I copied my make command line, run command line and output produced with option -sstderr.
---------------------
```
{- The Computer Language Benchmarks Game
http://shootout.alioth.debian.org/
contributed by Louis Wasserman
This should be compiled with:
-threaded -O2 -fexcess-precision -fasm
and run with:
+RTS -N<number of cores> -RTS <input>
-}
import Control.Concurrent
import Control.Monad
import System.Environment
import Foreign hiding (rotate)
import Data.Monoid
type Perm = Ptr Word8
data F = F {-# UNPACK #-} !Int {-# UNPACK #-} !Int
instance Monoid F where
mempty = F 0 0
F s1 m1 `mappend` F s2 m2 = F (s1 + s2) (max m1 m2)
incPtr = (`advancePtr` 1)
decPtr = (`advancePtr` (-1))
flop :: Int -> Perm -> IO ()
flop k xs = flopp xs (xs `advancePtr` k)
where flopp i j = when (i < j) $ swap i j >> flopp (incPtr i) (decPtr j)
swap i j = do
a <- peek i
b <- peek j
poke j a
poke i b
flopS :: Perm -> (Int -> IO a) -> IO a
flopS !xs f = do
let go !acc = do
k <- peekElemOff xs 0
if k == 0 then f acc else flop (fromIntegral k) xs >> go (acc+1)
go 0
increment :: Ptr Word8 -> Ptr Word8 -> IO ()
increment !p !ct = do
first <- peekElemOff p 1
pokeElemOff p 1 =<< peekElemOff p 0
pokeElemOff p 0 first
let go !i !first = do
ci <- peekElemOff ct i
if fromIntegral ci < i then pokeElemOff ct i (ci+1) else do
pokeElemOff ct i 0
let !i' = i + 1
moveArray p (incPtr p) i'
pokeElemOff p i' first
go i' =<< peekElemOff p 0
go 1 first
genPermutations :: Int -> Int -> Int -> Ptr Word8 -> Ptr Word8 -> IO F
genPermutations !n !l !r !perm !count = allocaArray n $ \ destF -> do
let upd j !f run = do
p0 <- peekElemOff perm 0
if p0 == 0 then increment perm count >> run f else do
copyArray destF perm n
increment perm count
flopS destF $ \ flops ->
run (f `mappend` F (checksum j flops) flops)
let go j !f = if j >= r then return f else upd j f (go (j+1))
go l mempty
where checksum i f = if i .&. 1 == 0 then f else -f
facts :: [Int]
facts = scanl (*) 1 [1..12]
unrank :: Int -> Int -> (Ptr Word8 -> Ptr Word8 -> IO a) -> IO a
unrank !idx !n f = allocaArray n $ \ p -> allocaArray n $ \ count ->
allocaArray n $ \ pp -> do
mapM_ (\ i -> pokeElemOff p i (fromIntegral i)) [0..n-1]
let go i !idx = when (i >= 0) $ do
let fi = facts !! i
let (q, r) = idx `quotRem` fi
pokeElemOff count i (fromIntegral q)
copyArray pp p (i+1)
let go' j = when (j <= i) $ do
let jq = j + q
pokeElemOff p j =<< peekElemOff pp (if jq <= i then jq else jq - i - 1)
go' (j+1)
go' 0
go (i-1) r
go (n-1) idx
f p count
main = do
n <- fmap (read.head) getArgs
let fact = product [1..n]
let bk = fact `quot` 4
vars <- forM [0,bk..fact-1] $ \ ix -> do
var <- newEmptyMVar
forkIO (unrank ix n $ \ p -> genPermutations n ix (min fact (ix + bk)) p >=> putMVar var)
return var
F chksm mflops <- liftM mconcat (mapM takeMVar vars)
putStrLn $ (show chksm) ++ "\nPfannkuchen(" ++ (show n) ++ ") = " ++ (show $ mflops)
{-
wurmli@noah-nofen:~/hpw/haskell/work/fannkuch$ ghc --make
-XBangPatterns -O -threaded -fllvm -rtsopts fannkuchredux.ghc-3.hs
[1 of 1] Compiling Main ( fannkuchredux.ghc-3.hs, fannkuchredux.ghc-3.o )
Linking fannkuchredux.ghc-3 ...
wurmli@noah-nofen:~/hpw/haskell/work/fannkuch$ ./fannkuchredux.ghc-3 +RTS -N4 -sstderr -RTS 12
3968050
Pfannkuchen(12) = 65
10,538,122,952 bytes allocated in the heap
359,512 bytes copied during GC
47,184 bytes maximum residency (2 sample(s))
51,120 bytes maximum slop
4 MB total memory in use (1 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 6053 colls, 6053 par 0.16s 0.04s 0.0000s 0.0001s
Gen 1 2 colls, 1 par 0.00s 0.00s 0.0001s 0.0001s
Parallel GC work balance: 40.82% (serial 0%, perfect 100%)
TASKS: 6 (1 bound, 5 peak workers (5 total), using -N4)
SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)
INIT time 0.00s ( 0.00s elapsed)
MUT time 44.73s ( 11.51s elapsed)
GC time 0.16s ( 0.04s elapsed)
EXIT time 0.00s ( 0.00s elapsed)
Total time 44.89s ( 11.55s elapsed)
Alloc rate 235,589,887 bytes per MUT second
Productivity 99.6% of total user, 387.3% of total elapsed
gc_alloc_block_sync: 31024
whitehole_spin: 0
gen[0].sync: 0
gen[1].sync: 0
-}
```The Haskell fannkuchredux (included in nofib as fannkuch-redux) contribution of Louis Wasserman to "The Computer Language Benchmarks Game" at shootout.alioth.debian.org times out on the amd64 machines, but not on the i386. I can reproduce it on my Debian amd64 machine.
It turns out that compiling without optimisation or with a simple -O produces a fast program, but with enormously large heap space allocated (10G compared with 67k on a virtual i386 machine) and also more garbage collector activity.
The source is below (because I don't find a way to attach the file). At the end of the source I copied my make command line, run command line and output produced with option -sstderr.
---------------------
```
{- The Computer Language Benchmarks Game
http://shootout.alioth.debian.org/
contributed by Louis Wasserman
This should be compiled with:
-threaded -O2 -fexcess-precision -fasm
and run with:
+RTS -N<number of cores> -RTS <input>
-}
import Control.Concurrent
import Control.Monad
import System.Environment
import Foreign hiding (rotate)
import Data.Monoid
type Perm = Ptr Word8
data F = F {-# UNPACK #-} !Int {-# UNPACK #-} !Int
instance Monoid F where
mempty = F 0 0
F s1 m1 `mappend` F s2 m2 = F (s1 + s2) (max m1 m2)
incPtr = (`advancePtr` 1)
decPtr = (`advancePtr` (-1))
flop :: Int -> Perm -> IO ()
flop k xs = flopp xs (xs `advancePtr` k)
where flopp i j = when (i < j) $ swap i j >> flopp (incPtr i) (decPtr j)
swap i j = do
a <- peek i
b <- peek j
poke j a
poke i b
flopS :: Perm -> (Int -> IO a) -> IO a
flopS !xs f = do
let go !acc = do
k <- peekElemOff xs 0
if k == 0 then f acc else flop (fromIntegral k) xs >> go (acc+1)
go 0
increment :: Ptr Word8 -> Ptr Word8 -> IO ()
increment !p !ct = do
first <- peekElemOff p 1
pokeElemOff p 1 =<< peekElemOff p 0
pokeElemOff p 0 first
let go !i !first = do
ci <- peekElemOff ct i
if fromIntegral ci < i then pokeElemOff ct i (ci+1) else do
pokeElemOff ct i 0
let !i' = i + 1
moveArray p (incPtr p) i'
pokeElemOff p i' first
go i' =<< peekElemOff p 0
go 1 first
genPermutations :: Int -> Int -> Int -> Ptr Word8 -> Ptr Word8 -> IO F
genPermutations !n !l !r !perm !count = allocaArray n $ \ destF -> do
let upd j !f run = do
p0 <- peekElemOff perm 0
if p0 == 0 then increment perm count >> run f else do
copyArray destF perm n
increment perm count
flopS destF $ \ flops ->
run (f `mappend` F (checksum j flops) flops)
let go j !f = if j >= r then return f else upd j f (go (j+1))
go l mempty
where checksum i f = if i .&. 1 == 0 then f else -f
facts :: [Int]
facts = scanl (*) 1 [1..12]
unrank :: Int -> Int -> (Ptr Word8 -> Ptr Word8 -> IO a) -> IO a
unrank !idx !n f = allocaArray n $ \ p -> allocaArray n $ \ count ->
allocaArray n $ \ pp -> do
mapM_ (\ i -> pokeElemOff p i (fromIntegral i)) [0..n-1]
let go i !idx = when (i >= 0) $ do
let fi = facts !! i
let (q, r) = idx `quotRem` fi
pokeElemOff count i (fromIntegral q)
copyArray pp p (i+1)
let go' j = when (j <= i) $ do
let jq = j + q
pokeElemOff p j =<< peekElemOff pp (if jq <= i then jq else jq - i - 1)
go' (j+1)
go' 0
go (i-1) r
go (n-1) idx
f p count
main = do
n <- fmap (read.head) getArgs
let fact = product [1..n]
let bk = fact `quot` 4
vars <- forM [0,bk..fact-1] $ \ ix -> do
var <- newEmptyMVar
forkIO (unrank ix n $ \ p -> genPermutations n ix (min fact (ix + bk)) p >=> putMVar var)
return var
F chksm mflops <- liftM mconcat (mapM takeMVar vars)
putStrLn $ (show chksm) ++ "\nPfannkuchen(" ++ (show n) ++ ") = " ++ (show $ mflops)
{-
wurmli@noah-nofen:~/hpw/haskell/work/fannkuch$ ghc --make
-XBangPatterns -O -threaded -fllvm -rtsopts fannkuchredux.ghc-3.hs
[1 of 1] Compiling Main ( fannkuchredux.ghc-3.hs, fannkuchredux.ghc-3.o )
Linking fannkuchredux.ghc-3 ...
wurmli@noah-nofen:~/hpw/haskell/work/fannkuch$ ./fannkuchredux.ghc-3 +RTS -N4 -sstderr -RTS 12
3968050
Pfannkuchen(12) = 65
10,538,122,952 bytes allocated in the heap
359,512 bytes copied during GC
47,184 bytes maximum residency (2 sample(s))
51,120 bytes maximum slop
4 MB total memory in use (1 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 6053 colls, 6053 par 0.16s 0.04s 0.0000s 0.0001s
Gen 1 2 colls, 1 par 0.00s 0.00s 0.0001s 0.0001s
Parallel GC work balance: 40.82% (serial 0%, perfect 100%)
TASKS: 6 (1 bound, 5 peak workers (5 total), using -N4)
SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)
INIT time 0.00s ( 0.00s elapsed)
MUT time 44.73s ( 11.51s elapsed)
GC time 0.16s ( 0.04s elapsed)
EXIT time 0.00s ( 0.00s elapsed)
Total time 44.89s ( 11.55s elapsed)
Alloc rate 235,589,887 bytes per MUT second
Productivity 99.6% of total user, 387.3% of total elapsed
gc_alloc_block_sync: 31024
whitehole_spin: 0
gen[0].sync: 0
gen[1].sync: 0
-}
```Research neededhttps://gitlab.haskell.org/ghc/ghc/-/issues/7206Implement cheap build2020-04-16T16:54:41ZSimon Peyton JonesImplement cheap buildWe sometimes see stuff like this:
```
f n ps = let ys = [1..x]
in map (\zs. ys ++ zs) ps
```
You might think the `(++)` would fuse with the `[1..x]`, via foldr/build fusion, but it doesn't. Why not? Because it would be WRONG to do so in this case:
```
f ns ps = let ys = map expensive ns
in map (\zs. ys ++ zs) ps
```
If we fused the `(++)` with the `map` we might call `expensive` once for each element of `ps`.
This is fairly easy to fix. The point is that `[1..x]` is cheap; we'd prefer to fuse it even if doing so involves computing 1, 1+1, 2+1, etc multiple times. Suppose we express this fact thusly:
```
enumFromTo lo hi = cheapBuild (\cn. ....lo...hi...)
map f xs = build (\cn. ...f...xs...)
```
Now we want the `foldr/cheapBuild` rule to fire even if that would involve duplicating the call to `cheapBuild`. And we already have a way to do that: we make `cheapBuild` into a `CONLIKE` function.
Happily it's almost all simply a change to the libraries, not the compiler itself.
I just need to execute on this, but I keep failing to get round to it. Below is the beginning. One missing piece is that I need to replace the hack for `build` in the occurrence analyser, so that it works for `cheapBuild` too. (At least until we have Ilya's cardinality analyser.)
Simon
```
diff --git a/GHC/Base.lhs b/GHC/Base.lhs
index 6a36eb5..b78edf5 100644
--- a/GHC/Base.lhs
+++ b/GHC/Base.lhs
@@ -304,6 +304,12 @@ build :: forall a. (forall b. (a -> b -> b) -> b -> b) -> [a]
build g = g (:) []
+cheapBuild :: forall a. (forall b. (a -> b -> b) -> b -> b) -> [a]
+{-# INLINE CONLIKE [1] cheapBuild #-}
+-- cheapBuild is just like build, except that it is CONLIKE
+-- See Note [cheapBuild]
+cheapBuild g = g (:) []
+
-- | A list producer that can be fused with 'foldr'.
-- This function is merely
--
@@ -320,6 +326,8 @@ augment g xs = g (:) xs
{-# RULES
"fold/build" forall k z (g::forall b. (a->b->b) -> b -> b) .
foldr k z (build g) = g k z
+"fold/cheapBuild" forall k z (g::forall b. (a->b->b) -> b -> b) .
+ foldr k z (cheapBuild g) = g k z
"foldr/augment" forall k z xs (g::forall b. (a->b->b) -> b -> b) .
foldr k z (augment g xs) = g k (foldr k z xs)
@@ -343,6 +351,12 @@ augment g xs = g (:) xs
"augment/build" forall (g::forall b. (a->b->b) -> b -> b)
(h::forall b. (a->b->b) -> b -> b) .
augment g (build h) = build (\c n -> g c (h c n))
+
+"augment/cheapBuild" forall (g::forall b. (a->b->b) -> b -> b)
+ (h::forall b. (a->b->b) -> b -> b) .
+ augment g (cheapBuild h) = build (\c n -> g c (h c n))
+ -- 'augment' doesn't necessarily have a cheap argument, so we revert to 'build'
+
"augment/nil" forall (g::forall b. (a->b->b) -> b -> b) .
augment g [] = build g
#-}
@@ -351,6 +365,20 @@ augment g xs = g (:) xs
-- augment g (augment h t) = augment (\cn -> g c (h c n)) t
\end{code}
+Note [cheapBuild]
+~~~~~~~~~~~~~~~~~
+cheapBuild is just like build, except that it is CONLIKE
+
+It is used in situations where fusion is more imortant than sharing,
+ie in situation where its argument function 'g' in (cheapBuild g) is
+cheap.
+
+Main example: enumerations of one kind or another:
+ f x = let xs = [x..]
+ go = \y. ....go y'....(map (h y) xs)...
+ in ...
+Here we woud like to fuse the map with the [x..]
+
----------------------------------------------
-- map
@@ -831,7 +859,7 @@ a `iShiftRL#` b | b >=# WORD_SIZE_IN_BITS# = 0#
-- Rules for C strings (the functions themselves are now in GHC.CString)
{-# RULES
-"unpack" [~1] forall a . unpackCString# a = build (unpackFoldrCString# a)
+"unpack" [~1] forall a . unpackCString# a = cheapBuild (unpackFoldrCString# a)
"unpack-list" [1] forall a . unpackFoldrCString# a (:) [] = unpackCString# a
"unpack-append" forall a n . unpackFoldrCString# a (:) n = unpackAppendCString# a n
diff --git a/GHC/Enum.lhs b/GHC/Enum.lhs
index cea3ced..561a995 100644
--- a/GHC/Enum.lhs
+++ b/GHC/Enum.lhs
@@ -376,9 +376,9 @@ instance Enum Char where
enumFromThenTo (C# x1) (C# x2) (C# y) = efdtChar (ord# x1) (ord# x2) (ord# y)
{-# RULES
-"eftChar" [~1] forall x y. eftChar x y = build (\c n -> eftCharFB c n x y)
-"efdChar" [~1] forall x1 x2. efdChar x1 x2 = build (\ c n -> efdCharFB c n x1 x2)
-"efdtChar" [~1] forall x1 x2 l. efdtChar x1 x2 l = build (\ c n -> efdtCharFB c n x1 x2 l)
+"eftChar" [~1] forall x y. eftChar x y = cheapBuild (\c n -> eftCharFB c n x y)
+"efdChar" [~1] forall x1 x2. efdChar x1 x2 = cheapBuild (\ c n -> efdCharFB c n x1 x2)
+"efdtChar" [~1] forall x1 x2 l. efdtChar x1 x2 l = cheapBuild (\ c n -> efdtCharFB c n x1 x2 l)
"eftCharList" [1] eftCharFB (:) [] = eftChar
"efdCharList" [1] efdCharFB (:) [] = efdChar
"efdtCharList" [1] efdtCharFB (:) [] = efdtChar
@@ -510,7 +510,7 @@ instance Enum Int where
-- In particular, we have rules for deforestation
{-# RULES
-"eftInt" [~1] forall x y. eftInt x y = build (\ c n -> eftIntFB c n x y)
+"eftInt" [~1] forall x y. eftInt x y = cheapBuild (\ c n -> eftIntFB c n x y)
"eftIntList" [1] eftIntFB (:) [] = eftInt
#-}
@@ -539,7 +539,7 @@ eftIntFB c n x0 y | x0 ># y = n
{-# RULES
"efdtInt" [~1] forall x1 x2 y.
- efdtInt x1 x2 y = build (\ c n -> efdtIntFB c n x1 x2 y)
+ efdtInt x1 x2 y = cheapBuild (\ c n -> efdtIntFB c n x1 x2 y)
"efdtIntUpList" [1] efdtIntFB (:) [] = efdtInt
#-}
@@ -646,8 +646,8 @@ instance Enum Integer where
enumFromThenTo x y lim = enumDeltaToInteger x (y-x) lim
{-# RULES
-"enumDeltaInteger" [~1] forall x y. enumDeltaInteger x y = build (\c _ -> enumDeltaIntegerFB c x y)
-"efdtInteger" [~1] forall x y l.enumDeltaToInteger x y l = build (\c n -> enumDeltaToIntegerFB c n x y l)
+"enumDeltaInteger" [~1] forall x y. enumDeltaInteger x y = cheapBuild (\c _ -> enumDeltaIntegerFB c x y)
+"efdtInteger" [~1] forall x y l.enumDeltaToInteger x y l = cheapBuild (\c n -> enumDeltaToIntegerFB c n x y l)
"enumDeltaInteger" [1] enumDeltaIntegerFB (:) = enumDeltaInteger
"enumDeltaToInteger" [1] enumDeltaToIntegerFB (:) [] = enumDeltaToInteger
#-}
```
<details><summary>Trac metadata</summary>
| Trac field | Value |
| ---------------------- | ------------ |
| Version | 7.4.2 |
| Type | Bug |
| TypeOfFailure | OtherFailure |
| Priority | normal |
| Resolution | Unresolved |
| Component | Compiler |
| Test case | |
| Differential revisions | |
| BlockedBy | |
| Related | |
| Blocking | |
| CC | |
| Operating system | |
| Architecture | |
</details>
<!-- {"blocked_by":[],"summary":"Implement cheap build","status":"New","operating_system":"","component":"Compiler","related":[],"milestone":"","resolution":"Unresolved","owner":{"tag":"Unowned"},"version":"7.4.2","keywords":[],"differentials":[],"test_case":"","architecture":"","cc":[""],"type":"Bug","description":"We sometimes see stuff like this:\r\n{{{\r\nf n ps = let ys = [1..x]\r\n in map (\\zs. ys ++ zs) ps\r\n}}}\r\nYou might think the `(++)` would fuse with the `[1..x]`, via foldr/build fusion, but it doesn't. Why not? Because it would be WRONG to do so in this case:\r\n{{{\r\nf ns ps = let ys = map expensive ns\r\n in map (\\zs. ys ++ zs) ps\r\n}}}\r\nIf we fused the `(++)` with the `map` we might call `expensive` once for each element of `ps`.\r\n\r\nThis is fairly easy to fix. The point is that `[1..x]` is cheap; we'd prefer to fuse it even if doing so involves computing 1, 1+1, 2+1, etc multiple times. Suppose we express this fact thusly:\r\n{{{\r\nenumFromTo lo hi = cheapBuild (\\cn. ....lo...hi...)\r\nmap f xs = build (\\cn. ...f...xs...)\r\n}}}\r\nNow we want the `foldr/cheapBuild` rule to fire even if that would involve duplicating the call to `cheapBuild`. And we already have a way to do that: we make `cheapBuild` into a `CONLIKE` function.\r\n\r\nHappily it's almost all simply a change to the libraries, not the compiler itself.\r\n\r\nI just need to execute on this, but I keep failing to get round to it. Below is the beginning. One missing piece is that I need to replace the hack for `build` in the occurrence analyser, so that it works for `cheapBuild` too. (At least until we have Ilya's cardinality analyser.)\r\n\r\nSimon\r\n\r\n{{{\r\ndiff --git a/GHC/Base.lhs b/GHC/Base.lhs\r\nindex 6a36eb5..b78edf5 100644\r\n--- a/GHC/Base.lhs\r\n+++ b/GHC/Base.lhs\r\n@@ -304,6 +304,12 @@ build :: forall a. (forall b. (a -> b -> b) -> b -> b) -> [a]\r\n\r\n build g = g (:) []\r\n\r\n+cheapBuild :: forall a. (forall b. (a -> b -> b) -> b -> b) -> [a]\r\n+{-# INLINE CONLIKE [1] cheapBuild #-}\r\n+-- cheapBuild is just like build, except that it is CONLIKE\r\n+-- See Note [cheapBuild]\r\n+cheapBuild g = g (:) []\r\n+\r\n-- | A list producer that can be fused with 'foldr'.\r\n-- This function is merely\r\n--\r\n@@ -320,6 +326,8 @@ augment g xs = g (:) xs\r\n{-# RULES\r\n\"fold/build\" forall k z (g::forall b. (a->b->b) -> b -> b) . \r\n foldr k z (build g) = g k z\r\n+\"fold/cheapBuild\" forall k z (g::forall b. (a->b->b) -> b -> b) . \r\n+ foldr k z (cheapBuild g) = g k z\r\n\r\n \"foldr/augment\" forall k z xs (g::forall b. (a->b->b) -> b -> b) . \r\n foldr k z (augment g xs) = g k (foldr k z xs)\r\n@@ -343,6 +351,12 @@ augment g xs = g (:) xs\r\n\"augment/build\" forall (g::forall b. (a->b->b) -> b -> b)\r\n (h::forall b. (a->b->b) -> b -> b) .\r\n augment g (build h) = build (\\c n -> g c (h c n))\r\n+\r\n+\"augment/cheapBuild\" forall (g::forall b. (a->b->b) -> b -> b)\r\n+ (h::forall b. (a->b->b) -> b -> b) .\r\n+ augment g (cheapBuild h) = build (\\c n -> g c (h c n))\r\n+ -- 'augment' doesn't necessarily have a cheap argument, so we revert to 'build'\r\n+\r\n\"augment/nil\" forall (g::forall b. (a->b->b) -> b -> b) .\r\n augment g [] = build g\r\n #-}\r\n@@ -351,6 +365,20 @@ augment g xs = g (:) xs\r\n-- augment g (augment h t) = augment (\\cn -> g c (h c n)) t\r\n\\end{code}\r\n\r\n+Note [cheapBuild]\r\n+~~~~~~~~~~~~~~~~~\r\n+cheapBuild is just like build, except that it is CONLIKE\r\n+\r\n+It is used in situations where fusion is more imortant than sharing,\r\n+ie in situation where its argument function 'g' in (cheapBuild g) is\r\n+cheap.\r\n+\r\n+Main example: enumerations of one kind or another:\r\n+ f x = let xs = [x..] \r\n+ go = \\y. ....go y'....(map (h y) xs)...\r\n+ in ...\r\n+Here we woud like to fuse the map with the [x..]\r\n+\r\n\r\n ----------------------------------------------\r\n-- map \r\n@@ -831,7 +859,7 @@ a `iShiftRL#` b | b >=# WORD_SIZE_IN_BITS# = 0#\r\n\r\n -- Rules for C strings (the functions themselves are now in GHC.CString)\r\n{-# RULES\r\n-\"unpack\" [~1] forall a . unpackCString# a = build (unpackFoldrCString# a)\r\n+\"unpack\" [~1] forall a . unpackCString# a = cheapBuild (unpackFoldrCString# a)\r\n\"unpack-list\" [1] forall a . unpackFoldrCString# a (:) [] = unpackCString# a\r\n\"unpack-append\" forall a n . unpackFoldrCString# a (:) n = unpackAppendCString# a n\r\n\r\ndiff --git a/GHC/Enum.lhs b/GHC/Enum.lhs\r\nindex cea3ced..561a995 100644\r\n--- a/GHC/Enum.lhs\r\n+++ b/GHC/Enum.lhs\r\n@@ -376,9 +376,9 @@ instance Enum Char where\r\n enumFromThenTo (C# x1) (C# x2) (C# y) = efdtChar (ord# x1) (ord# x2) (ord# y)\r\n\r\n {-# RULES\r\n-\"eftChar\" [~1] forall x y. eftChar x y = build (\\c n -> eftCharFB c n x y)\r\n-\"efdChar\" [~1] forall x1 x2. efdChar x1 x2 = build (\\ c n -> efdCharFB c n x1 x2)\r\n-\"efdtChar\" [~1] forall x1 x2 l. efdtChar x1 x2 l = build (\\ c n -> efdtCharFB c n x1 x2 l)\r\n+\"eftChar\" [~1] forall x y. eftChar x y = cheapBuild (\\c n -> eftCharFB c n x y)\r\n+\"efdChar\" [~1] forall x1 x2. efdChar x1 x2 = cheapBuild (\\ c n -> efdCharFB c n x1 x2)\r\n+\"efdtChar\" [~1] forall x1 x2 l. efdtChar x1 x2 l = cheapBuild (\\ c n -> efdtCharFB c n x1 x2 l)\r\n\"eftCharList\" [1] eftCharFB (:) [] = eftChar\r\n\"efdCharList\" [1] efdCharFB (:) [] = efdChar\r\n\"efdtCharList\" [1] efdtCharFB (:) [] = efdtChar\r\n@@ -510,7 +510,7 @@ instance Enum Int where\r\n-- In particular, we have rules for deforestation\r\n\r\n {-# RULES\r\n-\"eftInt\" [~1] forall x y. eftInt x y = build (\\ c n -> eftIntFB c n x y)\r\n+\"eftInt\" [~1] forall x y. eftInt x y = cheapBuild (\\ c n -> eftIntFB c n x y)\r\n\"eftIntList\" [1] eftIntFB (:) [] = eftInt\r\n #-}\r\n\r\n@@ -539,7 +539,7 @@ eftIntFB c n x0 y | x0 ># y = n\r\n\r\n {-# RULES\r\n\"efdtInt\" [~1] forall x1 x2 y.\r\n- efdtInt x1 x2 y = build (\\ c n -> efdtIntFB c n x1 x2 y)\r\n+ efdtInt x1 x2 y = cheapBuild (\\ c n -> efdtIntFB c n x1 x2 y)\r\n\"efdtIntUpList\" [1] efdtIntFB (:) [] = efdtInt\r\n #-}\r\n\r\n@@ -646,8 +646,8 @@ instance Enum Integer where\r\n enumFromThenTo x y lim = enumDeltaToInteger x (y-x) lim\r\n\r\n {-# RULES\r\n-\"enumDeltaInteger\" [~1] forall x y. enumDeltaInteger x y = build (\\c _ -> enumDeltaIntegerFB c x y)\r\n-\"efdtInteger\" [~1] forall x y l.enumDeltaToInteger x y l = build (\\c n -> enumDeltaToIntegerFB c n x y l)\r\n+\"enumDeltaInteger\" [~1] forall x y. enumDeltaInteger x y = cheapBuild (\\c _ -> enumDeltaIntegerFB c x y)\r\n+\"efdtInteger\" [~1] forall x y l.enumDeltaToInteger x y l = cheapBuild (\\c n -> enumDeltaToIntegerFB c n x y l)\r\n\"enumDeltaInteger\" [1] enumDeltaIntegerFB (:) = enumDeltaInteger\r\n\"enumDeltaToInteger\" [1] enumDeltaToIntegerFB (:) [] = enumDeltaToInteger\r\n #-}\r\n}}}","type_of_failure":"OtherFailure","blocking":[]} -->We sometimes see stuff like this:
```
f n ps = let ys = [1..x]
in map (\zs. ys ++ zs) ps
```
You might think the `(++)` would fuse with the `[1..x]`, via foldr/build fusion, but it doesn't. Why not? Because it would be WRONG to do so in this case:
```
f ns ps = let ys = map expensive ns
in map (\zs. ys ++ zs) ps
```
If we fused the `(++)` with the `map` we might call `expensive` once for each element of `ps`.
This is fairly easy to fix. The point is that `[1..x]` is cheap; we'd prefer to fuse it even if doing so involves computing 1, 1+1, 2+1, etc multiple times. Suppose we express this fact thusly:
```
enumFromTo lo hi = cheapBuild (\cn. ....lo...hi...)
map f xs = build (\cn. ...f...xs...)
```
Now we want the `foldr/cheapBuild` rule to fire even if that would involve duplicating the call to `cheapBuild`. And we already have a way to do that: we make `cheapBuild` into a `CONLIKE` function.
Happily it's almost all simply a change to the libraries, not the compiler itself.
I just need to execute on this, but I keep failing to get round to it. Below is the beginning. One missing piece is that I need to replace the hack for `build` in the occurrence analyser, so that it works for `cheapBuild` too. (At least until we have Ilya's cardinality analyser.)
Simon
```
diff --git a/GHC/Base.lhs b/GHC/Base.lhs
index 6a36eb5..b78edf5 100644
--- a/GHC/Base.lhs
+++ b/GHC/Base.lhs
@@ -304,6 +304,12 @@ build :: forall a. (forall b. (a -> b -> b) -> b -> b) -> [a]
build g = g (:) []
+cheapBuild :: forall a. (forall b. (a -> b -> b) -> b -> b) -> [a]
+{-# INLINE CONLIKE [1] cheapBuild #-}
+-- cheapBuild is just like build, except that it is CONLIKE
+-- See Note [cheapBuild]
+cheapBuild g = g (:) []
+
-- | A list producer that can be fused with 'foldr'.
-- This function is merely
--
@@ -320,6 +326,8 @@ augment g xs = g (:) xs
{-# RULES
"fold/build" forall k z (g::forall b. (a->b->b) -> b -> b) .
foldr k z (build g) = g k z
+"fold/cheapBuild" forall k z (g::forall b. (a->b->b) -> b -> b) .
+ foldr k z (cheapBuild g) = g k z
"foldr/augment" forall k z xs (g::forall b. (a->b->b) -> b -> b) .
foldr k z (augment g xs) = g k (foldr k z xs)
@@ -343,6 +351,12 @@ augment g xs = g (:) xs
"augment/build" forall (g::forall b. (a->b->b) -> b -> b)
(h::forall b. (a->b->b) -> b -> b) .
augment g (build h) = build (\c n -> g c (h c n))
+
+"augment/cheapBuild" forall (g::forall b. (a->b->b) -> b -> b)
+ (h::forall b. (a->b->b) -> b -> b) .
+ augment g (cheapBuild h) = build (\c n -> g c (h c n))
+ -- 'augment' doesn't necessarily have a cheap argument, so we revert to 'build'
+
"augment/nil" forall (g::forall b. (a->b->b) -> b -> b) .
augment g [] = build g
#-}
@@ -351,6 +365,20 @@ augment g xs = g (:) xs
-- augment g (augment h t) = augment (\cn -> g c (h c n)) t
\end{code}
+Note [cheapBuild]
+~~~~~~~~~~~~~~~~~
+cheapBuild is just like build, except that it is CONLIKE
+
+It is used in situations where fusion is more imortant than sharing,
+ie in situation where its argument function 'g' in (cheapBuild g) is
+cheap.
+
+Main example: enumerations of one kind or another:
+ f x = let xs = [x..]
+ go = \y. ....go y'....(map (h y) xs)...
+ in ...
+Here we woud like to fuse the map with the [x..]
+
----------------------------------------------
-- map
@@ -831,7 +859,7 @@ a `iShiftRL#` b | b >=# WORD_SIZE_IN_BITS# = 0#
-- Rules for C strings (the functions themselves are now in GHC.CString)
{-# RULES
-"unpack" [~1] forall a . unpackCString# a = build (unpackFoldrCString# a)
+"unpack" [~1] forall a . unpackCString# a = cheapBuild (unpackFoldrCString# a)
"unpack-list" [1] forall a . unpackFoldrCString# a (:) [] = unpackCString# a
"unpack-append" forall a n . unpackFoldrCString# a (:) n = unpackAppendCString# a n
diff --git a/GHC/Enum.lhs b/GHC/Enum.lhs
index cea3ced..561a995 100644
--- a/GHC/Enum.lhs
+++ b/GHC/Enum.lhs
@@ -376,9 +376,9 @@ instance Enum Char where
enumFromThenTo (C# x1) (C# x2) (C# y) = efdtChar (ord# x1) (ord# x2) (ord# y)
{-# RULES
-"eftChar" [~1] forall x y. eftChar x y = build (\c n -> eftCharFB c n x y)
-"efdChar" [~1] forall x1 x2. efdChar x1 x2 = build (\ c n -> efdCharFB c n x1 x2)
-"efdtChar" [~1] forall x1 x2 l. efdtChar x1 x2 l = build (\ c n -> efdtCharFB c n x1 x2 l)
+"eftChar" [~1] forall x y. eftChar x y = cheapBuild (\c n -> eftCharFB c n x y)
+"efdChar" [~1] forall x1 x2. efdChar x1 x2 = cheapBuild (\ c n -> efdCharFB c n x1 x2)
+"efdtChar" [~1] forall x1 x2 l. efdtChar x1 x2 l = cheapBuild (\ c n -> efdtCharFB c n x1 x2 l)
"eftCharList" [1] eftCharFB (:) [] = eftChar
"efdCharList" [1] efdCharFB (:) [] = efdChar
"efdtCharList" [1] efdtCharFB (:) [] = efdtChar
@@ -510,7 +510,7 @@ instance Enum Int where
-- In particular, we have rules for deforestation
{-# RULES
-"eftInt" [~1] forall x y. eftInt x y = build (\ c n -> eftIntFB c n x y)
+"eftInt" [~1] forall x y. eftInt x y = cheapBuild (\ c n -> eftIntFB c n x y)
"eftIntList" [1] eftIntFB (:) [] = eftInt
#-}
@@ -539,7 +539,7 @@ eftIntFB c n x0 y | x0 ># y = n
{-# RULES
"efdtInt" [~1] forall x1 x2 y.
- efdtInt x1 x2 y = build (\ c n -> efdtIntFB c n x1 x2 y)
+ efdtInt x1 x2 y = cheapBuild (\ c n -> efdtIntFB c n x1 x2 y)
"efdtIntUpList" [1] efdtIntFB (:) [] = efdtInt
#-}
@@ -646,8 +646,8 @@ instance Enum Integer where
enumFromThenTo x y lim = enumDeltaToInteger x (y-x) lim
{-# RULES
-"enumDeltaInteger" [~1] forall x y. enumDeltaInteger x y = build (\c _ -> enumDeltaIntegerFB c x y)
-"efdtInteger" [~1] forall x y l.enumDeltaToInteger x y l = build (\c n -> enumDeltaToIntegerFB c n x y l)
+"enumDeltaInteger" [~1] forall x y. enumDeltaInteger x y = cheapBuild (\c _ -> enumDeltaIntegerFB c x y)
+"efdtInteger" [~1] forall x y l.enumDeltaToInteger x y l = cheapBuild (\c n -> enumDeltaToIntegerFB c n x y l)
"enumDeltaInteger" [1] enumDeltaIntegerFB (:) = enumDeltaInteger
"enumDeltaToInteger" [1] enumDeltaToIntegerFB (:) [] = enumDeltaToInteger
#-}
```
<details><summary>Trac metadata</summary>
| Trac field | Value |
| ---------------------- | ------------ |
| Version | 7.4.2 |
| Type | Bug |
| TypeOfFailure | OtherFailure |
| Priority | normal |
| Resolution | Unresolved |
| Component | Compiler |
| Test case | |
| Differential revisions | |
| BlockedBy | |
| Related | |
| Blocking | |
| CC | |
| Operating system | |
| Architecture | |
</details>
<!-- {"blocked_by":[],"summary":"Implement cheap build","status":"New","operating_system":"","component":"Compiler","related":[],"milestone":"","resolution":"Unresolved","owner":{"tag":"Unowned"},"version":"7.4.2","keywords":[],"differentials":[],"test_case":"","architecture":"","cc":[""],"type":"Bug","description":"We sometimes see stuff like this:\r\n{{{\r\nf n ps = let ys = [1..x]\r\n in map (\\zs. ys ++ zs) ps\r\n}}}\r\nYou might think the `(++)` would fuse with the `[1..x]`, via foldr/build fusion, but it doesn't. Why not? Because it would be WRONG to do so in this case:\r\n{{{\r\nf ns ps = let ys = map expensive ns\r\n in map (\\zs. ys ++ zs) ps\r\n}}}\r\nIf we fused the `(++)` with the `map` we might call `expensive` once for each element of `ps`.\r\n\r\nThis is fairly easy to fix. The point is that `[1..x]` is cheap; we'd prefer to fuse it even if doing so involves computing 1, 1+1, 2+1, etc multiple times. Suppose we express this fact thusly:\r\n{{{\r\nenumFromTo lo hi = cheapBuild (\\cn. ....lo...hi...)\r\nmap f xs = build (\\cn. ...f...xs...)\r\n}}}\r\nNow we want the `foldr/cheapBuild` rule to fire even if that would involve duplicating the call to `cheapBuild`. And we already have a way to do that: we make `cheapBuild` into a `CONLIKE` function.\r\n\r\nHappily it's almost all simply a change to the libraries, not the compiler itself.\r\n\r\nI just need to execute on this, but I keep failing to get round to it. Below is the beginning. One missing piece is that I need to replace the hack for `build` in the occurrence analyser, so that it works for `cheapBuild` too. (At least until we have Ilya's cardinality analyser.)\r\n\r\nSimon\r\n\r\n{{{\r\ndiff --git a/GHC/Base.lhs b/GHC/Base.lhs\r\nindex 6a36eb5..b78edf5 100644\r\n--- a/GHC/Base.lhs\r\n+++ b/GHC/Base.lhs\r\n@@ -304,6 +304,12 @@ build :: forall a. (forall b. (a -> b -> b) -> b -> b) -> [a]\r\n\r\n build g = g (:) []\r\n\r\n+cheapBuild :: forall a. (forall b. (a -> b -> b) -> b -> b) -> [a]\r\n+{-# INLINE CONLIKE [1] cheapBuild #-}\r\n+-- cheapBuild is just like build, except that it is CONLIKE\r\n+-- See Note [cheapBuild]\r\n+cheapBuild g = g (:) []\r\n+\r\n-- | A list producer that can be fused with 'foldr'.\r\n-- This function is merely\r\n--\r\n@@ -320,6 +326,8 @@ augment g xs = g (:) xs\r\n{-# RULES\r\n\"fold/build\" forall k z (g::forall b. (a->b->b) -> b -> b) . \r\n foldr k z (build g) = g k z\r\n+\"fold/cheapBuild\" forall k z (g::forall b. (a->b->b) -> b -> b) . \r\n+ foldr k z (cheapBuild g) = g k z\r\n\r\n \"foldr/augment\" forall k z xs (g::forall b. (a->b->b) -> b -> b) . \r\n foldr k z (augment g xs) = g k (foldr k z xs)\r\n@@ -343,6 +351,12 @@ augment g xs = g (:) xs\r\n\"augment/build\" forall (g::forall b. (a->b->b) -> b -> b)\r\n (h::forall b. (a->b->b) -> b -> b) .\r\n augment g (build h) = build (\\c n -> g c (h c n))\r\n+\r\n+\"augment/cheapBuild\" forall (g::forall b. (a->b->b) -> b -> b)\r\n+ (h::forall b. (a->b->b) -> b -> b) .\r\n+ augment g (cheapBuild h) = build (\\c n -> g c (h c n))\r\n+ -- 'augment' doesn't necessarily have a cheap argument, so we revert to 'build'\r\n+\r\n\"augment/nil\" forall (g::forall b. (a->b->b) -> b -> b) .\r\n augment g [] = build g\r\n #-}\r\n@@ -351,6 +365,20 @@ augment g xs = g (:) xs\r\n-- augment g (augment h t) = augment (\\cn -> g c (h c n)) t\r\n\\end{code}\r\n\r\n+Note [cheapBuild]\r\n+~~~~~~~~~~~~~~~~~\r\n+cheapBuild is just like build, except that it is CONLIKE\r\n+\r\n+It is used in situations where fusion is more imortant than sharing,\r\n+ie in situation where its argument function 'g' in (cheapBuild g) is\r\n+cheap.\r\n+\r\n+Main example: enumerations of one kind or another:\r\n+ f x = let xs = [x..] \r\n+ go = \\y. ....go y'....(map (h y) xs)...\r\n+ in ...\r\n+Here we woud like to fuse the map with the [x..]\r\n+\r\n\r\n ----------------------------------------------\r\n-- map \r\n@@ -831,7 +859,7 @@ a `iShiftRL#` b | b >=# WORD_SIZE_IN_BITS# = 0#\r\n\r\n -- Rules for C strings (the functions themselves are now in GHC.CString)\r\n{-# RULES\r\n-\"unpack\" [~1] forall a . unpackCString# a = build (unpackFoldrCString# a)\r\n+\"unpack\" [~1] forall a . unpackCString# a = cheapBuild (unpackFoldrCString# a)\r\n\"unpack-list\" [1] forall a . unpackFoldrCString# a (:) [] = unpackCString# a\r\n\"unpack-append\" forall a n . unpackFoldrCString# a (:) n = unpackAppendCString# a n\r\n\r\ndiff --git a/GHC/Enum.lhs b/GHC/Enum.lhs\r\nindex cea3ced..561a995 100644\r\n--- a/GHC/Enum.lhs\r\n+++ b/GHC/Enum.lhs\r\n@@ -376,9 +376,9 @@ instance Enum Char where\r\n enumFromThenTo (C# x1) (C# x2) (C# y) = efdtChar (ord# x1) (ord# x2) (ord# y)\r\n\r\n {-# RULES\r\n-\"eftChar\" [~1] forall x y. eftChar x y = build (\\c n -> eftCharFB c n x y)\r\n-\"efdChar\" [~1] forall x1 x2. efdChar x1 x2 = build (\\ c n -> efdCharFB c n x1 x2)\r\n-\"efdtChar\" [~1] forall x1 x2 l. efdtChar x1 x2 l = build (\\ c n -> efdtCharFB c n x1 x2 l)\r\n+\"eftChar\" [~1] forall x y. eftChar x y = cheapBuild (\\c n -> eftCharFB c n x y)\r\n+\"efdChar\" [~1] forall x1 x2. efdChar x1 x2 = cheapBuild (\\ c n -> efdCharFB c n x1 x2)\r\n+\"efdtChar\" [~1] forall x1 x2 l. efdtChar x1 x2 l = cheapBuild (\\ c n -> efdtCharFB c n x1 x2 l)\r\n\"eftCharList\" [1] eftCharFB (:) [] = eftChar\r\n\"efdCharList\" [1] efdCharFB (:) [] = efdChar\r\n\"efdtCharList\" [1] efdtCharFB (:) [] = efdtChar\r\n@@ -510,7 +510,7 @@ instance Enum Int where\r\n-- In particular, we have rules for deforestation\r\n\r\n {-# RULES\r\n-\"eftInt\" [~1] forall x y. eftInt x y = build (\\ c n -> eftIntFB c n x y)\r\n+\"eftInt\" [~1] forall x y. eftInt x y = cheapBuild (\\ c n -> eftIntFB c n x y)\r\n\"eftIntList\" [1] eftIntFB (:) [] = eftInt\r\n #-}\r\n\r\n@@ -539,7 +539,7 @@ eftIntFB c n x0 y | x0 ># y = n\r\n\r\n {-# RULES\r\n\"efdtInt\" [~1] forall x1 x2 y.\r\n- efdtInt x1 x2 y = build (\\ c n -> efdtIntFB c n x1 x2 y)\r\n+ efdtInt x1 x2 y = cheapBuild (\\ c n -> efdtIntFB c n x1 x2 y)\r\n\"efdtIntUpList\" [1] efdtIntFB (:) [] = efdtInt\r\n #-}\r\n\r\n@@ -646,8 +646,8 @@ instance Enum Integer where\r\n enumFromThenTo x y lim = enumDeltaToInteger x (y-x) lim\r\n\r\n {-# RULES\r\n-\"enumDeltaInteger\" [~1] forall x y. enumDeltaInteger x y = build (\\c _ -> enumDeltaIntegerFB c x y)\r\n-\"efdtInteger\" [~1] forall x y l.enumDeltaToInteger x y l = build (\\c n -> enumDeltaToIntegerFB c n x y l)\r\n+\"enumDeltaInteger\" [~1] forall x y. enumDeltaInteger x y = cheapBuild (\\c _ -> enumDeltaIntegerFB c x y)\r\n+\"efdtInteger\" [~1] forall x y l.enumDeltaToInteger x y l = cheapBuild (\\c n -> enumDeltaToIntegerFB c n x y l)\r\n\"enumDeltaInteger\" [1] enumDeltaIntegerFB (:) = enumDeltaInteger\r\n\"enumDeltaToInteger\" [1] enumDeltaToIntegerFB (:) [] = enumDeltaToInteger\r\n #-}\r\n}}}","type_of_failure":"OtherFailure","blocking":[]} -->Simon Peyton JonesSimon Peyton Joneshttps://gitlab.haskell.org/ghc/ghc/-/issues/917-O introduces space leak2019-07-07T19:16:18Zclaus.reinke@talk21.com-O introduces space leakconsider the following variant of a popular space problem
```
initlast :: (()->[a]) -> ([a], a)
initlast xs = (init (xs ()), last (xs ()))
main = print $ case initlast (\()->[0..1000000000]) of
(init, last) -> (length init, last)
```
the long list has been wrapped in abstractions to avoid
sharing, gaining constant space processing rather than
the space leak in the original code - see
http://www.haskell.org/pipermail/haskell-cafe/2006-September/018447.html
http://www.haskell.org/pipermail/haskell-cafe/2006-September/018464.html
this works nicely if compiled without -O, but unfortunately,
-O reintroduces the space leak (apparently lifting and sharing
the common and space-expensive subexpression).
1. I didn't see a ticket for this issue, so I wanted to record the example
1. avoiding sharing isn't always possible, so it would be nice if one could
"bypass" sharing in such cases. in the old g-machine, that might have
been as simple as introducing a pragma that tells the compiler to omit
the update after the eval in the code for some subexpression (so the
original application node would not be overwritten by the space-expensive
result, trading time for space). is there a chance for a similar trick
in GHC? so one might write code like this (handwaving:-):
```
initlast :: [a] -> ([a], a)
initlast xs = (init ({-# COPY #-} xs), last ({-# COPY #-} xs))
main = print $ case initlast [0..1000000000] of
(init, last) -> (length init, last)
```
<details><summary>Trac metadata</summary>
| Trac field | Value |
| ---------------------- | ------------ |
| Version | 6.5 |
| Type | Bug |
| TypeOfFailure | OtherFailure |
| Priority | normal |
| Resolution | Unresolved |
| Component | Compiler |
| Test case | |
| Differential revisions | |
| BlockedBy | |
| Related | |
| Blocking | |
| CC | |
| Operating system | Unknown |
| Architecture | Unknown |
</details>
<!-- {"blocked_by":[],"summary":"-O introduces space leak","status":"New","operating_system":"Unknown","component":"Compiler","related":[],"milestone":"","resolution":"Unresolved","owner":{"tag":"Unowned"},"version":"6.5","keywords":[],"differentials":[],"test_case":"","architecture":"Unknown","cc":[""],"type":"Bug","description":"consider the following variant of a popular space problem\r\n{{{\r\ninitlast :: (()->[a]) -> ([a], a)\r\ninitlast xs = (init (xs ()), last (xs ()))\r\n\r\nmain = print $ case initlast (\\()->[0..1000000000]) of\r\n (init, last) -> (length init, last)\r\n}}}\r\n\r\nthe long list has been wrapped in abstractions to avoid\r\nsharing, gaining constant space processing rather than\r\nthe space leak in the original code - see\r\n\r\nhttp://www.haskell.org/pipermail/haskell-cafe/2006-September/018447.html\r\n\r\nhttp://www.haskell.org/pipermail/haskell-cafe/2006-September/018464.html\r\n\r\nthis works nicely if compiled without -O, but unfortunately,\r\n-O reintroduces the space leak (apparently lifting and sharing\r\nthe common and space-expensive subexpression).\r\n\r\n1. I didn't see a ticket for this issue, so I wanted to record the example\r\n\r\n2. avoiding sharing isn't always possible, so it would be nice if one could \r\n\"bypass\" sharing in such cases. in the old g-machine, that might have \r\nbeen as simple as introducing a pragma that tells the compiler to omit\r\nthe update after the eval in the code for some subexpression (so the\r\noriginal application node would not be overwritten by the space-expensive\r\nresult, trading time for space). is there a chance for a similar trick\r\nin GHC? so one might write code like this (handwaving:-):\r\n{{{\r\ninitlast :: [a] -> ([a], a)\r\ninitlast xs = (init ({-# COPY #-} xs), last ({-# COPY #-} xs))\r\n\r\nmain = print $ case initlast [0..1000000000] of\r\n (init, last) -> (length init, last)\r\n}}}","type_of_failure":"OtherFailure","blocking":[]} -->consider the following variant of a popular space problem
```
initlast :: (()->[a]) -> ([a], a)
initlast xs = (init (xs ()), last (xs ()))
main = print $ case initlast (\()->[0..1000000000]) of
(init, last) -> (length init, last)
```
the long list has been wrapped in abstractions to avoid
sharing, gaining constant space processing rather than
the space leak in the original code - see
http://www.haskell.org/pipermail/haskell-cafe/2006-September/018447.html
http://www.haskell.org/pipermail/haskell-cafe/2006-September/018464.html
this works nicely if compiled without -O, but unfortunately,
-O reintroduces the space leak (apparently lifting and sharing
the common and space-expensive subexpression).
1. I didn't see a ticket for this issue, so I wanted to record the example
1. avoiding sharing isn't always possible, so it would be nice if one could
"bypass" sharing in such cases. in the old g-machine, that might have
been as simple as introducing a pragma that tells the compiler to omit
the update after the eval in the code for some subexpression (so the
original application node would not be overwritten by the space-expensive
result, trading time for space). is there a chance for a similar trick
in GHC? so one might write code like this (handwaving:-):
```
initlast :: [a] -> ([a], a)
initlast xs = (init ({-# COPY #-} xs), last ({-# COPY #-} xs))
main = print $ case initlast [0..1000000000] of
(init, last) -> (length init, last)
```
<details><summary>Trac metadata</summary>
| Trac field | Value |
| ---------------------- | ------------ |
| Version | 6.5 |
| Type | Bug |
| TypeOfFailure | OtherFailure |
| Priority | normal |
| Resolution | Unresolved |
| Component | Compiler |
| Test case | |
| Differential revisions | |
| BlockedBy | |
| Related | |
| Blocking | |
| CC | |
| Operating system | Unknown |
| Architecture | Unknown |
</details>
<!-- {"blocked_by":[],"summary":"-O introduces space leak","status":"New","operating_system":"Unknown","component":"Compiler","related":[],"milestone":"","resolution":"Unresolved","owner":{"tag":"Unowned"},"version":"6.5","keywords":[],"differentials":[],"test_case":"","architecture":"Unknown","cc":[""],"type":"Bug","description":"consider the following variant of a popular space problem\r\n{{{\r\ninitlast :: (()->[a]) -> ([a], a)\r\ninitlast xs = (init (xs ()), last (xs ()))\r\n\r\nmain = print $ case initlast (\\()->[0..1000000000]) of\r\n (init, last) -> (length init, last)\r\n}}}\r\n\r\nthe long list has been wrapped in abstractions to avoid\r\nsharing, gaining constant space processing rather than\r\nthe space leak in the original code - see\r\n\r\nhttp://www.haskell.org/pipermail/haskell-cafe/2006-September/018447.html\r\n\r\nhttp://www.haskell.org/pipermail/haskell-cafe/2006-September/018464.html\r\n\r\nthis works nicely if compiled without -O, but unfortunately,\r\n-O reintroduces the space leak (apparently lifting and sharing\r\nthe common and space-expensive subexpression).\r\n\r\n1. I didn't see a ticket for this issue, so I wanted to record the example\r\n\r\n2. avoiding sharing isn't always possible, so it would be nice if one could \r\n\"bypass\" sharing in such cases. in the old g-machine, that might have \r\nbeen as simple as introducing a pragma that tells the compiler to omit\r\nthe update after the eval in the code for some subexpression (so the\r\noriginal application node would not be overwritten by the space-expensive\r\nresult, trading time for space). is there a chance for a similar trick\r\nin GHC? so one might write code like this (handwaving:-):\r\n{{{\r\ninitlast :: [a] -> ([a], a)\r\ninitlast xs = (init ({-# COPY #-} xs), last ({-# COPY #-} xs))\r\n\r\nmain = print $ case initlast [0..1000000000] of\r\n (init, last) -> (length init, last)\r\n}}}","type_of_failure":"OtherFailure","blocking":[]} -->