GHC issueshttps://gitlab.haskell.org/ghc/ghc/-/issues2024-03-27T10:38:59Zhttps://gitlab.haskell.org/ghc/ghc/-/issues/24251`filter (const False)` leaks with -O22024-03-27T10:38:59Zakegalj`filter (const False)` leaks with -O2See also
* #15631
* [This discussion on !10987](https://gitlab.haskell.org/ghc/ghc/-/merge_requests/10987#note_538951)
* #21741
* #24334
## Summary
Program:
```haskell
main = print $ filter (const False) [0..]
```
leaks when compiled...See also
* #15631
* [This discussion on !10987](https://gitlab.haskell.org/ghc/ghc/-/merge_requests/10987#note_538951)
* #21741
* #24334
## Summary
Program:
```haskell
main = print $ filter (const False) [0..]
```
leaks when compiled with `ghc -O2`. It doesn't leak when compiled with `ghc -O0`
## Steps to reproduce
```bash
$ ghc --version
The Glorious Glasgow Haskell Compilation System, version 9.8.1
$ ghc -O2 Test.hs
$ ./Test
```
## Expected behavior
When compiled with `-O2` and run it should loop with constant space.
## Environment
* GHC version used: 9.8.1, 9.4.8
Optional:
* Operating System: NixOS 23.11.750.7c4c20509c43 (Tapir)
* System Architecture: x86_64 Intel(R) Core(TM) i5-2520Makegaljakegaljhttps://gitlab.haskell.org/ghc/ghc/-/issues/24282SpecConstr regression in NoFib's `spectral/ansi`2024-02-05T17:53:15ZSebastian GrafSpecConstr regression in NoFib's `spectral/ansi`With a recent master GHC, I observe a perf regression in NoFib's [spectral/ansi](
https://gitlab.haskell.org/ghc/nofib/-/blob/0f330c9686ba1a96cc9db010f061a84e748057e5/spectral/ansi/Main.hs) the presence of `-fspec-constr`.
```bash
$ _bu...With a recent master GHC, I observe a perf regression in NoFib's [spectral/ansi](
https://gitlab.haskell.org/ghc/nofib/-/blob/0f330c9686ba1a96cc9db010f061a84e748057e5/spectral/ansi/Main.hs) the presence of `-fspec-constr`.
```bash
$ _build/stage1/bin/ghc -O Main.hs
$ ./Main 400 +RTS -t < ansi.stdout > /dev/null
<<ghc: 24321415416 bytes, 5880 GCs, 258273/364656 avg/max bytes residency (28 samples), 7M in use, 0.000 INIT (0.000 elapsed), 2.297 MUT (2.290 elapsed), 0.167 GC (0.171 elapsed) :ghc>>
$ _build/stage1/bin/ghc -O -fspec-constr Main.hs
$ ./Main 400 +RTS -t < ansi.stdout > /dev/null
<<ghc: 32211087200 bytes, 6471 GCs, 253717/401024 avg/max bytes residency (30 samples), 8M in use, 0.000 INIT (0.000 elapsed), 2.836 MUT (2.827 elapsed), 0.211 GC (0.215 elapsed) :ghc>>
```
(An aside: Perhaps me piping ansi.**stdout** is an incorrect run of the benchmark, but we shouldn't regress either way.)
Note that the second run allocates 33% more. This is due to SpecConstr introducing reboxing.
Having a hunch, I reverted !11689 and got the following results:
```bash
$ _build/stage1/bin/ghc -O -fspec-constr Main.hs
$ ./Main 400 +RTS -t < ansi.stdout > /dev/null
<<ghc: 22238093736 bytes, 5407 GCs, 258627/350776 avg/max bytes residency (22 samples), 7M in use, 0.000 INIT (0.000 elapsed), 2.127 MUT (2.120 elapsed), 0.125 GC (0.129 elapsed) :ghc>>
```
So that improved; hence !11689 is introducing a 40% regression for `spectral/ansi`. Perhaps we should re-evaluate that patch or find a way that it does not regress.
---
I began to diagnose.
Here's a diff of the specialisations we do according to `-ddump-spec-constr`, with !11689 reverted (e.g., OLD) first:
```
loop [Occ=LoopBreaker] :: Int -> [Char] -> Interact
[LclId,
Arity=2,
Str=<L><L>,
Unf=Unf{Src=<vanilla>, TopLvl=True,
Value=True, ConLike=True, WorkFree=True, Expandable=True,
Guidance=IF_ARGS [60 30] 585 60},
RULES: "SC:loop0"
forall (sc :: GHC.Prim.Int#).
loop (GHC.Types.I# sc) (GHC.Types.[] @Char)
= $sloop sc
"SC:loop1"
forall (sc :: Char) (sc :: [Char]) (sc :: Int).
loop sc (GHC.Types.: @Char sc sc)
= $sloop sc sc sc
"SC:loop2"
forall (sc :: GHC.Prim.Int#). loop (GHC.Types.I# sc) = $sloop sc]
```
and now with !11689 (e.g., NEW):
```
loop [Occ=LoopBreaker] :: Int -> [Char] -> Interact
[LclId,
Arity=2,
Str=<L><L>,
Unf=Unf{Src=<vanilla>, TopLvl=True,
Value=True, ConLike=True, WorkFree=True, Expandable=True,
Guidance=IF_ARGS [60 30] 585 60},
RULES: "SC:loop0"
forall (sc :: GHC.Prim.Int#). loop (GHC.Types.I# sc) = $sloop sc
"SC:loop1"
forall (sc :: Char) (sc :: [Char]) (sc :: Int).
loop sc (GHC.Types.: @Char sc sc)
= $sloop sc sc sc]
```
Apparently, we lose the specialisation
```
RULES: "SC:loop0"
forall (sc :: GHC.Prim.Int#).
loop (GHC.Types.I# sc) (GHC.Types.[] @Char)
= $sloop sc
```
Which IMO is an instance of (OLD)
```
"SC:loop2"
forall (sc :: GHC.Prim.Int#). loop (GHC.Types.I# sc) = $sloop sc]
```
With !11689, we never generate the first one, only the second one. But the `$sloop` of the second one needs to rebox its `I# sc`, resulting in the huge regression, whereas the `$sloop` for the `[]` specialisation does not.
Although I'm tempted to accept this regression because it is ultimately a result of a lack of awareness of reboxing in SpecConstr, I wonder why we so easily discard the specialisation for `[]`.9.10.1https://gitlab.haskell.org/ghc/ghc/-/issues/24206Performance of nofib/spectral/mandel2 is absurdly fragile2024-01-16T16:21:10ZSimon Peyton JonesPerformance of nofib/spectral/mandel2 is absurdly fragileLook at this code in `nofib/spectral/mandel2`:
```
main = do
[n] <- getArgs
replicateM_ (read n) $ do
-- m should always be smaller than size, but the compiler can't know that
m <- length <$> getArgs
let size' = max m siz...Look at this code in `nofib/spectral/mandel2`:
```
main = do
[n] <- getArgs
replicateM_ (read n) $ do
-- m should always be smaller than size, but the compiler can't know that
m <- length <$> getArgs
let size' = max m size
finite (build_tree (0,0) (size',size' `div` 2)) `seq` return ()
```
Very odd. We have just done `getArgs` so we know that `m <- length <$> getArgs` will yield 1. Hence `size' = size` always.
Now, if we copy that last line (an inlining-threshold decision) we end up with code like
```
if m<size then
finite (build_tree (0,0) (size,size `div` 2)) `seq` return ()
else
finite (build_tree (0,0) (m,m `div` 2)) `seq` return ()
```
Now, the first of these duplicated expressions is now a CAF and so can be floated out of the loop, and only done once -- thereby defeating whole purpose of the `replicate`. That happens to occur in HEAD.
But if that expression is a little bit too big, it won't be duplicated, and we won't see a CAF to lift out, and the `replicate` does its job. Result: execution is MUCH more expensive.
Ironically this fragility was introduced by @sgraf812 in this commit, which was intended to stabilise nofib results!
```
commit 8632268ad8405f0c01aaad3ad16e23c65771ba49
Author: Sebastian Graf <sebastian.graf@kit.edu>
Date: Sun Dec 30 19:36:23 2018 +0100
Stabilise benchmarks wrt. GC
Summary:
This is due to #15999, a follow-up on #5793 and #15357 and changes all
benchmarks, some of them (i.e. `wheel-sieve1`, `awards`) rather drastically.
The general plan is outlined in #15999: Identify GC-sensitive benchmarks by
looking at how productivity rates change over different nursery sizes and
iterate `main` of these benchmarks often enough for the non-monotony and
discontinuities to go away.
I was paying attention that the benchmarked logic is actually run $n times more
often, rather than just benchmarking IO operations printing the result of CAFs.
```
Specifically the change was this:
```
-main = if finite(build_tree (0,0) (size,size `div` 2)) then
- print "Success"
- else
- print "Fail"
-
-
+main = do
+ [n] <- getArgs
+ replicateM_ (read n) $ do
+ -- m should always be smaller than size, but the compiler can't know that
+ m <- length <$> getArgs
+ let size' = max m size
+ finite (build_tree (0,0) (size',size' `div` 2)) `seq` return ()
```
Let's fix this fragility.Simon Peyton JonesSimon Peyton Joneshttps://gitlab.haskell.org/ghc/ghc/-/issues/24264StgToCmm: Don't generate code for no-op continuations2024-01-01T11:55:05ZMatthew Cravenclyring@gmail.comStgToCmm: Don't generate code for no-op continuations## Motivation
In `IO` and `ST` code it's very common to see a function end with `pure $! someExpression`.
After unarisation, that looks like `case someExpression of vx { __DEFAULT -> Solo# [vx]; };`, where `Solo#` is the constructor fo...## Motivation
In `IO` and `ST` code it's very common to see a function end with `pure $! someExpression`.
After unarisation, that looks like `case someExpression of vx { __DEFAULT -> Solo# [vx]; };`, where `Solo#` is the constructor for the unary unboxed tuple. Today, the `Cmm` code we generate this `case` expression does the following:
1. Push a stack frame for `someExpression` to return to,
2. Jump to the code for `someExpression`.
3. When that returns, the stack frame we pushed earlier does nothing except pop itself from the stack and jump to the next stack frame below it.
It would be faster and more direct if we didn't bother pushing a stack frame, skipping steps 1 and 3. We can get away with this because the return convention for a lifted type like that of `someExpression` is compatible with the return convention for a unary unboxed tuple containing a lifted type like `Solo# [vx]`. (The return conventions are not identical: The former promises to return a properly tagged pointer to an evaluated object, while the latter only promises to return a pointer to an object that can be evaluated if need be.)
## Proposal
Detect these trivial `of vx { __DEFAULT -> Solo# [vx] }` case continuations in `StgToCmm`, and don't generate any code for them.
Alternative: It is also plausible to instead expand our Stg-CSE to detect this situation and rewrite ``case someExpression of vx { __DEFAULT -> Solo# [vx]; };`` to just `someExpression`, but this is representationally confusing and may cause trouble for one of our other Stg passes. Even if it doesn't, there is no clear benefit for such a rewrite above and beyond the cheap check proposed above.Matthew Cravenclyring@gmail.comMatthew Cravenclyring@gmail.comhttps://gitlab.haskell.org/ghc/ghc/-/issues/21710Optimize dataToTag# to do more work at compile time.2023-12-21T21:18:04ZAndreas KlebingerOptimize dataToTag# to do more work at compile time.Currently the code for dataToTag# is a bit odd.
- [ ] We check at runtime if the constructor tag is too large to be stored in the ptr tag. This should only be done at runtime if the result type is a large constructor family. This would...Currently the code for dataToTag# is a bit odd.
- [ ] We check at runtime if the constructor tag is too large to be stored in the ptr tag. This should only be done at runtime if the result type is a large constructor family. This would break `dataToTag# (unsafeCoerce# (x :: LargeConFamType) :: SmallConFamType)` but we don't consider this a sound use of unsafeCoerce# anyway. Turn this into a compile time check.
- [x] Make use of the tagSig/lfInfo to check if the argument is already tagged.
- [ ] Share the two code paths which take the constructor tag from the info table.https://gitlab.haskell.org/ghc/ghc/-/issues/3458Allocation where none should happen2023-12-21T21:18:02ZguestAllocation where none should happenThese two functions, according to profiling, do a lot of allocation:
```
gen d r n m s p
| r == ll = do
pokeElemOff p n 0x0a
gen d 0 (n+1) (m+1) s p
| n == m = do
pokeElemOff p n 0x0a
ret...These two functions, according to profiling, do a lot of allocation:
```
gen d r n m s p
| r == ll = do
pokeElemOff p n 0x0a
gen d 0 (n+1) (m+1) s p
| n == m = do
pokeElemOff p n 0x0a
return (s, if r == 0 then m else m+1)
| otherwise = do
let t = next s
pokeElemOff p n (pick d t)
gen d (r+1) (n+1) m t p
------------------------------------------------------------------------
pick (c, p) r = loop 0 where
loop i = if r < unsafeAt p i
then fromIntegral $ unsafeAt c i :: Word8
else loop (i+1)
```
------------------------------------------------------------------------
Core for pick:
```
[GlobalId]
[Arity 3
NoCafRefs
Str: DmdType LLL]
$w$spick_r3kC =
\ (ww_s33o :: GHC.Prim.ByteArray#)
(ww1_s33v :: GHC.Prim.ByteArray#)
(ww2_s33A :: GHC.Prim.Word#) ->
letrec {
$wloop_s38I :: GHC.Prim.Int# -> GHC.Prim.Word#
[Arity 1
Str: DmdType L]
$wloop_s38I =
\ (ww3_s339 :: GHC.Prim.Int#) ->
__scc {pick main:Main !}
case GHC.Prim.ltWord#
ww2_s33A (GHC.Prim.indexWord32Array# ww1_s33v ww3_s339)
of wild_X3O {
GHC.Bool.False -> $wloop_s38I (GHC.Prim.+# ww3_s339 1);
GHC.Bool.True ->
GHC.Prim.narrow8Word# (GHC.Prim.indexWord32Array# ww_s33o ww3_s339)
}; } in
case __scc {pick main:Main}
case $wloop_s38I 0 of ww3_s33d { __DEFAULT ->
GHC.Word.W8# ww3_s33d
}
of ww3_s33D { GHC.Word.W8# ww4_s33E ->
ww4_s33E
}
```
------------------------------------------------------------------------
Core for gen (long):
```
Rec {
$s$wa_r3mi :: GHC.Prim.State# GHC.Prim.RealWorld
-> GHC.Prim.Addr#
-> GHC.Prim.Word#
-> GHC.Prim.Int#
-> GHC.Prim.Int#
-> GHC.Prim.Int#
-> GHC.Prim.ByteArray#
-> GHC.Types.Int
-> GHC.Types.Int
-> GHC.Types.Int
-> GHC.Prim.ByteArray#
-> GHC.Types.Int
-> GHC.Types.Int
-> GHC.Types.Int
-> (# GHC.Prim.State# GHC.Prim.RealWorld,
(GHC.Word.Word32, GHC.Types.Int) #)
[GlobalId]
[Arity 14
NoCafRefs]
$s$wa_r3mi =
\ (sc_s3es :: GHC.Prim.State# GHC.Prim.RealWorld)
(sc1_s3et :: GHC.Prim.Addr#)
(sc2_s3eu :: GHC.Prim.Word#)
(sc3_s3ev :: GHC.Prim.Int#)
(sc4_s3ew :: GHC.Prim.Int#)
(sc5_s3ex :: GHC.Prim.Int#)
(sc6_s3ey :: GHC.Prim.ByteArray#)
(sc7_s3ez :: GHC.Types.Int)
(sc8_s3eA :: GHC.Types.Int)
(sc9_s3eB :: GHC.Types.Int)
(sc10_s3eC :: GHC.Prim.ByteArray#)
(sc11_s3eD :: GHC.Types.Int)
(sc12_s3eE :: GHC.Types.Int)
(sc13_s3eF :: GHC.Types.Int) ->
let {
m_s39b :: GHC.Types.Int
[]
m_s39b = GHC.Types.I# sc3_s3ev } in
((__scc {gen main:Main !}
case sc5_s3ex of wild_B1 {
__DEFAULT ->
case GHC.Prim.==# sc4_s3ew sc3_s3ev of wild1_X3F {
GHC.Bool.False ->
(\ (eta_a2vm :: GHC.Prim.State# GHC.Prim.RealWorld) ->
let {
ww_s33e :: GHC.Prim.Word#
[]
ww_s33e =
GHC.Prim.remWord#
(GHC.Prim.narrow32Word#
(GHC.Prim.plusWord#
(GHC.Prim.narrow32Word# (GHC.Prim.timesWord# __word 3877 sc2_s3eu))
__word 29573))
__word 139968 } in
case $w$spick_r3k8 sc10_s3eC sc6_s3ey ww_s33e
of ww1_s33i { __DEFAULT ->
case GHC.Prim.writeWord8OffAddr#
@ GHC.Prim.RealWorld sc1_s3et sc4_s3ew ww1_s33i eta_a2vm
of s21_a2wV { __DEFAULT ->
$s$wa_r3mi
s21_a2wV
sc1_s3et
ww_s33e
sc3_s3ev
(GHC.Prim.+# sc4_s3ew 1)
(GHC.Prim.+# wild_B1 1)
sc6_s3ey
sc7_s3ez
sc8_s3eA
sc9_s3eB
sc10_s3eC
sc11_s3eD
sc12_s3eE
sc13_s3eF
}
})
`cast` (sym ((GHC.IOBase.:CoIO) (GHC.Word.Word32, GHC.Types.Int))
:: GHC.Prim.State# GHC.Prim.RealWorld
-> (# GHC.Prim.State# GHC.Prim.RealWorld,
(GHC.Word.Word32, GHC.Types.Int) #)
~
GHC.IOBase.IO (GHC.Word.Word32, GHC.Types.Int));
GHC.Bool.True ->
(\ (eta_a2vm :: GHC.Prim.State# GHC.Prim.RealWorld) ->
case GHC.Prim.writeWord8OffAddr#
@ GHC.Prim.RealWorld sc1_s3et sc4_s3ew __word 10 eta_a2vm
of s21_a2wV { __DEFAULT ->
(# s21_a2wV,
(GHC.Word.W32# sc2_s3eu,
case wild_B1 of wild2_X4o {
__DEFAULT -> GHC.Types.I# (GHC.Prim.+# sc3_s3ev 1); 0 -> m_s39b
}) #)
})
`cast` (sym ((GHC.IOBase.:CoIO) (GHC.Word.Word32, GHC.Types.Int))
:: GHC.Prim.State# GHC.Prim.RealWorld
-> (# GHC.Prim.State# GHC.Prim.RealWorld,
(GHC.Word.Word32, GHC.Types.Int) #)
~
GHC.IOBase.IO (GHC.Word.Word32, GHC.Types.Int))
};
60 ->
(\ (eta_a2vm :: GHC.Prim.State# GHC.Prim.RealWorld) ->
case GHC.Prim.writeWord8OffAddr#
@ GHC.Prim.RealWorld sc1_s3et sc4_s3ew __word 10 eta_a2vm
of s21_a2wV { __DEFAULT ->
$s$wa1_r3mk
s21_a2wV
sc1_s3et
sc2_s3eu
(GHC.Prim.+# sc3_s3ev 1)
(GHC.Prim.+# sc4_s3ew 1)
sc6_s3ey
sc7_s3ez
sc8_s3eA
sc9_s3eB
sc10_s3eC
sc11_s3eD
sc12_s3eE
sc13_s3eF
})
`cast` (sym ((GHC.IOBase.:CoIO) (GHC.Word.Word32, GHC.Types.Int))
:: GHC.Prim.State# GHC.Prim.RealWorld
-> (# GHC.Prim.State# GHC.Prim.RealWorld,
(GHC.Word.Word32, GHC.Types.Int) #)
~
GHC.IOBase.IO (GHC.Word.Word32, GHC.Types.Int))
})
`cast` ((GHC.IOBase.:CoIO) (GHC.Word.Word32, GHC.Types.Int)
:: GHC.IOBase.IO (GHC.Word.Word32, GHC.Types.Int)
~
GHC.Prim.State# GHC.Prim.RealWorld
-> (# GHC.Prim.State# GHC.Prim.RealWorld,
(GHC.Word.Word32, GHC.Types.Int) #)))
sc_s3es
$s$wa1_r3mk :: GHC.Prim.State# GHC.Prim.RealWorld
-> GHC.Prim.Addr#
-> GHC.Prim.Word#
-> GHC.Prim.Int#
-> GHC.Prim.Int#
-> GHC.Prim.ByteArray#
-> GHC.Types.Int
-> GHC.Types.Int
-> GHC.Types.Int
-> GHC.Prim.ByteArray#
-> GHC.Types.Int
-> GHC.Types.Int
-> GHC.Types.Int
-> (# GHC.Prim.State# GHC.Prim.RealWorld,
(GHC.Word.Word32, GHC.Types.Int) #)
[GlobalId]
[Arity 13
NoCafRefs]
$s$wa1_r3mk =
\ (sc_s3fH :: GHC.Prim.State# GHC.Prim.RealWorld)
(sc1_s3fI :: GHC.Prim.Addr#)
(sc2_s3fJ :: GHC.Prim.Word#)
(sc3_s3fK :: GHC.Prim.Int#)
(sc4_s3fL :: GHC.Prim.Int#)
(sc5_s3fM :: GHC.Prim.ByteArray#)
(sc6_s3fN :: GHC.Types.Int)
(sc7_s3fO :: GHC.Types.Int)
(sc8_s3fP :: GHC.Types.Int)
(sc9_s3fQ :: GHC.Prim.ByteArray#)
(sc10_s3fR :: GHC.Types.Int)
(sc11_s3fS :: GHC.Types.Int)
(sc12_s3fT :: GHC.Types.Int) ->
let {
m_s39b :: GHC.Types.Int
[]
m_s39b = GHC.Types.I# sc3_s3fK } in
((__scc {gen main:Main !}
case GHC.Prim.==# sc4_s3fL sc3_s3fK of wild_X3F {
GHC.Bool.False ->
(\ (eta_a2vm :: GHC.Prim.State# GHC.Prim.RealWorld) ->
let {
ww_s33e :: GHC.Prim.Word#
[]
ww_s33e =
GHC.Prim.remWord#
(GHC.Prim.narrow32Word#
(GHC.Prim.plusWord#
(GHC.Prim.narrow32Word# (GHC.Prim.timesWord# __word 3877 sc2_s3fJ))
__word 29573))
__word 139968 } in
case $w$spick_r3k8 sc9_s3fQ sc5_s3fM ww_s33e
of ww1_s33i { __DEFAULT ->
case GHC.Prim.writeWord8OffAddr#
@ GHC.Prim.RealWorld sc1_s3fI sc4_s3fL ww1_s33i eta_a2vm
of s21_a2wV { __DEFAULT ->
$s$wa_r3mi
s21_a2wV
sc1_s3fI
ww_s33e
sc3_s3fK
(GHC.Prim.+# sc4_s3fL 1)
1
sc5_s3fM
sc6_s3fN
sc7_s3fO
sc8_s3fP
sc9_s3fQ
sc10_s3fR
sc11_s3fS
sc12_s3fT
}
})
`cast` (sym ((GHC.IOBase.:CoIO) (GHC.Word.Word32, GHC.Types.Int))
:: GHC.Prim.State# GHC.Prim.RealWorld
-> (# GHC.Prim.State# GHC.Prim.RealWorld,
(GHC.Word.Word32, GHC.Types.Int) #)
~
GHC.IOBase.IO (GHC.Word.Word32, GHC.Types.Int));
GHC.Bool.True ->
(\ (eta_a2vm :: GHC.Prim.State# GHC.Prim.RealWorld) ->
case GHC.Prim.writeWord8OffAddr#
@ GHC.Prim.RealWorld sc1_s3fI sc4_s3fL __word 10 eta_a2vm
of s21_a2wV { __DEFAULT ->
(# s21_a2wV, (GHC.Word.W32# sc2_s3fJ, m_s39b) #)
})
`cast` (sym ((GHC.IOBase.:CoIO) (GHC.Word.Word32, GHC.Types.Int))
:: GHC.Prim.State# GHC.Prim.RealWorld
-> (# GHC.Prim.State# GHC.Prim.RealWorld,
(GHC.Word.Word32, GHC.Types.Int) #)
~
GHC.IOBase.IO (GHC.Word.Word32, GHC.Types.Int))
})
`cast` ((GHC.IOBase.:CoIO) (GHC.Word.Word32, GHC.Types.Int)
:: GHC.IOBase.IO (GHC.Word.Word32, GHC.Types.Int)
~
GHC.Prim.State# GHC.Prim.RealWorld
-> (# GHC.Prim.State# GHC.Prim.RealWorld,
(GHC.Word.Word32, GHC.Types.Int) #)))
sc_s3fH
end Rec }
$s$wa2_r3mm :: GHC.Prim.State# GHC.Prim.RealWorld
-> GHC.Prim.Addr#
-> GHC.Word.Word32
-> GHC.Prim.Int#
-> GHC.Prim.Int#
-> (Data.Array.Base.UArray GHC.Types.Int GHC.Word.Word32,
Data.Array.Base.UArray GHC.Types.Int GHC.Word.Word32)
-> (# GHC.Prim.State# GHC.Prim.RealWorld,
(GHC.Word.Word32, GHC.Types.Int) #)
[GlobalId]
[Arity 6
NoCafRefs]
$s$wa2_r3mm =
\ (sc_s3eH :: GHC.Prim.State# GHC.Prim.RealWorld)
(sc1_s3eI :: GHC.Prim.Addr#)
(sc2_s3eJ :: GHC.Word.Word32)
(sc3_s3eK :: GHC.Prim.Int#)
(sc4_s3eL :: GHC.Prim.Int#)
(sc5_s3eM :: (Data.Array.Base.UArray GHC.Types.Int GHC.Word.Word32,
Data.Array.Base.UArray GHC.Types.Int GHC.Word.Word32)) ->
let {
m_s39b :: GHC.Types.Int
[]
m_s39b = GHC.Types.I# sc3_s3eK } in
((__scc {gen main:Main !}
case GHC.Prim.==# sc4_s3eL sc3_s3eK of wild_X3F {
GHC.Bool.False ->
(\ (eta_a2vm :: GHC.Prim.State# GHC.Prim.RealWorld) ->
case sc5_s3eM of w_X34x { (ww_s32X, ww1_s334) ->
case ww_s32X
of ww2_X34F
{ Data.Array.Base.UArray ww3_s32Z ww4_s330 ww5_s331 ww6_s332 ->
case ww1_s334
of ww7_X34X
{ Data.Array.Base.UArray ww8_s336 ww9_s337 ww10_s338 ww11_s339 ->
case __scc {next main:Main}
case sc2_s3eJ of wild1_a2Bw { GHC.Word.W32# y#_a2By ->
GHC.Word.W32#
(GHC.Prim.remWord#
(GHC.Prim.narrow32Word#
(GHC.Prim.plusWord#
(GHC.Prim.narrow32Word# (GHC.Prim.timesWord# __word 3877 y#_a2By))
__word 29573))
__word 139968)
}
of w1_X35g { GHC.Word.W32# ww12_s33e ->
case $w$spick_r3k8 ww6_s332 ww11_s339 ww12_s33e
of ww13_s33i { __DEFAULT ->
case GHC.Prim.writeWord8OffAddr#
@ GHC.Prim.RealWorld sc1_s3eI sc4_s3eL ww13_s33i eta_a2vm
of s21_a2wV { __DEFAULT ->
$s$wa_r3mi
s21_a2wV
sc1_s3eI
ww12_s33e
sc3_s3eK
(GHC.Prim.+# sc4_s3eL 1)
1
ww11_s339
ww10_s338
ww9_s337
ww8_s336
ww6_s332
ww5_s331
ww4_s330
ww3_s32Z
}
}
}
}
}
})
`cast` (sym ((GHC.IOBase.:CoIO) (GHC.Word.Word32, GHC.Types.Int))
:: GHC.Prim.State# GHC.Prim.RealWorld
-> (# GHC.Prim.State# GHC.Prim.RealWorld,
(GHC.Word.Word32, GHC.Types.Int) #)
~
GHC.IOBase.IO (GHC.Word.Word32, GHC.Types.Int));
GHC.Bool.True ->
(\ (eta_a2vm :: GHC.Prim.State# GHC.Prim.RealWorld) ->
case GHC.Prim.writeWord8OffAddr#
@ GHC.Prim.RealWorld sc1_s3eI sc4_s3eL __word 10 eta_a2vm
of s21_a2wV { __DEFAULT ->
(# s21_a2wV, (sc2_s3eJ, m_s39b) #)
})
`cast` (sym ((GHC.IOBase.:CoIO) (GHC.Word.Word32, GHC.Types.Int))
:: GHC.Prim.State# GHC.Prim.RealWorld
-> (# GHC.Prim.State# GHC.Prim.RealWorld,
(GHC.Word.Word32, GHC.Types.Int) #)
~
GHC.IOBase.IO (GHC.Word.Word32, GHC.Types.Int))
})
`cast` ((GHC.IOBase.:CoIO) (GHC.Word.Word32, GHC.Types.Int)
:: GHC.IOBase.IO (GHC.Word.Word32, GHC.Types.Int)
~
GHC.Prim.State# GHC.Prim.RealWorld
-> (# GHC.Prim.State# GHC.Prim.RealWorld,
(GHC.Word.Word32, GHC.Types.Int) #)))
sc_s3eH
$wa1_r3mo :: (Data.Array.Base.UArray GHC.Types.Int GHC.Word.Word32,
Data.Array.Base.UArray GHC.Types.Int GHC.Word.Word32)
-> GHC.Prim.Int#
-> GHC.Prim.Int#
-> GHC.Prim.Int#
-> GHC.Word.Word32
-> GHC.Prim.Addr#
-> GHC.Prim.State# GHC.Prim.RealWorld
-> (# GHC.Prim.State# GHC.Prim.RealWorld,
(GHC.Word.Word32, GHC.Types.Int) #)
[GlobalId]
[Arity 7
NoCafRefs
Str: DmdType LLLLLLL]
$wa1_r3mo =
\ (w_s33r :: (Data.Array.Base.UArray GHC.Types.Int GHC.Word.Word32,
Data.Array.Base.UArray GHC.Types.Int GHC.Word.Word32))
(ww_s33u :: GHC.Prim.Int#)
(ww1_s33y :: GHC.Prim.Int#)
(ww2_s33C :: GHC.Prim.Int#)
(w1_s33E :: GHC.Word.Word32)
(ww3_s33H :: GHC.Prim.Addr#)
(w2_s33J :: GHC.Prim.State# GHC.Prim.RealWorld) ->
let {
m_s39b :: GHC.Types.Int
[]
m_s39b = GHC.Types.I# ww2_s33C } in
((__scc {gen main:Main !}
case ww_s33u of wild_B1 {
__DEFAULT ->
case GHC.Prim.==# ww1_s33y ww2_s33C of wild1_X3F {
GHC.Bool.False ->
(\ (eta_a2vm :: GHC.Prim.State# GHC.Prim.RealWorld) ->
case w_s33r of w3_X34x { (ww4_s32X, ww5_s334) ->
case ww4_s32X
of ww6_X34F
{ Data.Array.Base.UArray ww7_s32Z ww8_s330 ww9_s331 ww10_s332 ->
case ww5_s334
of ww11_X34X
{ Data.Array.Base.UArray ww12_s336 ww13_s337 ww14_s338 ww15_s339 ->
case __scc {next main:Main}
case w1_s33E of wild11_a2Bw { GHC.Word.W32# y#_a2By ->
GHC.Word.W32#
(GHC.Prim.remWord#
(GHC.Prim.narrow32Word#
(GHC.Prim.plusWord#
(GHC.Prim.narrow32Word# (GHC.Prim.timesWord# __word 3877 y#_a2By))
__word 29573))
__word 139968)
}
of w4_X35g { GHC.Word.W32# ww16_s33e ->
case $w$spick_r3k8 ww10_s332 ww15_s339 ww16_s33e
of ww17_s33i { __DEFAULT ->
case GHC.Prim.writeWord8OffAddr#
@ GHC.Prim.RealWorld ww3_s33H ww1_s33y ww17_s33i eta_a2vm
of s21_a2wV { __DEFAULT ->
$s$wa_r3mi
s21_a2wV
ww3_s33H
ww16_s33e
ww2_s33C
(GHC.Prim.+# ww1_s33y 1)
(GHC.Prim.+# wild_B1 1)
ww15_s339
ww14_s338
ww13_s337
ww12_s336
ww10_s332
ww9_s331
ww8_s330
ww7_s32Z
}
}
}
}
}
})
`cast` (sym ((GHC.IOBase.:CoIO) (GHC.Word.Word32, GHC.Types.Int))
:: GHC.Prim.State# GHC.Prim.RealWorld
-> (# GHC.Prim.State# GHC.Prim.RealWorld,
(GHC.Word.Word32, GHC.Types.Int) #)
~
GHC.IOBase.IO (GHC.Word.Word32, GHC.Types.Int));
GHC.Bool.True ->
(\ (eta_a2vm :: GHC.Prim.State# GHC.Prim.RealWorld) ->
case GHC.Prim.writeWord8OffAddr#
@ GHC.Prim.RealWorld ww3_s33H ww1_s33y __word 10 eta_a2vm
of s21_a2wV { __DEFAULT ->
(# s21_a2wV,
(w1_s33E,
case wild_B1 of wild2_X4o {
__DEFAULT -> GHC.Types.I# (GHC.Prim.+# ww2_s33C 1); 0 -> m_s39b
}) #)
})
`cast` (sym ((GHC.IOBase.:CoIO) (GHC.Word.Word32, GHC.Types.Int))
:: GHC.Prim.State# GHC.Prim.RealWorld
-> (# GHC.Prim.State# GHC.Prim.RealWorld,
(GHC.Word.Word32, GHC.Types.Int) #)
~
GHC.IOBase.IO (GHC.Word.Word32, GHC.Types.Int))
};
60 ->
(\ (eta_a2vm :: GHC.Prim.State# GHC.Prim.RealWorld) ->
case GHC.Prim.writeWord8OffAddr#
@ GHC.Prim.RealWorld ww3_s33H ww1_s33y __word 10 eta_a2vm
of s21_a2wV { __DEFAULT ->
$s$wa2_r3mm
s21_a2wV
ww3_s33H
w1_s33E
(GHC.Prim.+# ww2_s33C 1)
(GHC.Prim.+# ww1_s33y 1)
w_s33r
})
`cast` (sym ((GHC.IOBase.:CoIO) (GHC.Word.Word32, GHC.Types.Int))
:: GHC.Prim.State# GHC.Prim.RealWorld
-> (# GHC.Prim.State# GHC.Prim.RealWorld,
(GHC.Word.Word32, GHC.Types.Int) #)
~
GHC.IOBase.IO (GHC.Word.Word32, GHC.Types.Int))
})
`cast` ((GHC.IOBase.:CoIO) (GHC.Word.Word32, GHC.Types.Int)
:: GHC.IOBase.IO (GHC.Word.Word32, GHC.Types.Int)
~
GHC.Prim.State# GHC.Prim.RealWorld
-> (# GHC.Prim.State# GHC.Prim.RealWorld,
(GHC.Word.Word32, GHC.Types.Int) #)))
w2_s33J
a2_r3mq :: (Data.Array.Base.UArray GHC.Types.Int GHC.Word.Word32,
Data.Array.Base.UArray GHC.Types.Int GHC.Word.Word32)
-> GHC.Types.Int
-> GHC.Types.Int
-> GHC.Types.Int
-> GHC.Word.Word32
-> GHC.Ptr.Ptr GHC.Word.Word8
-> GHC.Prim.State# GHC.Prim.RealWorld
-> (# GHC.Prim.State# GHC.Prim.RealWorld,
(GHC.Word.Word32, GHC.Types.Int) #)
[GlobalId]
[Arity 7
NoCafRefs
Str: DmdType LU(L)U(L)U(L)LU(L)L]
a2_r3mq =
__inline_me (\ (w_s33r :: (Data.Array.Base.UArray
GHC.Types.Int GHC.Word.Word32,
Data.Array.Base.UArray GHC.Types.Int GHC.Word.Word32))
(w1_s33s :: GHC.Types.Int)
(w2_s33w :: GHC.Types.Int)
(w3_s33A :: GHC.Types.Int)
(w4_s33E :: GHC.Word.Word32)
(w5_s33F :: GHC.Ptr.Ptr GHC.Word.Word8)
(w6_s33J :: GHC.Prim.State# GHC.Prim.RealWorld) ->
case w1_s33s of w7_X35h { GHC.Types.I# ww_s33u ->
case w2_s33w of w8_X35q { GHC.Types.I# ww1_s33y ->
case w3_s33A of w9_X35z { GHC.Types.I# ww2_s33C ->
case w5_s33F of w10_X35J { GHC.Ptr.Ptr ww3_s33H ->
$wa1_r3mo w_s33r ww_s33u ww1_s33y ww2_s33C w4_s33E ww3_s33H w6_s33J
}
}
}
})
```https://gitlab.haskell.org/ghc/ghc/-/issues/15226GHC doesn't know that seq# produces something in WHNF2023-12-18T14:30:21ZDavid FeuerGHC doesn't know that seq# produces something in WHNF```hs
data Str a = Str !a
bar :: Maybe a -> IO (Str (Maybe a))
bar x = do
x' <- evaluate x
pure (Str x')
```
This compiles to
```hs
Test.bar1
= \ (@ a_a3Ld)
(x_a3Ah :: Maybe a_a3Ld)
(s_i3Nz :: GHC.Prim.State# GHC.Prim...```hs
data Str a = Str !a
bar :: Maybe a -> IO (Str (Maybe a))
bar x = do
x' <- evaluate x
pure (Str x')
```
This compiles to
```hs
Test.bar1
= \ (@ a_a3Ld)
(x_a3Ah :: Maybe a_a3Ld)
(s_i3Nz :: GHC.Prim.State# GHC.Prim.RealWorld) ->
case GHC.Prim.seq#
@ (Maybe a_a3Ld) @ GHC.Prim.RealWorld x_a3Ah s_i3Nz
of
{ (# ipv_i3NC, ipv1_i3ND #) ->
(# ipv_i3NC, Test.$WStr @ (Maybe a_a3Ld) ipv1_i3ND #)
}
```
We suspend the application of `$WStr` to `ipv1_i3ND`, when all we actually need to do is apply `Str` directly. We could work around this in `base` by defining
```hs
evaluate x = IO $ \s ->
case seq# x s of
(# s', !x' #) -> (# s', x' #)
```
but that seems more than a little bit silly.
<details><summary>Trac metadata</summary>
| Trac field | Value |
| ---------------------- | ------------ |
| Version | 8.4.3 |
| Type | Bug |
| TypeOfFailure | OtherFailure |
| Priority | normal |
| Resolution | Unresolved |
| Component | Compiler |
| Test case | |
| Differential revisions | |
| BlockedBy | |
| Related | |
| Blocking | |
| CC | simonmar |
| Operating system | |
| Architecture | |
</details>
<!-- {"blocked_by":[],"summary":"GHC doesn't know that seq# produces something in WHNF","status":"New","operating_system":"","component":"Compiler","related":[],"milestone":"8.6.1","resolution":"Unresolved","owner":{"tag":"Unowned"},"version":"8.4.3","keywords":[],"differentials":[],"test_case":"","architecture":"","cc":["simonmar"],"type":"Bug","description":"{{{#!hs\r\ndata Str a = Str !a\r\nbar :: Maybe a -> IO (Str (Maybe a))\r\nbar x = do\r\n x' <- evaluate x\r\n pure (Str x')\r\n}}}\r\n\r\nThis compiles to\r\n\r\n{{{#!hs\r\nTest.bar1\r\n = \\ (@ a_a3Ld)\r\n (x_a3Ah :: Maybe a_a3Ld)\r\n (s_i3Nz :: GHC.Prim.State# GHC.Prim.RealWorld) ->\r\n case GHC.Prim.seq#\r\n @ (Maybe a_a3Ld) @ GHC.Prim.RealWorld x_a3Ah s_i3Nz\r\n of\r\n { (# ipv_i3NC, ipv1_i3ND #) ->\r\n (# ipv_i3NC, Test.$WStr @ (Maybe a_a3Ld) ipv1_i3ND #)\r\n }\r\n}}}\r\n\r\nWe suspend the application of `$WStr` to `ipv1_i3ND`, when all we actually need to do is apply `Str` directly. We could work around this in `base` by defining\r\n\r\n{{{#!hs\r\nevaluate x = IO $ \\s ->\r\n case seq# x s of\r\n (# s', !x' #) -> (# s', x' #)\r\n}}}\r\n\r\nbut that seems more than a little bit silly.","type_of_failure":"OtherFailure","blocking":[]} -->8.6.1https://gitlab.haskell.org/ghc/ghc/-/issues/8023dph-examples binaries don't use all CPUs2023-12-01T14:11:24ZLethalmandph-examples binaries don't use all CPUsHi,
I've run dph-spectral-quicksort 3000000 +RTS -N6 of dph-examples-0.7.0.5 but it doesn't seem to use all the 6 hyperthreads. You can see the system monitor attached.
Same goes for the other quickhull-vector example.
ghc --version
The...Hi,
I've run dph-spectral-quicksort 3000000 +RTS -N6 of dph-examples-0.7.0.5 but it doesn't seem to use all the 6 hyperthreads. You can see the system monitor attached.
Same goes for the other quickhull-vector example.
ghc --version
The Glorious Glasgow Haskell Compilation System, version 7.6.3
llvm-config-3.0 --version
1. 0
What am I possibly doing wrong?
<details><summary>Trac metadata</summary>
| Trac field | Value |
| ---------------------- | --------------------- |
| Version | 7.6.3 |
| Type | Bug |
| TypeOfFailure | OtherFailure |
| Priority | normal |
| Resolution | Unresolved |
| Component | Data Parallel Haskell |
| Test case | |
| Differential revisions | |
| BlockedBy | |
| Related | |
| Blocking | |
| CC | benl |
| Operating system | |
| Architecture | Unknown/Multiple |
</details>
<!-- {"blocked_by":[],"summary":"dph-examples binaries don't use all CPUs","status":"New","operating_system":"","component":"Data Parallel Haskell","related":[],"milestone":"","resolution":"Unresolved","owner":{"tag":"Unowned"},"version":"7.6.3","keywords":[],"differentials":[],"test_case":"","architecture":"Unknown/Multiple","cc":["benl"],"type":"Bug","description":"Hi,\r\nI've run dph-spectral-quicksort 3000000 +RTS -N6 of dph-examples-0.7.0.5 but it doesn't seem to use all the 6 hyperthreads. You can see the system monitor attached.\r\nSame goes for the other quickhull-vector example.\r\n\r\nghc --version\r\nThe Glorious Glasgow Haskell Compilation System, version 7.6.3\r\n\r\nllvm-config-3.0 --version\r\n3.0\r\n\r\nWhat am I possibly doing wrong?","type_of_failure":"OtherFailure","blocking":[]} -->https://gitlab.haskell.org/ghc/ghc/-/issues/5302Unused arguments in join points2023-11-20T09:31:42ZreinerpUnused arguments in join pointsSometimes GHC produces join points with unused parameters. In the example attached, we get join points like the following (when compiled with -O2):
```
...
$j1_XHI
:: GHC.Prim.Int#
-> GHC.Types.Int
-> (# Unbo...Sometimes GHC produces join points with unused parameters. In the example attached, we get join points like the following (when compiled with -O2):
```
...
$j1_XHI
:: GHC.Prim.Int#
-> GHC.Types.Int
-> (# Unboxed.FingerTree
Unboxed.Size (Unboxed.Node Unboxed.Size b_ahY),
Unboxed.Node Unboxed.Size b_ahY,
Unboxed.FingerTree
Unboxed.Size (Unboxed.Node Unboxed.Size b_ahY) #)
[LclId, Arity=2, Str=DmdType LL]
$j1_XHI =
\ (x2_XE8 :: GHC.Prim.Int#) _ ->
...
```
which is always called as follows:
```
...
$j1_XHI x2_XE2 (GHC.Types.I# x2_XE2)
...
```
i.e. where the second argument is a boxed version of the first. GHC should remove the dead parameter from the join point, to avoid unnecessary boxing.
I get this Core with 7.0.3 and with 7.1.20110629.
I've attached a self-contained example, as small as I can make it. (Making it smaller lets GHC do more unfolding and the problem disappears.) These join points occur inside the 'Deep' case of '$wsplitTree'.
<details><summary>Trac metadata</summary>
| Trac field | Value |
| ---------------------- | ------------ |
| Version | 7.0.3 |
| Type | Bug |
| TypeOfFailure | OtherFailure |
| Priority | normal |
| Resolution | Unresolved |
| Component | Compiler |
| Test case | |
| Differential revisions | |
| BlockedBy | |
| Related | |
| Blocking | |
| CC | |
| Operating system | |
| Architecture | |
</details>
<!-- {"blocked_by":[],"summary":"Unused arguments in join points","status":"New","operating_system":"","component":"Compiler","related":[],"milestone":"","resolution":"Unresolved","owner":{"tag":"Unowned"},"version":"7.0.3","keywords":[],"differentials":[],"test_case":"","architecture":"","cc":[""],"type":"Bug","description":"Sometimes GHC produces join points with unused parameters. In the example attached, we get join points like the following (when compiled with -O2):\r\n\r\n{{{\r\n...\r\n $j1_XHI\r\n :: GHC.Prim.Int#\r\n -> GHC.Types.Int\r\n -> (# Unboxed.FingerTree\r\n Unboxed.Size (Unboxed.Node Unboxed.Size b_ahY),\r\n Unboxed.Node Unboxed.Size b_ahY,\r\n Unboxed.FingerTree\r\n Unboxed.Size (Unboxed.Node Unboxed.Size b_ahY) #)\r\n [LclId, Arity=2, Str=DmdType LL]\r\n $j1_XHI =\r\n \\ (x2_XE8 :: GHC.Prim.Int#) _ ->\r\n...\r\n}}}\r\nwhich is always called as follows:\r\n\r\n{{{\r\n...\r\n $j1_XHI x2_XE2 (GHC.Types.I# x2_XE2)\r\n...\r\n}}}\r\n\r\ni.e. where the second argument is a boxed version of the first. GHC should remove the dead parameter from the join point, to avoid unnecessary boxing.\r\n\r\nI get this Core with 7.0.3 and with 7.1.20110629.\r\n\r\nI've attached a self-contained example, as small as I can make it. (Making it smaller lets GHC do more unfolding and the problem disappears.) These join points occur inside the 'Deep' case of '$wsplitTree'.","type_of_failure":"OtherFailure","blocking":[]} -->8.4.1Simon Peyton JonesSimon Peyton Joneshttps://gitlab.haskell.org/ghc/ghc/-/issues/23907Performance Regression in splitmix from GHC 9.2.8 to GHC 9.4.1 (and later)2023-11-07T11:41:44ZjulianbrunnerPerformance Regression in splitmix from GHC 9.2.8 to GHC 9.4.1 (and later)## Summary
When using the `nextDouble` function from the splitmix package, GHC 9.4.1, GHC 9.4.7, and GHC 9.6.2 all generate code that performs worse than the code generated by GHC 8.10.7, GHC 9.0.2, and GHC 9.2.8.
## Steps to reproduce...## Summary
When using the `nextDouble` function from the splitmix package, GHC 9.4.1, GHC 9.4.7, and GHC 9.6.2 all generate code that performs worse than the code generated by GHC 8.10.7, GHC 9.0.2, and GHC 9.2.8.
## Steps to reproduce
I have come up with the following minimal example:
```hs
{-# LANGUAGE BangPatterns #-}
import System.Random.SplitMix
import Test.Tasty.Bench
{-# NOINLINE loop #-}
loop :: Int -> Double -> SMGen -> (Double, SMGen)
loop 0 !a !s = (a, s)
loop n !a !s = loop (n - 1) (a + b) t where (b, t) = nextDouble s
main :: IO ()
main = defaultMain [bench "main" $ whnf (fst . loop 1000000 0) (mkSMGen 0)]
```
When compiled on various GHC versions using `-O2` and run using `+RTS -T`, this results in:
```
GHC 8.10.7: 2.78 ms ± 105 μs, 0 B allocated, 0 B copied, 2.0 MB peak memory
GHC 9.0.2: 2.83 ms ± 195 μs, 0 B allocated, 0 B copied, 2.0 MB peak memory
GHC 9.2.8: 2.77 ms ± 229 μs, 0 B allocated, 0 B copied, 6.0 MB peak memory
GHC 9.4.1: 4.62 ms ± 345 μs, 15 MB allocated, 559 B copied, 6.0 MB peak memory
GHC 9.4.7: 4.93 ms ± 427 μs, 15 MB allocated, 559 B copied, 6.0 MB peak memory
GHC 9.6.2: 5.24 ms ± 270 μs, 15 MB allocated, 477 B copied, 6.0 MB peak memory
```
## Expected behavior
I expect this code to run without allocating any heap memory.
## Investigation
I have looked at the core generated by different versions of GHC.
### GHC 9.2.8
```hs
Rec {
-- RHS size: {terms: 52, types: 12, coercions: 0, joins: 0/3}
$wloop
= \ ww ww1 ww2 ww3 ->
case ww of ds {
__DEFAULT ->
let { seed' = plusWord# ww2 ww3 } in
let {
x#
= timesWord#
(xor# seed' (uncheckedShiftRL# seed' 33#))
18397679294719823053## } in
let {
x#1
= timesWord#
(xor# x# (uncheckedShiftRL# x# 33#)) 14181476777654086739## } in
$wloop
(-# ds 1#)
(+##
ww1
(*##
(word2Double#
(uncheckedShiftRL# (xor# x#1 (uncheckedShiftRL# x#1 33#)) 11#))
1.1102230246251565e-16##))
seed'
ww3;
0# -> (# D# ww1, SMGen ww2 ww3 #)
}
end Rec }
```
### GHC 9.4.1
```hs
Rec {
$wloop
= \ ww ww1 ww2 ww3 ->
case ww of ds {
__DEFAULT ->
let { seed' = plusWord64# ww2 ww3 } in
let {
x#
= timesWord64#
(xor64# seed' (uncheckedShiftRL64# seed' 33#))
18397679294719823053##64 } in
let {
x#1
= timesWord64#
(xor64# x# (uncheckedShiftRL64# x# 33#))
14181476777654086739##64 } in
case integerToDouble#
(integerFromWord64#
(uncheckedShiftRL64#
(xor64# x#1 (uncheckedShiftRL64# x#1 33#)) 11#))
of wild1
{ __DEFAULT ->
$wloop
(-# ds 1#) (+## ww1 (*## wild1 1.1102230246251565e-16##)) seed' ww3
};
0# -> (# ww1, ww2, ww3 #)
}
end Rec }
```
### Analysis
In splitmix, `nextDouble` is defined as follows:
```hs
nextDouble :: SMGen -> (Double, SMGen)
nextDouble g = case nextWord64 g of
(w64, g') -> (fromIntegral (w64 `shiftR` 11) * doubleUlp, g')
```
It looks to me like the main difference is that 9.2.8 translates `fromIntegral` to `word2Double#`, while 9.4.1 translates it to `integerToDouble# . integerFromWord64#`. I suspect that this is where the 16 bytes of allocated heap memory per loop iteration come from.
Cores from GHC versions earlier than 9.2.8 are fairly unreadable but also use `word2Double#`, while cores from versions later than 9.4.1 look almost identical to the one from 9.4.1.9.8.1Ben GamariSylvain HenryBen Gamarihttps://gitlab.haskell.org/ghc/ghc/-/issues/17079Optimize dataToTag# for small constructor families.2023-11-02T14:39:11ZAndreas KlebingerOptimize dataToTag# for small constructor families.## Motivation
Currently we might have some code `\x -> dataToTag# x :: T -> Int#`.
This evaluates the argument, and then executes a primop to get the tag of the argument.
This works and produces the Cmm code:
```haskell
c1g4: ...## Motivation
Currently we might have some code `\x -> dataToTag# x :: T -> Int#`.
This evaluates the argument, and then executes a primop to get the tag of the argument.
This works and produces the Cmm code:
```haskell
c1g4: // global
call (I64[R1])(R1) returns to c1g3, args: 8, res: 8, upd: 8;
c1g3: // global
R1 = %MO_UU_Conv_W32_W64(I32[I64[R1 & (-8)] - 4]);
Sp = Sp + 8;
call (P64[Sp])(R1) args: 8, res: 0, upd: 8;
```
It follows the pointer to the closures, then follows the pointer to the info table and extracts the pointer.
However, for types with few data cons we don't need to, after evaluating the value we are guaranteed that the pointer will be tagged. This means we can construct the tag from the pointer alone.
## Proposal
GHC instead should check if the type is a type for which we can reconstruct the tag from the pointer. And do so if possible.
Possibly via a rewrite rule to a dataToTagSmall# primop or similar. This would save two memory accesses for dataToTag#9.2.1https://gitlab.haskell.org/ghc/ghc/-/issues/23783Stg rewriter should create updatable closures2023-10-04T14:51:58ZJaro ReindersStg rewriter should create updatable closuresIn https://gitlab.haskell.org/ghc/ghc/-/merge_requests/9874#note_518337 we noticed that the `rewriteRhs` function is creating `ReEntrant` closures, see the last line of this case:
```haskell
rewriteRhs (_id, _tagSig) (StgRhsCon ccs con ...In https://gitlab.haskell.org/ghc/ghc/-/merge_requests/9874#note_518337 we noticed that the `rewriteRhs` function is creating `ReEntrant` closures, see the last line of this case:
```haskell
rewriteRhs (_id, _tagSig) (StgRhsCon ccs con cn ticks args typ) = {-# SCC rewriteRhs_ #-} do
-- pprTraceM "rewriteRhs" (ppr _id)
-- Look up the nodes representing the constructor arguments.
fieldInfos <- mapM isArgTagged args
-- Filter out non-strict fields.
let strictFields =
getStrictConArgs con (zip args fieldInfos) :: [(StgArg,Bool)] -- (nth-argument, tagInfo)
-- Filter out already tagged arguments.
let needsEval = map fst . --get the actual argument
filter (not . snd) $ -- Keep untagged (False) elements.
strictFields :: [StgArg]
let evalArgs = [v | StgVarArg v <- needsEval] :: [Id]
if (null evalArgs)
then return $! (StgRhsCon ccs con cn ticks args typ)
else do
--assert not (isTaggedSig tagSig)
-- pprTraceM "CreatingSeqs for " $ ppr _id <+> ppr node_id
-- At this point iff we have possibly untagged arguments to strict fields
-- we convert the RHS into a RhsClosure which will evaluate the arguments
-- before allocating the constructor.
let ty_stub = panic "mkSeqs shouldn't use the type arg"
conExpr <- mkSeqs args evalArgs (\taggedArgs -> StgConApp con cn taggedArgs ty_stub)
fvs <- fvArgs args
-- lcls <- getFVs
-- pprTraceM "RhsClosureConversion" (ppr (StgRhsClosure fvs ccs ReEntrant [] $! conExpr) $$ text "lcls:" <> ppr lcls)
return $! (StgRhsClosure fvs ccs ReEntrant [] $! conExpr) typ
```
This means that all allocations from `conExpr` could be repeated many times, which is especially painful if `conExpr` is an infinite recursive data type. See for an example: https://gitlab.haskell.org/ghc/ghc/-/merge_requests/9874#note_517869.
If instead we mark this `Updatable` then the allocations in `conExpr` will only be performed once.
It would be nice to find a minimal reproducer for this to add to the test suite, but this `rewriteRhs` function is only run in some very specific situations which are hard to reproduce.Jaro ReindersJaro Reindershttps://gitlab.haskell.org/ghc/ghc/-/issues/13218<$ is bad in derived functor instances2023-10-02T09:12:24ZDavid Feuer<$ is bad in derived functor instances`Functor` deriving derives the definition of `fmap`, leaving the definition of `<$` to the default. This is quite bad for recursive types:
```hs
data Tree a = Bin !(Tree a) a !(Tree a) | Tip deriving Functor
```
produces
```
Replace.$...`Functor` deriving derives the definition of `fmap`, leaving the definition of `<$` to the default. This is quite bad for recursive types:
```hs
data Tree a = Bin !(Tree a) a !(Tree a) | Tip deriving Functor
```
produces
```
Replace.$fFunctorTree_$c<$ =
\ (@ a_aGl) (@ b_aGm) (eta_aGn :: a_aGl) (eta1_B1 :: Tree b_aGm) ->
Replace.$fFunctorTree_$cfmap
@ b_aGm @ a_aGl (\ _ [Occ=Dead] -> eta_aGn) eta1_B1
```
Why is this bad? It fills the tree with thunks keeping the original values (which we never use again) alive. What we want to generate is
```hs
x <$ Bin l _ r = Bin (x <$ l) x (x <$ r)
```
When there are other functor types in the constructor, like
```hs
| Whatever (Tree (Tree a))
```
we will need to insert `fmap (x <$) t`. The overall shape should be basically the same as `fmap` deriving.
Note: there are some types for which we will not realistically be able to derive optimal definitions. In particular, fixed-shape, undecorated types that appear in nested types allow special treatment:
```hs
data Pair a = Pair a a deriving Functor
data Tree2 a = Z a | S (Tree2 (Pair a)) deriving Functor
```
The ideal definition for this type is
```hs
x <$ Z _ = Z x
x <$ S t = S (Pair x x <$ t)
```
but that requires cleverness. We should probably settle for
```hs
x <$ Z _ = Z x
x <$ S t = S (fmap (x <$) t)
```
<details><summary>Trac metadata</summary>
| Trac field | Value |
| ---------------------- | ------------ |
| Version | 8.1 |
| Type | Bug |
| TypeOfFailure | OtherFailure |
| Priority | normal |
| Resolution | Unresolved |
| Component | Compiler |
| Test case | |
| Differential revisions | |
| BlockedBy | |
| Related | |
| Blocking | |
| CC | |
| Operating system | |
| Architecture | |
</details>
<!-- {"blocked_by":[],"summary":"<$ is bad in derived functor instances","status":"New","operating_system":"","component":"Compiler","related":[],"milestone":"8.4.1","resolution":"Unresolved","owner":{"tag":"OwnedBy","contents":"dfeuer"},"version":"8.1","keywords":[],"differentials":[],"test_case":"","architecture":"","cc":[""],"type":"Bug","description":"`Functor` deriving derives the definition of `fmap`, leaving the definition of `<$` to the default. This is quite bad for recursive types:\r\n\r\n{{{#!hs\r\ndata Tree a = Bin !(Tree a) a !(Tree a) | Tip deriving Functor\r\n}}}\r\n\r\nproduces\r\n\r\n{{{\r\nReplace.$fFunctorTree_$c<$ =\r\n \\ (@ a_aGl) (@ b_aGm) (eta_aGn :: a_aGl) (eta1_B1 :: Tree b_aGm) ->\r\n Replace.$fFunctorTree_$cfmap\r\n @ b_aGm @ a_aGl (\\ _ [Occ=Dead] -> eta_aGn) eta1_B1\r\n}}}\r\n\r\nWhy is this bad? It fills the tree with thunks keeping the original values (which we never use again) alive. What we want to generate is\r\n\r\n{{{#!hs\r\nx <$ Bin l _ r = Bin (x <$ l) x (x <$ r)\r\n}}}\r\n\r\nWhen there are other functor types in the constructor, like\r\n\r\n{{{#!hs\r\n | Whatever (Tree (Tree a))\r\n}}}\r\n\r\nwe will need to insert `fmap (x <$) t`. The overall shape should be basically the same as `fmap` deriving.\r\n\r\nNote: there are some types for which we will not realistically be able to derive optimal definitions. In particular, fixed-shape, undecorated types that appear in nested types allow special treatment:\r\n\r\n{{{#!hs\r\ndata Pair a = Pair a a deriving Functor\r\ndata Tree2 a = Z a | S (Tree2 (Pair a)) deriving Functor\r\n}}}\r\n\r\nThe ideal definition for this type is\r\n\r\n{{{#!hs\r\n x <$ Z _ = Z x\r\n x <$ S t = S (Pair x x <$ t)\r\n}}}\r\n\r\nbut that requires cleverness. We should probably settle for\r\n\r\n{{{#!hs\r\n x <$ Z _ = Z x\r\n x <$ S t = S (fmap (x <$) t)\r\n}}}","type_of_failure":"OtherFailure","blocking":[]} -->8.2.1David FeuerDavid Feuerhttps://gitlab.haskell.org/ghc/ghc/-/issues/14003Allow more worker arguments in SpecConstr2023-09-14T10:25:31ZchoenerzsAllow more worker arguments in SpecConstrStarting with GHC 8.2 (rc1 -- head) I noticed that the SpecConstr pass does not always optimize completely with SpecConstr-heavy code.
Setting ```-fmax-worker-args=100``` leads to complete specialization again.
However, given that code ...Starting with GHC 8.2 (rc1 -- head) I noticed that the SpecConstr pass does not always optimize completely with SpecConstr-heavy code.
Setting ```-fmax-worker-args=100``` leads to complete specialization again.
However, given that code annotated with ```SPEC``` should be optimized until no more ```SPEC``` arguments are alive, shouldn't ```callToNewPats``` in ```compiler/specialise/SpecConstr.hs``` specialize \*irrespective\* of the size of the worker argument list?
Code that actually fails to specialize is fairly large, hence no test case -- though I have some files with core output showing insufficient specialization.
(I'd be willing to write a patch for this)
<details><summary>Trac metadata</summary>
| Trac field | Value |
| ---------------------- | -------------- |
| Version | 8.2.1-rc3 |
| Type | FeatureRequest |
| TypeOfFailure | OtherFailure |
| Priority | normal |
| Resolution | Unresolved |
| Component | Compiler |
| Test case | |
| Differential revisions | |
| BlockedBy | |
| Related | |
| Blocking | |
| CC | |
| Operating system | |
| Architecture | |
</details>
<!-- {"blocked_by":[],"summary":"Allow more worker arguments in SpecConstr","status":"New","operating_system":"","component":"Compiler","related":[],"milestone":"8.2.2","resolution":"Unresolved","owner":{"tag":"Unowned"},"version":"8.2.1-rc3","keywords":["Fusion","JoinPoints,"],"differentials":[],"test_case":"","architecture":"","cc":[""],"type":"FeatureRequest","description":"Starting with GHC 8.2 (rc1 -- head) I noticed that the SpecConstr pass does not always optimize completely with SpecConstr-heavy code.\r\nSetting ```-fmax-worker-args=100``` leads to complete specialization again.\r\n\r\nHowever, given that code annotated with ```SPEC``` should be optimized until no more ```SPEC``` arguments are alive, shouldn't ```callToNewPats``` in ```compiler/specialise/SpecConstr.hs``` specialize *irrespective* of the size of the worker argument list?\r\n\r\nCode that actually fails to specialize is fairly large, hence no test case -- though I have some files with core output showing insufficient specialization.\r\n\r\n(I'd be willing to write a patch for this)","type_of_failure":"OtherFailure","blocking":[]} -->9.10.1https://gitlab.haskell.org/ghc/ghc/-/issues/22338Performance Regression from GHC 8.10.7 to GHC 9.0.1 (and later)2023-09-07T11:24:06ZjulianbrunnerPerformance Regression from GHC 8.10.7 to GHC 9.0.1 (and later)## Summary
In some situations, GHC 9.0.1, GHC 9.0.2, GHC 9.2.4, and GHC 9.4.2 all generate code that performs worse than the code generated by GHC 8.10.7.
## Steps to reproduce
I have come up with the following minimal example:
```hs
...## Summary
In some situations, GHC 9.0.1, GHC 9.0.2, GHC 9.2.4, and GHC 9.4.2 all generate code that performs worse than the code generated by GHC 8.10.7.
## Steps to reproduce
I have come up with the following minimal example:
```hs
import Data.Foldable
import Data.Vector
import Test.Tasty.Bench
{-# INLINE arithmean #-}
arithmean :: Foldable f => Fractional a => f a -> a
arithmean = g . Data.Foldable.foldl' f (0 :: Int, 0) where
f (n, a) x = (n + 1, a + x)
g (n, a) = a / fromIntegral n
main :: IO ()
main = defaultMain [bench "arithmean" $ whnf (arithmean . enumFromN (1 :: Double)) 1000000]
```
I run this with `ghc -O2 Arithmean.hs && ./Arithmean +RTS -T`. In GHC 9.0.1 (and later), this results in
```
arithmean: OK (0.30s)
2.32 ms ± 178 μs, 15 MB allocated, 1.8 KB copied, 2.0 MB peak memory
```
## Expected behavior
In GHC 8.10.7, this results in
```
arithmean: OK (0.22s)
854 μs ± 51 μs, 0 B allocated, 0 B copied, 2.0 MB peak memory
```
It is much faster and performs no heap allocation.
## Environment
* GHC version used: 8.10.7, 9.0.1, 9.0.2, 9.2.4, 9.4.2
Optional:
* Operating System: Arch Linux
* System Architecture: x64
## Investigation
I have looked at the core generated by different versions of GHC.
### GHC 8.10.7
GHC 8.10.7 gives me the following worker and benchmark.
```hs
Rec {
-- RHS size: {terms: 31, types: 6, coercions: 0, joins: 0/0}
$s$wfoldlM'_loop
= \ sc sc1 sc2 sc3 ->
case ># sc 0# of {
__DEFAULT ->
case /## sc2 (int2Double# sc3) of wild2 { __DEFAULT -> D# wild2 };
1# ->
$s$wfoldlM'_loop
(-# sc 1#) (+## sc1 1.0##) (+## sc2 sc1) (+# sc3 1#)
}
end Rec }
Rec {
-- RHS size: {terms: 28, types: 27, coercions: 0, joins: 0/0}
main_$s$wbenchLoop1
= \ sc sc1 sc2 ->
case sc1 of wild {
__DEFAULT ->
case seq#
(case sc2 of { I# ww1 -> $s$wfoldlM'_loop ww1 1.0## 0.0## 0# }) sc
of
{ (# ipv, ipv1 #) ->
main_$s$wbenchLoop1 ipv (minusWord# wild 1##) sc2
};
0## -> (# sc, () #)
}
end Rec }
```
The worker exclusively uses unboxed values and thus is very fast and performs no heap allocation. The benchmark simply calls the worker function.
### GHC 9.0.1
GHC 9.0.1 gives me the following worker and benchmark.
```hs
Rec {
-- RHS size: {terms: 31, types: 6, coercions: 0, joins: 0/0}
$s$wfoldlM'_loop
= \ sc sc1 sc2 sc3 ->
case ># sc 0# of {
__DEFAULT ->
case /## sc2 (int2Double# sc3) of ww { __DEFAULT -> D# ww };
1# ->
$s$wfoldlM'_loop
(-# sc 1#) (+## sc1 1.0##) (+## sc2 sc1) (+# sc3 1#)
}
end Rec }
Rec {
-- RHS size: {terms: 71, types: 43, coercions: 0, joins: 1/1}
$s$wbenchLoop
= \ sc sc1 sc2 ->
case sc1 of wild {
__DEFAULT ->
case seq#
(case sc2 of { I# ww1 ->
joinrec {
$wfoldlM'_loop w ww2 ww3 ww4 ww5
= case w of { __DEFAULT ->
case ># ww5 0# of {
__DEFAULT ->
case /## ww3 (int2Double# ww2) of ww6 { __DEFAULT -> D# ww6 };
1# ->
case ww4 of { D# y ->
jump $wfoldlM'_loop
SPEC (+# ww2 1#) (+## ww3 y) (D# (+## y 1.0##)) (-# ww5 1#)
}
}
}; } in
jump $wfoldlM'_loop SPEC 0# 0.0## (D# 1.0##) ww1
})
sc
of
{ (# ipv, ipv1 #) ->
$s$wbenchLoop ipv (minusWord# wild 1##) sc2
};
0## -> (# sc, () #)
}
end Rec }
```
The worker isomorphic up to renaming to the one generated by GHC 8.10.7. However, the benchmark does not actually use the generated worker! Indeed, the worker is only assigned to a top-level binding:
```hs
-- RHS size: {terms: 9, types: 3, coercions: 0, joins: 0/0}
eta
= \ w ->
case w of { I# ww1 -> $s$wfoldlM'_loop ww1 1.0## 0.0## 0# }
```
This top-level binding is then never used, which is very confusing to me.
Inside the benchmark function, GHC 9.0.1 generates a different version of the worker which boxes one of the parameters for no apparent reason, which explains the slowdown and the heap allocation.
I am very confused by the fact that GHC generates two different worker functions, one of which has additional boxing. I was unable to remove this boxing with strictness annotations.
The core generated by GHC 9.0.1 is also quite a bit larger than the one generated by GHC 8.10.7, although I do not know if this carries over to executable size.
### GHC 9.4.2
The core generated looks very similar to the one generated by GHC 9.0.1. In particular, it also contains two different versions of the worker function, with the one that is actually used in the benchmark suffering from additional boxing.
### Remarks
If other benchmarks are present in the same module, sometimes GHC 9.0.1 generates code that is as fast as the one generated by GHC 8.10.7. However, this behavior is very unpredictable and I have not been able to figure out what is going on or when it happens.9.8.1https://gitlab.haskell.org/ghc/ghc/-/issues/23083Simplifier: Insufficient eta expansion of arguments2023-08-28T16:34:32ZSebastian GrafSimplifier: Insufficient eta expansion of argumentsWhile hunting regressions in !9874 I found that `T21839r` regressed by 10%. I realised that the argument in the following program is not properly eta-expanded:
```hs
g :: ((Integer -> Integer) -> Integer) -> (Integer -> Integer) -> Int...While hunting regressions in !9874 I found that `T21839r` regressed by 10%. I realised that the argument in the following program is not properly eta-expanded:
```hs
g :: ((Integer -> Integer) -> Integer) -> (Integer -> Integer) -> Integer
g f h = f (h `seq` (h $))
```
After simplification, we have
```
g = \ (f_aiR :: (Integer -> Integer) -> Integer)
(h_aiS :: Integer -> Integer) ->
f_aiR
(case h_aiS of h1_X0 { __DEFAULT ->
$ @GHC.Types.LiftedRep @Integer @Integer h1_X0
})
```
It would be much better to eta-expand the Case expression and subsequently inline `$`.
I believe is this due to the Simplifier never checking whether it can eta-expand an argument.Sebastian GrafSebastian Grafhttps://gitlab.haskell.org/ghc/ghc/-/issues/23287nospec for incoherent instances regresses performance2023-08-22T18:17:58ZAdam Gundrynospec for incoherent instances regresses performanceThe change in !9411 to mark incoherent instances as `nospec` causes less specialisation in `optics`, inhibiting optimizations and breaking the performance testsuite. See https://github.com/well-typed/optics/issues/488 and https://gitlab....The change in !9411 to mark incoherent instances as `nospec` causes less specialisation in `optics`, inhibiting optimizations and breaking the performance testsuite. See https://github.com/well-typed/optics/issues/488 and https://gitlab.haskell.org/ghc/ghc/-/issues/22448#note_474141.
I remain of the view that this change should be guarded by a command-line flag and disabled by default. Users who need the additional guarantees requested by #22448 can then opt in to less specialisation, without imposing a cost on others.
CC @simonpj @cactus @arybczak9.8.1https://gitlab.haskell.org/ghc/ghc/-/issues/23578Bad fusion for `enumFrom*` on 32-bit targets2023-08-10T09:00:12ZSylvain HenryBad fusion for `enumFrom*` on 32-bit targetsConsider the following example:
```haskell
module Test where
import Data.Word
foo :: Word64
foo = sum [0..123456]
```
On x86-64 we get the following Core:
```haskell
Rec {
$wgo3
= \ x_s1f6 ww_s1f9 ->
case x_s1f6 of wild_X2 {...Consider the following example:
```haskell
module Test where
import Data.Word
foo :: Word64
foo = sum [0..123456]
```
On x86-64 we get the following Core:
```haskell
Rec {
$wgo3
= \ x_s1f6 ww_s1f9 ->
case x_s1f6 of wild_X2 {
__DEFAULT ->
$wgo3
(plusWord# wild_X2 1##)
(plusWord64# ww_s1f9 (wordToWord64# wild_X2));
123456## -> plusWord64# ww_s1f9 123456#Word64
}
end Rec }
foo
= case $wgo3 0## 0#Word64 of ww_s1fe { __DEFAULT -> W64# ww_s1fe }
```
But with the JS target (32-bit) we get:
```haskell
lvl_r1im = IS 1#
Rec {
foo_$s$wgo3
= \ sc_s1ie ww_s1i2 ->
case ># sc_s1ie 123456# of {
__DEFAULT ->
$wgo3_r1in
(integerAdd (IS sc_s1ie) lvl_r1im)
(plusWord64# ww_s1i2 (int64ToWord64# (intToInt64# sc_s1ie)));
1# -> ww_s1i2
}
$wgo3_r1in
= \ x_s1hZ ww_s1i2 ->
join {
$j_s1hU
= case integerToWord64# x_s1hZ of ds_a1a8 { __DEFAULT ->
$wgo3_r1in
(integerAdd x_s1hZ lvl_r1im) (plusWord64# ww_s1i2 ds_a1a8)
} } in
case x_s1hZ of {
IS x1_a1eQ ->
case ># x1_a1eQ 123456# of {
__DEFAULT -> jump $j_s1hU;
1# -> ww_s1i2
};
IP x1_a1eV -> ww_s1i2;
IN x1_a1hI -> jump $j_s1hU
}
end Rec }
foo
= case foo_$s$wgo3 0# 0#Word64 of ww_s1i7 { __DEFAULT ->
W64# ww_s1i7
}
```
Which is worse because it uses `Integer` (boxed) instead of `Word64#`.
Probably related to this code:
```haskell
-- in base:GHC.Word
instance Enum Word64 where
...
#if WORD_SIZE_IN_BITS < 64
...
#else
-- use Word's Enum as it has better support for fusion. We can't use
-- `boundedEnumFrom` and `boundedEnumFromThen` -- which use Int's Enum
-- instance -- because Word64 isn't compatible with Int/Int64's domain.
--
...
#endif
```
We have the same issue for `Int64` with similar CPP in `base:GHC.Int`.Jaro ReindersJaro Reindershttps://gitlab.haskell.org/ghc/ghc/-/issues/22352JavaScript Backend: Implement the Compactor2023-08-08T13:56:52Zdoyougnujmy6342@gmail.comJavaScript Backend: Implement the Compactor### Description
GHCJS's [Compactor](https://github.com/ghcjs/ghc/blob/ghc-8.10-ghcjs/compiler/ghcjs/Gen2/Compactor.hs) is a memory optimization that shortens the length of binder names in the generated JS code. This reduces file size an...### Description
GHCJS's [Compactor](https://github.com/ghcjs/ghc/blob/ghc-8.10-ghcjs/compiler/ghcjs/Gen2/Compactor.hs) is a memory optimization that shortens the length of binder names in the generated JS code. This reduces file size and is necessary for certain programs to run.
This is the tracking issue to implement the compactor in the JS Backend, post MR !9133
### Impacted Tests
Tick these off as you go:
* [ ] recomp007 passes
* [ ] recompChangedPackage passes
* [ ] recompTHpackage passes
* [ ] T6145 passes
* [ ] literals passes
* [ ] apirecomp001 passes
* [ ] PartialDownsweep passes
* [ ] parsed passes
* [ ] T13350 passes
* [ ] T13168 passes
* [ ] bug1465 passes
* [ ] T20242 passes
* [ ] T21336b passes (this one is in base/tests/IO)
* [ ] seward-space-leak passesSylvain HenryLuite StegemanJosh Meredithdoyougnujmy6342@gmail.comSylvain Henryhttps://gitlab.haskell.org/ghc/ghc/-/issues/23030AArch64 backend should handle code generation for bitmasks more efficiently2023-08-07T17:50:56ZBen GamariAArch64 backend should handle code generation for bitmasks more efficientlyCurrently if one attempts to compile the program
```c
#include "Cmm.h"
hi() {
R1 = UNTAG(R1);
jump cont();
}
```
with the ARMv8 NCG you get the code:
```asm
hi:
mov x18, #65528
movk x18, #65535, lsl #16
movk ...Currently if one attempts to compile the program
```c
#include "Cmm.h"
hi() {
R1 = UNTAG(R1);
jump cont();
}
```
with the ARMv8 NCG you get the code:
```asm
hi:
mov x18, #65528
movk x18, #65535, lsl #16
movk x18, #65535, lsl #32
movk x18, #65535, lsl #48
and x22, x22, x18
b cont
```
This is horrible, using four move instructions and an `and` where a single `and` would do. for instance, the LLVM backend produces the following:
```asm
Disassembly of section .text:
0000000000000000 <hi>:
0: 927df2d6 and x22, x22, #0xfffffffffffffff8
4: 94000000 bl 0 <cont>
8: d65f03c0 ret
```
This is possible because AArch64's 12-bit immediate encoding is sign-extended.
However, even using `movn` and `and` would be a considerable improvement over the status quo.