GHC issueshttps://gitlab.haskell.org/ghc/ghc/-/issues2024-03-27T17:08:50Zhttps://gitlab.haskell.org/ghc/ghc/-/issues/24462Exponentially increasing compilation w/ Higher-Kinded Data under 9.8.12024-03-27T17:08:50Zm4dc4pExponentially increasing compilation w/ Higher-Kinded Data under 9.8.1## Summary
GHC 9.8.1 takes increasingly longer (and uses radically more memory) to compile a module as fields are added to a data type; the same did not occur with GHC 9.6.
## Steps to reproduce
The repo at https://github.com/m4dc4p/g...## Summary
GHC 9.8.1 takes increasingly longer (and uses radically more memory) to compile a module as fields are added to a data type; the same did not occur with GHC 9.6.
## Steps to reproduce
The repo at https://github.com/m4dc4p/ghc98-bug can demonstrate the issue. You can just run `cabal build` to try it.
The file `AAAX.hs` contains the `HKDType` data type. As you uncomment fields, the compilation times (`cabal build`) take longer and longer.
For example, on my M2, with GHC 9.8, I get these timings & peak memory usage:
* 1 field - 2.5s, 198MB peak
* 2 fields - 7s, 1.0GB peak
* 3 fields - 26.8s, 4.6GB peak
* 4 fields - 82.9s, 14.5GB peak
Note that with -O0, there is no problem - compilation time does not increase. Also, removing the standalone `Eq` instance also causes the problem to go away (but, of course, I want that instance).
Under GHC 9.6, with -O1, compilation times do not increase either (even up to 10 fields on the data type).
I found this on MacOS 14.2.1, using an M2 chip.
(Note I also reported this as a bug against the Barbies library - https://github.com/jcpetruzza/barbies/issues/51)9.8.3Simon Peyton JonesSimon Peyton Joneshttps://gitlab.haskell.org/ghc/ghc/-/issues/24401fixIO can choke, and report a loop where there is none2024-02-06T15:13:14ZSimon Peyton JonesfixIO can choke, and report a loop where there is noneThis program was written by [Matthew Craven](https://github.com/haskell/core-libraries-committee/issues/242#issuecomment-1907226501):
```
import System.IO( fixIO )
import GHC.Exts (noinline)
main :: IO ()
main = do
r <- fixIO $ \p -> ...This program was written by [Matthew Craven](https://github.com/haskell/core-libraries-committee/issues/242#issuecomment-1907226501):
```
import System.IO( fixIO )
import GHC.Exts (noinline)
main :: IO ()
main = do
r <- fixIO $ \p -> let
{-# NOINLINE q #-}
q = noinline id p
in pure (True : q)
print $! case r of { _:v:_ -> v ; _ -> False }
```
For any law-abiding MonadFix instance, such a call to `mfix` should be semantically equivalent to `pure (repeat True)`. (Specifically, this follows from the 'Purity' law.)
So the `print` should work fine. But actually we get (with -O)
```
run-it: cyclic evaluation in fixIO
```
## Diagnosis
What is going on? Here is the code after optimisation:
```
Main.main1
= \ (s_aRX :: GHC.Prim.State# GHC.Prim.RealWorld) ->
case GHC.Prim.newMVar#
@GHC.Types.Lifted @GHC.Prim.RealWorld @[Bool] s_aRX
of
{ (# ipv_aUk, ipv1_aUl #) ->
case GHC.IO.Unsafe.unsafeDupableInterleaveIO1
@[Bool]
((\ (eta_aSp [OS=OneShot] :: GHC.Prim.State# GHC.Prim.RealWorld) ->
GHC.Prim.catch# @GHC.Types.LiftedRep @GHC.Types.Lifted @[Bool]
@GHC.Exception.Type.SomeException
(\ (eta1_aUj [OS=OneShot] :: GHC.Prim.State# GHC.Prim.RealWorld) ->
GHC.Prim.readMVar#
@GHC.Types.Lifted @GHC.Prim.RealWorld @[Bool] ipv1_aUl eta1_aUj)
(System.IO.fixIO2 @[Bool])
eta_aSp)
`cast` (Sym (GHC.Types.N:IO[0] <[Bool]>_R)
:: (GHC.Prim.State# GHC.Prim.RealWorld
-> (# GHC.Prim.State# GHC.Prim.RealWorld, [Bool] #))
~R# IO [Bool]))
ipv_aUk
of
{ (# ipv2_aUp, ipv3_aUq #) ->
let {
q_s1HX [InlPrag=NOINLINE, Dmd=SL] :: [Bool]
q_s1HX = noinline @(forall a. a -> a) id @[Bool] ipv3_aUq } in
case GHC.Prim.putMVar# @GHC.Types.Lifted @GHC.Prim.RealWorld @[Bool]
ipv1_aUl
(GHC.Types.: @Bool GHC.Types.True q_s1HX)
ipv2_aUp
of s2#_aUw
{ __DEFAULT ->
case q_s1HX of {
[] ->
GHC.IO.Handle.Text.hPutStr2
GHC.IO.Handle.FD.stdout
GHC.Show.$fShowBool5
GHC.Types.True
s2#_aUw;
: v_aJ6 ds2_dRE ->
case v_aJ6 of vx_aV1 { __DEFAULT ->
GHC.IO.Handle.Text.hPutStr2
GHC.IO.Handle.FD.stdout
(case vx_aV1 of {
False -> GHC.Show.$fShowBool5;
True -> GHC.Show.$fShowBool4
})
GHC.Types.True
s2#_aUw
} } } } }
```
* The MVar stuff is to "tie the knot" in `fixIO`.
* The `putMVar#` fills in the MVar
* The `unsafeDupableInterleaveIO1` call returns a thunk in `ipv3` which reads the MVar when forced.
Now, **alas** the strictness analyser works out that `q` i used strictly in the continuation, so
the binding for `q` is done with a `case` not a `let` (look at CorePrep output). So we force `ipv3` too
early, before the `putMVar#` has run.
## Cure
The code for `fixIO` is this:
```
fixIO :: (a -> IO a) -> IO a
fixIO k = do
m <- newEmptyMVar
ans <- unsafeDupableInterleaveIO
(readMVar m `catch` \BlockedIndefinitelyOnMVar ->
throwIO FixIOException)
result <- k ans
putMVar m result
return result
```
Why all this MVar stuff? See `Note [fixST]` in `base:Control.Monad.ST.Imp`.
(And [this comment](https://github.com/haskell/core-libraries-committee/issues/242#issuecomment-1902317732).)
Looking at this code it's clear that we must do the `putMVar` before evaluating `result`.
But if we inline `fixIO`, the consumer will consume the `return result`, and the consumer
may well be strict in `result`. Something like
```
do { result <- fixIO m
; f result } -- where f is strict
```
Two simple solutions
* Do not inline `fixIO`.
* Wrap the result of `fixIO` in `lazy`, thus
```
putMVar m restult
return (lazy result)
```
I am not sure which is best. But the `lazy` solution is the one adopted by `Note [unsafePerformIO and strictness]` in GHC.IO.Unsafe, for a very very similar problem.https://gitlab.haskell.org/ghc/ghc/-/issues/24263Precise exceptions: `stToIO` and `ioToST` can circumvent analysis in Note [Wh...2024-01-22T04:13:40ZSebastian GrafPrecise exceptions: `stToIO` and `ioToST` can circumvent analysis in Note [Which scrutinees may throw precise exceptions]## Summary
The analysis described in `Note [Which scrutinees may throw precise exceptions]` is oblivious wrt. computations in `forall s. ... State# s ...`, even though it triggers for its instance `State# RealWorld#`.
This means we can...## Summary
The analysis described in `Note [Which scrutinees may throw precise exceptions]` is oblivious wrt. computations in `forall s. ... State# s ...`, even though it triggers for its instance `State# RealWorld#`.
This means we can circumvent the analysis by defining most of the code in `ST s r`, parameterising over computations that are actually in `IO`,
and then run the supposedly pure `ST s r` computation with those impure `IO` actions supplied as parameters.
## Steps to reproduce:
Compile and run with -O:
```hs
import GHC.IO
import GHC.ST
foo :: Int -> ST s r -> ST s Int
foo a act = act >> (pure $! a+1)
{-# NOINLINE foo #-}
main = stToIO $ foo (error "Not OK") (ioToST (throwIO (mkUserError "OK")))
```
```
test: Not OK
```
## Expected behavior
```
test: user error (OK)
```
## Discussion
It is questionable whether this is actually desirable. Potentially, the performance of every morally pure `ST s a` computation such as `foo` might be affected for the worse: We may not ever use call-by-value for `foo`, unless it forces `a` before calling `act`.
I'm inclined to simply ignore this issue.
(CC) @clyring and @simonpj with whom I discussed this issue today.https://gitlab.haskell.org/ghc/ghc/-/issues/24203GHC is reluctant to unbox Int with -O when returned in a sum field2023-11-22T08:17:59ZmeooowGHC is reluctant to unbox Int with -O when returned in a sum field## Summary
### Example 1
I have some code like
```hs
{-# LANGUAGE BangPatterns #-}
module M where
foo :: [Int] -> Maybe Int
foo = go 1 0
where
go :: Int -> Int -> [Int] -> Maybe Int
go !i !n [] = Just n
go i n (x:xs)
...## Summary
### Example 1
I have some code like
```hs
{-# LANGUAGE BangPatterns #-}
module M where
foo :: [Int] -> Maybe Int
foo = go 1 0
where
go :: Int -> Int -> [Int] -> Maybe Int
go !i !n [] = Just n
go i n (x:xs)
| i < 10 = go (i+1) (n+x) xs
| otherwise = Nothing
```
If I compile it with -O on GHC 9.4.8 or 9.6.3, `n` is not unboxed even with the bang.
<details>
<summary>Core:</summary>
```hs
-- RHS size: {terms: 31, types: 16, coercions: 0, joins: 0/0}
M.$wgo [InlPrag=[2], Occ=LoopBreaker]
:: GHC.Prim.Int# -> Int -> [Int] -> Maybe Int
[GblId[StrictWorker([~, !, !])],
Arity=3,
Str=<L><1L><1L>,
Unf=OtherCon []]
M.$wgo
= \ (ww_sNC :: GHC.Prim.Int#) (n_sNE :: Int) (ds_sNF :: [Int]) ->
case n_sNE of n1_X1 { GHC.Types.I# ipv_sN7 ->
case ds_sNF of {
[] -> GHC.Maybe.Just @Int n1_X1;
: ipv1_sN9 ipv2_sNa ->
case GHC.Prim.<# ww_sNC 10# of {
__DEFAULT -> GHC.Maybe.Nothing @Int;
1# ->
case ipv1_sN9 of { GHC.Types.I# y_aNl ->
M.$wgo
(GHC.Prim.+# ww_sNC 1#)
(GHC.Types.I# (GHC.Prim.+# ipv_sN7 y_aNl))
ipv2_sNa
}
}
}
}
```
</details>
[Haskell playground link](https://play.haskell.org/saved/lOdYT1ZQ)
### Example 2
```hs
module M where
foo :: [Int] -> Maybe Int
foo [] = Nothing
foo xs = Just $! sum xs
```
The accumulator in `sum` remains boxed.
[Haskell playground link](https://play.haskell.org/saved/eM7gcscU)
This is not the case with GHC 9.2.8. Using -O2 with 9.4.8 and 9.6.3 also unboxes it.
Is this intended behavior?
## Steps to reproduce
Compile the snippet above.
## Expected behavior
`n` is unboxed.
## Environment
* GHC version used: 9.4.8, 9.6.3https://gitlab.haskell.org/ghc/ghc/-/issues/23911`noinline` worsens demand signatures when it should not.2023-09-05T14:20:53ZAndreas Klebinger`noinline` worsens demand signatures when it should not.In terms of demands I would have expected the variants of `foo` and `bar` to behave the same:
```haskell
{-# NOINLINE foo #-}
foo x y = (noinline const) x y
{-# NOINLINE bar #-}
bar x y = const x y
{-# NOINLINE sfoo #-}
sfoo x y = (n...In terms of demands I would have expected the variants of `foo` and `bar` to behave the same:
```haskell
{-# NOINLINE foo #-}
foo x y = (noinline const) x y
{-# NOINLINE bar #-}
bar x y = const x y
{-# NOINLINE sfoo #-}
sfoo x y = (noinline (+)) x y :: Int
{-# NOINLINE sbar #-}
sbar x y = x + y :: Int
```
However `noinline` completely destroys the demand information:
```haskell
==================== Tidy Core ====================
Result size of Tidy Core
= {terms: 21, types: 36, coercions: 0, joins: 0/0}
-- RHS size: {terms: 8, types: 11, coercions: 0, joins: 0/0}
foo [InlPrag=NOINLINE] :: forall {t1} {t2}. t1 -> t2 -> t1
[GblId, Arity=2, Str=<L><L>, Unf=OtherCon []]
foo
= \ (@t_aCy) (@t1_aCu) (x_aiJ :: t_aCy) (y_aiK :: t1_aCu) ->
noinline
@(forall a b. a -> b -> a) const @t_aCy @t1_aCu x_aiJ y_aiK
-- RHS size: {terms: 1, types: 0, coercions: 0, joins: 0/0}
bar [InlPrag=NOINLINE] :: forall {a} {b}. a -> b -> a
[GblId, Arity=2, Str=<1L><A>, Unf=OtherCon []]
bar = const
-- RHS size: {terms: 7, types: 9, coercions: 0, joins: 0/0}
sfoo [InlPrag=NOINLINE] :: Int -> Int -> Int
[GblId, Arity=2, Str=<L><L>, Unf=OtherCon []]
sfoo
= \ (x_aiS :: Int) (y_aiT :: Int) ->
noinline
@(forall a. Num a => a -> a -> a)
+
@Int
GHC.Num.$fNumInt
x_aiS
y_aiT
-- RHS size: {terms: 1, types: 0, coercions: 0, joins: 0/0}
sbar [InlPrag=NOINLINE] :: Int -> Int -> Int
[GblId, Arity=2, Str=<1!P(L)><1!P(L)>, Cpr=1, Unf=OtherCon []]
sbar = GHC.Num.$fNumInt_$c+
```
It seems very reasonale to have a special case for noinline wich works something like:
`dmdAnal' env dmd <noinline @type f> = dmdAnal' env dmd f`.https://gitlab.haskell.org/ghc/ghc/-/issues/23910Replace absent lazy function arguments with rubbishLit even without W/W.2023-09-05T14:21:36ZAndreas KlebingerReplace absent lazy function arguments with rubbishLit even without W/W.Currently for something like `const` (assuming it isn't inlined) we often see code like this:
```haskell
foo x y z = .... const x y ....
```
W/W would iirc transform this into:
```haskell
foo x y z = $wfoo x z
$wfoo = ... const x <...Currently for something like `const` (assuming it isn't inlined) we often see code like this:
```haskell
foo x y z = .... const x y ....
```
W/W would iirc transform this into:
```haskell
foo x y z = $wfoo x z
$wfoo = ... const x <rubbishLit> ...
```
Which means y` isn't unnecessarily retained. However there seems little reason to limit this to worker wrapper.
Even without worker wrapper if we have
```haskell
foo x y z = .... const x y ....
```
it should be fine to rewrite this into:
```haskell
foo x y z = .... const x <rubbishLit> ....
```
It's possible I'm missing something but I think that should be fine and allow us to gc `y` as soon as we entered `foo`.https://gitlab.haskell.org/ghc/ghc/-/issues/23699Regression in the optimizer of GHC 9.6 concerning `unsafePerformIO`2024-02-06T18:08:20ZAndreas AbelRegression in the optimizer of GHC 9.6 concerning `unsafePerformIO`I noticed a regression in the optimizer of GHC 9.6 (over 9.4) concerning `unsafePerformIO` (used for debug messages).
The reproducer for this issue is the `master` branch of this repo: https://github.com/andreasabel/regression-ghc-9.6-un...I noticed a regression in the optimizer of GHC 9.6 (over 9.4) concerning `unsafePerformIO` (used for debug messages).
The reproducer for this issue is the `master` branch of this repo: https://github.com/andreasabel/regression-ghc-9.6-unsafePerformIO
With GHC 9.6.2, a debug message that is printed with `-O0` is not printed with `-O1`.
```shellsession
$ cabal run agda -w ghc-9.6.2 --builddir=dist0 -O0
ReduceM SHOULD ALSO PRINT THIS DEBUG MESSAGE!!!!!!!!!!!!! LET'S MAKE IT VERY LONG SO IT CANNOT BE OVERLOOKED!!!!!!!!!!!!!!!!!!!
agda: An internal error has occurred. Please report this as a bug.
Location of the error: __IMPOSSIBLE_VERBOSE__, called at src/full/Agda/ImpossibleTest.hs:22:11 in Agda-2.6.4-inplace-agda:Agda.ImpossibleTest
impossibleTestReduceM, called at src/main/Main.hs:32:5 in Agda-2.6.4-inplace-agda:Main
$ cabal run agda -w ghc-9.6.2 -O1
agda: An internal error has occurred. Please report this as a bug.
Location of the error: __IMPOSSIBLE_VERBOSE__, called at src/full/Agda/ImpossibleTest.hs:22:11 in Agda-2.6.4-inplace-agda:Agda.ImpossibleTest
impossibleTestReduceM, called at src/main/Main.hs:32:5 in Agda-2.6.4-inplace-agda:Main
```
Note that the separation of these builds in to two different `builddir`s is necessary because Cabal might confuse the builds otherwise:
- https://github.com/haskell/cabal/issues/7711
I shrank Agda from 426 modules to 142, it can likely be further shrunk (to 2 or 3 modules), but one has to proceed with care, as inlining code might make the issue go away.
Here are some branches where the issue has disappeared:
1. `lost-issue-inlining-impossibleTest`: inlining the module `Agda.ImpossibleTest`
2. `lost-issue-inlining-reportSLn`: inlining `reportSLn` into `__IMPOSSIBLE_VERBOSE__`
The latter modification was used to fix the motivating issue:
- https://github.com/agda/agda/issues/6728
- fixed in https://github.com/agda/agda/pull/6710/commits/4de3e5353cadd9a8e448af5f781701fa214ce9f3
This GHC issue for 8.8 and up might be related:
- https://gitlab.haskell.org/ghc/ghc/-/issues/198419.10.1https://gitlab.haskell.org/ghc/ghc/-/issues/23463Demand: retry# should have exnDiv, not botDiv2023-06-15T13:50:17ZSebastian GrafDemand: retry# should have exnDiv, not botDivHere's the entry in primops.txt.pp:
```
-- NB: retry#'s strictness information specifies it to diverge.
-- This lets the compiler perform some extra simplifications, since retry#
-- will technically never return.
--
-- This allows the s...Here's the entry in primops.txt.pp:
```
-- NB: retry#'s strictness information specifies it to diverge.
-- This lets the compiler perform some extra simplifications, since retry#
-- will technically never return.
--
-- This allows the simplifier to replace things like:
-- case retry# s1
-- (# s2, a #) -> e
-- with:
-- retry# s1
-- where 'e' would be unreachable anyway. See #8091.
primop RetryOp "retry#" GenPrimOp
State# RealWorld -> (# State# RealWorld, v #)
with
strictness = { \ _arity -> mkClosedDmdSig [topDmd] botDiv }
out_of_line = True
has_side_effects = True
```
This use of `botDiv` is unsound because it appears that the whole computation is strict in every free variable.
We have introduced `exnDiv` for exactly this reason: To enable dead-code elimination while not being strict in free variables. Let's use it on `retry#`, too.
`retry#` kind of is like a precise exception primop anyway (with a pre-defined catch handler), so it makes a lot of sense to treat it as such.
There's more context about `exnDiv` in `Note [Dead ends]` and `Note [Precise exceptions and strictness analysis]`.https://gitlab.haskell.org/ghc/ghc/-/issues/23319Missed optimization2023-05-21T20:58:05ZmeooowMissed optimizationPlease feel free to change the title to something that better describes the issue.
## Summary
Here is some code that calculates the sum of vertices over a tree.
```hs
sum1 :: (Int -> [Int]) -> Int -> Int
sum1 neighbors root = go root ...Please feel free to change the title to something that better describes the issue.
## Summary
Here is some code that calculates the sum of vertices over a tree.
```hs
sum1 :: (Int -> [Int]) -> Int -> Int
sum1 neighbors root = go root 0 where
f :: Int -> [Int -> Int] -> Int -> Int
f x ks acc = foldl' (\acc' k -> k acc') (acc + x) ks
go :: Int -> Int -> Int
go x = f x (map go (neighbors x))
{-# NOINLINE sum1 #-}
```
This might seem a bit unnatural, but it's a simplified example so please excuse that.
I expect the `[Int -> Int]` to get optimized away, but unfortunately that doesn't happen according to the core.
However, if I eta-expand `go`, the problem disappears. The same happens if I merge `f` into `go`.
Here are some benchmarks:
```hs
import Criterion.Main
import Data.List
main :: IO ()
main = defaultMain
[ bench "sum1" $ whnf (sum1 binTree) 1
, bench "sum2" $ whnf (sum2 binTree) 1
, bench "sum3" $ whnf (sum3 binTree) 1
]
binTree :: Int -> [Int]
binTree x
| x > 1000000 = []
| otherwise = [2*x + 1, 2*x + 2]
sum1 :: (Int -> [Int]) -> Int -> Int
sum1 neighbors root = go root 0 where
f :: Int -> [Int -> Int] -> Int -> Int
f x ks acc = foldl' (\acc' k -> k acc') (acc + x) ks
go :: Int -> Int -> Int
go x = f x (map go (neighbors x))
{-# NOINLINE sum1 #-}
sum2 :: (Int -> [Int]) -> Int -> Int
sum2 neighbors root = go root 0 where
f :: Int -> [Int -> Int] -> Int -> Int
f x ks acc = foldl' (\acc' k -> k acc') (acc + x) ks
go :: Int -> Int -> Int
go x eta1 = f x (map go (neighbors x)) eta1
{-# NOINLINE sum2 #-}
sum3 :: (Int -> [Int]) -> Int -> Int
sum3 neighbors root = go root 0 where
go :: Int -> Int -> Int
go x = \acc -> foldl' (\acc' k -> k acc') (acc + x) (map go (neighbors x))
{-# NOINLINE sum3 #-}
```
With GHC 9.4.4 and -O2:
```
benchmarking sum1
time 49.07 ms (48.88 ms .. 49.31 ms)
1.000 R² (1.000 R² .. 1.000 R²)
mean 48.52 ms (48.25 ms .. 48.75 ms)
std dev 473.0 μs (371.5 μs .. 619.1 μs)
benchmarking sum2
time 6.521 ms (6.493 ms .. 6.539 ms)
1.000 R² (1.000 R² .. 1.000 R²)
mean 6.542 ms (6.538 ms .. 6.548 ms)
std dev 14.12 μs (10.22 μs .. 19.91 μs)
benchmarking sum3
time 6.565 ms (6.505 ms .. 6.647 ms)
1.000 R² (0.999 R² .. 1.000 R²)
mean 6.544 ms (6.531 ms .. 6.570 ms)
std dev 50.54 μs (26.96 μs .. 91.55 μs)
```
## Expected behavior
`sum1` to be as efficient as `sum2` and `sum3`.
## Environment
* GHC version used: 9.4.4
Optional:
* Operating System: Ubuntu
* System Architecture: x86_64
---
**Edit**: I have simplified the example a bit, the original example was
```hs
depthSum1 :: forall a. (a -> [a]) -> a -> Int
depthSum1 neighbors root = go root 0 0 where
f :: a -> [Int -> Int -> Int] -> Int -> Int -> Int
f _ ks depth acc = foldl' (\acc' k -> k (depth+1) acc') (acc + depth) ks
go :: a -> Int -> Int -> Int
go x = f x (map go (neighbors x))
```https://gitlab.haskell.org/ghc/ghc/-/issues/23233Subsequent strictness defeats pseq2024-02-07T11:08:27ZMatthew Cravenclyring@gmail.comSubsequent strictness defeats pseq## Summary
Today, `pseq x y` expands to `case x of !_ -> lazy y`. We reason that the use of `lazy` will prevent us from moving the evaluation of `y` before the evaluation of `x`. Even if that much is true (CSE could potentially defeat i...## Summary
Today, `pseq x y` expands to `case x of !_ -> lazy y`. We reason that the use of `lazy` will prevent us from moving the evaluation of `y` before the evaluation of `x`. Even if that much is true (CSE could potentially defeat it), if `x` is used strictly later on we may defer the eval on `x`.
Se also #22935
## Steps to reproduce
<details><summary>compile and run with -O</summary>
```haskell
{-# LANGUAGE BangPatterns #-}
module Main (main) where
import GHC.Conc
import GHC.Exts
import Debug.Trace
data StrictPair a b = SP !a !b
deriving Show
f :: a -> Int -> StrictPair a Int
{-# NOINLINE f #-}
f x y = SP x (y * y)
fun :: a -> b -> (b -> Bool) -> StrictPair a Int
fun x y g = case pseq x y of
!u -> case g u of
True -> f x 12
False -> f x 14
p :: Int
{-# NOINLINE p #-}
p = trace "eval p" 3
q :: Int
{-# NOINLINE q #-}
q = trace "eval q" 4
main :: IO ()
main = print (fun p q even)
```
</details>
In this program, `q` should never be evaluated unless `p` has already been evaluated, because it is only ever used as the second arg to `pseq`. But when it is run with optimizations, `q` will be evaluated first.
## Environment
* GHC version used: any version 8.0-9.6.1https://gitlab.haskell.org/ghc/ghc/-/issues/22496Not enough boxing in demand analysis2022-11-22T15:43:15ZSimon Peyton JonesNot enough boxing in demand analysisIn GHC.Types.Demand I see
```
plusSubDmd :: SubDemand -> SubDemand -> SubDemand
-- Shortcuts for neutral and absorbing elements.
-- Below we assume that Boxed always wins.
```
But in the code compiled for GHC.Core.Map.Type, I found some...In GHC.Types.Demand I see
```
plusSubDmd :: SubDemand -> SubDemand -> SubDemand
-- Shortcuts for neutral and absorbing elements.
-- Below we assume that Boxed always wins.
```
But in the code compiled for GHC.Core.Map.Type, I found some terribly sub-optimal code,
like this:
```
$wgo_s7Ti [InlPrag=[2], Occ=LoopBreaker]
:: CmEnv -> Type -> CmEnv -> Type -> TypeEquality
[LclId[StrictWorker([])],
Arity=4,
Str=<SL><SL><L><L>,
Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True,
WorkFree=True, Expandable=True, Guidance=NEVER}]
$wgo_s7Ti
= \ (ww_s7T9 [Dmd=SL] :: CmEnv)
(ww_s7Ta [Dmd=SL] :: Type)
(ww_s7Te :: CmEnv)
(ww_s7Tf :: Type) ->
case ww_s7Te of ww_X1 { CME ipv_s817 ipv_s818 ->
case ww_s7T9 of wild_s77i { CME bx_s77j ds_s77k ->
... many cases
gos_s77g wild_s77i ww_X1 tys_a4kW tys'_a4kY
... many cases
```
That is, `$wgo` is strict in both `CmEnv` arguments, and takes them apart,
but still requires them boxed. (This is *after* strictness analysis and worker/wrapper.)
Why? After faffing around for quite a while I found one of the many recursive
calls inside `$wgo` was this, which uses the boxed values.
```
gos_r8bK wild_s77i ww_X1 tys_a4kW tys'_a4kY
```
And `gos` looks like this:
```
gos_s77g [Occ=LoopBreaker]
:: CmEnv -> CmEnv -> [KindOrType] -> [KindOrType] -> TypeEquality
[LclId,
Arity=4,
Str=<L><L><SL><SL> ]
gos_r8bK
= \ (ds_s8n8 :: GHC.Core.Map.Type.CmEnv)
(ds1_s8n9 :: GHC.Core.Map.Type.CmEnv)
(ds2_s8na [Occ=Once1!] :: [GHC.Core.TyCo.Rep.KindOrType])
(ds3_s8nb [Occ=Once2!] :: [GHC.Core.TyCo.Rep.KindOrType]) ->
case ds2_s8na of {
[] ->
case ds3_s8nb of {
[] -> GHC.Core.Map.Type.TEQ;
: _ [Occ=Dead] _ [Occ=Dead] -> GHC.Core.Map.Type.TNEQ
};
: ty1_s8ng [Occ=Once1] tys1_s8nh [Occ=Once2] ->
case ds3_s8nb of {
[] -> GHC.Core.Map.Type.TNEQ;
: ty2_s8nj [Occ=Once1] tys2_s8nk [Occ=Once2] ->
case GHC.Core.Map.Type.$wgo ds_s8n8 ty1_s8ng ds1_s8n9 ty2_s8nj of {
GHC.Core.Map.Type.TNEQ -> GHC.Core.Map.Type.TNEQ;
GHC.Core.Map.Type.TEQ ->
gos_r8bK ds_s8n8 ds1_s8n9 tys1_s8nh tys2_s8nk;
GHC.Core.Map.Type.TEQX ->
case gos_r8bK ds_s8n8 ds1_s8n9 tys1_s8nh tys2_s8nk
of wild3_s8nm [Occ=Once1] {
__DEFAULT -> wild3_s8nm;
GHC.Core.Map.Type.TEQ -> GHC.Core.Map.Type.TEQX
}
}
}
}
```
Aha. It isn't strict on the `CMEnv` arguments, so they are passed boxed.
It happens that this is a cold path of `$wgo`. Moreover, all of the calls to `gos` are in the RHS of `$wgo`.
So it would really be better to pass the arguments to `gos` unboxed.
However even SpecConstr doesn't catch this, because even though all the calls to `gos`
are explicit constructors, `gos` itself does not take them apart.
There is clearly a lost opportunity here. I'm not sure how to grasp it.https://gitlab.haskell.org/ghc/ghc/-/issues/22442Eta-expansion can cause binders in callers to become lazier.2023-02-07T17:24:18ZAndreas KlebingerEta-expansion can cause binders in callers to become lazier.Consider this code:
```haskell
foo !x = ...
bar =
let x = <thunk>
in foo x
```
Given that `foo` is strict in it's first arg one would hope that bar eventually compiles down to code such as:
```haskell
bar =
case <thunk> of x...Consider this code:
```haskell
foo !x = ...
bar =
let x = <thunk>
in foo x
```
Given that `foo` is strict in it's first arg one would hope that bar eventually compiles down to code such as:
```haskell
bar =
case <thunk> of x
_DEFAULT -> foo x
```
And that's generally what happens. However if we eta-expand `foo` instead we get:
```haskell
foo !x y = ...
bar =
let x = <thunk>
in \y -> foo x y
```
This happens since now only once `bar` is applied to an additional argument `x` will be evaluated.
This can be good or bad depending on the code in question, but it's definitely *surprising*.
I observed this happening in https://gitlab.haskell.org/ghc/ghc/-/issues/22425
I think this has a potential bad interaction with when the simplifier drops seqs that I haven't yet seen in the wild.
In particular we might start out with:
```haskell
-- foo has arity 1
bar x =
case x of x' -> foo x'
-- The simplifier sees that foo is strict in x' and will drop the seq
-- foo has arity 1
bar x =
foo x'
-- We eta expand foo
-- foo has arity 2
bar x =
\y -> foo x' y
```
And suddenly forcing `bar x` no longer forces `x` despite the user potentially having explicitly written it to do so.
But as I said I haven't observed the later part in the wild yet and haven't tried to come up with a reproducer. I doubt it happens often in practice.
@sgraf812 You might find this interesting/have more to say about this.https://gitlab.haskell.org/ghc/ghc/-/issues/22179DmdAnal/Meta: Is strictness the right property to look for?2023-12-20T04:48:20ZSebastian GrafDmdAnal/Meta: Is strictness the right property to look for?In #19917, we have seen how strictness analysis can be "too aggressive". Small, executable reproducer:
```hs
main = print $ loop "foo" (error "woops")
where
loop xs l = case xs of
x:xs -> loop xs x :: String
[] -> l ...In #19917, we have seen how strictness analysis can be "too aggressive". Small, executable reproducer:
```hs
main = print $ loop "foo" (error "woops")
where
loop xs l = case xs of
x:xs -> loop xs x :: String
[] -> l `seq` loop [l] l
```
This program will crash with `"woops"` when compiled with optimisations, because `loop` is inferred to be strict in its `l` parameter. That is *very* surprising! Note that `l` doesn't even *occur freely* in the cons branch of `loop`. And neither will `error "woops"` be evaluated under call-by-need; instead we'll get an infinite loop.
On the other hand, `loop`'s parameter `l` certainly *does* satisfy the strictness definition: Whenever `l` is bottom/diverges, `loop xs l` will, too, simply because `loop` *always* diverges.
That is to say: The above program points out that The Strictness Definition of Olde entirely neglects diverging program traces, simply because Strictness is a property of denotational semantics and all such program traces share the same denotation ⊥.
I think a more useful definition is
> *Strong Strictness*. `f` is strong strict if `f x` will evaluate `x` after finitely many steps in the small-step semantics.
NB: For terminating programs/traces, strong strictness and strictness coincide. It is only about diverging traces where we see the difference.
`loop` above is strong strict in the first but not in its second argument, as `loop "foo" (error "woops")` will loop forever without ever throwing that `error "woops"`. It is still strong strict in `xs`. Nice!
Unfortunately, I'm pretty sure that this stronger notion of strictness would regress performance all over the place. For example, `\x -> error "blah"` would not be strong strict (in `x`, obviously). But I think it's good to describe the ideal model so that one day we might even implement it.
---
Here is a bit Related Work:
1. Usage analysis as implemented in GHC actually works for diverging programs, too: We'd never say that `loop` above *does not* use its first argument, although `loop` diverges. Usage/Absence analysis has to account for diverging/infinite traces to be useful.
2. What Strictness Analysis is to *variables* in lazy functional languages is quite similar to what [Very Busy Expression Analysis](https://web.cs.wpi.edu/~kal/PLT/PLT9.6.html) is to *expressions* in an imperative setting. Very Busy Expression Analysis is a backwards analysis with a MUST powerset lattice (e.g., ⊥ is *all* Exprs, lub is intersection). I investigated how this analysis handles infinite loops. It turns out that infinite loops are very busy *in every expression*, even if it doesn't occur in the loop. Pretty much the same problem as for us!
3. Then I recalled that the gen/kill formulation for Post-dominance analysis is *also* a backwards analysis with a MUST powerset lattice and actually shares the all the same problems wrt. infinite loops. And sure enough, there is an abundance of discussion and literature about post-dominance and infinite loops:
- https://stackoverflow.com/questions/35399281/how-can-i-build-the-post-dominator-tree-of-a-function-with-an-endless-loop discusses how LLVM was failing to properly handle infinite loops in their post dominance algorithm. I *think* it has been fixed, though: https://reviews.llvm.org/D35851. Also it's a bit of a different problem they had: Their solutions were too weak (e.g., Top), ours are too optimistic (e.g., report too many vars that we are strict in).
- [A Formal Model of Program Dependences and its Implications for Software Testing, Debugging, and Maintenance](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=58784) introduces the notion of *strong* post-dominance (they called it strong forward dominance), by
> u strongly post dominates v <=> u post dominates v and there is an integer k > 0 such that every walk in G beginning with v and of length >= k contains u.
And that's indeed why I called my definition above Strong Strictness: It has a an additional "in finitely many steps" condition compared to baseline Strictness.
---
Implementation-wise, I think we can do the following during fixed-point iteration (I don't want to touch strictness of `error` just yet):
1. We keep initialising strictness signatures of recursive functions such as `loop` with bottom (meaning an application of `loop` is strict in every var).
2. We do one round of fixed-point iteration and get to find a too optimistic signature that involves *just the arguments and free variables that occur during any finite evaluation of `loop`*. E.g., we'll get something like `<1B><1B>b`.
3. Now we'll do another iteration, but we'll omit the `b`, replacing it with `x` (`exnDiv`, could have chosen `topDiv` too). NB: We'll keep the argument demands and demands on free vars we accumulated, because they actually occur in finitely many steps.
4. Then onto the next iteration: From the `x:xs` branch, we'll see that we are strict in `x` and `xs`, but not in `l` (because we switched from `b` to `x`!). So when we lub together with the `[]` branch, we see `{l:->L}`.
And then we'll get `<1B><L>x`, which is perfect! Still strict in the first arg, no longer strict in the second and we get to see through `x` that evaluation of `loop` will never terminate, hence dead-code elimination will keep working as before.
What happened in (3) is that we basically used `b` to figure out in the first iteration *just the right set* of FVs that `loop` has to be strict in in the recursive call. We could've instead done a special FV traversal that accounts for FVs in strictness signatures and initialised the set of occuring FVs to demand `B` instead, I suppose. Maybe we should do that instead... The important part is that instead of saying that `loop` is strict in *every* variable during the first iteration, we say that it's strict in *just the sensible set of FVs* instead.
I put up an experimental implementation of this idea in !8983.https://gitlab.haskell.org/ghc/ghc/-/issues/22035slow validate failure: T7931 core lint error hpc2022-08-16T12:54:46ZMatthew Pickeringslow validate failure: T7931 core lint error hpcReproduce with
```
hadrian/build test --freeze1 --docs=none --flavour=slow-validate --test-speed=slow --only="T7931"
```
Looks like something needs to look through ticks.
```
[1 of 2] Compiling Main ( T7931.hs, T7931.o )
...Reproduce with
```
hadrian/build test --freeze1 --docs=none --flavour=slow-validate --test-speed=slow --only="T7931"
```
Looks like something needs to look through ticks.
```
[1 of 2] Compiling Main ( T7931.hs, T7931.o )
*** Core Lint errors : in result of Simplifier ***
<no location info>: warning:
idArity 2 exceeds arity imposed by the strictness signature <S>b: lvl_s1SM
In the RHS of lvl_s1SM :: A -> ShowS
Substitution: <InScope = {}
IdSubst = []
TvSubst = []
CvSubst = []>
*** Offending Program ***
$creadPrec_sXC :: Prec -> forall b. (A -> P b) -> P b
[LclId,
Unf=Unf{Src=<vanilla>, TopLvl=True, Value=False, ConLike=False,
WorkFree=False, Expandable=False, Guidance=IF_ARGS [] 0 60}]
$creadPrec_sXC = (hpc<Main,9> $fAlternativeReadPrec4) @A
$creadListPrec_aTs :: ReadPrec [A]
[LclId,
Unf=Unf{Src=<vanilla>, TopLvl=True, Value=False, ConLike=False,
WorkFree=False, Expandable=False, Guidance=IF_ARGS [] 20 0}]
$creadListPrec_aTs
= hpc<Main,10>
list
@A
($creadPrec_sXC
`cast` (<Prec>_R %<'Many>_N ->_R Sym (N:ReadP[0] <A>_R)
; Sym (N:ReadPrec[0] <A>_R)
:: (Prec -> forall b. (A -> P b) -> P b) ~R# ReadPrec A))
$creadList_s11Z :: P [A]
[LclId,
Unf=Unf{Src=<vanilla>, TopLvl=True, Value=False, ConLike=False,
WorkFree=False, Expandable=False, Guidance=IF_ARGS [] 30 0}]
$creadList_s11Z
= ((($creadListPrec_aTs
`cast` (N:ReadPrec[0] <[A]>_R
:: ReadPrec [A] ~R# (Prec -> ReadP [A])))
$fRead()7)
`cast` (N:ReadP[0] <[A]>_R
:: ReadP [A] ~R# (forall b. ([A] -> P b) -> P b)))
@[A] ($fApplicativeP_$cpure @[A])
$creadList_aSt :: ReadS [A]
[LclId,
Unf=Unf{Src=<vanilla>, TopLvl=True, Value=False, ConLike=False,
WorkFree=False, Expandable=False, Guidance=IF_ARGS [] 20 60}]
$creadList_aSt = hpc<Main,8> run @[A] $creadList_s11Z
$creadsPrec_aSl :: Int -> ReadS A
[LclId,
Unf=Unf{Src=<vanilla>, TopLvl=True, Value=False, ConLike=False,
WorkFree=False, Expandable=False, Guidance=IF_ARGS [] 60 60}]
$creadsPrec_aSl
= hpc<Main,7>
\ (eta_aXk :: Int) ->
run @A ($creadPrec_sXC eta_aXk @A ($fApplicativeP_$cpure @A))
$fReadA [InlPrag=CONLIKE] :: Read A
[LclIdX[DFunId],
Unf=DFun: \ ->
C:Read TYPE: A
$creadsPrec_aSl
$creadList_aSt
$creadPrec_sXC
`cast` (<Prec>_R %<'Many>_N ->_R Sym (N:ReadP[0] <A>_R)
; Sym (N:ReadPrec[0] <A>_R)
:: (Prec -> forall b. (A -> P b) -> P b) ~R# ReadPrec A)
$creadListPrec_aTs]
$fReadA
= C:Read
@A
$creadsPrec_aSl
$creadList_aSt
($creadPrec_sXC
`cast` (<Prec>_R %<'Many>_N ->_R Sym (N:ReadP[0] <A>_R)
; Sym (N:ReadPrec[0] <A>_R)
:: (Prec -> forall b. (A -> P b) -> P b) ~R# ReadPrec A))
$creadListPrec_aTs
$sreadEither_s1SI [InlPrag=[2]] :: String -> Either String [A]
[LclId,
Arity=1,
Str=<L>,
Unf=Unf{Src=InlineStable, TopLvl=True, Value=True, ConLike=True,
WorkFree=True, Expandable=True,
Guidance=ALWAYS_IF(arity=1,unsat_ok=True,boring_ok=False)
Tmpl= \ (s_a1Sz [Occ=Once1] :: String) ->
case readEither8
@[A]
(run
@[A]
(((($creadListPrec_aTs
`cast` (N:ReadPrec[0] <[A]>_R
:: ReadPrec [A] ~R# (Prec -> ReadP [A])))
minPrec)
`cast` (N:ReadP[0] <[A]>_R
:: ReadP [A] ~R# (forall b. ([A] -> P b) -> P b)))
@[A] (readEither7 @[A]))
s_a1Sz)
of {
[] -> readEither4 @[A];
: x_a1T1 [Occ=Once1] ds_a1T2 [Occ=Once1!] ->
case ds_a1T2 of {
[] -> Right @String @[A] x_a1T1;
: _ [Occ=Dead] _ [Occ=Dead] -> readEither1 @[A]
}
}}]
$sreadEither_s1SI
= \ (s_a1Sz :: String) ->
case readEither8
@[A]
(run
@[A]
(((($creadListPrec_aTs
`cast` (N:ReadPrec[0] <[A]>_R
:: ReadPrec [A] ~R# (Prec -> ReadP [A])))
minPrec)
`cast` (N:ReadP[0] <[A]>_R
:: ReadP [A] ~R# (forall b. ([A] -> P b) -> P b)))
@[A] (readEither7 @[A]))
s_a1Sz)
of {
[] -> readEither4 @[A];
: x_a1T1 [Dmd=S] ds_a1T2 [Dmd=S] ->
case ds_a1T2 of {
[] -> Right @String @[A] x_a1T1;
: ipv_a1T8 [Dmd=B] ipv1_a1T9 [Dmd=B] -> readEither1 @[A]
}
}
$cshowsPrec_aPA :: Int -> A -> ShowS
[LclId,
Arity=2,
Str=<B><S>b,
Cpr=b,
Unf=Unf{Src=InlineStable, TopLvl=True, Value=True, ConLike=True,
WorkFree=True, Expandable=True,
Guidance=ALWAYS_IF(arity=2,unsat_ok=True,boring_ok=True)
Tmpl= \ _ [Occ=Dead, Dmd=B] (z_aOX [Occ=Once1, Dmd=S] :: A) ->
hpc<Main,4> case z_aOX of { }}]
$cshowsPrec_aPA
= \ _ [Occ=Dead, Dmd=B] (z_aOX [Dmd=S] :: A) ->
hpc<Main,4> case z_aOX of { }
lvl_s1SM :: A -> ShowS
[LclId,
Arity=2,
Str=<S>b,
Cpr=b,
Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True,
WorkFree=True, Expandable=True,
Guidance=ALWAYS_IF(arity=2,unsat_ok=True,boring_ok=True)}]
lvl_s1SM
= \ (z_aOX [Dmd=S] :: A) (eta_B0 [OS=OneShot] :: String) ->
hpc<Main,4> case z_aOX of { }
$cshowList_aPQ :: [A] -> ShowS
[LclId,
Unf=Unf{Src=<vanilla>, TopLvl=True, Value=False, ConLike=False,
WorkFree=False, Expandable=False, Guidance=IF_ARGS [] 60 60}]
$cshowList_aPQ
= hpc<Main,6>
\ (ls_aXp :: [A]) (s_aXq :: String) ->
showList__ @A lvl_s1SM ls_aXp s_aXq
$cshow_aPI :: A -> String
[LclId,
Unf=Unf{Src=<vanilla>, TopLvl=True, Value=False, ConLike=False,
WorkFree=False, Expandable=False, Guidance=IF_ARGS [] 10 60}]
$cshow_aPI
= hpc<Main,5>
\ (x_aXu [Dmd=S] :: A) -> hpc<Main,4> case x_aXu of { }
$fShowA [InlPrag=CONLIKE] :: Show A
[LclIdX[DFunId],
Unf=DFun: \ ->
C:Show TYPE: A $cshowsPrec_aPA $cshow_aPI $cshowList_aPQ]
$fShowA = C:Show @A $cshowsPrec_aPA $cshow_aPI $cshowList_aPQ
main_s1bm :: State# RealWorld -> (# State# RealWorld, () #)
[LclId,
Unf=Unf{Src=<vanilla>, TopLvl=True, Value=False, ConLike=False,
WorkFree=False, Expandable=False, Guidance=IF_ARGS [] 270 0}]
main_s1bm
= hpc<Main,3>
hpc<Main,2>
\ (eta_a12m [OS=OneShot] :: State# RealWorld) ->
hPutStr2
stdout
($cshowList_aPQ
(hpc<Main,1>
case readEither8
@[A]
(run
@[A]
(((($creadListPrec_aTs
`cast` (N:ReadPrec[0] <[A]>_R
:: ReadPrec [A] ~R# (Prec -> ReadP [A])))
minPrec)
`cast` (N:ReadP[0] <[A]>_R
:: ReadP [A] ~R# (forall b. ([A] -> P b) -> P b)))
@[A] (readEither7 @[A]))
(hpc<Main,0> unpackCString# "[]"#))
of {
[] -> errorWithoutStackTrace @LiftedRep @[A] readEither5;
: x_a1T1 [Dmd=S] ds_a1T2 [Dmd=S] ->
case ds_a1T2 of {
[] -> x_a1T1;
: ipv_a1T8 [Dmd=B] ipv1_a1T9 [Dmd=B] ->
errorWithoutStackTrace @LiftedRep @[A] readEither2
}
})
([] @Char))
True
eta_a12m
main :: IO ()
[LclIdX,
Unf=Unf{Src=<vanilla>, TopLvl=True, Value=False, ConLike=False,
WorkFree=True, Expandable=True,
Guidance=ALWAYS_IF(arity=0,unsat_ok=True,boring_ok=True)}]
main
= main_s1bm
`cast` (Sym (N:IO[0] <()>_R)
:: (State# RealWorld -> (# State# RealWorld, () #)) ~R# IO ())
main_s1Sq :: State# RealWorld -> (# State# RealWorld, () #)
[LclId,
Arity=1,
Str=<L>,
Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True,
WorkFree=True, Expandable=True, Guidance=IF_ARGS [] 20 60}]
main_s1Sq
= runMainIO1
@()
(main_s1bm
`cast` (Sym (N:IO[0] <()>_R)
:: (State# RealWorld -> (# State# RealWorld, () #)) ~R# IO ()))
main :: IO ()
[LclIdX,
Arity=1,
Str=<L>,
Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True,
WorkFree=True, Expandable=True,
Guidance=ALWAYS_IF(arity=0,unsat_ok=True,boring_ok=True)}]
main
= main_s1Sq
`cast` (Sym (N:IO[0] <()>_R)
:: (State# RealWorld -> (# State# RealWorld, () #)) ~R# IO ())
$trModule_s1Sr :: Addr#
[LclId,
Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True,
WorkFree=True, Expandable=True, Guidance=IF_ARGS [] 20 0}]
$trModule_s1Sr = "main"#
$trModule_s1Ss :: TrName
[LclId,
Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True,
WorkFree=True, Expandable=True, Guidance=IF_ARGS [] 10 10}]
$trModule_s1Ss = TrNameS $trModule_s1Sr
$trModule_s1St :: Addr#
[LclId,
Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True,
WorkFree=True, Expandable=True, Guidance=IF_ARGS [] 20 0}]
$trModule_s1St = "Main"#
$trModule_s1Su :: TrName
[LclId,
Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True,
WorkFree=True, Expandable=True, Guidance=IF_ARGS [] 10 10}]
$trModule_s1Su = TrNameS $trModule_s1St
$trModule :: Module
[LclIdX,
Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True,
WorkFree=True, Expandable=True, Guidance=IF_ARGS [] 10 10}]
$trModule = Module $trModule_s1Ss $trModule_s1Su
$tcA_s1Sv :: Addr#
[LclId,
Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True,
WorkFree=True, Expandable=True, Guidance=IF_ARGS [] 20 0}]
$tcA_s1Sv = "A"#
$tcA_s1Sw :: TrName
[LclId,
Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True,
WorkFree=True, Expandable=True, Guidance=IF_ARGS [] 10 10}]
$tcA_s1Sw = TrNameS $tcA_s1Sv
$tcA :: TyCon
[LclIdX,
Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True,
WorkFree=True, Expandable=True, Guidance=IF_ARGS [] 10 10}]
$tcA
= TyCon
16201120719427956884##64
13080046616073797921##64
$trModule
$tcA_s1Sw
0#
krep$*
*** End of Offense ***
<no location info>: error:
Compilation had errors
<no location info>: error: ExitFailure 1
*** unexpected failure for T7931(hpc)
```Andreas KlebingerAndreas Klebingerhttps://gitlab.haskell.org/ghc/ghc/-/issues/21736Need better absence analysis2023-12-19T12:09:32ZSimon Peyton JonesNeed better absence analysisHere's an example where absence analysis is really failing.
```haskell
data K a = Nil | Kons !a !(K a)
data T = MkT ![Int] !(K Int)
f1 :: T -> Int -> [Int]
-- Str=<1!P(1L,A)><1!P(1L)>,
f1 (MkT xs ys) 0 = xs
f1 (MkT xs ys) n = f1 (MkT x...Here's an example where absence analysis is really failing.
```haskell
data K a = Nil | Kons !a !(K a)
data T = MkT ![Int] !(K Int)
f1 :: T -> Int -> [Int]
-- Str=<1!P(1L,A)><1!P(1L)>,
f1 (MkT xs ys) 0 = xs
f1 (MkT xs ys) n = f1 (MkT xs (Kons n ys)) (n-1)
f2 :: [Int] -> K Int -> Int -> [Int]
-- The bang makes things much worse
-- Str=<1L><1A><1!P(1L)>,
f2 xs !ys 0 = xs
f2 xs ys n = f2 xs (Kons n ys) (n-1)
```
In `f1`, the `ys` argument to `MkT` is never used. Absence analysis discovers that,
and we get a tight loop tha never passes that accumulating `Kons` list.
```
Foo.$wf1 [InlPrag=[2], Occ=LoopBreaker]
:: [Int] -> GHC.Prim.Int# -> [Int]
[GblId[StrictWorker([!, ~])],
Arity=2,
Str=<1L><1L>,
Unf=OtherCon []]
Foo.$wf1
= \ (ww_sMw :: [Int])
(ww1_sMB :: GHC.Prim.Int#) ->
case ww1_sMB of ds_X3 {
__DEFAULT -> Foo.$wf1 ww_sMw (GHC.Prim.-# ds_X3 1#);
0# -> ww_sMw
}
```
But in `f2` I have done a kind of manual worker/wrapper and unboxed the `MkT`. I have also added a bng to `ys`. I get *much* worse code:
```
Foo.$wf2 [InlPrag=[2], Occ=LoopBreaker]
:: [Int] -> K Int -> GHC.Prim.Int# -> [Int]
[GblId[StrictWorker([!, ~, ~])],
Arity=3,
Str=<1L><MA><1L>,
Unf=OtherCon []]
Foo.$wf2
= \ (xs_sMl :: [Int])
(ys_sMm :: K Int)
(ww_sMp :: GHC.Prim.Int#) ->
case ww_sMp of ds_X2 {
__DEFAULT ->
case ys_sMm of ys1_X3 { __DEFAULT ->
Foo.$wf2
xs_sMl
(Foo.Kons @Int (GHC.Types.I# ds_X2) ys1_X3)
(GHC.Prim.-# ds_X2 1#)
};
0# -> xs_sMl
}
```
Look at that ever-increasing `Kons` argument that is eventually thrown away anyway.
## How this happened
This happened in the wild: [see here](https://github.com/Mikolaj/horde-ad/issues/14#issuecomment-1155821381).
The observed behavior is:
* With a SPECIALISE pragama in module `MnistFcnnScalar.hs`, program is fast. Indirectly we create a specialised version of overloaded function `sumConstantData` in that module.
* Remove the SPECIALISE, add `-fexpose-all-unfoldings -fspecialise-aggressively`. Now module `BenchMnistTools.hs` needs a specialised version of `sumConstantData`, and creates one from the exposed unfolding.
* But program runs 30% slower, and allocates 15x as much. Why?
* Turns out to be that the optimised unfolding of the (overloaded) `sumConstantData`, in `MnistFcnnScalar.hi`, has a local loop `$wfoldlM'_loop` which arose from this defn in `vector:Data.Vector.Fusion.Stream.Monadic`:
```
foldlM' :: Monad m => (a -> b -> m a) -> a -> Stream m b -> m a
{-# INLINE_FUSED foldlM' #-}
foldlM' m w (Stream step t) = foldlM'_loop SPEC w t
where
foldlM'_loop !_ z s
= z `seq`
do
r <- step s
case r of
Yield x s' -> do { z' <- m z x; foldlM'_loop SPEC z' s' }
Skip s' -> foldlM'_loop SPEC z s'
Done -> return z
```
Notice that `seq`!
* When we specialise the (optimised) unfolding for `sumConstantData` we are seeing the loop that looks much like `f2`.
* But when we specialise the un-optimised original defn of `sumConstantData` (when the SPECIALISE pragma is there) we have not yet inlined `foldlM'` so we don't see that loop.
So roughly, we get `f2` instead of `f1`, and that makes a *reallyl big* difference. What is worse,
the difference happens when we see a "more optimised" program. The programmer has zero chance of working out what is going on.Sebastian GrafSebastian Grafhttps://gitlab.haskell.org/ghc/ghc/-/issues/21565Case of known constructor can weaken indicated demands on a functions arguments.2022-09-28T11:59:17ZAndreas KlebingerCase of known constructor can weaken indicated demands on a functions arguments.We start out with a function like this.
```haskell
$wfoo x'[Dmd=1S]) _[Dmd=L] =
let box[Dmd=1S(1S,L)] = (x', _)
in
...
bar box
```
=> (inlining) We inline bar which introduces a branch:
```haskell
$wfoo x'[Dmd=SP]) ...We start out with a function like this.
```haskell
$wfoo x'[Dmd=1S]) _[Dmd=L] =
let box[Dmd=1S(1S,L)] = (x', _)
in
...
bar box
```
=> (inlining) We inline bar which introduces a branch:
```haskell
$wfoo x'[Dmd=SP]) _[Dmd=L] =
let box[Dmd=1S(1S,L)] = (x', _)
in
...
case z of
C1 -> baz box
C2 -> baz (case box of (x'',_) -> (x'',sth))
```
=> Known Con ends up eliminating `box` on one of the two branches.
```haskell
-- Different demand sigs
$wfoo x'[Dmd=SP]) _[Dmd=L] =
let box[Dmd=L] = (x', _)
in
...
case z of
C1 -> baz box -- x' only demanded via box on this branch
C2 -> baz (x',sth) -- box dead on this branch!
```
A few confusing things happen here:
* `box` is no longer strict which might or might not matter, but it seems we update the demand on it correctly.
* `x'`s demand is not updated. So it's *currently* still given a strict demand. But if we rerun the demand analysier it *will* be given a `L` demand.
* The behaviour of the RHS in regards to it's arguments has *not* changed. Although the demand we will infer in future demand analysier runs has.
This seems rather rare but I had this come up with a patch of mine where I had assumed strictness on arguments for a RHS never goes down (which I think also isn't true).
And here is a full reproducer where this happens. Compile with -ddump-str-anal and look at the signatures for `foo`.
```haskell
{-# LANGUAGE MagicHash #-}
{-# LANGUAGE UnboxedTuples #-}
{-# LANGUAGE BangPatterns #-}
{-# OPTIONS_GHC -O -fno-worker-wrapper -fno-cpr-anal -fno-full-laziness -O -ddump-stranal -flate-dmd-anal -fstrictness-before=1 -fno-float-in#-}
{-# LANGUAGE PartialTypeSignatures #-}
module A where
import GHC.Exts
data Env = Env Bool Bool {-# NOUNPACK #-} Int Int Char
data MyEnum = A | B | C | D
{-# NOINLINE doSomething #-}
doSomething :: Env -> Bool
doSomething (Env a b _ _ _ ) = a || b
{-# NOINLINE doSomething2 #-}
doSomething2 :: Env -> Bool
doSomething2 (Env a b _ _ _) = a || b
{-# INLINE[0] bar #-}
bar :: MyEnum -> Env -> Bool
bar b env =
case b of
A -> (doSomething) env
B -> (doSomething2) env
C -> (doSomething) (case env of Env x y _ _ _ -> x `seq` (Env x False 1 2 '1'))
D -> doSomething env || doSomething2 env
foo :: Bool -> Bool -> A.MyEnum -> A.MyEnum -> Int -> Char -> Bool
foo x y b c n char =
let env =
Env x (n == 42 ) (n) 4 $ (succ $ succ $ succ $ succ $ succ char)
in case c of
!_ -> bar b env
```
The good thing though is I can't think of a way how this could cause issues at the moment.
* `box`/`env` needs to be in WHNF in order for known constructor to trigger. So dropping a seq is fine either way.
If they are a thunk I don't think we can get into this situation. But it's possible I'm wrong on that.
* A function changing from showing a strict demand for an argument at first, and a lazy demand later is odd, but here the function remains strict in it's *behavior*. It's just that dmdAnal no longer can recognize it as such in future runs.
It can certainly make it more confusing to look at core though! And maybe it is dangerous and I just missed how so. Hence the ticket.https://gitlab.haskell.org/ghc/ghc/-/issues/21513Lack of eta expansion in NoFib's fft22022-05-10T14:44:39ZSebastian GrafLack of eta expansion in NoFib's fft2According to what I wrote in my Master's thesis a few years ago, the following example is inspired by NoFib's [`fft2` benchmark](https://gitlab.haskell.org/ghc/nofib/-/blob/bca01968bb39a252a9771c9dd028bd7f71472361/spectral/fft2/Fourier.l...According to what I wrote in my Master's thesis a few years ago, the following example is inspired by NoFib's [`fft2` benchmark](https://gitlab.haskell.org/ghc/nofib/-/blob/bca01968bb39a252a9771c9dd028bd7f71472361/spectral/fft2/Fourier.lhs#L61-69):
```hs
foo :: ([a] -> [a]) -> [a] -> [a]
foo f a = f a
{-# NOINLINE foo #-}
bar :: Int -> [Int] -> [Int]
bar n =
let bar' = bar (n-1) in
if sum [0..n] < 10
then \xs -> 10:xs
else \xs -> foo bar' xs
main = print ( bar 1 [2])
```
The point here is that even though `bar` has arity 1 as written, it could be eta-expanded: Assuming we'd eta-expand `bar`, the sharing of the pap `bar'` turns out unnecessary, because `foo` will call `bar'` only once, with one arg.
Neither Call Arity nor Demand Analysis is able to detect that:
- Call Arity doesn't see that `foo` will call `bar'` only once, because it doesn't have a signature mechanism. Neither could it, because it analyses let body before RHS.
- Demand Analysis computes a signature for `foo` from which it can tell that `bar'` is only called once with one arg (`MCM(L)`). But likewise, it analyses `bar`'s RHS before looking at how it's used. Thus, it doesn't see that the only external call in `main` is one with two args. Thus, upon analysing `bar`'s RHS, it has to assume that `\xs` can be entered many times, each of which might result in one call to the PAP `bar'`. Thus to DmdAnal, it appears as if `bar'` is shared, which prohibits eta-expansion of `bar`.
The DmdAnal variant of my Master's thesis was able to eta-expand `bar`. I believe !5349 is able to do so, too, but I can't tell at the moment because I don't have it checked out and built at the moment.
Simon and I are working on a way to do the same in a way that is explainable to someone else.Sebastian GrafSebastian Grafhttps://gitlab.haskell.org/ghc/ghc/-/issues/21392Don't float out join points to top-level with an interesting demand2024-02-01T09:42:00ZSebastian GrafDon't float out join points to top-level with an interesting demandDue to a complicated interaction with FloatOut in !7599, I'm seeing roughly the following sequence of steps:
```
==> after DmdAnal
f = ... joinrec go [Dmd=SCS(C1(P(1L,A))),Str=<S><L>] x y =
... case <e> of x' { __DEF...Due to a complicated interaction with FloatOut in !7599, I'm seeing roughly the following sequence of steps:
```
==> after DmdAnal
f = ... joinrec go [Dmd=SCS(C1(P(1L,A))),Str=<S><L>] x y =
... case <e> of x' { __DEFAULT -> go x' y' }
in ... go ...
==> FloatOut + FloatIn strict arg `e`
go [Str=<S><L>]
go = ... go <e> y' ... -- NB: Dropped the eval
f = ... go ...
==> DmdAnal, 2nd run
go [Str=<L><L>] -- Urgh, now lazy!
go = ... go <e> y' ...
f = ... go ...
```
This is what happens:
1. The first run of DmdAnal records a signature `<S><L>` for the join point `go`, assuming a demand of `P(1L,A)` on the join body. NB: DmdAnal currently computes better signatures for join points than for regular functions because their evaluation context is known.
2. The fact that `go` is strict in its first arg allows us to drop the eval on `e` in Core and we get `go e y'` in the recursive call because `go`, which is easier to handle in Core2Core passes.
3. In the meantime, `go` has been simplified enough for the second FloatOut pass to see that it can float to the top-level. Off it goes
4. But on the top-level it's just a regular function where DmdAnal isn't smart enough (yet?) to figure out an accurate evaluation context. So our strictness signature gets worse, `<L><L>`. And now we dropped the eval in (2) but can't recover it! We made the program lazier; in CorePrep we'll let-bind instead of case-bind `e`.
In !7599, we made inlined versions of `addListToUniqDSet` lazier in `GHC.Linker.Loader`, to detrimental effect on `MultiLayerModulesTH_OneShot`, which increased by 4.8%.
In a multi-day effort I could come up with a reproducer, but it only reproduces with !7599 (because `t` has demand `MP(1L,A)` which would be `MP(ML,A)` today. I couldn't reproduce with a demand of `1P(1L,A)` because then `t` will just be turned into a `case` and pushed into the join point):
```hs
module Lib (f) where
import Data.List
import Data.Ord
newtype Unique = U { unU :: Int }
class Uniquable u where getKey :: u -> Unique
instance Uniquable Int where getKey = U
data UMap a = UMap { unS :: ![(Unique,a)], unI :: !Int }
insertBy' f v !xs = insertBy f v xs
{-# NOINLINE insertBy' #-}
addOne :: Uniquable u => UMap a -> u -> a -> UMap a
addOne (UMap set n) x v = UMap (insertBy' (comparing (unU . fst)) (getKey x,v) set) (n+1)
newtype USet u = USet (UMap u)
insertOne :: Uniquable u => USet u -> u -> USet u
insertOne (USet s) x = USet (addOne s x x)
insertMany :: Uniquable u => USet u -> [u] -> USet u
insertMany s vs = foldl' insertOne s (reverse (reverse vs))
seq' = seq
{-# NOINLINE seq' #-}
blah s@(USet m) = unS m `seq'` s
{-# OPAQUE blah #-}
end (USet m) = unS m
{-# NOINLINE end #-}
f :: USet Int -> [Int] -> [(Unique,Int)]
f !xs ys
| length ys == 13 = end $ blah t
| length ys == 23 = reverse $ end $ blah t
| otherwise = []
where
t = insertMany xs (reverse $ reverse $ reverse $ reverse ys)
```
Welp.
My suggestion for a fix: In `SetLevels.destLevel`, only float join points (to the top-level) if their demand is not "interesting", e.g. not just `nCn(C1(L))`. Alternatively, zap their demand signatures when floating. Perhaps a combination of both.
Perhaps a strange artifact is that `go`'s first arg is actually unlifted (in terms of `StrictWorkerId`) and perhaps will be eval'd in Stg2Cmm. But the thunk will be allocated nonetheless which is a waste of ressources. Maybe we could case-bind in CorePrep expressions that go in Unlifted positions. But that isn't strictly the cause of the issue here.Sebastian GrafSebastian Grafhttps://gitlab.haskell.org/ghc/ghc/-/issues/21257DmdAnal: `dmdAnalStar` could yield better usage for trivial args and bindings2022-11-21T19:55:13ZSebastian GrafDmdAnal: `dmdAnalStar` could yield better usage for trivial args and bindingsConsider
```hs
f :: (Bool, Bool) -> (Bool, Bool)
f pr = (pr `seq` True, case pr of (a,b) -> a && b)
{-# NOINLINE f #-}
-- idDmdSig f = LP(1L,1L)
g :: (Bool, Bool) -> ()
g pr = f pr `seq` ()
-- idDmdSig g = L
```
`g` simply calls `f`, ...Consider
```hs
f :: (Bool, Bool) -> (Bool, Bool)
f pr = (pr `seq` True, case pr of (a,b) -> a && b)
{-# NOINLINE f #-}
-- idDmdSig f = LP(1L,1L)
g :: (Bool, Bool) -> ()
g pr = f pr `seq` ()
-- idDmdSig g = L
```
`g` simply calls `f`, yet `g` puts a worse demand on `pr` than `f`; it loses the information that `pr`'s components are only evaluated once.
That is due to `dmdAnalStar e (n :* sd)`, which is used when analysing the argument `pr` in the call `f pr` in demand `LP(1L,1L)`. That will
1. Call `dmdAnal "pr" "P(1L,1L)"`, as if `pr` was evaluated exactly once in sub-demand `P(1L,1L)`. We get back `[pr |-> 1P(1L,1L)]` (which of course is too optimistic).
2. Then it worsens the resulting demand type to account for `n=L`, by multiplying it with `L` via `multDmdType`. That ultimately calls `multDmd "L"` on the demand of `x`: `multDmd L 1P(1L,1L) === LP(LL,LL) === L`.
It's alright to turn the outer `1P(..)` into `LP(..)`, but we don't actually need to worsen the inner `P(1L,1L)`, because it is the exact demand that is put on the argument already.
NB: This ticket is specifically concerned with upper bounds. The `multDmd "L"` situation (with a used more than once upper boudn) only occurs when an arg `pr` is trivial (or if the RHS of a bindings is trivial). Otherwise, `dmdTransformThunkDmd` anticipates the memoisation of complex expressions and call `oneifyDmd` instead. So if we had a call like `f (x, True)` instead, then we'd get `dmdAnalStar "(x, True)" "1P(1L,1L)"`, e.g., with the oneified demand `1P(..)` and all will be well, as we ultimately get a demand of `1L` on `x`.Sebastian GrafSebastian Grafhttps://gitlab.haskell.org/ghc/ghc/-/issues/20819DmdAnal: Handling of RULES and unfolding free variables might be broken2022-05-02T18:14:05ZSebastian GrafDmdAnal: Handling of RULES and unfolding free variables might be brokenWhile working on !7044, I realised that our current `keepAlive*` logic might be broken.
In particular, we have code of this form:
```hs
-- See Note [Absence analysis for stable unfoldings and RULES]
rule_fvs = bndrRul...While working on !7044, I realised that our current `keepAlive*` logic might be broken.
In particular, we have code of this form:
```hs
-- See Note [Absence analysis for stable unfoldings and RULES]
rule_fvs = bndrRuleAndUnfoldingIds id
final_ty = body_ty' `plusDmdType` rhs_ty `keepAliveDmdType` rule_fvs
....
-- See Note [Absence analysis for stable unfoldings and RULES]
rhs_fv2 = rhs_fv1 `keepAliveDmdEnv` bndrRuleAndUnfoldingIds id
```
In `dmdAnalBindLetUp` and `dmdAnalRhsSig` (called from `dmdAnalBindLEtDown`), respectively.
This code fails to unleash demands on free variables in the demand signature of those FVs! E.g., if we could write
```hs
foo :: Integer -> Integer
foo a = f a
where
z :: Integer
z = 0
{-# NOINLINE z #-}
g :: Integer -> Integer -> Integer
g x y = x+z
{-# NOINLINE g #-}
f :: Integer -> Integer
f x = x+a
{-# NOINLINE[0] f #-}
{-# RULES "f to g" f 0 = g 0 #-}
```
(which we can't because RULES on local bindings don't work), then we'd do the following:
1. Compute a signature for `g`, `<1L>{z->1L>}`. Note that it is strict in its FV `x`.
2. Analyse `f`'s RHS. Get DmdType `<1L>`. Note that it doesn't mention `g`!
3. Now we do the `keepAliveDmdEnv` on `{g}`, the only FV of `f`'s RULE. This entails simply *adding* `g` to the `DmdEnv` of `f`. We get the final signature `<1L>{g->L}`.
4. Analyse and unleash `f`s signature at the call `f a`.
5. ... Leave scopes of `f` and `g`, properly annotating them ...
6.Realise that `z` is absent. Wrong wrong wrong!! This might lead to a crash whenever we rewrite `f` to `g`.
This won't work if lambda-lift `f` and `g`, because then we'll automatically record many-uses that end up in the `lazy_fv`s. In particular, we'd automatically record a usage of `z->L` when we add the lazy_fv's of `g`; `z->1L` would never be part of `g`'s signature.
It also doesn't work as written, because we can't declare RULES on nested functions.
So I haven't been able to actually reproduce the issue in production. Nevertheless, the danger is there! And !7044 trips over it, because it considers no variable as weak. So it will actually give `g` the sig `<1L>{z->1L>}`, which we fail to unleash as part of `keepAliveDmdEnv`.
!7044 now has the fix. !5349 fixes it in similar manner.