import Data.Vector.Unboxed as U ((!), constructN, length)coinchangev :: Int -> [Int] -> Intcoinchangev n cs = v ! n where v = constructN (n+1) f f w = case U.length w of 0 -> 0 m -> 1 + minimum [w ! x | x <- map (m-) cs, x >= 0]main = print $ coinchangev 10000000 [1, 5, 10, 25, 100]
However if I change minimum to sum, while the runtime in 7.8.4 is unchanged, the runtime in 7.10.1 drops by a factor of 5! Allocations also decrease by a large factor. So my guess is that something has gone wrong with call arity analysis for minimum (and gone very right for sum).
Trac metadata
Trac field
Value
Version
7.10.1
Type
Bug
TypeOfFailure
OtherFailure
Priority
normal
Resolution
Unresolved
Component
Compiler
Test case
Differential revisions
BlockedBy
Related
Blocking
CC
Operating system
Architecture
Edited
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items
0
Show closed items
No child items are currently assigned. Use child items to break down this issue into smaller parts.
Linked items
0
Link issues together to show that they're related or that one is blocking others.
Learn more.
In 7.8, both minimum and sum were non-fusing left-folds. In 7.10, sum, via foldl, is fusing, so this is the 5× improvement you observe. minimum is not easily foldable: It is a foldl1, which treats the first (:) different from the rest, and it is not clear how to fix that.
So performance difference between minimum and sum in 7.10 can be explained. What needs to be investigated is why 7.10 degraded by 50% over 7.8. I do not expect Call Arity/foldl fusion to play a role here, but I might be wrong.
BTW: One could reasonably expect the compiler to transform minimum (x:xs) into foldl' min x xs, which could then maybe fuse, but that does not seem to be the case.
GHC-7.8’s inlining seems to be a little excessive, but in 7.10 there is certainly a lack of specialization. Maybe some INLINEABLE pragma would help? Not sure...
It looks like more inlining happened in 7.10.1 at the definition site of strictMinimum, and the result was that the unfolding became too large for GHC to want to inline it at use sites.
8.4:
Considering inlining: Data.List.strictMinimum arg infos [ValueArg, NonTrivArg] uf arity 2 interesting continuation CaseCtxt some_benefit True is exp: True is work-free: True guidance IF_ARGS [30 30] 80 0 discounted size = -10 ANSWER = YES strictMinimum :: GHC.Classes.Ord a -> [a] -> a {- Arity: 2, Strictness: <L,1*U(A,A,A,A,A,A,A,1*C(C1(U)))><S,1*U>, Unfolding: (\ @ a $dOrd :: GHC.Classes.Ord a ds :: [a] -> case ds of wild { [] -> Data.List.minimum1 @ a : ipv ipv1 -> Data.List.foldl' @ a @ a (GHC.Classes.min @ a $dOrd) ipv ipv1 }) -}
10.1:
Considering inlining: strictMinimum arg infos [ValueArg, NonTrivArg] interesting continuation CaseCtxt some_benefit True is exp: True is work-free: True guidance IF_ARGS [30 30] 200 0 discounted size = 110 ANSWER = NO strictMinimum :: Ord a => [a] -> a {- Arity: 2, Strictness: <L,1*U(A,A,A,A,A,A,A,1*U)><S,1*U>, Unfolding: (\ @ a $dOrd :: Ord a ds :: [a] -> case ds of wild { [] -> minimum1 @ a : ipv ipv1 -> let { k :: a -> a -> a = min @ a $dOrd } in letrec { go :: [a] -> a -> a {- Arity: 2, Strictness: <S,1*U><S,1*U> -} = \ ds1 :: [a] eta :: a -> case ds1 of wild1 { [] -> eta : y ys -> case eta of z { DEFAULT -> go ys (k z y) } } } in go ipv1 ipv }) -}
rwbartonchanged title from performance regression involving minimum (and maybe Vector) to performance regression involving minimum
changed title from performance regression involving minimum (and maybe Vector) to performance regression involving minimum
So one solution would be to mark strictMinimum as INLINE, so that its unfolding stays small and both strictMinimum and foldl will be inlined at the use site?
So one solution would be to mark strictMinimum as INLINE, so that its unfolding stays small and both strictMinimum and foldl will be inlined at the use site?
Tried that. The unfolding will then mention foldr instead of the above code, but is still too large. I was expecting the unfolding to mention foldl1', as I thought the unfolding of something marked INLINE is never changed? Weird.
instead of the rule rewriting it to strictMinimum, and *not* adding an INLINE pragma to minimum, I get good code in GHC.List, and this is being used here without too much inlining (it inlines the wrapper that distinguishes [] from a non-empty list and unboxes the int, but then calls the wrapper in GHC.List.$wgo1.
Removing INLINE is important, as otherwise we’d be having this worker in every use of minimum.
Even without INLINE we have this in the interface
minimum :: Ord a => [a] -> a {- Arity: 2, Strictness: <L,1*U(A,A,A,A,A,A,A,1*U)><S,1*U>, Unfolding: (\ @ a11 $dOrd :: Ord a11 ds :: [a11] -> case ds of wild { [] -> minimum3 @ a11 : ipv ipv1 -> let { k :: a11 -> a11 -> a11 = min @ a11 $dOrd } in letrec { go :: [a11] -> a11 -> a11 {- Arity: 2, Strictness: <S,1*U><L,U> -} = \ ds1 :: [a11] eta :: a11 -> case ds1 of wild1 { [] -> eta : y ys -> go ys (k eta y) } } in go ipv1 ipv }) -}
so more specialization in some client module seems to be possible. Adding INLINEABLE for reliability does not hurt, though.
This seems to be the right thing to be doing here, but maybe not in general – how can I tell?
I agree that brittle-ness is bad. Can you stand back and give a description of the brittle-ness?
Of course, if a function is inlined, it can be specialised for the call site, and without pragmas that decision is indeed dependent on how big the function is. I see no way to avoid that.
But pragmas should not be brittle. Can you show an example in which they seem to be.
In this particular case, I am quite happy with the INLINEABLE/SPECIALIZE solution, and will submit a DR soon.
Spoke too soon. Looking at the core of List, the maximum for Int is great (worker with a strict unboxed Int), but for Integer, the strictness analyzer is failing me, and I get this loop:
So it sees that go is strict in its second argument. Why is it then not strictly evaluated before the recursive call, avoiding this obvious space leak?
Note that max is not inlined (as it is for Int), but the strictness data is there, so that should not make a difference.
Anyways, I guess this discussion is derailing for this particular ticket. I’ll look into a combination of rewriting with RULES to strictMinimum to avoid relying on the strictness analyzer to produce good code.
*Update:** I looked into this, and it seems that even if -ddump-simpl looks like this, with a space-leaky way of accumulating the argument, that in the CorePrep stage the evaluation of cmax is pulled before the call to go and alls is well. I did not know that there are still such things going on in that stage.
Update: I looked into this, and it seems that even if -ddump-simpl looks like this, with a space-leaky way of accumulating the argument, that in the CorePrep stage the evaluation of cmax is pulled before the call to go and alls is well. I did not know that there are still such things going on in that stage.
Suppose we have a function call f (g x) and f is strict. Then GHC keeps it looking like that so that rewrite rules apply easily. In CorePrep GHC just makes the order of evaluation explicit, by moving to
case (g x) of y -> f y
There is no new analysis; the call always was call-by-value; but after CorePrep that fact is 100% explicit.
I believe that the conclusion here is that GHC is behaving perfectly predictably; but library authors need to take a little care with pragmas and rules if they want to get predictable fusion. That's what your email in ticket:10788#comment:104874 is about, if I read it right. That is, you are not seeking any change to GHC. Correct?