We like to think that do {x <- xs; y <- f x ; return g y} is the same as [g y | x <- xs, y <- f x], but the former does not fuse nearly as well. Let's fix that.
I have no issues with improving fusion for do sugared lists, the question is if it is possible to achieve this change with just a libraries fix or if you need to go into the compiler.
There is a lot of crazy code down in the list comprehension module that gets used to make lists fast there. It isn't clear to me how we'd lean on it.
I have no issues with improving fusion for do sugared lists, the question is if it is possible to achieve this change with just a libraries fix or if you need to go into the compiler.
There is a lot of crazy code down in the list comprehension module that gets used to make lists fast there. It isn't clear to me how we'd lean on it.
The easy way is to use the list comprehension desugaring, as in the linked code review. The hard way (because it makes us hide it in some imports) is to move concatMap from GHC.List into GHC.Base and use xs >>= f = concatMap f xs, etc.
This showed some minor performance improvements, and one huge one, as you noticed already: cryptarithm2’s allocation went down by more than 50%!
(I just spent a few minutes looking for list monad operations in that benchmark until I noticed that your commit also had an unrelated(?) change to mapM.)
This showed some minor performance improvements, and one huge one, as you noticed already: cryptarithm2’s allocation went down by more than 50%!
(I just spent a few minutes looking for list monad operations in that benchmark until I noticed that your commit also had an unrelated(?) change to mapM.)
The change also improves cryptarithm2 without the mapM change, but not as much. So there must have been something else.
It would be nice to understand why the mapM change did anything at all. That's an enormous difference. I wonder how much real code in the wild has similar properties (is it a lack of INLINEing?)
It would be nice to understand why the mapM change did anything at all. That's an enormous difference. I wonder how much real code in the wild has similar properties (is it a lack of INLINEing?)
I believe it's either a lack of inlining or an unfortunate effect of full laziness. I haven't tried to dig into that particular one so deeply. Short cut fusion is great stuff, but it's on the finicky side; writing library functions so they don't themselves rely on it seems to be a good idea in many cases. The inliner will be less likely to over-estimate their cost, I believe.