Fusion for literal lists
Should fuse foldr k z [a,b,c]
? Doing so can create a lot of code bloat: instead of a compact literal list we get a chain of calls to k
. But of course it can be profitable too.
GHC.Base has this:
-- The foldr/cons rule looks nice, but it can give disastrously
-- bloated code when compiling
-- array (a,b) [(1,2), (2,2), (3,2), ...very long list... ]
-- i.e. when there are very very long literal lists
-- So I've disabled it for now. We could have special cases
-- for short lists, I suppose.
-- "foldr/cons" forall k z x xs. foldr k z (x:xs) = k x (foldr k z xs)
That is, the decision there is "no".
But GHC.HsToCore.Expr
has this
Note [Desugaring explicit lists]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Explicit lists are desugared in a cleverer way to prevent some
fruitless allocations. Essentially, whenever we see a list literal
[x_1, ..., x_n] we generate the corresponding expression in terms of
build:
Explicit lists (literals) are desugared to allow build/foldr fusion when
beneficial. This is a bit of a trade-off,
* build/foldr fusion can generate far larger code than the corresponding
cons-chain (e.g. see #11707)
* even when it doesn't produce more code, build can still fail to fuse,
requiring that the simplifier do more work to bring the expression
back into cons-chain form; this costs compile time
* when it works, fusion can be a significant win. Allocations are reduced
by up to 25% in some nofib programs. Specifically,
Program Size Allocs Runtime CompTime
rewrite +0.0% -26.3% 0.02 -1.8%
ansi -0.3% -13.8% 0.00 +0.0%
lift +0.0% -8.7% 0.00 -2.3%
At the moment we use a simple heuristic to determine whether build will be
fruitful: for small lists we assume the benefits of fusion will be worthwhile;
for long lists we assume that the benefits will be outweighted by the cost of
code duplication. This magic length threshold is @maxBuildLength@. Also, fusion
won't work at all if rewrite rules are disabled, so we don't use the build-based
desugaring in this case.
We used to have a more complex heuristic which would try to break the list into
"static" and "dynamic" parts and only build-desugar the dynamic part.
Unfortunately, determining "static-ness" reliably is a bit tricky and the
heuristic at times produced surprising behavior (see #11710) so it was dropped.
So the decision there is "yes", at least for lists shorter than some threshold, which is currently set to 32.
This all seems a bit unsatisfactory. I tried reducing the threshold to 1, and got some big swings in nofib
# bytes allocated
+-------------------------------++--+-------------+
| || | s.tsv (rel) |
+===============================++==+=============+
| imaginary/integrate || | +156.54% |
| real/cacheprof || | +3.71% |
| real/fem || | +5.82% |
| real/fluid || | +1.13% |
| real/lift || | +3.43% |
| spectral/constraints || | +1.58% |
| spectral/dom-lt || | +1.80% |
| spectral/hartel/nucleic2 || | +1.99% |
| spectral/knights || | -1.05% |
| spectral/rewrite || | +89.96% |
+===============================++==+=============+
| geom mean || | |
+-------------------------------++--+-------------+
This ticket is just to record the breadcrumbs. I tripped over this when doing perf-debugging on real/smallpt
where there was a small negative effect from the unrolling c.f #22157