Fusion for literal lists

Should fuse foldr k z [a,b,c]? Doing so can create a lot of code bloat: instead of a compact literal list we get a chain of calls to k. But of course it can be profitable too.

GHC.Base has this:

-- The foldr/cons rule looks nice, but it can give disastrously
-- bloated code when compiling
--      array (a,b) [(1,2), (2,2), (3,2), ...very long list... ]
-- i.e. when there are very very long literal lists
-- So I've disabled it for now. We could have special cases
-- for short lists, I suppose.
-- "foldr/cons" forall k z x xs. foldr k z (x:xs) = k x (foldr k z xs)

That is, the decision there is "no".

But GHC.HsToCore.Expr has this

Note [Desugaring explicit lists]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Explicit lists are desugared in a cleverer way to prevent some
fruitless allocations.  Essentially, whenever we see a list literal
[x_1, ..., x_n] we generate the corresponding expression in terms of
build:

Explicit lists (literals) are desugared to allow build/foldr fusion when
beneficial. This is a bit of a trade-off,

 * build/foldr fusion can generate far larger code than the corresponding
   cons-chain (e.g. see #11707)

 * even when it doesn't produce more code, build can still fail to fuse,
   requiring that the simplifier do more work to bring the expression
   back into cons-chain form; this costs compile time

 * when it works, fusion can be a significant win. Allocations are reduced
   by up to 25% in some nofib programs. Specifically,

        Program           Size    Allocs   Runtime  CompTime
        rewrite          +0.0%    -26.3%      0.02     -1.8%
           ansi          -0.3%    -13.8%      0.00     +0.0%
           lift          +0.0%     -8.7%      0.00     -2.3%

At the moment we use a simple heuristic to determine whether build will be
fruitful: for small lists we assume the benefits of fusion will be worthwhile;
for long lists we assume that the benefits will be outweighted by the cost of
code duplication. This magic length threshold is @maxBuildLength@. Also, fusion
won't work at all if rewrite rules are disabled, so we don't use the build-based
desugaring in this case.

We used to have a more complex heuristic which would try to break the list into
"static" and "dynamic" parts and only build-desugar the dynamic part.
Unfortunately, determining "static-ness" reliably is a bit tricky and the
heuristic at times produced surprising behavior (see #11710) so it was dropped.

So the decision there is "yes", at least for lists shorter than some threshold, which is currently set to 32.

This all seems a bit unsatisfactory. I tried reducing the threshold to 1, and got some big swings in nofib

# bytes allocated

+-------------------------------++--+-------------+
|                               ||  | s.tsv (rel) |
+===============================++==+=============+
|           imaginary/integrate ||  |    +156.54% |
|                real/cacheprof ||  |      +3.71% |
|                      real/fem ||  |      +5.82% |
|                    real/fluid ||  |      +1.13% |
|                     real/lift ||  |      +3.43% |
|          spectral/constraints ||  |      +1.58% |
|               spectral/dom-lt ||  |      +1.80% |
|      spectral/hartel/nucleic2 ||  |      +1.99% |
|              spectral/knights ||  |      -1.05% |
|              spectral/rewrite ||  |     +89.96% |
+===============================++==+=============+
|                     geom mean ||  |             |
+-------------------------------++--+-------------+

This ticket is just to record the breadcrumbs. I tripped over this when doing perf-debugging on real/smallpt where there was a small negative effect from the unrolling c.f #22157

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information