Define reducing operations in Data.Foldable in terms of foldl' instead of foldMap or foldl1
This is a performance issue for the
base library. The following operations in
Data.Foldable take O(n) amount of space and non-optimal amount of time, which makes them of little use in practice. They can lead to performance issues that makes people waste a lot of time in debugging.
Type class operations:
Non type class operations:
Also, not having an
INLINE on these and other operations in
Data.Foldable makes global optimization involving these operations impossible and therefore requires re-definition to achieve that.
Steps to reproduce
Run the benchmarks at this commit (https://github.com/composewell/streamly/tree/b50e1a52850f8fdd1e70797815047cc7333f743d) in the streamly repository using the following command:
cabal run linear -- serially/foldable
If you increase the size of the stream, the amount of memory required would increase linearly, the following command demonstrates that:
cabal run linear -- --stream-size 100000000 serially/foldable
It should run in constant memory and with much better timings for these operations. When these operations are implemented using
foldl' we can see the difference in performance by running the above commands on this commit https://github.com/composewell/streamly/tree/007df998b99bcbea168763fccd79d49164ab04c9 .
The comparison of benchmarks can be seen in this pull request - https://github.com/composewell/streamly/pull/416 . In this PR we could fix the performance of those operations which are part of the
Foldable type class, but we cannot fix
maximumBy because they are not in the type class. However, they can also be fixed generically if we use
foldl' instead of
Our definitions of the type class operations can be found here: https://github.com/composewell/streamly/blob/04f84d34598b8d3f125b4919f51afb4b233f281e/src/Streamly/Internal/Data/Stream/Instances.hs#L138
Similar issues seem to have been discussed in ghc ticket #10830 (closed) specifically in this comment https://gitlab.haskell.org//ghc/ghc/issues/10830#note_129843.
Is there a good reason to use
foldl1 for these operations instead of using
I understand that some of these operations can terminate early in some cases e.g.
product can terminate if it becomes 0. However, those cases cannot be used as an excuse to penalize all other cases heavily. I think, in practice, it makes better sense to implement these using a strict left fold by default and if early termination cases are important to someone they can use custom definitions.
At the very least, the
minimumBy can be replaced with
- GHC version used: 8.8.2
- Operating System: Mac OS
- System Architecture: x86_64