GHC issueshttps://gitlab.haskell.org/ghc/ghc/-/issues2019-07-07T18:40:05Zhttps://gitlab.haskell.org/ghc/ghc/-/issues/9545Evaluate Takano Akio's foldrW/buildW fusion framework as a possible replaceme...2019-07-07T18:40:05ZDavid FeuerEvaluate Takano Akio's foldrW/buildW fusion framework as a possible replacement for foldr/buildTakano Akio's [worker-wrapper fusion](https://github.com/takano-akio/ww-fusion) modifies `foldr/build` fairly conservatively to allow safe fusion of `foldl` and friends without relying on arity analysis. This advantage is important for t...Takano Akio's [worker-wrapper fusion](https://github.com/takano-akio/ww-fusion) modifies `foldr/build` fairly conservatively to allow safe fusion of `foldl` and friends without relying on arity analysis. This advantage is important for two reasons that I know of:
1. As discussed in Joachim Breitner's original paper, the current arity analysis is unable to infer arity correctly when facing certain forms of recursion.
1. The current arity analysis, and probably *any* practical arity analysis, depends on inlining to a degree that can sometimes be unhealthy. I would love to make functions like `hPutStr` fuse, but I suspect doing so safely at present would likely cause a code explosion—the function being folded is too large to want to inline to make it available for arity analysis following fusion.
I don't understand it completely yet, but it looks like `foldrW/buildW` can eliminate a lot of this uncertainty. Furthermore, it appears that functions currently written using `foldr` in a "right-handed" way can remain unchanged, using an implementation of `foldr` in terms of `foldrW`. Only "left-handed" folds will need to be rewritten to take advantage of this framework, along with all the `build`s.
That said, `foldrW/buildW` probably has its weaknesses too. The basic fusion rule looks like this.
```hs
{-# RULES
"foldrW/buildW" forall
f z
(i :: Wrap f b)
(g :: forall c g .
(Wrap g c)
-> (a -> c -> c)
-> c
-> c)
.
foldrW i f z (buildW g) = g i f z
#-}
```
If `g` and `i` are not inlined sufficiently to merge with each other, I imagine this could produce worse results than `foldr/build` when the latter works properly. The only way to really get a feel for performance would be to carefully modify everything to work as well as it can with this framework and see how the results compare.
<details><summary>Trac metadata</summary>
| Trac field | Value |
| ---------------------- | -------------- |
| Version | 7.9 |
| Type | Task |
| TypeOfFailure | OtherFailure |
| Priority | normal |
| Resolution | Unresolved |
| Component | libraries/base |
| Test case | |
| Differential revisions | |
| BlockedBy | |
| Related | |
| Blocking | |
| CC | ekmett, hvr |
| Operating system | |
| Architecture | |
</details>
<!-- {"blocked_by":[],"summary":"Evaluate Takano Akio's foldrW/buildW fusion framework as a possible replacement for foldr/build","status":"New","operating_system":"","component":"libraries/base","related":[],"milestone":"","resolution":"Unresolved","owner":{"tag":"Unowned"},"version":"7.9","keywords":[],"differentials":[],"test_case":"","architecture":"","cc":["ekmett","hvr"],"type":"Task","description":"Takano Akio's [https://github.com/takano-akio/ww-fusion worker-wrapper fusion] modifies `foldr/build` fairly conservatively to allow safe fusion of `foldl` and friends without relying on arity analysis. This advantage is important for two reasons that I know of:\r\n\r\n1. As discussed in Joachim Breitner's original paper, the current arity analysis is unable to infer arity correctly when facing certain forms of recursion.\r\n\r\n2. The current arity analysis, and probably ''any'' practical arity analysis, depends on inlining to a degree that can sometimes be unhealthy. I would love to make functions like `hPutStr` fuse, but I suspect doing so safely at present would likely cause a code explosion—the function being folded is too large to want to inline to make it available for arity analysis following fusion.\r\n\r\nI don't understand it completely yet, but it looks like `foldrW/buildW` can eliminate a lot of this uncertainty. Furthermore, it appears that functions currently written using `foldr` in a \"right-handed\" way can remain unchanged, using an implementation of `foldr` in terms of `foldrW`. Only \"left-handed\" folds will need to be rewritten to take advantage of this framework, along with all the `build`s.\r\n\r\nThat said, `foldrW/buildW` probably has its weaknesses too. The basic fusion rule looks like this.\r\n\r\n{{{#!hs\r\n{-# RULES\r\n\"foldrW/buildW\" forall\r\n f z\r\n (i :: Wrap f b)\r\n (g :: forall c g .\r\n (Wrap g c)\r\n -> (a -> c -> c)\r\n -> c\r\n -> c)\r\n .\r\n foldrW i f z (buildW g) = g i f z\r\n #-}\r\n}}}\r\n\r\nIf `g` and `i` are not inlined sufficiently to merge with each other, I imagine this could produce worse results than `foldr/build` when the latter works properly. The only way to really get a feel for performance would be to carefully modify everything to work as well as it can with this framework and see how the results compare.","type_of_failure":"OtherFailure","blocking":[]} -->https://gitlab.haskell.org/ghc/ghc/-/issues/19881Static arguments2021-05-21T13:51:31ZJaro ReindersStatic argumentsFor this issue I will use the `foldr` function (on lists) as an example.
It can be naively defined as follows.
```haskell
{-# LANGUAGE LambdaCase #-}
module Foldr where
import Prelude hiding (foldr)
foldr :: (a -> b -> b) -> b -> [a] ...For this issue I will use the `foldr` function (on lists) as an example.
It can be naively defined as follows.
```haskell
{-# LANGUAGE LambdaCase #-}
module Foldr where
import Prelude hiding (foldr)
foldr :: (a -> b -> b) -> b -> [a] -> b
foldr k z = \case
[] -> z
(x:xs) = x `k` foldr k z xs
```
However, the `base` library chooses to define it with a local recursive helper function.
```haskell
{-# INLINE [0] foldr #-}
foldr k z = go
where
go [] = z
go (y:ys) = y `k` go ys
```
The reason for this might not be obvious to everyone, especially less advanced users.
I suspect the main reason is that GHC is able to optimize this form more.
An obvious difference is that the recursive `go` function now has two arguments less which should make the recursion faster, but I would expect such an optimization to be easy and automatically done by the compiler.
No, what really gives a big performance improvement is that the `foldr` function is now non-recursive and hence can be inlined.
Inlining this function basically means specializing this function to known `k` and `z` arguments.
My first question is: **why doesn't GHC transform the naive version to the version that optimizes better?**
An optimization rule could be: for any recursive function (perhaps any `INLINEABLE` recursive function), identify arguments that do not change during the recursion (this might be impossible in general but it could just be done syntactically) and generate a recursive helper function that only uses the changing arguments.
But there is another way to optmize this code that relies on specialization instead of inlining.
The `foldr` function can be defined as follows.
```haskell
{-# LANGUAGE MultiParamTypeClasses, FunctionalDependencies #-}
module FoldrC where
class FoldrAlg a b | b -> a where
k :: a -> b -> b
z :: b
{-# INLINEABLE foldrC #-}
foldrC :: FoldrAlg a b => [a] -> b
foldrC [] = z
foldrC (x:xs) = x `k` foldrC xs
```
Now calling the foldr function requires defining an instance for this `FoldrAlg` type class, but that is very mechanical.
Here is an example.
```haskell
{-# LANGUAGE MultiParamTypeClasses, FlexibleInstances #-}
import Foldr
default (Int)
newtype FoldrSum a = FoldrSum { unFoldrSum :: a }
instance Num a => FoldrAlg a (FoldrSum a) where
z = FoldrSum 0
x `k` FoldrSum xs = FoldrSum (x + xs)
main = print (unFoldrSum (foldrC [1,2,3]))
```
For me it is not obvious how to infer the functional dependencies of that `FoldrAlg` class, but that class can be split to avoid functional dependencies.
```haskell
class FoldrZ b where
z :: b
class FoldrK a b where
k :: a -> b -> b
{-# INLINEABLE foldrC #-}
foldrC :: (FoldrK a b, FoldrZ b) => [a] -> b
foldrC [] = z
foldrC (x:xs) = x `k` foldrC xs
```
Here is an example of using it, note the extra type applications:
```haskell
{-# LANGUAGE MultiParamTypeClasses, FlexibleInstances, TypeApplications #-}
import FoldrClass
newtype FoldrSum a = FoldrSum { unFoldrSum :: a }
instance Num a => FoldrZ (FoldrSum a) where
z = FoldrSum 0
instance Num a => FoldrK a (FoldrSum a) where
x `k` FoldrSum xs = FoldrSum (x + xs)
main = print (unFoldrSum @Int (foldrC @Int [1,2,3]))
```
The type arguments should not be a problem if this would be implemented as a core optimization.
And I believe the creation of the `FoldrSum` type and these instances can be simplified in core too.
So, I feel like this optimization could be done automatically by GHC too.
Maybe the `SPECIALIZE` pragma could even be extended to allow something like:
```haskell
{-# SPECIALIZE foldr (+) 0 :: Num a => [a] -> a #-}
```
Which would automatically create that type class and the instances.
This approach seems preferable to inlining because we can reuse the specialized version everywhere it is used with the same arguments.
This requires determining equality of arbitrary values including functions, which is impossible in general, but I feel like again a simple analysis, purely syntactically if necessary, could be used to determine function equality.
Edit: this last note about the specialize pragma already seems to be tracked by: https://gitlab.haskell.org/ghc/ghc/-/issues/5059