Unsatisfactory fusion result on simple example
I was interested in reproducing the fusion example of unlines given in the short cut fusion paper: https://www.microsoft.com/en-us/research/wp-content/uploads/2016/07/deforestation-short-cut.pdf.
It gives this Haskell definition:
unlines ls = concat (map (\l -> l ++ ['\n']) ls)
which is then through build/foldr rewrites and some other simplications fused to:
unlines ls = h ls
where h [] = []
h (l:ls) = g l
where g [] = '\n' : h ls
g (x:xs) = x : g xs
which appears to be the most optimal definition. However, if I compile with either GHC 8.7.10 and 9.2.2 I get the following optimized Core (prettified):
newline_chr :: Char
newline_chr = C# '\n'
newline_str :: [Char]
newline_str = : newline_chr []
unlines_impl :: [[Char]] -> [Char]
unlines_impl ds = case ds of
{ : y ys -> ++
(++ y newline_str)
(unlines_impl ys)
[] -> []
}
unlines :: [String] -> String
unlines ls = unlines_impl ls
Which as far I can tell is ~ O(2n) instead of O(n). I benchmarked it on a few dozen sentences of lorem ipsum which confirmed my suspicion (albeit less extreme):
benchmarking my_unlines
time 127.9 μs (125.0 μs .. 130.7 μs)
0.997 R² (0.996 R² .. 0.998 R²)
mean 126.6 μs (124.6 μs .. 128.4 μs)
std dev 6.532 μs (5.742 μs .. 7.504 μs)
variance introduced by outliers: 53% (severely inflated)
benchmarking prelude_unlines
time 80.23 μs (79.87 μs .. 80.53 μs)
1.000 R² (0.999 R² .. 1.000 R²)
mean 79.70 μs (79.05 μs .. 80.13 μs)
std dev 1.858 μs (1.088 μs .. 2.962 μs)
variance introduced by outliers: 20% (moderately inflated)
So as I see it this is performance regression caused by ++
not properly fusing. Interestingly, the rewrite rule that changes ++
to a foldr/build representation does fire but in a later simplifier pass is written back to ++
.