Unsatisfactory fusion result on simple example

I was interested in reproducing the fusion example of unlines given in the short cut fusion paper: https://www.microsoft.com/en-us/research/wp-content/uploads/2016/07/deforestation-short-cut.pdf.

It gives this Haskell definition:

unlines ls = concat (map (\l -> l ++ ['\n']) ls)

which is then through build/foldr rewrites and some other simplications fused to:

unlines ls = h ls
    where h [] = []
          h (l:ls) = g l
              where g [] = '\n' : h ls
              g (x:xs) = x : g xs

which appears to be the most optimal definition. However, if I compile with either GHC 8.7.10 and 9.2.2 I get the following optimized Core (prettified):

newline_chr :: Char
newline_chr = C# '\n'

newline_str :: [Char]
newline_str = : newline_chr []

unlines_impl :: [[Char]] -> [Char]
unlines_impl ds = case ds of
  { : y ys -> ++
                (++ y newline_str)
                (unlines_impl ys)
    [] -> []
  }

unlines :: [String] -> String
unlines ls = unlines_impl ls

Which as far I can tell is ~ O(2n) instead of O(n). I benchmarked it on a few dozen sentences of lorem ipsum which confirmed my suspicion (albeit less extreme):

benchmarking my_unlines
time                 127.9 μs   (125.0 μs .. 130.7 μs)
                     0.997 R²   (0.996 R² .. 0.998 R²)
mean                 126.6 μs   (124.6 μs .. 128.4 μs)
std dev              6.532 μs   (5.742 μs .. 7.504 μs)
variance introduced by outliers: 53% (severely inflated)

benchmarking prelude_unlines
time                 80.23 μs   (79.87 μs .. 80.53 μs)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 79.70 μs   (79.05 μs .. 80.13 μs)
std dev              1.858 μs   (1.088 μs .. 2.962 μs)
variance introduced by outliers: 20% (moderately inflated)

So as I see it this is performance regression caused by ++ not properly fusing. Interestingly, the rewrite rule that changes ++ to a foldr/build representation does fire but in a later simplifier pass is written back to ++.

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information