Skip to content

join points produce bad code for stream fusion

Below, I am generating to stream fusion streams xs and ys. Both parameterized on k l. The two streams are then concatenated. Finally I do a strict left fold.

This example needs the 'vector' package but nothing else.

module Test where

import Data.Vector.Fusion.Stream.Monadic as S


foo :: Int -> Int -> IO Int
foo = \i j -> S.foldl' (+) 0 $ xs i j S.++ ys i j
  where xs k l = S.enumFromStepN k l 2
        ys k l = S.enumFromStepN k l 3
        {-# Inline xs #-}
        {-# Inline ys #-}
{-# Inline foo #-}

With ghc-8.0.1 I get nice core:

$wfoo_r1Ai
$wfoo_r1Ai =
  \ ww_s1q5 ww1_s1q9 w_s1q2 ->
    letrec {
      $s$wfoldlM'_loop_s1xc
      $s$wfoldlM'_loop_s1xc =
        \ sc_s1x7 sc1_s1x5 sc2_s1x6 sc3_s1x4 ->
          case tagToEnum# (># sc2_s1x6 0#) of _ {
            False -> (# sc_s1x7, I# sc3_s1x4 #);
            True ->
              $s$wfoldlM'_loop_s1xc
                sc_s1x7
                (+# sc1_s1x5 ww1_s1q9)
                (-# sc2_s1x6 1#)
                (+# sc3_s1x4 sc1_s1x5)
          }; } in
    letrec {
      $s$wfoldlM'_loop1_s1x3
      $s$wfoldlM'_loop1_s1x3 =
        \ sc_s1x2 sc1_s1x0 sc2_s1x1 sc3_s1wZ ->
          case tagToEnum# (># sc2_s1x1 0#) of _ {
            False -> $s$wfoldlM'_loop_s1xc sc_s1x2 ww_s1q5 3# sc3_s1wZ;
            True ->
              $s$wfoldlM'_loop1_s1x3
                sc_s1x2
                (+# sc1_s1x0 ww1_s1q9)
                (-# sc2_s1x1 1#)
                (+# sc3_s1wZ sc1_s1x0)
          }; } in
    $s$wfoldlM'_loop1_s1x3 w_s1q2 ww_s1q5 2# 0#

Now the same with ghc-8.2-rc1. Here, Stream.++ function is not fully optimized away (Left and Right constructors!). Instead we have a join point that executes either of the two parts (xs or ys) based on a case w2_s1U2 of {Left -> ; Right ->}.

$wfoo_r23R
$wfoo_r23R
  = \ ww_s1Ue ww1_s1Ui w_s1Ub ->
      let {
        x1_a1tj
        x1_a1tj = I# ww_s1Ue } in
      let {
        tb_a1wC
        tb_a1wC = (x1_a1tj, lvl1_r23Q) } in
      let {
        lvl2_s1Yh
        lvl2_s1Yh = Right tb_a1wC } in
      joinrec {
        $wfoldlM'_loop_s1U8
        $wfoldlM'_loop_s1U8 w1_s1U0 ww2_s1U6 w2_s1U2 w3_s1U3
          = case w1_s1U0 of { __DEFAULT ->
            case w2_s1U2 of {
              Left sa_a1yP ->
                case sa_a1yP of { (w4_a1zr, m1_a1zs) ->
                case m1_a1zs of { I# x2_a1zw ->
                case tagToEnum# (># x2_a1zw 0#) of {
                  False -> jump $wfoldlM'_loop_s1U8 SPEC ww2_s1U6 lvl2_s1Yh w3_s1U3;
                  True ->
                    case w4_a1zr of { I# y_a1xT ->
                    jump $wfoldlM'_loop_s1U8
                      SPEC
                      (+# ww2_s1U6 y_a1xT)
                      (Left (I# (+# y_a1xT ww1_s1Ui), I# (-# x2_a1zw 1#)))
                      w3_s1U3
                    }
                }
                }
                };
              Right sb_a1z3 ->
                case sb_a1z3 of { (w4_a1zr, m1_a1zs) ->
                case m1_a1zs of { I# x2_a1zw ->
                case tagToEnum# (># x2_a1zw 0#) of {
                  False -> (# w3_s1U3, I# ww2_s1U6 #);
                  True ->
                    case w4_a1zr of { I# y_a1xT ->
                    jump $wfoldlM'_loop_s1U8
                      SPEC
                      (+# ww2_s1U6 y_a1xT)
                      (Right (I# (+# y_a1xT ww1_s1Ui), I# (-# x2_a1zw 1#)))
                      w3_s1U3
                    }
                }
                }
                }
            }
            }; } in
      jump $wfoldlM'_loop_s1U8 SPEC 0# (Left (x1_a1tj, lvl_r23P)) w_s1Ub

For my stream-fusion heavy code, this yields a slowdown of approximately x4 (10 seconds with ghc-8.2-rc1, 2.5 seconds with ghc-8.0.1).

ghc-options:

-O2

-ddump-to-file

-ddump-simpl

-dsuppress-all

-dshow-passes

Edited by choenerzs
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information