Skip to content

GitLab

  • Menu
Projects Groups Snippets
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
  • Sign in / Register
  • GHC GHC
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
    • Locked Files
  • Issues 4,869
    • Issues 4,869
    • List
    • Boards
    • Service Desk
    • Milestones
    • Iterations
  • Merge requests 453
    • Merge requests 453
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
    • Test Cases
  • Deployments
    • Deployments
    • Releases
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Code review
    • Insights
    • Issue
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • Glasgow Haskell Compiler
  • GHCGHC
  • Issues
  • #17592
Closed
Open
Created Dec 19, 2019 by Sebastian Graf@sgraf812Developer

SpecConstr rule base woes

Consider the following stream fusion approach harnessing the type class specialiser (and also constructor-pattern specialisation for subsequently specialising loop):

{-# LANGUAGE TypeFamilies, FlexibleContexts, BangPatterns #-}

data Step iter
  = Yield !iter !(Item iter)
  | Skip !iter
  | Done

class Iterator iter where
  type Item iter
  next :: iter -> Step iter

data Singleton a = Empty' | Singleton !a
singletonS :: a -> Singleton a
singletonS = Singleton
instance Iterator (Singleton a) where
  type Item (Singleton a) = a
  next Empty' = Done
  next (Singleton a) = Yield Empty' a

data EnumFromTo a = EnumFromTo !a !a
enumFromToS :: a -> a -> EnumFromTo a
enumFromToS = EnumFromTo
instance (Ord a, Num a) => Iterator (EnumFromTo a) where
  type Item (EnumFromTo a) = a
  next (EnumFromTo i high)
    | i > high = Done
    | otherwise = Yield (EnumFromTo (i + 1) high) i

sumS :: (Num (Item iter), Iterator iter) => iter -> Item iter
sumS = loop 0
  where
    loop !total iter1 =
      case next iter1 of
        Done -> total
        Skip iter2 -> loop total iter2
        Yield iter2 x -> loop (total + x) iter2
    -- {-# NOINLINE[99] loop #-} -- comment in for 10x speedup
{-# INLINE sumS #-}

data ConcatMap i o = ConcatMap !(Maybe o) !(Item i -> o) !i
concatMapS :: (Item i -> o) -> i -> ConcatMap i o
concatMapS = ConcatMap Nothing
instance (Iterator i, Iterator o) => Iterator (ConcatMap i o) where
  type Item (ConcatMap i o) = Item o
  next (ConcatMap Nothing f i) = case next i of
    Skip i' -> Skip (ConcatMap Nothing f i')
    Yield i' x -> Skip (ConcatMap (Just (f x)) f i')
    Done -> Done
  next (ConcatMap (Just o) f i) = case next o of
    Skip o' -> Skip (ConcatMap (Just o') f i)
    Yield o' x -> Yield (ConcatMap (Just o') f i) x
    Done -> Skip (ConcatMap Nothing f i)

ex1 :: Int -> Int
ex1
  = sumS
  . concatMapS (\a -> singletonS (a*a))
  . enumFromToS 1
{-# NOINLINE ex1 #-}

main = print (ex1 99999999)

If the {-# NOINLINE[99] loop #-} pragma is commented in, ex1 will compile down to this perfect bit of code:

Main.main_$s$wloop
  = \ (sc_s4GJ :: GHC.Prim.Int#)
      (sc1_s4GI :: GHC.Prim.Int#)
      (sc2_s4GH :: GHC.Prim.Int#) ->
      case GHC.Prim.># sc1_s4GI sc_s4GJ of {
        __DEFAULT ->
          Main.main_$s$wloop
            sc_s4GJ
            (GHC.Prim.+# sc1_s4GI 1#)
            (GHC.Prim.+# sc2_s4GH (GHC.Prim.*# sc1_s4GI sc1_s4GI));
        1# -> sc2_s4GH
      }

Beautiful!

Without the NOINLINE pragma, not so much, though. Here's the Core:

Main.main_$s$wloop [Occ=LoopBreaker]
  :: GHC.Prim.Int#
     -> GHC.Prim.Int#
     -> (Item (EnumFromTo Int) -> Singleton Int)
     -> Singleton Int
     -> GHC.Prim.Int#
     -> GHC.Prim.Int#
[GblId,
 Arity=5,
 Caf=NoCafRefs,
 Str=<S,U><S,U><L,C(U)><S,1*U><S,U>,
 Unf=OtherCon []]
Main.main_$s$wloop
  = \ (sc_s4FQ :: GHC.Prim.Int#)
      (sc1_s4FP :: GHC.Prim.Int#)
      (sc2_s4FO :: Item (EnumFromTo Int) -> Singleton Int)
      (sc3_s4FN :: Singleton Int)
      (sc4_s4FM :: GHC.Prim.Int#) ->
      case sc3_s4FN of {
        Empty' ->
          case GHC.Prim.># sc1_s4FP sc_s4FQ of {
            __DEFAULT ->
              Main.main_$s$wloop
                sc_s4FQ
                (GHC.Prim.+# sc1_s4FP 1#)
                sc2_s4FO
                (sc2_s4FO
                   ((GHC.Types.I# sc1_s4FP)
                    `cast` (Sub (Sym (Main.D:R:ItemEnumFromTo[0] <Int>_N))
                            :: Int ~R# Item (EnumFromTo Int))))
                sc4_s4FM;
            1# -> sc4_s4FM
          };
        Singleton a_a1z4 ->
          case a_a1z4 of { GHC.Types.I# y_s4Dg ->
          Main.main_$s$wloop
            sc_s4FQ
            sc1_s4FP
            sc2_s4FO
            (Main.Empty' @ Int)
            (GHC.Prim.+# sc4_s4FM y_s4Dg)
          }
      }

Long story short, constructor specialisation failed to specialise $wloop for some higher-order arguments (the function of type Item (EnumFromTo Int) -> Singleton Int in particular). That results in a slow-down of a factor of 10 and it goes from constant space to allocating huge amounts of memory (still constant residency, though).

Why?

Edit: Don't bother reading this section yet, it's incoherent and inaccurate. Suffice it to say, the conclusion is accurate: Even though we specialise for the stronger call pattern, we don't rewrite to it because the rewrite rule is not attached to the strictly weaker specialisation.

Looking at the difference in -dverbose-core2core, the first real divergence is in the output from desugaring after optimisation, where the inner loop is specialised for the type arguments and dictionaries of the outer sumS function in the version without NOINLINE.

Then after the initial simplifier phase we get the first digression because loop was specialised for the particular types it was called at, whereas it wasn't in the NOINLINE version (which makes sense).

Now, the critical moment is when loop becomes too large to inline into ex1 in the version without NOINLINE. But ex1 calls loop with quite interesting arguments, those we want to specialise for (the particular ConcatMap datum in particular)! In code:

f :: Int -> Singleton Int -- same function for both
f ...

inner :: Int -> EnumFromToInt Int -- the input stream from which ConcatMap draws elements
inner ...

 -- without NOINLINE
loop (ConcatMap x f inner) = HUGE and recursive
ex1 n = loop (ConcatMap Nothing f (inner n))

 -- with NOINLINE[99], loop gets inlined after the initial phase
ex1 n = joinrec loop' (ConcatMap x f (inner n)) = <HUGE and recursive> in loop' (ConcatMap Nothing f (inner n))

Then WW kicks in. loop (and loop') is deeply strict in the ConcatMap we pass it, but it refuses to 'unpack' (i.e. specialise) for ConcatMap's f. Note also that it will WW loop, ex1 and loop':

-- without NOINLINE
$wloop x f <unpacked components from inner> = HUGE and recursive
loop (ConcatMap x f (.. unpack components from inner ...)) = $wloop x f <unpacked components from inner>
ex1 n = $wloop (ConcatMap Nothing f (inner n))

-- with NOINLINE
$wex1 n = joinrec $wloop' (ConcatMap x f inner) = <HUGE and recursive> in $wloop' (ConcatMap Nothing f (inner n))
ex1 (I# n) = $wex1 n

Note how in the NOINLINE case this separated the definition of f from its usage by a lambda binder, so the subsequent SpecConstr pass is unable to figure out what to specialise for. It can't look through $fw. I think this is the same or at least a similar effect that @nomeata experienced in #14068 and accounted for in #14844.

Or at least that's my working hypothesis, I'll edit this post accordingly tomorrow when I had time to investigate.

Edited Dec 21, 2019 by Sebastian Graf
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Assignee
Assign to
Time tracking