The Static Argument Transformation (SAT)
This page summarises progress on the Static Argument Transformation.
See:
- Andre Sansos's thesis which has a whole chapter.
- Danvy's lambda-dropping paper
In comment:10 of #5059, Max notes that SAT provides 20-30% wins in nofib, wow!
I think we really want to be able to perform value specialization of recursive functions without SAT. The performance trade-off of SAT when it comes to code generation doesn't look so hot in many cases. BTW, I noticed that the lack of SAT or argument specialization can prevent good optimization of derived code. For example, data Tree a = Branch (Tree a) (Tree a) | Leaf a deriving Foldable
yields a very mediocre-looking foldl'
that would be fixed by a foldr
that could inline. It also produces a very boxing length
, but that smells like a trickier higher-order demand issue. --dfeuer
Tickets
Use Keyword = StaticArgumentTransformation
to ensure that a ticket ends up on these lists.
Open Tickets:
#888 | Implement the static argument transformation |
---|---|
#5059 | Pragma to SPECIALISE on value arguments |
#9374 | Investigate Static Argument Transformation |
#13502 | Static argument transformation should also run after specialisation |
#13966 | Skip-less stream fusion: a missed opportunity |
#14211 | Compiler is unable to INLINE as well as the programmer can manually |
#14231 | Core lint error "in result of Static argument" |
#14649 | ghc panic: mergeSATInfo |
Closed Tickets:
#9545 | Evaluate Takano Akio's foldrW/buildW fusion framework as a possible replacement for foldr/build |
---|
Matt's Notes
The primary value of performing SAT
is to enable the function to be inlined and hence simplified.
The decision about whether to run SAT is currently predicated on the number of static value arguments. If there is more than one static argument then the transformation is run, otherwise it is not. I am unsure why this decision has been made, there are useful situations where running the transformation with one argument (see test2
below). I don't think it has been scrutinised much as 1. Not many people use this optimisation and 2. Usually there is more than one static argument. (For example, a dictionary and a function which uses that dictionary).
Interaction with specialisation
One very useful time to run SAT is after specialisation and SpecConstr. Specialisation especially can create situations where static dictionaries are passed to functions using RankNTypes
which are then not eliminated leading
to slow dictionary lookups.
However, we also have the be careful as sometimes these passes produce recursive groups which are later broken up. SAT only works for recursive groups which contain exactly one definition due to the possibility of blow-up.
test
and test2
are only optimised well with an additional late pass of SAT.
{-# LANGUAGE RankNTypes, BangPatterns #-}
module Standalone where
import qualified Control.Monad.State as S
times :: Monad m => Int -> m a -> m ()
times n ma = go n where
go 0 = pure ()
go n = ma >> go (n - 1)
{-# inline times #-}
newtype VSM s a = VSM { runVSM :: forall m. Monad m => m s -> (s -> m ()) -> m a}
instance Functor (VSM s) where
fmap f (VSM g) = VSM $ \get put ->
g get put >>= \a -> pure (f a)
{-# inline fmap #-}
instance Applicative (VSM s) where
pure a = VSM $ \get put -> pure a
VSM mf <*> VSM ma = VSM $ \get put ->
mf get put >>= \f ->
ma get put >>= \a -> pure (f a)
{-# inline pure #-}
{-# inline (<*>) #-}
instance Monad (VSM s) where
return a = VSM $ \get put -> pure a
VSM ma >>= f = VSM $ \get put ->
ma get put >>= \a -> runVSM (f a) get put
{-# inline return #-}
vmmodify :: (s -> s) -> VSM s ()
vmmodify f = VSM $ \get put ->
get >>= \s ->
let !s' = f s in
put s'
{-# inline vmmodify #-}
vmrunState :: VSM s a -> s -> (a, s)
vmrunState (VSM ma) = S.runState (ma S.get S.put)
{-# inline vmrunState #-}
test :: Int -> ((), Int)
test n = vmrunState (times n $ vmmodify (+1)) n
newtype CS s a = CS {runCS ::
forall r.
(a -> r) -- pure
-> ((s -> r) -> r) -- get
-> (s -> r -> r) -- put
-> r
}
instance Functor (CS s) where
fmap f (CS g) = CS $ \pure get put -> g (pure . f) get put
{-# inline fmap #-}
instance Applicative (CS s) where
pure a = CS $ \pure get put -> pure a
{-# inline pure #-}
CS mf <*> CS ma = CS $ \pure get put ->
mf (\f -> ma (pure . f) get put) get put
{-# inline (<*>) #-}
instance Monad (CS s) where
return a = CS $ \pure get put -> pure a
{-# inline return #-}
CS ma >>= f = CS $ \pure get put ->
ma (\a -> runCS (f a) pure get put) get put
{-# inline (>>=) #-}
CS ma >> CS mb = CS $ \pure get put -> ma (\_ -> mb pure get put) get put
{-# inline (>>) #-}
cmodify :: (s -> s) -> CS s ()
cmodify f = CS $ \pure get put ->
get $ \s -> let !s' = f s in
put s' $
pure ()
{-# inline cmodify #-}
crunState :: CS s a -> s -> (a, s)
crunState (CS f) = f
(\a s -> (a, s))
(\got s -> got s s)
(\s' put s -> put s')
{-# inline crunState #-}
test2 :: Int -> ((), Int)
test2 n = crunState (times n (cmodify (+1))) 0
# Interaction with Join Points
In section 5 of "compiling without continuations" it is noted that join points enable fusion but in order to do so you have to write your programs in a particular style.
module Join where
find :: (a -> Bool) -> [a] -> Maybe a
find p xs = go xs
where
go [] = Nothing
go (x:xs) = if p x then Just x else go xs
any :: (a -> Bool) -> [a] -> Bool
any p xs = case find p xs of
Just _ -> True
Nothing -> False
The core for any
is then nicely fused as go
is identified as a join point which means that it can be inlined even though find
is recursive.
This particular style is exactly the kind of code which the static argument transformation produces. Currently it is not enough to turn on -fstatic-argument-transformation
as the order the transformations are run seems wrong.