This page tracks @akio's attempt at reviving @nomeata's work on NestedCPR.
Latest code can be found at https://github.com/takano-akio/ghc/compare/master...nested-cpr .
Was last rebased by @mpickering here: https://github.com/ghc/ghc/compare/master...mpickering:nested-cpr. D4244 is probably more up to date.
Design
CPRResult
Changes to -
In the product case (
RedProd
), keep oneDmdResult
for each component. This is the main change in the branch. -
Add
NeverReturns
contructor. It means that no value is returned from this expression. It's different fromThrowsExn
in that it might perform some side effect. Indeed it might be an IO-performing infinite loop. The addition of this constructor is necessary to infer a nested CPR property for a tail- recursive function that does some I/O.
In summary, CPRResult
is changed from:
NoCPR
/ \
RetProd RetSum ConTag
to:
NoCPR
/ \
RetProd [DmdResult] RetSum ConTag
\ /
NeverReturns
DmdResult
Changes to - The new
Converges
contructor is added to track definite convergence.
Dunno CPRResult
/ \
ThrowsExn Converges CPRResult (new)
/
Diverges
This information is used to tell when it is safe to perform a nested CPR worker-wrapper transformation. Unpacking a nested component in the return value is safe only when that component definitely converges.
Changes to the demand analyzer
-
Infer nested CPR property when possible.
-
Slightly strenghten the strictness analysis so that in the following code, both
foo
andbar
get the strictnessS(SS)
on their first argument. Previously onlyfoo
gotS(SS)
, wherebar
gotS
.
{-# LANGUAGE BangPatterns #-}
data Complex a = !a :+ !a
foo :: Complex Double -> Int -> Complex Double
foo !x 0 = x
foo (a :+ b) _ = b :+ a
bar :: Complex Double -> Int -> Complex Double
bar x 1 = x
bar (a :+ b) _ = b :+ a
Changes to the worker-wrapper transformer
- Apply nested CPR transformation. For example, if a function that returns
(# State# RealWorld, (Int, Int, Int) #)
has the CPR informationm(t, tm(tm(t), m(t), t))
, then the worker function would return the type(# State# RealWorld, Int#, Int, Int #)
, unboxing the triple and one of theInt
s.
Examples
simple.hs
This is a simple recursive function with an easy-to-spot nested CPR property. Note that a - 1
is always convering in an obvious way.
Status: ok
module Foo where
f :: Int -> (Int, Int)
f 0 = (1, 2)
f n
| even n = f (div n 2)
| otherwise = case f (n - 1) of
(a, b) -> (a - 1, b)
strictness.hs
This one is trickier in that the analysis has to use the nested strictness of foo
on p
to give p
a nested CPR property. Inspired by nofib/imaginary/x2n1
.
Indeed, it's not obvious how the last branch has the nested CPR property. Consider the condition for the first branch, which will evaluate p
and its two Int
s completely. So, we have the components of p
available deconstructed. According to Note [CPR in a product case alternative]
this is enough to give p
the nested CPR property, in the sense that we could supply a constructed product of that depth if we wanted to (because we can immediately deconstruct it when the wrapper is inlined, for example).
Status: ok
{-# LANGUAGE BangPatterns #-}
module Foo where
foo :: Int -> (Int, Int) -> (Int, Int)
foo n p
| even (n + uncurry (+) p), n /= 0 = foo (n - 1) p
| n == 0 = (1, 2)
| otherwise = p
strict_field.hs
This one needs a correct handling of strict constructor fields. Inspired by nofib/imaginary/x2n1
.
Status: ok
{-# LANGUAGE BangPatterns #-}
module Foo where
data C a = C !a !a
pow :: C Double -> Int -> C Double
pow !_ 0 = C 0 1
pow !c 1 = c
pow c n
| even n = let d = pow c (div n 2) in mul d d
| otherwise = mul c (pow c (n - 1))
mul :: C Double -> C Double -> C Double
mul (C a b) (C d e) = C (a*d-b*e) (a*e+b*d)
strict_field1.hs
This is similar to strict_field.hs
, but needs a more aggressive worker-wrapper.
Status: ok
Changing CPR analysis alone wouldn't help here. We need to give the function a better strictness as well.
module Foo (pow) where
data C a = C !a !a
pow :: C Double -> Int -> C Double
pow x y
| even y = pow (x `mul` x) (y `quot` 2)
| y == 1 = x
| otherwise = pow (x `mul` x) ((y - 1) `quot` 2) `mul` x
mul :: C Double -> C Double -> C Double
mul (C a b) (C d e) = C (a*d-b*e) (a*e+b*d)
peek.hs
This example involves reading a tuple of Ints from memory.
Status: needs -fcpr-depth=4 or higher.
{-# LANGUAGE BangPatterns #-}
module Foo (peek4) where
import Foreign.Storable
import Foreign.Ptr
peek4 :: Ptr Int -> Ptr Int -> (Ptr Int -> IO (Ptr Int, Ptr Int)) -> IO (Ptr Int, Ptr Int, (Int, Int, Int, Int))
peek4 ptr end req
| end `minusPtr` ptr >= 32 = do
val <- (,,,)
<$> peekByteOff ptr 0
<*> peekByteOff ptr 8
<*> peekByteOff ptr 16
<*> peekByteOff ptr 24
let !ptr' = ptr `plusPtr` 32
return (ptr', end, val)
| otherwise = do
(ptr', end') <- req ptr
peek4 ptr' end' req
beam-word-poke.hs
This is a real-world example taken from https://github.com/tsurucapital/beamable/blob/master/src/Data/Beamable/Internal.hs#L159.
It serializes a Word
using a variable-length encoding.
Status: ok
module Foo(beamWordPoke) where
import Data.Bits
import Data.Monoid
import Data.Word
import Foreign.Ptr
import Foreign.Storable
newtype Poke = Poke (Ptr Word8 -> IO (Ptr Word8))
instance Monoid Poke where
mempty = Poke return
mappend (Poke a) (Poke b) = Poke $ \ptr -> a ptr >>= b
beamWordPoke :: Word -> Poke
beamWordPoke n
| next == 0 = pokeWord8 firstSeptet
| otherwise = pokeWord8 (firstSeptet .|. 0x80) <> beamWordPoke next
where
firstSeptet :: Word8
firstSeptet = fromIntegral $ n .&. 0x7F
next = n `shiftR` 7
pokeWord8 :: Word8 -> Poke
pokeWord8 w = Poke $ \p -> do poke p w; return $! p `plusPtr` 1