This page tracks @akio's attempt at reviving @nomeata's work on NestedCPR.
Latest code can be found at https://github.com/takanoakio/ghc/compare/master...nestedcpr .
Was last rebased by @mpickering here: https://github.com/ghc/ghc/compare/master...mpickering:nestedcpr. D4244 is probably more up to date.
Design
CPRResult
Changes to 
In the product case (
RedProd
), keep oneDmdResult
for each component. This is the main change in the branch. 
Add
NeverReturns
contructor. It means that no value is returned from this expression. It's different fromThrowsExn
in that it might perform some side effect. Indeed it might be an IOperforming infinite loop. The addition of this constructor is necessary to infer a nested CPR property for a tail recursive function that does some I/O.
In summary, CPRResult
is changed from:
NoCPR
/ \
RetProd RetSum ConTag
to:
NoCPR
/ \
RetProd [DmdResult] RetSum ConTag
\ /
NeverReturns
DmdResult
Changes to  The new
Converges
contructor is added to track definite convergence.
Dunno CPRResult
/ \
ThrowsExn Converges CPRResult (new)
/
Diverges
This information is used to tell when it is safe to perform a nested CPR workerwrapper transformation. Unpacking a nested component in the return value is safe only when that component definitely converges.
Changes to the demand analyzer

Infer nested CPR property when possible.

Slightly strenghten the strictness analysis so that in the following code, both
foo
andbar
get the strictnessS(SS)
on their first argument. Previously onlyfoo
gotS(SS)
, wherebar
gotS
.
{# LANGUAGE BangPatterns #}
data Complex a = !a :+ !a
foo :: Complex Double > Int > Complex Double
foo !x 0 = x
foo (a :+ b) _ = b :+ a
bar :: Complex Double > Int > Complex Double
bar x 1 = x
bar (a :+ b) _ = b :+ a
Changes to the workerwrapper transformer
 Apply nested CPR transformation. For example, if a function that returns
(# State# RealWorld, (Int, Int, Int) #)
has the CPR informationm(t, tm(tm(t), m(t), t))
, then the worker function would return the type(# State# RealWorld, Int#, Int, Int #)
, unboxing the triple and one of theInt
s.
Examples
simple.hs
This is a simple recursive function with an easytospot nested CPR property. Note that a  1
is always convering in an obvious way.
Status: ok
module Foo where
f :: Int > (Int, Int)
f 0 = (1, 2)
f n
 even n = f (div n 2)
 otherwise = case f (n  1) of
(a, b) > (a  1, b)
strictness.hs
This one is trickier in that the analysis has to use the nested strictness of foo
on p
to give p
a nested CPR property. Inspired by nofib/imaginary/x2n1
.
Indeed, it's not obvious how the last branch has the nested CPR property. Consider the condition for the first branch, which will evaluate p
and its two Int
s completely. So, we have the components of p
available deconstructed. According to Note [CPR in a product case alternative]
this is enough to give p
the nested CPR property, in the sense that we could supply a constructed product of that depth if we wanted to (because we can immediately deconstruct it when the wrapper is inlined, for example).
Status: ok
{# LANGUAGE BangPatterns #}
module Foo where
foo :: Int > (Int, Int) > (Int, Int)
foo n p
 even (n + uncurry (+) p), n /= 0 = foo (n  1) p
 n == 0 = (1, 2)
 otherwise = p
strict_field.hs
This one needs a correct handling of strict constructor fields. Inspired by nofib/imaginary/x2n1
.
Status: ok
{# LANGUAGE BangPatterns #}
module Foo where
data C a = C !a !a
pow :: C Double > Int > C Double
pow !_ 0 = C 0 1
pow !c 1 = c
pow c n
 even n = let d = pow c (div n 2) in mul d d
 otherwise = mul c (pow c (n  1))
mul :: C Double > C Double > C Double
mul (C a b) (C d e) = C (a*db*e) (a*e+b*d)
strict_field1.hs
This is similar to strict_field.hs
, but needs a more aggressive workerwrapper.
Status: ok
Changing CPR analysis alone wouldn't help here. We need to give the function a better strictness as well.
module Foo (pow) where
data C a = C !a !a
pow :: C Double > Int > C Double
pow x y
 even y = pow (x `mul` x) (y `quot` 2)
 y == 1 = x
 otherwise = pow (x `mul` x) ((y  1) `quot` 2) `mul` x
mul :: C Double > C Double > C Double
mul (C a b) (C d e) = C (a*db*e) (a*e+b*d)
peek.hs
This example involves reading a tuple of Ints from memory.
Status: needs fcprdepth=4 or higher.
{# LANGUAGE BangPatterns #}
module Foo (peek4) where
import Foreign.Storable
import Foreign.Ptr
peek4 :: Ptr Int > Ptr Int > (Ptr Int > IO (Ptr Int, Ptr Int)) > IO (Ptr Int, Ptr Int, (Int, Int, Int, Int))
peek4 ptr end req
 end `minusPtr` ptr >= 32 = do
val < (,,,)
<$> peekByteOff ptr 0
<*> peekByteOff ptr 8
<*> peekByteOff ptr 16
<*> peekByteOff ptr 24
let !ptr' = ptr `plusPtr` 32
return (ptr', end, val)
 otherwise = do
(ptr', end') < req ptr
peek4 ptr' end' req
beamwordpoke.hs
This is a realworld example taken from https://github.com/tsurucapital/beamable/blob/master/src/Data/Beamable/Internal.hs#L159.
It serializes a Word
using a variablelength encoding.
Status: ok
module Foo(beamWordPoke) where
import Data.Bits
import Data.Monoid
import Data.Word
import Foreign.Ptr
import Foreign.Storable
newtype Poke = Poke (Ptr Word8 > IO (Ptr Word8))
instance Monoid Poke where
mempty = Poke return
mappend (Poke a) (Poke b) = Poke $ \ptr > a ptr >>= b
beamWordPoke :: Word > Poke
beamWordPoke n
 next == 0 = pokeWord8 firstSeptet
 otherwise = pokeWord8 (firstSeptet .. 0x80) <> beamWordPoke next
where
firstSeptet :: Word8
firstSeptet = fromIntegral $ n .&. 0x7F
next = n `shiftR` 7
pokeWord8 :: Word8 > Poke
pokeWord8 w = Poke $ \p > do poke p w; return $! p `plusPtr` 1