Opportunities for eta reduction in STG?
Consider an indexed strict left fold on length indexed vectors:
{-# LANGUAGE DataKinds #-}
{-# LANGUAGE GADTs #-}
{-# LANGUAGE RankNTypes #-}
{-# LANGUAGE ScopedTypeVariables #-}
{-# LANGUAGE TypeApplications #-}
{-# LANGUAGE BangPatterns #-}
{-# OPTIONS_GHC -ddump-stg-final -dsuppress-all -O2 #-}
module Eta where
import Data.Coerce
data Nat = Z | S Nat
data Vec n a where
Nil :: Vec 'Z a
Cons :: a -> Vec n a -> Vec ('S n) a
newtype Succ b n = Succ { unSucc :: b (S n) }
ifoldl' :: forall b n a. (forall m. b m -> a -> b ('S m)) -> b 'Z -> Vec n a -> b n
ifoldl' f z Nil = z
ifoldl' f !z (Cons x xs) = unSucc $ ifoldl' (\(Succ m) a -> Succ (f m a)) (Succ $ f z x) xs
(Technique adapted from line 251 of this functional pearl submission)
This is implemented using a polymorphic recursion, so our accumulating function has to be wrapped in newtype wrappers for the recursive call.
Since newtypes are supposed to be "free" and without runtime cost, we might expect the accumalating function to eta-reduce and inline away without allocating a closure. However, this is not the case. With -ddump-simpl
, we get:
ifoldl'
= \ @b_als @n_alt @a_alu f_ahC z_ahD ds_dBQ ->
case ds_dBQ of {
Nil co_alv -> z_ahD `cast` <Co:3>;
Cons @ipv_sCu ipv1_sCv ipv2_sCw ipv3_sCx ->
case z_ahD of z1_X2 { __DEFAULT ->
(ifoldl'
@(Succ b_als)
@ipv_sCu
@a_alu
((\ @m_alG ds1_dCf a1_ahJ ->
f_ahC @('S m_alG) (ds1_dCf `cast` <Co:3>) a1_ahJ)
`cast` <Co:15>)
((f_ahC @'Z z1_X2 ipv2_sCw) `cast` <Co:4>)
ipv3_sCx)
`cast` <Co:7>
Indeed, in core we can't get away with eta-reducing \ @m_alG ds1_dCf a1_ahJ -> f_ahC @('S m_alG) (ds1_dCf
cast <Co:3>) a1_ahJ
to just f_ahC
since the @m_alG
type parameter occurs non-trivially in the first argument of f_ahC
and also in <Co:3>
.
However, since STG doesn't include type information, we might expect the eta reduction to be performed there. However, this is not so:
ifoldl' =
\r [f_sGf z_sGg ds_sGh]
case ds_sGh of {
Nil -> z_sGg;
Cons ipv2_sGj ipv3_sGk ->
case z_sGg of z1_sGl {
__DEFAULT ->
case f_sGf z1_sGl ipv2_sGj of sat_sGp {
__DEFAULT ->
let { sat_sGo = \r [ds1_sGm a1_sGn] f_sGf ds1_sGm a1_sGn;
} in ifoldl' sat_sGo sat_sGp ipv3_sGk;
};
};
};
Instead we allocate a closure sat_sGo = \r [ds1_sGm a1_sGn] f_sGf ds1_sGm a1_sGn;
which could clearly just be eta reduced to f_sGf
.
To demonstrate that it is indeed possible to write the same function without allocating a closure, we can rewrite it with coerce
(which was slightly annoying since we don't have lambda bound type variables in the surface language, so it has to be
hacked around with ScopedTypeVariables
):
ifoldl' :: forall b n a. (forall m. b m -> a -> b ('S m)) -> b 'Z -> Vec n a -> b n
ifoldl' f z Nil = z
ifoldl' f !z (Cons x xs) = unSucc $ ifoldl' (coF f) (Succ $ f z x) xs
{-# INLINE coF #-}
coF :: forall b a. (forall m. b m -> a -> b ('S m)) -> forall m. Succ b m -> a -> Succ b (S m)
coF f = go
where
go :: forall m. Succ b m -> a -> Succ b (S m)
go = coerce (f @(S m))
And this results in exactly the stg we want, without unnecessary closure allocations:
ifoldl' =
\r [f_sGc z_sGd ds_sGe]
case ds_sGe of {
Nil -> z_sGd;
Cons ipv2_sGg ipv3_sGh ->
case z_sGd of z1_sGi {
__DEFAULT ->
case f_sGc z1_sGi ipv2_sGg of sat_sGj {
__DEFAULT -> ifoldl' f_sGc sat_sGj ipv3_sGh;
};
};
};