Commit b572aadb authored by Eric Seidel's avatar Eric Seidel Committed by Ben Gamari

Do Worker/Wrapper for NOINLINE things

Disabling worker/wrapper for NOINLINE things can cause unnecessary
reboxing of values. Consider

    {-# NOINLINE f #-}
    f :: Int -> a
    f x = error (show x)

    g :: Bool -> Bool -> Int -> Int
    g True  True  p = f p
    g False True  p = p + 1
    g b     False p = g b True p

the strictness analysis will discover f and g are strict, but because f
has no wrapper, the worker for g will rebox p. So we get

    $wg x y p# =
      let p = I# p# in  -- Yikes! Reboxing!
      case x of
        False ->
          case y of
            False -> $wg False True p#
            True -> +# p# 1#
        True ->
          case y of
            False -> $wg True True p#
            True -> case f p of { }

    g x y p = case p of (I# p#) -> $wg x y p#

Now, in this case the reboxing will float into the True branch, an so
the allocation will only happen on the error path. But it won't float
inwards if there are multiple branches that call (f p), so the reboxing
will happen on every call of g. Disaster.

Solution: do worker/wrapper even on NOINLINE things; but move the
NOINLINE pragma to the worker.

Test Plan: make test TEST="13143"

Reviewers: simonpj, bgamari, dfeuer, austin

Reviewed By: simonpj, bgamari

Subscribers: dfeuer, thomie

Differential Revision: https://phabricator.haskell.org/D3046
parent a9754e3c
......@@ -54,7 +54,7 @@ import DataCon
import Literal
import PrimOp
import IdInfo
import BasicTypes ( Arity )
import BasicTypes ( Arity, InlineSpec(..), inlinePragmaSpec )
import Type
import PrelNames
import TysPrim ( realWorldStatePrimTy )
......@@ -1031,6 +1031,9 @@ certainlyWillInline dflags fn_info
-- See Note [certainlyWillInline: INLINABLE]
do_cunf expr (UnfIfGoodArgs { ug_size = size, ug_args = args })
| not (null args) -- See Note [certainlyWillInline: be careful of thunks]
, case inlinePragmaSpec (inlinePragInfo fn_info) of
NoInline -> False -- NOINLINE; do not say certainlyWillInline!
_ -> True -- INLINE, INLINABLE, or nothing
, let arity = length args
, size - (10 * (arity + 1)) <= ufUseThreshold dflags
= Just (fn_unf { uf_src = InlineStable
......
......@@ -199,17 +199,79 @@ unfolding to the *worker*. So we will get something like this:
How do we "transfer the unfolding"? Easy: by using the old one, wrapped
in work_fn! See CoreUnfold.mkWorkerUnfolding.
Note [Activation for INLINABLE worker]
Note [Worker-wrapper for NOINLINE functions]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
We used to disable worker/wrapper for NOINLINE things, but it turns out
this can cause unnecessary reboxing of values. Consider
{-# NOINLINE f #-}
f :: Int -> a
f x = error (show x)
g :: Bool -> Bool -> Int -> Int
g True True p = f p
g False True p = p + 1
g b False p = g b True p
the strictness analysis will discover f and g are strict, but because f
has no wrapper, the worker for g will rebox p. So we get
$wg x y p# =
let p = I# p# in -- Yikes! Reboxing!
case x of
False ->
case y of
False -> $wg False True p#
True -> +# p# 1#
True ->
case y of
False -> $wg True True p#
True -> case f p of { }
g x y p = case p of (I# p#) -> $wg x y p#
Now, in this case the reboxing will float into the True branch, an so
the allocation will only happen on the error path. But it won't float
inwards if there are multiple branches that call (f p), so the reboxing
will happen on every call of g. Disaster.
Solution: do worker/wrapper even on NOINLINE things; but move the
NOINLINE pragma to the worker.
(See Trac #13143 for a real-world example.)
Note [Activation for workers]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Follows on from Note [Worker-wrapper for INLINABLE functions]
It is *vital* that if the worker gets an INLINABLE pragma (from the
original function), then the worker has the same phase activation as
the wrapper (or later). That is necessary to allow the wrapper to
inline into the worker's unfolding: see SimplUtils
Note [Simplifying inside stable unfoldings].
Notihng is lost by giving the worker the same activation as the
worker, because the worker won't have any chance of inlining until the
If the original is NOINLINE, it's important that the work inherit the
original activation. Consider
{-# NOINLINE expensive #-}
expensive x = x + 1
f y = let z = expensive y in ...
If expensive's worker inherits the wrapper's activation, we'll get
{-# NOINLINE[0] $wexpensive #-}
$wexpensive x = x + 1
{-# INLINE[0] expensive #-}
expensive x = $wexpensive x
f y = let z = expensive y in ...
and $wexpensive will be immediately inlined into expensive, followed by
expensive into f. This effectively removes the original NOINLINE!
Otherwise, nothing is lost by giving the worker the same activation as the
wrapper, because the worker won't have any chance of inlining until the
wrapper does; there's no point in giving it an earlier activation.
Note [Don't w/w inline small non-loop-breaker things]
......@@ -326,11 +388,7 @@ tryWW :: DynFlags
-- if two, then a worker and a
-- wrapper.
tryWW dflags fam_envs is_rec fn_id rhs
| isNeverActive inline_act
-- No point in worker/wrappering if the thing is never inlined!
-- Because the no-inline prag will prevent the wrapper ever
-- being inlined at a call site.
= return [ (new_fn_id, rhs) ]
-- See Note [Worker-wrapper for NOINLINE functions]
| Just stable_unf <- certainlyWillInline dflags fn_info
= return [ (fn_id `setIdUnfolding` stable_unf, rhs) ]
......@@ -348,7 +406,6 @@ tryWW dflags fam_envs is_rec fn_id rhs
where
fn_info = idInfo fn_id
inline_act = inlinePragmaActivation (inlinePragInfo fn_info)
(wrap_dmds, res_info) = splitStrictSig (strictnessInfo fn_info)
new_fn_id = zapIdUsedOnceInfo (zapIdUsageEnvInfo fn_id)
......@@ -412,13 +469,18 @@ splitFun dflags fam_envs fn_id fn_info wrap_dmds res_info rhs
Just (work_demands, join_arity, wrap_fn, work_fn) -> do
work_uniq <- getUniqueM
let work_rhs = work_fn rhs
work_inline = inl_inline inl_prag
work_act = case work_inline of
-- See Note [Activation for workers]
NoInline -> inl_act inl_prag
_ -> wrap_act
work_prag = InlinePragma { inl_src = SourceText "{-# INLINE"
, inl_inline = inl_inline inl_prag
, inl_inline = work_inline
, inl_sat = Nothing
, inl_act = wrap_act
, inl_act = work_act
, inl_rule = FunLike }
-- idl_inline: copy from fn_id; see Note [Worker-wrapper for INLINABLE functions]
-- idl_act: see Note [Activation for INLINABLE workers]
-- idl_act: see Note [Activation for workers]
-- inl_rule: it does not make sense for workers to be constructorlike.
work_join_arity | isJoinId fn_id = Just join_arity
| otherwise = Nothing
......
......@@ -15,7 +15,9 @@ test('join003',
compile_and_run,
[''])
test('join004',
[stats_num_field('bytes allocated', [(wordsize(64), 48146720, 5)])],
[stats_num_field('bytes allocated', [(wordsize(64), 16130592, 5)])],
# 2017-01-24 48146720 Join point rework
# 2017-02-05 16130592 Do Worker/Wrapper for NOINLINE things
compile_and_run,
[''])
......
......@@ -159,8 +159,9 @@ test('MethSharing',
[(wordsize(32), 360940756, 5),
# expected value: 2685858140 (x86/OS X)
# expected: 360940756 (x86/Linux)
(wordsize(64), 640067672, 5)]),
# expected: 640067672 (amd64/Linux)
(wordsize(64), 480098192, 5)]),
# expected: 640067672 (amd64/Linux)
# 2017-01-31: 480098192 work/wrap noinline things
only_ways(['normal'])
],
compile_and_run,
......@@ -481,10 +482,11 @@ test('T13001',
test('T12990',
[stats_num_field('bytes allocated',
[ (wordsize(64), 21640904, 5) ]),
[ (wordsize(64), 20040936, 5) ]),
# 2017-01-03 34440936 w/o inlining unsaturated
# constructor wrappers
# 2017-01-03 21640904 inline wrappers
# 2017-01-31 20040936 work/wrap noinline things
only_ways(['normal'])],
compile_and_run,
['-O2'])
module T13143 where
{-# NOINLINE f #-}
f :: Int -> a
f x = f x
g :: Bool -> Bool -> Int -> Int
g True True p = f p
g False True p = p + 1
g b False p = g b True p
==================== Tidy Core ====================
Result size of Tidy Core
= {terms: 73, types: 50, coercions: 0, joins: 0/0}
Rec {
-- RHS size: {terms: 3, types: 3, coercions: 0, joins: 0/0}
lvl :: forall a. a
[GblId, Str=b]
lvl = \ (@ a) -> T13143.$wf @ a GHC.Prim.void#
-- RHS size: {terms: 3, types: 4, coercions: 0, joins: 0/0}
T13143.$wf [InlPrag=NOINLINE, Occ=LoopBreaker]
:: forall a. GHC.Prim.Void# -> a
[GblId, Arity=1, Str=<B,A>b]
T13143.$wf = \ (@ a) _ [Occ=Dead] -> lvl @ a
end Rec }
-- RHS size: {terms: 3, types: 4, coercions: 0, joins: 0/0}
f [InlPrag=INLINE[0]] :: forall a. Int -> a
[GblId,
Arity=1,
Str=<B,A>b,
Unf=Unf{Src=InlineStable, TopLvl=True, Value=True, ConLike=True,
WorkFree=True, Expandable=True,
Guidance=ALWAYS_IF(arity=1,unsat_ok=True,boring_ok=True)
Tmpl= \ (@ a) _ [Occ=Dead] -> T13143.$wf @ a GHC.Prim.void#}]
f = \ (@ a) _ [Occ=Dead] -> lvl @ a
-- RHS size: {terms: 1, types: 0, coercions: 0, joins: 0/0}
T13143.$trModule4 :: GHC.Prim.Addr#
[GblId,
Caf=NoCafRefs,
Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True,
WorkFree=True, Expandable=True, Guidance=IF_ARGS [] 20 0}]
T13143.$trModule4 = "main"#
-- RHS size: {terms: 2, types: 0, coercions: 0, joins: 0/0}
T13143.$trModule3 :: GHC.Types.TrName
[GblId,
Caf=NoCafRefs,
Str=m1,
Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True,
WorkFree=True, Expandable=True, Guidance=IF_ARGS [] 10 20}]
T13143.$trModule3 = GHC.Types.TrNameS T13143.$trModule4
-- RHS size: {terms: 1, types: 0, coercions: 0, joins: 0/0}
T13143.$trModule2 :: GHC.Prim.Addr#
[GblId,
Caf=NoCafRefs,
Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True,
WorkFree=True, Expandable=True, Guidance=IF_ARGS [] 30 0}]
T13143.$trModule2 = "T13143"#
-- RHS size: {terms: 2, types: 0, coercions: 0, joins: 0/0}
T13143.$trModule1 :: GHC.Types.TrName
[GblId,
Caf=NoCafRefs,
Str=m1,
Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True,
WorkFree=True, Expandable=True, Guidance=IF_ARGS [] 10 20}]
T13143.$trModule1 = GHC.Types.TrNameS T13143.$trModule2
-- RHS size: {terms: 3, types: 0, coercions: 0, joins: 0/0}
T13143.$trModule :: GHC.Types.Module
[GblId,
Caf=NoCafRefs,
Str=m,
Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True,
WorkFree=True, Expandable=True, Guidance=IF_ARGS [] 10 30}]
T13143.$trModule =
GHC.Types.Module T13143.$trModule3 T13143.$trModule1
-- RHS size: {terms: 2, types: 1, coercions: 0, joins: 0/0}
lvl1 :: Int
[GblId, Str=b]
lvl1 = T13143.$wf @ Int GHC.Prim.void#
Rec {
-- RHS size: {terms: 28, types: 7, coercions: 0, joins: 0/0}
T13143.$wg [InlPrag=[0], Occ=LoopBreaker]
:: Bool -> Bool -> GHC.Prim.Int# -> GHC.Prim.Int#
[GblId, Arity=3, Str=<S,1*U><S,1*U><S,U>]
T13143.$wg =
\ (w :: Bool) (w1 :: Bool) (ww :: GHC.Prim.Int#) ->
case w of {
False ->
case w1 of {
False -> T13143.$wg GHC.Types.False GHC.Types.True ww;
True -> GHC.Prim.+# ww 1#
};
True ->
case w1 of {
False -> T13143.$wg GHC.Types.True GHC.Types.True ww;
True -> case lvl1 of wild2 { }
}
}
end Rec }
-- RHS size: {terms: 14, types: 6, coercions: 0, joins: 0/0}
g [InlPrag=INLINE[0]] :: Bool -> Bool -> Int -> Int
[GblId,
Arity=3,
Str=<S,1*U><S,1*U><S(S),1*U(U)>m,
Unf=Unf{Src=InlineStable, TopLvl=True, Value=True, ConLike=True,
WorkFree=True, Expandable=True,
Guidance=ALWAYS_IF(arity=3,unsat_ok=True,boring_ok=False)
Tmpl= \ (w [Occ=Once] :: Bool)
(w1 [Occ=Once] :: Bool)
(w2 [Occ=Once!] :: Int) ->
case w2 of { GHC.Types.I# ww1 [Occ=Once] ->
case T13143.$wg w w1 ww1 of ww2 { __DEFAULT -> GHC.Types.I# ww2 }
}}]
g =
\ (w :: Bool) (w1 :: Bool) (w2 :: Int) ->
case w2 of { GHC.Types.I# ww1 ->
case T13143.$wg w w1 ww1 of ww2 { __DEFAULT -> GHC.Types.I# ww2 }
}
==================== Tidy Core ====================
Result size of Tidy Core
= {terms: 40, types: 16, coercions: 0, joins: 0/0}
= {terms: 44, types: 19, coercions: 0, joins: 0/0}
Rec {
-- RHS size: {terms: 10, types: 2, coercions: 0, joins: 0/0}
......@@ -15,18 +15,30 @@ $wxs =
}
end Rec }
-- RHS size: {terms: 14, types: 5, coercions: 0, joins: 0/0}
foo [InlPrag=NOINLINE] :: Int -> ()
[GblId, Arity=1, Caf=NoCafRefs, Str=<S(S),1*U(U)>]
foo =
\ (n :: Int) ->
case n of { GHC.Types.I# y ->
case GHC.Prim.tagToEnum# @ Bool (GHC.Prim.<# 0# y) of {
-- RHS size: {terms: 11, types: 3, coercions: 0, joins: 0/0}
T3772.$wfoo [InlPrag=NOINLINE] :: GHC.Prim.Int# -> ()
[GblId, Arity=1, Caf=NoCafRefs, Str=<S,U>]
T3772.$wfoo =
\ (ww :: GHC.Prim.Int#) ->
case GHC.Prim.tagToEnum# @ Bool (GHC.Prim.<# 0# ww) of {
False -> GHC.Tuple.();
True -> $wxs y
}
True -> $wxs ww
}
-- RHS size: {terms: 6, types: 3, coercions: 0, joins: 0/0}
foo [InlPrag=INLINE[0]] :: Int -> ()
[GblId,
Arity=1,
Caf=NoCafRefs,
Str=<S(S),1*U(U)>,
Unf=Unf{Src=InlineStable, TopLvl=True, Value=True, ConLike=True,
WorkFree=True, Expandable=True,
Guidance=ALWAYS_IF(arity=1,unsat_ok=True,boring_ok=False)
Tmpl= \ (w [Occ=Once!] :: Int) ->
case w of { GHC.Types.I# ww1 [Occ=Once] -> T3772.$wfoo ww1 }}]
foo =
\ (w :: Int) -> case w of { GHC.Types.I# ww1 -> T3772.$wfoo ww1 }
-- RHS size: {terms: 1, types: 0, coercions: 0, joins: 0/0}
T3772.$trModule2 :: GHC.Prim.Addr#
[GblId,
......
......@@ -20,7 +20,15 @@ T7360.$WFoo3 =
-- RHS size: {terms: 5, types: 2, coercions: 0, joins: 0/0}
fun1 [InlPrag=NOINLINE] :: Foo -> ()
[GblId, Arity=1, Caf=NoCafRefs, Str=<S,1*U>]
[GblId,
Arity=1,
Caf=NoCafRefs,
Str=<S,1*U>,
Unf=Unf{Src=InlineStable, TopLvl=True, Value=True, ConLike=True,
WorkFree=True, Expandable=True,
Guidance=ALWAYS_IF(arity=1,unsat_ok=True,boring_ok=False)
Tmpl= \ (x [Occ=Once] :: Foo) ->
case x of { __DEFAULT -> GHC.Tuple.() }}]
fun1 = \ (x :: Foo) -> case x of { __DEFAULT -> GHC.Tuple.() }
-- RHS size: {terms: 2, types: 0, coercions: 0, joins: 0/0}
......
expensive [InlPrag=NOINLINE] :: Int -> Int
T7865.$wexpensive [InlPrag=NOINLINE]
T7865.$wexpensive =
expensive [InlPrag=INLINE[0]] :: Int -> Int
case T7865.$wexpensive ww1 of ww2 { __DEFAULT -> GHC.Types.I# ww2 }
expensive =
case expensive sc1 of { GHC.Types.I# x ->
(case expensive x of { GHC.Types.I# x1 ->
case T7865.$wexpensive ww1 of ww2 { __DEFAULT -> GHC.Types.I# ww2 }
case T7865.$wexpensive ww1 of ww2 { __DEFAULT ->
case T7865.$wexpensive ww1 of ww2 { __DEFAULT ->
......@@ -232,6 +232,7 @@ test('T13025',
normal,
run_command,
['$MAKE -s --no-print-directory T13025'])
test('T13143', normal, compile, ['-O -ddump-simpl -dsuppress-uniques'])
test('T13156', normal, run_command, ['$MAKE -s --no-print-directory T13156'])
test('T11444', normal, compile, [''])
test('str-rules',
......
==================== Tidy Core ====================
Result size of Tidy Core = {terms: 59, types: 41, coercions: 0}
Result size of Tidy Core = {terms: 70, types: 63, coercions: 0}
-- RHS size: {terms: 39, types: 23, coercions: 0}
pm [InlPrag=NOINLINE] :: Int -> Int -> (Int, Int)
[GblId, Arity=2, Str=<L,U(U)><L,U(U)>m]
pm =
\ (x_axr :: Int) (y_axs :: Int) ->
-- RHS size: {terms: 39, types: 25, coercions: 0}
T10694.$wpm [InlPrag=NOINLINE] :: Int -> Int -> (# Int, Int #)
[GblId, Arity=2, Str=<L,U(U)><L,U(U)>]
T10694.$wpm =
\ (w_sVU :: Int) (w1_sVV :: Int) ->
let {
l_sVj :: Int
l_sUQ :: Int
[LclId]
l_sVj =
case x_axr of { GHC.Types.I# x1_aUL -> case y_axs of { GHC.Types.I# y1_aUP -> GHC.Types.I# (GHC.Prim.+# x1_aUL y1_aUP) } } } in
l_sUQ = case w_sVU of { GHC.Types.I# x_aUi -> case w1_sVV of { GHC.Types.I# y_aUm -> GHC.Types.I# (GHC.Prim.+# x_aUi y_aUm) } } } in
let {
l1_sVl :: Int
l1_sUS :: Int
[LclId]
l1_sVl =
case x_axr of { GHC.Types.I# x1_aUV -> case y_axs of { GHC.Types.I# y1_aUZ -> GHC.Types.I# (GHC.Prim.-# x1_aUV y1_aUZ) } } } in
l1_sUS = case w_sVU of { GHC.Types.I# x_aUs -> case w1_sVV of { GHC.Types.I# y_aUw -> GHC.Types.I# (GHC.Prim.-# x_aUs y_aUw) } } } in
let {
l2_sVk :: [Int]
[LclId]
l2_sVk = GHC.Types.: @ Int l1_sVl (GHC.Types.[] @ Int) } in
l2_sUR :: [Int]
[LclId, Unf=OtherCon []]
l2_sUR = GHC.Types.: @ Int l1_sUS (GHC.Types.[] @ Int) } in
let {
l3_sVa :: [Int]
[LclId]
l3_sVa = GHC.Types.: @ Int l_sVj l2_sVk } in
(GHC.List.$w!! @ Int l3_sVa 0#, GHC.List.$w!! @ Int l3_sVa 1#)
l3_sUH :: [Int]
[LclId, Unf=OtherCon []]
l3_sUH = GHC.Types.: @ Int l_sUQ l2_sUR } in
(# GHC.List.$w!! @ Int l3_sUH 0#, GHC.List.$w!! @ Int l3_sUH 1# #)
-- RHS size: {terms: 10, types: 11, coercions: 0}
pm [InlPrag=INLINE[0]] :: Int -> Int -> (Int, Int)
[GblId,
Arity=2,
Str=<L,U(U)><L,U(U)>m,
Unf=Unf{Src=InlineStable, TopLvl=True, Value=True, ConLike=True, WorkFree=True, Expandable=True,
Guidance=ALWAYS_IF(arity=2,unsat_ok=True,boring_ok=False)
Tmpl= \ (w_sVU [Occ=Once] :: Int) (w1_sVV [Occ=Once] :: Int) ->
case T10694.$wpm w_sVU w1_sVV of { (# ww1_sW0 [Occ=Once], ww2_sW1 [Occ=Once] #) -> (ww1_sW0, ww2_sW1) }}]
pm = \ (w_sVU :: Int) (w1_sVV :: Int) -> case T10694.$wpm w_sVU w1_sVV of { (# ww1_sW0, ww2_sW1 #) -> (ww1_sW0, ww2_sW1) }
-- RHS size: {terms: 8, types: 7, coercions: 0}
-- RHS size: {terms: 8, types: 9, coercions: 0}
m :: Int -> Int -> Int
[GblId,
Arity=2,
Str=<L,U(U)><L,U(U)>,
Unf=Unf{Src=InlineStable, TopLvl=True, Value=True, ConLike=True, WorkFree=True, Expandable=True,
Guidance=ALWAYS_IF(arity=2,unsat_ok=True,boring_ok=False)
Tmpl= \ (x_aCt [Occ=Once] :: Int) (y_aCu [Occ=Once] :: Int) ->
case pm x_aCt y_aCu of { (_ [Occ=Dead], mr_aCw [Occ=Once]) -> mr_aCw }}]
m = \ (x_aCt :: Int) (y_aCu :: Int) -> case pm x_aCt y_aCu of { (pr_aCv, mr_aCw) -> mr_aCw }
Tmpl= \ (x_aCT [Occ=Once] :: Int) (y_aCU [Occ=Once] :: Int) ->
case pm x_aCT y_aCU of { (_ [Occ=Dead], mr_aCW [Occ=Once]) -> mr_aCW }}]
m = \ (x_aCT :: Int) (y_aCU :: Int) -> case T10694.$wpm x_aCT y_aCU of { (# ww1_sW0, ww2_sW1 #) -> ww2_sW1 }
-- RHS size: {terms: 2, types: 0, coercions: 0}
T10694.$trModule2 :: GHC.Types.TrName
......
......@@ -8,7 +8,7 @@ BottomFromInnerLambda.f: <S(S),1*U(U)>
==================== Strictness signatures ====================
BottomFromInnerLambda.$trModule: m
BottomFromInnerLambda.expensive: <S(S),1*U(U)>m
BottomFromInnerLambda.f: <S(S),1*U(U)>
BottomFromInnerLambda.expensive: <S(S),1*U(1*U)>m
BottomFromInnerLambda.f: <S(S),1*U(1*U)>
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment