SimplUtils.lhs 63.6 KB
Newer Older
1
%
2
% (c) The AQUA Project, Glasgow University, 1993-1998
3
4
5
6
%
\section[SimplUtils]{The simplifier utilities}

\begin{code}
Ian Lynagh's avatar
Ian Lynagh committed
7
8
9
10
11
12
13
{-# OPTIONS -fno-warn-tabs #-}
-- The above warning supression flag is a temporary kludge.
-- While working on this module you are encouraged to remove it and
-- detab the module (please do the detabbing in a separate patch). See
--     http://hackage.haskell.org/trac/ghc/wiki/Commentary/CodingStyle#TabsvsSpaces
-- for details

14
module SimplUtils (
15
	-- Rebuilding
16
	mkLam, mkCase, prepareAlts, tryEtaExpand,
17
18

	-- Inlining,
19
	preInlineUnconditionally, postInlineUnconditionally, 
20
21
22
	activeUnfolding, activeRule, 
	getUnfoldingInRuleMatch, 
        simplEnvForGHCi, updModeForInlineRules,
23
24

	-- The continuation type
25
	SimplCont(..), DupFlag(..), ArgInfo(..),
26
        isSimplified,
27
	contIsDupable, contResultType, contIsTrivial, contArgs, dropArgs, 
28
	pushSimplifiedArgs, countValArgs, countArgs, addArgTo,
29
	mkBoringStop, mkRhsStop, mkLazyArgStop, contIsRhsOrArg,
30
	interestingCallContext, 
31

32
33
34
	interestingArg, mkArgInfo,
	
	abstractFloats
35
36
    ) where

37
#include "HsVersions.h"
38

39
import SimplEnv
40
import CoreMonad        ( SimplifierMode(..), Tick(..) )
41
import MkCore           ( sortQuantVars )
42
43
import DynFlags
import StaticFlags
44
import CoreSyn
45
import qualified CoreSubst
46
import PprCore
47
48
import CoreFVs
import CoreUtils
49
import CoreArity
50
import CoreUnfold
51
import Name
52
import Id
53
import Var
54
import Demand
55
import SimplMonad
56
import Type	hiding( substTy )
57
import Coercion hiding( substCo )
58
import DataCon          ( dataConWorkId )
59
import VarSet
60
61
import BasicTypes
import Util
62
import MonadUtils
63
import Outputable
64
import FastString
65
import Pair
66

67
import Control.Monad    ( when )
68
69
\end{code}

70

71
72
%************************************************************************
%*									*
73
		The SimplCont type
74
75
76
%*									*
%************************************************************************

77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
A SimplCont allows the simplifier to traverse the expression in a 
zipper-like fashion.  The SimplCont represents the rest of the expression,
"above" the point of interest.

You can also think of a SimplCont as an "evaluation context", using
that term in the way it is used for operational semantics. This is the
way I usually think of it, For example you'll often see a syntax for
evaluation context looking like
	C ::= []  |  C e   |  case C of alts  |  C `cast` co
That's the kind of thing we are doing here, and I use that syntax in
the comments.


Key points:
  * A SimplCont describes a *strict* context (just like 
    evaluation contexts do).  E.g. Just [] is not a SimplCont

  * A SimplCont describes a context that *does not* bind
    any variables.  E.g. \x. [] is not a SimplCont

97
\begin{code}
98
99
data SimplCont	
  = Stop		-- An empty context, or hole, []     
100
	CallCtxt	-- True <=> There is something interesting about
101
102
			--          the context, and hence the inliner
			--	    should be a bit keener (see interestingCallContext)
103
104
			-- Specifically:
			--     This is an argument of a function that has RULES
105
			--     Inlining the call might allow the rule to fire
106

107
108
  | CoerceIt 		-- C `cast` co
	OutCoercion		-- The coercion simplified
109
				-- Invariant: never an identity coercion
110
	SimplCont
111

112
  | ApplyTo  		-- C arg
113
114
	DupFlag			-- See Note [DupFlag invariants]
	InExpr StaticEnv	-- The argument and its static env
115
	SimplCont
116

117
  | Select   		-- case C of alts
118
119
	DupFlag 	                -- See Note [DupFlag invariants]
	InId InType [InAlt] StaticEnv	-- The case binder, alts type, alts, and subst-env
120
	SimplCont
121

122
123
124
  -- The two strict forms have no DupFlag, because we never duplicate them
  | StrictBind 		-- (\x* \xs. e) C
	InId [InBndr]		-- let x* = [] in e 	
125
	InExpr StaticEnv	--	is a special case 
126
	SimplCont	
127

128
129
130
131
132
  | StrictArg 		-- f e1 ..en C
 	ArgInfo		-- Specifies f, e1..en, Whether f has rules, etc
			--     plus strictness flags for *further* args
        CallCtxt        -- Whether *this* argument position is interesting
	SimplCont		
133

134
135
136
137
138
  | TickIt
        (Tickish Id)    -- Tick tickish []
        SimplCont

data ArgInfo
139
  = ArgInfo {
140
141
142
143
144
145
146
147
148
        ai_fun   :: Id,		-- The function
	ai_args  :: [OutExpr],	-- ...applied to these args (which are in *reverse* order)
	ai_rules :: [CoreRule],	-- Rules for this function

	ai_encl :: Bool,	-- Flag saying whether this function 
				-- or an enclosing one has rules (recursively)
				--	True => be keener to inline in all args
	
	ai_strs :: [Bool],	-- Strictness of remaining arguments
149
150
151
				--   Usually infinite, but if it is finite it guarantees
				--   that the function diverges after being given
				--   that number of args
152
	ai_discs :: [Int]	-- Discounts for remaining arguments; non-zero => be keener to inline
153
154
				--   Always infinite
    }
155

156
157
158
addArgTo :: ArgInfo -> OutExpr -> ArgInfo
addArgTo ai arg = ai { ai_args = arg : ai_args ai }

159
instance Outputable SimplCont where
160
161
162
163
164
165
166
167
168
  ppr (Stop interesting)    	       = ptext (sLit "Stop") <> brackets (ppr interesting)
  ppr (ApplyTo dup arg _ cont)         = ((ptext (sLit "ApplyTo") <+> ppr dup <+> pprParendExpr arg)
				       	  {-  $$ nest 2 (pprSimplEnv se) -}) $$ ppr cont
  ppr (StrictBind b _ _ _ cont)        = (ptext (sLit "StrictBind") <+> ppr b) $$ ppr cont
  ppr (StrictArg ai _ cont)            = (ptext (sLit "StrictArg") <+> ppr (ai_fun ai)) $$ ppr cont
  ppr (Select dup bndr ty alts se cont) = (ptext (sLit "Select") <+> ppr dup <+> ppr bndr <+> ppr ty) $$ 
				         (nest 2 $ vcat [ppr (seTvSubst se), ppr alts]) $$ ppr cont 
  ppr (CoerceIt co cont)	       = (ptext (sLit "CoerceIt") <+> ppr co) $$ ppr cont
  ppr (TickIt t cont)                  = (ptext (sLit "TickIt") <+> ppr t) $$ ppr cont
169

170
171
172
173
174
175
176
data DupFlag = NoDup       -- Unsimplified, might be big
             | Simplified  -- Simplified
             | OkToDup     -- Simplified and small

isSimplified :: DupFlag -> Bool
isSimplified NoDup = False
isSimplified _     = True	-- Invariant: the subst-env is empty
177
178

instance Outputable DupFlag where
179
180
181
182
  ppr OkToDup    = ptext (sLit "ok")
  ppr NoDup      = ptext (sLit "nodup")
  ppr Simplified = ptext (sLit "simpl")
\end{code}
183

184
185
186
187
188
Note [DupFlag invariants]
~~~~~~~~~~~~~~~~~~~~~~~~~
In both (ApplyTo dup _ env k)
   and  (Select dup _ _ env k)
the following invariants hold
189

190
191
192
  (a) if dup = OkToDup, then continuation k is also ok-to-dup
  (b) if dup = OkToDup or Simplified, the subst-env is empty
      (and and hence no need to re-simplify)
193

194
\begin{code}
195
-------------------
196
197
mkBoringStop :: SimplCont
mkBoringStop = Stop BoringCtxt
198

199
200
201
mkRhsStop :: SimplCont	-- See Note [RHS of lets] in CoreUnfold
mkRhsStop = Stop (ArgCtxt False)

202
203
mkLazyArgStop :: CallCtxt -> SimplCont
mkLazyArgStop cci = Stop cci
204

205
-------------------
Ian Lynagh's avatar
Ian Lynagh committed
206
207
208
209
210
contIsRhsOrArg :: SimplCont -> Bool
contIsRhsOrArg (Stop {})       = True
contIsRhsOrArg (StrictBind {}) = True
contIsRhsOrArg (StrictArg {})  = True
contIsRhsOrArg _               = False
211

212
-------------------
213
contIsDupable :: SimplCont -> Bool
214
215
216
217
218
contIsDupable (Stop {})                    = True
contIsDupable (ApplyTo  OkToDup _ _ _)     = True	-- See Note [DupFlag invariants]
contIsDupable (Select   OkToDup _ _ _ _ _) = True -- ...ditto...
contIsDupable (CoerceIt _ cont)            = contIsDupable cont
contIsDupable _                            = False
219

220
-------------------
221
contIsTrivial :: SimplCont -> Bool
Ian Lynagh's avatar
Ian Lynagh committed
222
contIsTrivial (Stop {})                   = True
223
contIsTrivial (ApplyTo _ (Type _) _ cont) = contIsTrivial cont
224
contIsTrivial (ApplyTo _ (Coercion _) _ cont) = contIsTrivial cont
Ian Lynagh's avatar
Ian Lynagh committed
225
226
contIsTrivial (CoerceIt _ cont)           = contIsTrivial cont
contIsTrivial _                           = False
227

228
-------------------
229
230
231
232
contResultType :: SimplEnv -> OutType -> SimplCont -> OutType
contResultType env ty cont
  = go cont ty
  where
233
234
    subst_ty se ty = SimplEnv.substTy (se `setInScope` env) ty
    subst_co se co = SimplEnv.substCo (se `setInScope` env) co
235

Ian Lynagh's avatar
Ian Lynagh committed
236
    go (Stop {})                      ty = ty
237
    go (CoerceIt co cont)             _  = go cont (pSnd (coercionKind co))
Ian Lynagh's avatar
Ian Lynagh committed
238
    go (StrictBind _ bs body se cont) _  = go cont (subst_ty se (exprType (mkLams bs body)))
239
    go (StrictArg ai _ cont)          _  = go cont (funResultTy (argInfoResultTy ai))
240
    go (Select _ _ ty _ se cont)      _  = go cont (subst_ty se ty)
241
    go (ApplyTo _ arg se cont)        ty = go cont (apply_to_arg ty arg se)
242
    go (TickIt _ cont)                ty = go cont ty
243

244
245
246
    apply_to_arg ty (Type ty_arg)     se = applyTy ty (subst_ty se ty_arg)
    apply_to_arg ty (Coercion co_arg) se = applyCo ty (subst_co se co_arg)
    apply_to_arg ty _                 _  = funResultTy ty
247

248
249
250
251
argInfoResultTy :: ArgInfo -> OutType
argInfoResultTy (ArgInfo { ai_fun = fun, ai_args = args })
  = foldr (\arg fn_ty -> applyTypeToArg fn_ty arg) (idType fun) args

252
-------------------
253
countValArgs :: SimplCont -> Int
Ian Lynagh's avatar
Ian Lynagh committed
254
countValArgs (ApplyTo _ (Type _) _ cont) = countValArgs cont
255
countValArgs (ApplyTo _ (Coercion _) _ cont) = countValArgs cont
Ian Lynagh's avatar
Ian Lynagh committed
256
257
countValArgs (ApplyTo _ _        _ cont) = 1 + countValArgs cont
countValArgs _                           = 0
258
259

countArgs :: SimplCont -> Int
Ian Lynagh's avatar
Ian Lynagh committed
260
261
countArgs (ApplyTo _ _ _ cont) = 1 + countArgs cont
countArgs _                    = 0
262

263
contArgs :: SimplCont -> (Bool, [ArgSummary], SimplCont)
264
-- Uses substitution to turn each arg into an OutExpr
265
266
contArgs cont@(ApplyTo {})
  = case go [] cont of { (args, cont') -> (False, args, cont') }
267
  where
268
269
270
271
272
273
274
275
276
277
    go args (ApplyTo _ arg se cont)
      | isTypeArg arg = go args                           cont
      | otherwise     = go (is_interesting arg se : args) cont
    go args cont      = (reverse args, cont)

    is_interesting arg se = interestingArg (substExpr (text "contArgs") se arg)
    		   -- Do *not* use short-cutting substitution here
		   -- because we want to get as much IdInfo as possible

contArgs cont = (True, [], cont)
278

279
280
281
282
pushSimplifiedArgs :: SimplEnv -> [CoreExpr] -> SimplCont -> SimplCont
pushSimplifiedArgs _env []         cont = cont
pushSimplifiedArgs env  (arg:args) cont = ApplyTo Simplified arg env (pushSimplifiedArgs env args cont)
		   -- The env has an empty SubstEnv
283

284
285
286
287
dropArgs :: Int -> SimplCont -> SimplCont
dropArgs 0 cont = cont
dropArgs n (ApplyTo _ _ _ cont) = dropArgs (n-1) cont
dropArgs n other		= pprPanic "dropArgs" (ppr n <+> ppr other)
288
289
290
\end{code}


291
292
Note [Interesting call context]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
We want to avoid inlining an expression where there can't possibly be
any gain, such as in an argument position.  Hence, if the continuation
is interesting (eg. a case scrutinee, application etc.) then we
inline, otherwise we don't.  

Previously some_benefit used to return True only if the variable was
applied to some value arguments.  This didn't work:

	let x = _coerce_ (T Int) Int (I# 3) in
	case _coerce_ Int (T Int) x of
		I# y -> ....

we want to inline x, but can't see that it's a constructor in a case
scrutinee position, and some_benefit is False.

Another example:

dMonadST = _/\_ t -> :Monad (g1 _@_ t, g2 _@_ t, g3 _@_ t)

....  case dMonadST _@_ x0 of (a,b,c) -> ....

we'd really like to inline dMonadST here, but we *don't* want to
inline if the case expression is just

	case x of y { DEFAULT -> ... }

since we can just eliminate this case instead (x is in WHNF).  Similar
applies when x is bound to a lambda expression.  Hence
contIsInteresting looks for case expressions with just a single
default case.

324

325
\begin{code}
326
interestingCallContext :: SimplCont -> CallCtxt
327
-- See Note [Interesting call context]
328
interestingCallContext cont
329
  = interesting cont
330
  where
331
    interesting (Select _ bndr _ _ _ _)
332
	| isDeadBinder bndr = CaseCtxt
333
	| otherwise	    = ArgCtxt False	-- If the binder is used, this
334
						-- is like a strict let
335
						-- See Note [RHS of lets] in CoreUnfold
336
		
337
338
339
340
341
342
    interesting (ApplyTo _ arg _ cont)
	| isTypeArg arg = interesting cont
	| otherwise     = ValAppCtxt 	-- Can happen if we have (f Int |> co) y
					-- If f has an INLINE prag we need to give it some
					-- motivation to inline. See Note [Cast then apply]
					-- in CoreUnfold
343

344
345
346
    interesting (StrictArg _ cci _) = cci
    interesting (StrictBind {})	    = BoringCtxt
    interesting (Stop cci)   	    = cci
347
    interesting (TickIt _ cci)      = interesting cci
348
    interesting (CoerceIt _ cont)   = interesting cont
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
	-- If this call is the arg of a strict function, the context
	-- is a bit interesting.  If we inline here, we may get useful
	-- evaluation information to avoid repeated evals: e.g.
	--	x + (y * z)
	-- Here the contIsInteresting makes the '*' keener to inline,
	-- which in turn exposes a constructor which makes the '+' inline.
	-- Assuming that +,* aren't small enough to inline regardless.
	--
	-- It's also very important to inline in a strict context for things
	-- like
	--		foldr k z (f x)
	-- Here, the context of (f x) is strict, and if f's unfolding is
	-- a build it's *great* to inline it here.  So we must ensure that
	-- the context for (f x) is not totally uninteresting.


365
-------------------
366
mkArgInfo :: Id
367
	  -> [CoreRule]	-- Rules for function
368
	  -> Int	-- Number of value args
369
	  -> SimplCont	-- Context of the call
370
	  -> ArgInfo
371

372
mkArgInfo fun rules n_val_args call_cont
373
  | n_val_args < idArity fun		-- Note [Unsaturated functions]
374
375
  = ArgInfo { ai_fun = fun, ai_args = [], ai_rules = rules
            , ai_encl = False
376
377
378
	    , ai_strs = vanilla_stricts 
	    , ai_discs = vanilla_discounts }
  | otherwise
379
380
  = ArgInfo { ai_fun = fun, ai_args = [], ai_rules = rules
            , ai_encl = interestingArgContext rules call_cont
381
	    , ai_strs  = add_type_str (idType fun) arg_stricts
382
	    , ai_discs = arg_discounts }
383
  where
384
385
386
    vanilla_discounts, arg_discounts :: [Int]
    vanilla_discounts = repeat 0
    arg_discounts = case idUnfolding fun of
387
			CoreUnfolding {uf_guidance = UnfIfGoodArgs {ug_args = discounts}}
388
			      -> discounts ++ vanilla_discounts
Ian Lynagh's avatar
Ian Lynagh committed
389
			_     -> vanilla_discounts
390
391

    vanilla_stricts, arg_stricts :: [Bool]
392
393
    vanilla_stricts  = repeat False

394
    arg_stricts
395
      = case splitStrictSig (idStrictness fun) of
396
397
398
399
400
401
402
403
404
405
406
407
408
	  (demands, result_info)
		| not (demands `lengthExceeds` n_val_args)
		-> 	-- Enough args, use the strictness given.
			-- For bottoming functions we used to pretend that the arg
			-- is lazy, so that we don't treat the arg as an
			-- interesting context.  This avoids substituting
			-- top-level bindings for (say) strings into 
			-- calls to error.  But now we are more careful about
			-- inlining lone variables, so its ok (see SimplUtils.analyseCont)
		   if isBotRes result_info then
			map isStrictDmd demands		-- Finite => result is bottom
		   else
			map isStrictDmd demands ++ vanilla_stricts
409
410
411
412
413
	       | otherwise
	       -> WARN( True, text "More demands than arity" <+> ppr fun <+> ppr (idArity fun) 
				<+> ppr n_val_args <+> ppr demands ) 
		   vanilla_stricts	-- Not enough args, or no strictness

414
415
416
417
418
419
420
    add_type_str :: Type -> [Bool] -> [Bool]
    -- If the function arg types are strict, record that in the 'strictness bits'
    -- No need to instantiate because unboxed types (which dominate the strict
    -- types) can't instantiate type variables.
    -- add_type_str is done repeatedly (for each call); might be better 
    -- once-for-all in the function
    -- But beware primops/datacons with no strictness
Ian Lynagh's avatar
Ian Lynagh committed
421
    add_type_str _ [] = []
422
    add_type_str fun_ty strs		-- Look through foralls
Ian Lynagh's avatar
Ian Lynagh committed
423
	| Just (_, fun_ty') <- splitForAllTy_maybe fun_ty	-- Includes coercions
424
425
426
427
	= add_type_str fun_ty' strs
    add_type_str fun_ty (str:strs)	-- Add strict-type info
	| Just (arg_ty, fun_ty') <- splitFunTy_maybe fun_ty
	= (str || isStrictType arg_ty) : add_type_str fun_ty' strs
Ian Lynagh's avatar
Ian Lynagh committed
428
    add_type_str _ strs
429
430
	= strs

431
432
433
434
435
436
437
438
439
{- Note [Unsaturated functions]
  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Consider (test eyeball/inline4)
	x = a:as
	y = f x
where f has arity 2.  Then we do not want to inline 'x', because
it'll just be floated out again.  Even if f has lots of discounts
on its first argument -- it must be saturated for these to kick in
-}
440

441
interestingArgContext :: [CoreRule] -> SimplCont -> Bool
442
443
444
445
446
447
448
-- If the argument has form (f x y), where x,y are boring,
-- and f is marked INLINE, then we don't want to inline f.
-- But if the context of the argument is
--	g (f x y) 
-- where g has rules, then we *do* want to inline f, in case it
-- exposes a rule that might fire.  Similarly, if the context is
--	h (g (f x x))
449
450
451
-- where h has rules, then we do want to inline f; hence the
-- call_cont argument to interestingArgContext
--
452
-- The ai-rules flag makes this happen; if it's
453
454
455
456
457
458
-- set, the inliner gets just enough keener to inline f 
-- regardless of how boring f's arguments are, if it's marked INLINE
--
-- The alternative would be to *always* inline an INLINE function,
-- regardless of how boring its context is; but that seems overkill
-- For example, it'd mean that wrapper functions were always inlined
459
460
interestingArgContext rules call_cont
  = notNull rules || enclosing_fn_has_rules
461
  where
462
463
    enclosing_fn_has_rules = go call_cont

464
465
466
467
468
469
    go (Select {})	   = False
    go (ApplyTo {})	   = False
    go (StrictArg _ cci _) = interesting cci
    go (StrictBind {})	   = False	-- ??
    go (CoerceIt _ c)	   = go c
    go (Stop cci)          = interesting cci
470
    go (TickIt _ c)        = go c
471

472
473
    interesting (ArgCtxt rules) = rules
    interesting _               = False
474
475
476
\end{code}


477
478
%************************************************************************
%*									*
479
                  SimplifierMode
480
481
482
%*									*
%************************************************************************

483
484
485
486
487
488
The SimplifierMode controls several switches; see its definition in
CoreMonad
        sm_rules      :: Bool     -- Whether RULES are enabled
        sm_inline     :: Bool     -- Whether inlining is enabled
        sm_case_case  :: Bool     -- Whether case-of-case is enabled
        sm_eta_expand :: Bool     -- Whether eta-expansion is enabled
489
490

\begin{code}
491
492
493
494
495
496
497
498
499
500
501
simplEnvForGHCi :: DynFlags -> SimplEnv
simplEnvForGHCi dflags
  = mkSimplEnv $ SimplMode { sm_names = ["GHCi"]
                           , sm_phase = InitialPhase
                           , sm_rules = rules_on
                           , sm_inline = False
                           , sm_eta_expand = eta_expand_on
                           , sm_case_case = True }
  where
    rules_on      = dopt Opt_EnableRewriteRules   dflags
    eta_expand_on = dopt Opt_DoLambdaEtaExpansion dflags
502
503
   -- Do not do any inlining, in case we expose some unboxed
   -- tuple stuff that confuses the bytecode interpreter
504

505
506
updModeForInlineRules :: Activation -> SimplifierMode -> SimplifierMode
-- See Note [Simplifying inside InlineRules]
507
508
509
510
updModeForInlineRules inline_rule_act current_mode
  = current_mode { sm_phase = phaseFromActivation inline_rule_act
                 , sm_inline = True
                 , sm_eta_expand = False }
511
512
		 -- For sm_rules, just inherit; sm_rules might be "off"
		 -- becuase of -fno-enable-rewrite-rules
513
514
515
  where
    phaseFromActivation (ActiveAfter n) = Phase n
    phaseFromActivation _               = InitialPhase
516
517
518
519
520
521
522
523
524
525
\end{code}

Note [Inlining in gentle mode]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Something is inlined if 
   (i)   the sm_inline flag is on, AND
   (ii)  the thing has an INLINE pragma, AND
   (iii) the thing is inlinable in the earliest phase.  

Example of why (iii) is important:
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
  {-# INLINE [~1] g #-}
  g = ...
  
  {-# INLINE f #-}
  f x = g (g x)

If we were to inline g into f's inlining, then an importing module would
never be able to do
	f e --> g (g e) ---> RULE fires
because the InlineRule for f has had g inlined into it.

On the other hand, it is bad not to do ANY inlining into an
InlineRule, because then recursive knots in instance declarations
don't get unravelled.

simonpj@microsoft.com's avatar
simonpj@microsoft.com committed
541
542
543
However, *sometimes* SimplGently must do no call-site inlining at all
(hence sm_inline = False).  Before full laziness we must be careful
not to inline wrappers, because doing so inhibits floating
544
545
546
547
    e.g. ...(case f x of ...)...
    ==> ...(case (case x of I# x# -> fw x#) of ...)...
    ==> ...(case x of I# x# -> case fw x# of ...)...
and now the redex (f x) isn't floatable any more.
548

simonpj@microsoft.com's avatar
simonpj@microsoft.com committed
549
The no-inlining thing is also important for Template Haskell.  You might be 
550
551
552
553
compiling in one-shot mode with -O2; but when TH compiles a splice before
running it, we don't want to use -O2.  Indeed, we don't want to inline
anything, because the byte-code interpreter might get confused about 
unboxed tuples and suchlike.
554

555
556
557
558
Note [Simplifying inside InlineRules]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
We must take care with simplification inside InlineRules (which come from
INLINE pragmas).  
559

560
First, consider the following example
561
562
563
564
565
     	let f = \pq -> BIG
     	in
     	let g = \y -> f y y
	    {-# INLINE g #-}
     	in ...g...g...g...g...g...
566
567
568
569
570
571
572
Now, if that's the ONLY occurrence of f, it might be inlined inside g,
and thence copied multiple times when g is inlined. HENCE we treat
any occurrence in an InlineRule as a multiple occurrence, not a single
one; see OccurAnal.addRuleUsage.

Second, we do want *do* to some modest rules/inlining stuff in InlineRules,
partly to eliminate senseless crap, and partly to break the recursive knots
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
generated by instance declarations.

However, suppose we have
	{-# INLINE <act> f #-}
	f = <rhs>
meaning "inline f in phases p where activation <act>(p) holds". 
Then what inlinings/rules can we apply to the copy of <rhs> captured in
f's InlineRule?  Our model is that literally <rhs> is substituted for
f when it is inlined.  So our conservative plan (implemented by 
updModeForInlineRules) is this:

  -------------------------------------------------------------
  When simplifying the RHS of an InlineRule, set the phase to the
  phase in which the InlineRule first becomes active
  -------------------------------------------------------------

That ensures that

  a) Rules/inlinings that *cease* being active before p will 
     not apply to the InlineRule rhs, consistent with it being
     inlined in its *original* form in phase p.

  b) Rules/inlinings that only become active *after* p will
     not apply to the InlineRule rhs, again to be consistent with
     inlining the *original* rhs in phase p.

For example, 
       	{-# INLINE f #-}
      	f x = ...g...

      	{-# NOINLINE [1] g #-}
      	g y = ...

      	{-# RULE h g = ... #-}
Here we must not inline g into f's RHS, even when we get to phase 0,
because when f is later inlined into some other module we want the
rule for h to fire.

Similarly, consider
 	{-# INLINE f #-}
	f x = ...g...

	g y = ...
and suppose that there are auto-generated specialisations and a strictness
wrapper for g.  The specialisations get activation AlwaysActive, and the
strictness wrapper get activation (ActiveAfter 0).  So the strictness
wrepper fails the test and won't be inlined into f's InlineRule. That
means f can inline, expose the specialised call to g, so the specialisation
rules can fire.
622

623
624
A note about wrappers
~~~~~~~~~~~~~~~~~~~~~
625
626
627
628
629
630
631
It's also important not to inline a worker back into a wrapper.
A wrapper looks like
	wraper = inline_me (\x -> ...worker... )
Normally, the inline_me prevents the worker getting inlined into
the wrapper (initially, the worker's only call site!).  But,
if the wrapper is sure to be called, the strictness analyser will
mark it 'demanded', so when the RHS is simplified, it'll get an ArgOf
632
continuation. 
633

634
\begin{code}
635
activeUnfolding :: SimplEnv -> Id -> Bool
636
activeUnfolding env
637
638
639
640
641
642
  | not (sm_inline mode) = active_unfolding_minimal
  | otherwise            = case sm_phase mode of
                             InitialPhase -> active_unfolding_gentle
                             Phase n      -> active_unfolding n
  where
    mode = getMode env
643

644
getUnfoldingInRuleMatch :: SimplEnv -> IdUnfoldingFun
645
646
647
648
649
650
-- When matching in RULE, we want to "look through" an unfolding
-- (to see a constructor) if *rules* are on, even if *inlinings* 
-- are not.  A notable example is DFuns, which really we want to 
-- match in rules like (op dfun) in gentle mode. Another example
-- is 'otherwise' which we want exprIsConApp_maybe to be able to
-- see very early on
651
652
653
getUnfoldingInRuleMatch env id
  | unf_is_active = idUnfolding id
  | otherwise     = NoUnfolding
654
  where
655
656
657
658
    mode = getMode env
    unf_is_active
     | not (sm_rules mode) = active_unfolding_minimal id
     | otherwise           = isActive (sm_phase mode) (idInlineActivation id)
659

660
active_unfolding_minimal :: Id -> Bool
661
662
663
664
665
666
667
668
669
670
-- Compuslory unfoldings only
-- Ignore SimplGently, because we want to inline regardless;
-- the Id has no top-level binding at all
--
-- NB: we used to have a second exception, for data con wrappers.
-- On the grounds that we use gentle mode for rule LHSs, and 
-- they match better when data con wrappers are inlined.
-- But that only really applies to the trivial wrappers (like (:)),
-- and they are now constructed as Compulsory unfoldings (in MkId)
-- so they'll happen anyway.
671
672
673
674
active_unfolding_minimal id = isCompulsoryUnfolding (realIdUnfolding id)

active_unfolding :: PhaseNum -> Id -> Bool
active_unfolding n id = isActiveIn n (idInlineActivation id)
675

676
active_unfolding_gentle :: Id -> Bool
677
678
679
-- Anything that is early-active
-- See Note [Gentle mode]
active_unfolding_gentle id
680
681
  =  isInlinePragma prag
  && isEarlyActive (inlinePragmaActivation prag)
682
683
       -- NB: wrappers are not early-active
  where
684
    prag = idInlinePragma id
685

686
----------------------
687
activeRule :: SimplEnv -> Activation -> Bool
688
-- Nothing => No rules at all
689
690
691
activeRule env
  | not (sm_rules mode) = \_ -> False     -- Rewriting is off
  | otherwise           = isActive (sm_phase mode)
692
693
  where
    mode = getMode env
694
\end{code}
695

696
697
698
699
700
701
702


%************************************************************************
%*									*
                  preInlineUnconditionally
%*									*
%************************************************************************
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722

preInlineUnconditionally
~~~~~~~~~~~~~~~~~~~~~~~~
@preInlineUnconditionally@ examines a bndr to see if it is used just
once in a completely safe way, so that it is safe to discard the
binding inline its RHS at the (unique) usage site, REGARDLESS of how
big the RHS might be.  If this is the case we don't simplify the RHS
first, but just inline it un-simplified.

This is much better than first simplifying a perhaps-huge RHS and then
inlining and re-simplifying it.  Indeed, it can be at least quadratically
better.  Consider

	x1 = e1
	x2 = e2[x1]
	x3 = e3[x2]
	...etc...
	xN = eN[xN-1]

We may end up simplifying e1 N times, e2 N-1 times, e3 N-3 times etc.
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
This can happen with cascades of functions too:

	f1 = \x1.e1
	f2 = \xs.e2[f1]
	f3 = \xs.e3[f3]
	...etc...

THE MAIN INVARIANT is this:

	----  preInlineUnconditionally invariant -----
   IF preInlineUnconditionally chooses to inline x = <rhs>
   THEN doing the inlining should not change the occurrence
	info for the free vars of <rhs>
	----------------------------------------------

For example, it's tempting to look at trivial binding like
	x = y
and inline it unconditionally.  But suppose x is used many times,
but this is the unique occurrence of y.  Then inlining x would change
y's occurrence info, which breaks the invariant.  It matters: y
might have a BIG rhs, which will now be dup'd at every occurrenc of x.
744
745


746
Even RHSs labelled InlineMe aren't caught here, because there might be
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
no benefit from inlining at the call site.

[Sept 01] Don't unconditionally inline a top-level thing, because that
can simply make a static thing into something built dynamically.  E.g.
	x = (a,b)
	main = \s -> h x

[Remember that we treat \s as a one-shot lambda.]  No point in
inlining x unless there is something interesting about the call site.

But watch out: if you aren't careful, some useful foldr/build fusion
can be lost (most notably in spectral/hartel/parstof) because the
foldr didn't see the build.  Doing the dynamic allocation isn't a big
deal, in fact, but losing the fusion can be.  But the right thing here
seems to be to do a callSiteInline based on the fact that there is
something interesting about the call site (it's strict).  Hmm.  That
seems a bit fragile.

Conclusion: inline top level things gaily until Phase 0 (the last
phase), at which point don't.
767

768
769
770
771
772
773
774
775
776
777
778
779
Note [pre/postInlineUnconditionally in gentle mode]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Even in gentle mode we want to do preInlineUnconditionally.  The
reason is that too little clean-up happens if you don't inline
use-once things.  Also a bit of inlining is *good* for full laziness;
it can expose constant sub-expressions.  Example in
spectral/mandel/Mandel.hs, where the mandelset function gets a useful
let-float if you inline windowToViewport

However, as usual for Gentle mode, do not inline things that are
inactive in the intial stages.  See Note [Gentle mode].

780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
Note [InlineRule and preInlineUnconditionally]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Surprisingly, do not pre-inline-unconditionally Ids with INLINE pragmas!
Example

   {-# INLINE f #-}
   f :: Eq a => a -> a
   f x = ...
   
   fInt :: Int -> Int
   fInt = f Int dEqInt

   ...fInt...fInt...fInt...

Here f occurs just once, in the RHS of f1. But if we inline it there
we'll lose the opportunity to inline at each of fInt's call sites.
The INLINE pragma will only inline when the application is saturated
for exactly this reason; and we don't want PreInlineUnconditionally
to second-guess it.  A live example is Trac #3736.
    c.f. Note [InlineRule and postInlineUnconditionally]

801
802
803
804
805
806
Note [Top-level botomming Ids]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Don't inline top-level Ids that are bottoming, even if they are used just
once, because FloatOut has gone to some trouble to extract them out.
Inlining them won't make the program run faster!

807
808
Note [Do not inline CoVars unconditionally]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Simon Peyton Jones's avatar
Simon Peyton Jones committed
809
810
811
Coercion variables appear inside coercions, and the RHS of a let-binding
is a term (not a coercion) so we can't necessarily inline the latter in
the former.
812

813
\begin{code}
814
815
preInlineUnconditionally :: SimplEnv -> TopLevelFlag -> InId -> InExpr -> Bool
preInlineUnconditionally env top_lvl bndr rhs
816
  | not active 		                     = False
817
  | isStableUnfolding (idUnfolding bndr)     = False    -- Note [InlineRule and preInlineUnconditionally]
818
819
  | isTopLevel top_lvl && isBottomingId bndr = False	-- Note [Top-level bottoming Ids]
  | opt_SimplNoPreInlining                   = False
820
  | isCoVar bndr                             = False -- Note [Do not inline CoVars unconditionally]
821
822
823
  | otherwise = case idOccInfo bndr of
		  IAmDead	     	     -> True	-- Happens in ((\x.1) v)
	  	  OneOcc in_lam True int_cxt -> try_once in_lam int_cxt
Ian Lynagh's avatar
Ian Lynagh committed
824
		  _                          -> False
825
  where
826
827
828
    mode = getMode env
    active = isActive (sm_phase mode) act
             -- See Note [pre/postInlineUnconditionally in gentle mode]
829
    act = idInlineActivation bndr
830
    try_once in_lam int_cxt	-- There's one textual occurrence
831
832
833
	| not in_lam = isNotTopLevel top_lvl || early_phase
	| otherwise  = int_cxt && canInlineInLam rhs

834
-- Be very careful before inlining inside a lambda, because (a) we must not 
835
836
837
838
839
840
841
842
-- invalidate occurrence information, and (b) we want to avoid pushing a
-- single allocation (here) into multiple allocations (inside lambda).  
-- Inlining a *function* with a single *saturated* call would be ok, mind you.
--	|| (if is_cheap && not (canInlineInLam rhs) then pprTrace "preinline" (ppr bndr <+> ppr rhs) ok else ok)
--	where 
--	 	is_cheap = exprIsCheap rhs
--		ok = is_cheap && int_cxt

843
844
845
846
	-- 	int_cxt		The context isn't totally boring
	-- E.g. let f = \ab.BIG in \y. map f xs
	-- 	Don't want to substitute for f, because then we allocate
	--	its closure every time the \y is called
847
        -- But: let f = \ab.BIG in \y. map (f y) xs
848
849
850
851
	--	Now we do want to substitute for f, even though it's not 
	--	saturated, because we're going to allocate a closure for 
	--	(f y) every time round the loop anyhow.

852
853
	-- canInlineInLam => free vars of rhs are (Once in_lam) or Many,
	-- so substituting rhs inside a lambda doesn't change the occ info.
854
	-- Sadly, not quite the same as exprIsHNF.
Ian Lynagh's avatar
Ian Lynagh committed
855
    canInlineInLam (Lit _)		= True
856
    canInlineInLam (Lam b e)		= isRuntimeVar b || canInlineInLam e
857
858
859
    canInlineInLam _                    = False
      -- not ticks.  Counting ticks cannot be duplicated, and non-counting
      -- ticks around a Lam will disappear anyway.
860

861
862
863
    early_phase = case sm_phase mode of
                    Phase 0 -> False
                    _       -> True
864
-- If we don't have this early_phase test, consider
865
866
867
868
869
870
871
872
873
874
875
876
877
878
--	x = length [1,2,3]
-- The full laziness pass carefully floats all the cons cells to
-- top level, and preInlineUnconditionally floats them all back in.
-- Result is (a) static allocation replaced by dynamic allocation
--	     (b) many simplifier iterations because this tickles
--		 a related problem; only one inlining per pass
-- 
-- On the other hand, I have seen cases where top-level fusion is
-- lost if we don't inline top level thing (e.g. string constants)
-- Hence the test for phase zero (which is the phase for all the final
-- simplifications).  Until phase zero we take no special notice of
-- top level things, but then we become more leery about inlining
-- them.  

879
\end{code}
880

881
882
883
884
885
886
%************************************************************************
%*									*
                  postInlineUnconditionally
%*									*
%************************************************************************

887
888
889
890
891
892
893
postInlineUnconditionally
~~~~~~~~~~~~~~~~~~~~~~~~~
@postInlineUnconditionally@ decides whether to unconditionally inline
a thing based on the form of its RHS; in particular if it has a
trivial RHS.  If so, we can inline and discard the binding altogether.

NB: a loop breaker has must_keep_binding = True and non-loop-breakers
894
only have *forward* references. Hence, it's safe to discard the binding
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
	
NOTE: This isn't our last opportunity to inline.  We're at the binding
site right now, and we'll get another opportunity when we get to the
ocurrence(s)

Note that we do this unconditional inlining only for trival RHSs.
Don't inline even WHNFs inside lambdas; doing so may simply increase
allocation when the function is called. This isn't the last chance; see
NOTE above.

NB: Even inline pragmas (e.g. IMustBeINLINEd) are ignored here Why?
Because we don't even want to inline them into the RHS of constructor
arguments. See NOTE above

NB: At one time even NOINLINE was ignored here: if the rhs is trivial
it's best to inline it anyway.  We often get a=E; b=a from desugaring,
with both a and b marked NOINLINE.  But that seems incompatible with
our new view that inlining is like a RULE, so I'm sticking to the 'active'
story for now.
914

915
\begin{code}
916
917
postInlineUnconditionally 
    :: SimplEnv -> TopLevelFlag
918
    -> OutId		-- The binder (an InId would be fine too)
919
       			--            (*not* a CoVar)
920
921
922
923
    -> OccInfo 		-- From the InId
    -> OutExpr
    -> Unfolding
    -> Bool
924
postInlineUnconditionally env top_lvl bndr occ_info rhs unfolding
925
  | not active		        = False
926
  | isWeakLoopBreaker occ_info  = False	-- If it's a loop-breaker of any kind, don't inline
927
					-- because it might be referred to "earlier"
928
929
  | isExportedId bndr           = False
  | isStableUnfolding unfolding = False	-- Note [InlineRule and postInlineUnconditionally]
930
  | isTopLevel top_lvl          = False	-- Note [Top level and postInlineUnconditionally]
931
  | exprIsTrivial rhs 	        = True
932
933
  | otherwise
  = case occ_info of
934
935
936
937
938
939
940
941
942
943
	-- The point of examining occ_info here is that for *non-values* 
	-- that occur outside a lambda, the call-site inliner won't have
	-- a chance (becuase it doesn't know that the thing
	-- only occurs once).   The pre-inliner won't have gotten
	-- it either, if the thing occurs in more than one branch
	-- So the main target is things like
	--	let x = f y in
	--	case v of
	--	   True  -> case x of ...
	--	   False -> case x of ...
944
945
	-- This is very important in practice; e.g. wheel-seive1 doubles 
	-- in allocation if you miss this out
Ian Lynagh's avatar
Ian Lynagh committed
946
      OneOcc in_lam _one_br int_cxt	-- OneOcc => no code-duplication issue
947
	->     smallEnoughToInline unfolding	-- Small enough to dup
948
949
			-- ToDo: consider discount on smallEnoughToInline if int_cxt is true
			--
950
951
952
953
954
955
956
		 	-- NB: Do NOT inline arbitrarily big things, even if one_br is True
			-- Reason: doing so risks exponential behaviour.  We simplify a big
			--	   expression, inline it, and simplify it again.  But if the
			--	   very same thing happens in the big expression, we get 
			--	   exponential cost!
			-- PRINCIPLE: when we've already simplified an expression once, 
			-- make sure that we only inline it if it's reasonably small.
957

958
959
           && (not in_lam || 
			-- Outside a lambda, we want to be reasonably aggressive
960
961
962
963
964
965
966
967
968
969
970
971
			-- about inlining into multiple branches of case
			-- e.g. let x = <non-value> 
			--	in case y of { C1 -> ..x..; C2 -> ..x..; C3 -> ... } 
			-- Inlining can be a big win if C3 is the hot-spot, even if
			-- the uses in C1, C2 are not 'interesting'
			-- An example that gets worse if you add int_cxt here is 'clausify'

		(isCheapUnfolding unfolding && int_cxt))
			-- isCheap => acceptable work duplication; in_lam may be true
			-- int_cxt to prevent us inlining inside a lambda without some 
			-- good reason.  See the notes on int_cxt in preInlineUnconditionally

972
973
974
975
976
      IAmDead -> True	-- This happens; for example, the case_bndr during case of
			-- known constructor:  case (a,b) of x { (p,q) -> ... }
			-- Here x isn't mentioned in the RHS, so we don't want to
			-- create the (dead) let-binding  let x = (a,b) in ...

Ian Lynagh's avatar
Ian Lynagh committed
977
      _ -> False
978
979
980
981
982
983

-- Here's an example that we don't handle well:
--	let f = if b then Left (\x.BIG) else Right (\y.BIG)
--	in \y. ....case f of {...} ....
-- Here f is used just once, and duplicating the case work is fine (exprIsCheap).
-- But
Thomas Schilling's avatar
Thomas Schilling committed
984
985
986
987
988
--  - We can't preInlineUnconditionally because that woud invalidate
--    the occ info for b.
--  - We can't postInlineUnconditionally because the RHS is big, and
--    that risks exponential behaviour
--  - We can't call-site inline, because the rhs is big
989
990
-- Alas!

991
  where
992
993
    active = isActive (sm_phase (getMode env)) (idInlineActivation bndr)
        -- See Note [pre/postInlineUnconditionally in gentle mode]
Samuel Bronson's avatar
Samuel Bronson committed
994
\end{code}
995

996
997
Note [Top level and postInlineUnconditionally]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
998
999
1000
We don't do postInlineUnconditionally for top-level things (even for
ones that are trivial):

simonpj@microsoft.com's avatar
simonpj@microsoft.com committed
1001
1002
1003
  * Doing so will inline top-level error expressions that have been
    carefully floated out by FloatOut.  More generally, it might 
    replace static allocation with dynamic.
1004

1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
  * Even for trivial expressions there's a problem.  Consider
      {-# RULE "foo" forall (xs::[T]). reverse xs = ruggle xs #-}
      blah xs = reverse xs
      ruggle = sort
    In one simplifier pass we might fire the rule, getting 
      blah xs = ruggle xs
    but in *that* simplifier pass we must not do postInlineUnconditionally
    on 'ruggle' because then we'll have an unbound occurrence of 'ruggle'

    If the rhs is trivial it'll be inlined by callSiteInline, and then
    the binding will be dead and discarded by the next use of OccurAnal

  * There is less point, because the main goal is to get rid of local
    bindings used in multiple case branches.  
    
1020
  * The inliner should inline trivial things at call sites anyway.
1021

1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
Note [InlineRule and postInlineUnconditionally]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Do not do postInlineUnconditionally if the Id has an InlineRule, otherwise
we lose the unfolding.  Example

     -- f has InlineRule with rhs (e |> co)
     --   where 'e' is big
     f = e |> co

Then there's a danger we'll optimise to

     f' = e
     f = f' |> co

and now postInlineUnconditionally, losing the InlineRule on f.  Now f'
won't inline because 'e' is too big.

1039
1040
    c.f. Note [InlineRule and preInlineUnconditionally]

1041

1042
1043
%************************************************************************
%*									*
1044
	Rebuilding a lambda
1045
1046
1047
%*									*
%************************************************************************

1048
\begin{code}
1049
mkLam :: SimplEnv -> [OutBndr] -> OutExpr -> SimplM OutExpr
1050
1051
1052
1053
-- mkLam tries three things
--	a) eta reduction, if that gives a trivial expression
--	b) eta expansion [only if there are some value lambdas]

1054
mkLam _b [] body 
1055
  = return body
1056
mkLam _env bndrs body
1057
  = do	{ dflags <- getDynFlags
1058
1059
	; mkLam' dflags bndrs body }
  where
1060
    mkLam' :: DynFlags -> [OutBndr] -> OutExpr -> SimplM OutExpr
1061
1062
    mkLam' dflags bndrs (Cast body co)
      | not (any bad bndrs)
1063
	-- Note [Casts and lambdas]
1064
      = do { lam <- mkLam' dflags bndrs body
1065
           ; return (mkCast lam (mkPiCos bndrs co)) }
1066
      where
1067
        co_vars  = tyCoVarsOfCo co
1068
	bad bndr = isCoVar bndr && bndr `elemVarSet` co_vars      
1069

1070
1071
1072
1073
1074
    mkLam' dflags bndrs body@(Lam {})
      = mkLam' dflags (bndrs ++ bndrs1) body1
      where
        (bndrs1, body1) = collectBinders body

1075
    mkLam' dflags bndrs body
1076
1077
      | dopt Opt_DoEtaReduction dflags
      , Just etad_lam <- tryEtaReduce bndrs body
1078
1079
1080
      = do { tick (EtaReduction (head bndrs))
	   ; return etad_lam }

1081
      | otherwise
1082
      = return (mkLams bndrs body)
1083
1084
\end{code}

1085

1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
Note [Casts and lambdas]
~~~~~~~~~~~~~~~~~~~~~~~~
Consider 
	(\x. (\y. e) `cast` g1) `cast` g2
There is a danger here that the two lambdas look separated, and the 
full laziness pass might float an expression to between the two.

So this equation in mkLam' floats the g1 out, thus:
	(\x. e `cast` g1)  -->  (\x.e) `cast` (tx -> g1)
where x:tx.

1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
In general, this floats casts outside lambdas, where (I hope) they
might meet and cancel with some other cast:
	\x. e `cast` co   ===>   (\x. e) `cast` (tx -> co)
	/\a. e `cast` co  ===>   (/\a. e) `cast` (/\a. co)
	/\g. e `cast` co  ===>   (/\g. e) `cast` (/\g. co)
  		  	  (if not (g `in` co))

Notice that it works regardless of 'e'.  Originally it worked only
if 'e' was itself a lambda, but in some cases that resulted in 
fruitless iteration in the simplifier.  A good example was when
compiling Text.ParserCombinators.ReadPrec, where we had a definition 
like	(\x. Get `cast` g)
where Get is a constructor with nonzero arity.  Then mkLam eta-expanded
the Get, and the next iteration eta-reduced it, and then eta-expanded 
it again.

Note also the side condition for the case of coercion binders.
It does not make sense to transform
	/\g. e `cast` g  ==>  (/\g.e) `cast` (/\g.g)
because the latter is not well-kinded.
1117

1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
%************************************************************************
%*									*
              Eta expansion									 
%*									*
%************************************************************************

\begin{code}
tryEtaExpand :: SimplEnv -> OutId -> OutExpr -> SimplM (Arity, OutExpr)
-- See Note [Eta-expanding at let bindings]
tryEtaExpand env bndr rhs
1128
  = do { dflags <- getDynFlags
1129
1130
1131
1132
1133
1134
1135
1136
1137
       ; (new_arity, new_rhs) <- try_expand dflags

       ; WARN( new_arity < old_arity || new_arity < _dmd_arity, 
               (ptext (sLit "Arity decrease:") <+> (ppr bndr <+> ppr old_arity
		<+> ppr new_arity <+> ppr _dmd_arity) $$ ppr new_rhs) )
			-- Note [Arity decrease]
         return (new_arity, new_rhs) }
  where
    try_expand dflags
1138
1139
1140
      | exprIsTrivial rhs
      = return (exprArity rhs, rhs)

1141
      | sm_eta_expand (getMode env)      -- Provided eta-expansion is on
1142
      , let new_arity = findArity dflags bndr rhs old_arity
1143
1144
      , new_arity > manifest_arity  	-- And the curent manifest arity isn't enough
      		    			-- See Note [Eta expansion to manifes arity]
1145
1146
1147
      = do { tick (EtaExpansion bndr)
           ; return (new_arity, etaExpand new_arity rhs) }
      | otherwise
1148
      = return (manifest_arity, rhs)
1149

1150
    manifest_arity = manifestArity rhs
1151
1152
    old_arity  = idArity bndr
    _dmd_arity = length $ fst $ splitStrictSig $ idStrictness bndr
1153

1154
findArity :: DynFlags -> Id -> CoreExpr -> Arity -> Arity
1155
1156
-- This implements the fixpoint loop for arity analysis
-- See Note [Arity analysis]
1157
1158
findArity dflags bndr rhs old_arity
  = go (exprEtaExpandArity dflags init_cheap_app rhs)
1159
1160
1161
1162
1163
       -- We always call exprEtaExpandArity once, but usually 
       -- that produces a result equal to old_arity, and then
       -- we stop right away (since arities should not decrease)
       -- Result: the common case is that there is just one iteration
  where
1164
    init_cheap_app :: CheapAppFun
1165
1166
    init_cheap_app fn n_val_args
      | fn == bndr = True   -- On the first pass, this binder gets infinite arity
1167
      | otherwise  = isCheapApp fn n_val_args
1168

1169
1170
1171
1172
1173
    go :: Arity -> Arity
    go cur_arity
      | cur_arity <= old_arity = cur_arity	
      | new_arity == cur_arity = cur_arity
      | otherwise = ASSERT( new_arity < cur_arity )
1174
#ifdef DEBUG
1175
1176
1177
                    pprTrace "Exciting arity" 
                       (vcat [ ppr bndr <+> ppr cur_arity <+> ppr new_arity
                             , ppr rhs])
1178
#endif
1179
1180
                    go new_arity
      where
1181
1182
        new_arity = exprEtaExpandArity dflags cheap_app rhs

1183
        cheap_app :: CheapAppFun
1184
1185
        cheap_app fn n_val_args
          | fn == bndr = n_val_args < cur_arity
1186
          | otherwise  = isCheapApp fn n_val_args
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
\end{code}

Note [Eta-expanding at let bindings]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
We now eta expand at let-bindings, which is where the payoff 
comes. 

One useful consequence is this example:
   genMap :: C a => ...
   {-# INLINE genMap #-}
   genMap f xs = ...

   myMap :: D a => ...
   {-# INLINE myMap #-}
   myMap = genMap

Notice that 'genMap' should only inline if applied to two arguments.
In the InlineRule for myMap we'll have the unfolding 
    (\d -> genMap Int (..d..))  
We do not want to eta-expand to 
    (\d f xs -> genMap Int (..d..) f xs) 
because then 'genMap' will inline, and it really shouldn't: at least
as far as the programmer is concerned, it's not applied to two
arguments!

1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
Note [Eta expansion to manifest arity]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Eta expansion does *not* eta-expand trivial RHSs, like
    x = y
because these will get substituted out in short order.  (Indeed
we *eta-contract* if that yields a trivial RHS.)

Otherwise we eta-expand to produce enough manifest lambdas.
This *does* eta-expand partial applications.  eg
      x = map g		-->    x = \v -> map g v
      y = \_ -> map g	-->    y = \_ v -> map g v
One benefit this is that in the definition of y there was 
a danger that full laziness would transform to
      lvl = map g
      y = \_ -> lvl
which is stupid.  This doesn't happen in the eta-expanded form.

1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
Note [Arity analysis]
~~~~~~~~~~~~~~~~~~~~~
The motivating example for arity analysis is this:
 
  f = \x. let g = f (x+1) 
          in \y. ...g...

What arity does f have?  Really it should have arity 2, but a naive
look at the RHS won't see that.  You need a fixpoint analysis which
says it has arity "infinity" the first time round.

This example happens a lot; it first showed up in Andy Gill's thesis,
fifteen years ago!  It also shows up in the code for 'rnf' on lists
in Trac #4138.

The analysis is easy to achieve because exprEtaExpandArity takes an
argument
     type CheapFun = CoreExpr -> Maybe Type -> Bool
used to decide if an expression is cheap enough to push inside a 
lambda.  And exprIsCheap' in turn takes an argument
1249
     type CheapAppFun = Id -> Int -> Bool
1250
1251
1252
1253
1254
1255
which tells when an application is cheap. This makes it easy to
write the analysis loop.

The analysis is cheap-and-cheerful because it doesn't deal with
mutual recursion.  But the self-recursive case is the important one.

1256
1257
1258
1259
1260
1261
1262

%************************************************************************
%*									*
\subsection{Floating lets out of big lambdas}
%*									*
%************************************************************************

1263
1264
1265
1266
1267
1268
1269
Note [Floating and type abstraction]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Consider this:
	x = /\a. C e1 e2
We'd like to float this to 
	y1 = /\a. e1
	y2 = /\a. e2
1270
	x  = /\a. C (y1 a) (y2 a)
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
for the usual reasons: we want to inline x rather vigorously.

You may think that this kind of thing is rare.  But in some programs it is
common.  For example, if you do closure conversion you might get:

	data a :-> b = forall e. (e -> a -> b) :$ e

	f_cc :: forall a. a :-> a
	f_cc = /\a. (\e. id a) :$ ()

Now we really want to inline that f_cc thing so that the
construction of the closure goes away. 

So I have elaborated simplLazyBind to understand right-hand sides that look
like
	/\ a1..an. body

and treat them specially. The real work is done in SimplUtils.abstractFloats,
but there is quite a bit of plumbing in simplLazyBind as well.

The same transformation is good when there are lets in the body:
sof's avatar
sof committed
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304

	/\abc -> let(rec) x = e in b
   ==>
	let(rec) x' = /\abc -> let x = x' a b c in e
	in 
	/\abc -> let x = x' a b c in b

This is good because it can turn things like:

	let f = /\a -> letrec g = ... g ... in g
into
	letrec g' = /\a -> ... g' a ...
	in
1305
	let f = /\ a -> g' a
sof's avatar
sof committed
1306
1307
1308
1309
1310
1311
1312

which is better.  In effect, it means that big lambdas don't impede
let-floating.

This optimisation is CRUCIAL in eliminating the junk introduced by
desugaring mutually recursive definitions.  Don't eliminate it lightly!

1313
1314
[May 1999]  If we do this transformation *regardless* then we can
end up with some pretty silly stuff.  For example, 
1315

1316
1317
1318
1319
1320
1321
1322
1323
	let 
	    st = /\ s -> let { x1=r1 ; x2=r2 } in ...
	in ..
becomes
	let y1 = /\s -> r1
	    y2 = /\s -> r2
	    st = /\s -> ...[y1 s/x1, y2 s/x2]
	in ..
sof's avatar
sof committed
1324

1325
1326
1327
Unless the "..." is a WHNF there is really no point in doing this.
Indeed it can make things worse.  Suppose x1 is used strictly,
and is of the form
1328