Specialise.lhs 80.4 KB
Newer Older
1
%
2
% (c) The GRASP/AQUA Project, Glasgow University, 1993-1998
3 4 5 6
%
\section[Specialise]{Stamping out overloading, and (optionally) polymorphism}

\begin{code}
sof's avatar
sof committed
7
module Specialise ( specProgram ) where
8

9
#include "HsVersions.h"
10

11
import Id
12
import TcType hiding( substTy, extendTvSubstList )
13
import Type   hiding( substTy, extendTvSubstList )
14
import Coercion( Coercion )
15
import CoreMonad
16
import qualified CoreSubst
17
import CoreUnfold
18 19
import VarSet
import VarEnv
20
import CoreSyn
21
import Rules
22 23
import CoreUtils        ( exprIsTrivial, applyTypeToArgs )
import CoreFVs          ( exprFreeVars, exprsFreeVars, idFreeVars )
24
import UniqSupply
25
import Name
26 27 28
import MkId             ( voidArgId, realWorldPrimId )
import Maybes           ( catMaybes, isJust )
import BasicTypes
29
import HscTypes
30
import Bag
31
import DynFlags
Ian Lynagh's avatar
Ian Lynagh committed
32
import Util
33
import Outputable
34
import FastString
35
import State
36

37
import Control.Monad
38 39 40
import Data.Map (Map)
import qualified Data.Map as Map
import qualified FiniteMap as Map
41 42 43
\end{code}

%************************************************************************
44
%*                                                                      *
45
\subsection[notes-Specialise]{Implementation notes [SLPJ, Aug 18 1993]}
46
%*                                                                      *
47 48 49
%************************************************************************

These notes describe how we implement specialisation to eliminate
sof's avatar
sof committed
50
overloading.
51

sof's avatar
sof committed
52
The specialisation pass works on Core
53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72
syntax, complete with all the explicit dictionary application,
abstraction and construction as added by the type checker.  The
existing type checker remains largely as it is.

One important thought: the {\em types} passed to an overloaded
function, and the {\em dictionaries} passed are mutually redundant.
If the same function is applied to the same type(s) then it is sure to
be applied to the same dictionary(s)---or rather to the same {\em
values}.  (The arguments might look different but they will evaluate
to the same value.)

Second important thought: we know that we can make progress by
treating dictionary arguments as static and worth specialising on.  So
we can do without binding-time analysis, and instead specialise on
dictionary arguments and no others.

The basic idea
~~~~~~~~~~~~~~
Suppose we have

73 74
        let f = <f_rhs>
        in <body>
75

76
and suppose f is overloaded.
77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93

STEP 1: CALL-INSTANCE COLLECTION

We traverse <body>, accumulating all applications of f to types and
dictionaries.

(Might there be partial applications, to just some of its types and
dictionaries?  In principle yes, but in practice the type checker only
builds applications of f to all its types and dictionaries, so partial
applications could only arise as a result of transformation, and even
then I think it's unlikely.  In any case, we simply don't accumulate such
partial applications.)


STEP 2: EQUIVALENCES

So now we have a collection of calls to f:
94 95 96
        f t1 t2 d1 d2
        f t3 t4 d3 d4
        ...
97 98 99 100 101 102 103 104 105 106 107
Notice that f may take several type arguments.  To avoid ambiguity, we
say that f is called at type t1/t2 and t3/t4.

We take equivalence classes using equality of the *types* (ignoring
the dictionary args, which as mentioned previously are redundant).

STEP 3: SPECIALISATION

For each equivalence class, choose a representative (f t1 t2 d1 d2),
and create a local instance of f, defined thus:

108
        f@t1/t2 = <f_rhs> t1 t2 d1 d2
109

sof's avatar
sof committed
110 111 112 113 114 115
f_rhs presumably has some big lambdas and dictionary lambdas, so lots
of simplification will now result.  However we don't actually *do* that
simplification.  Rather, we leave it for the simplifier to do.  If we
*did* do it, though, we'd get more call instances from the specialised
RHS.  We can work out what they are by instantiating the call-instance
set from f's RHS with the types t1, t2.
116 117 118 119 120 121 122 123 124 125

Add this new id to f's IdInfo, to record that f has a specialised version.

Before doing any of this, check that f's IdInfo doesn't already
tell us about an existing instance of f at the required type/s.
(This might happen if specialisation was applied more than once, or
it might arise from user SPECIALIZE pragmas.)

Recursion
~~~~~~~~~
126
Wait a minute!  What if f is recursive?  Then we can't just plug in
127 128 129 130 131
its right-hand side, can we?

But it's ok.  The type checker *always* creates non-recursive definitions
for overloaded recursive functions.  For example:

132
        f x = f (x+x)           -- Yes I know its silly
133 134 135

becomes

136 137 138 139 140
        f a (d::Num a) = let p = +.sel a d
                         in
                         letrec fl (y::a) = fl (p y y)
                         in
                         fl
141

sof's avatar
sof committed
142 143
We still have recusion for non-overloaded functions which we
speciailise, but the recursive call should get specialised to the
144 145 146 147 148 149 150 151 152 153 154 155 156 157 158
same recursive version.


Polymorphism 1
~~~~~~~~~~~~~~

All this is crystal clear when the function is applied to *constant
types*; that is, types which have no type variables inside.  But what if
it is applied to non-constant types?  Suppose we find a call of f at type
t1/t2.  There are two possibilities:

(a) The free type variables of t1, t2 are in scope at the definition point
of f.  In this case there's no problem, we proceed just as before.  A common
example is as follows.  Here's the Haskell:

159 160
        g y = let f x = x+x
              in f y + f y
161 162 163

After typechecking we have

164 165
        g a (d::Num a) (y::a) = let f b (d'::Num b) (x::b) = +.sel b d' x x
                                in +.sel a d (f a d y) (f a d y)
166 167 168 169

Notice that the call to f is at type type "a"; a non-constant type.
Both calls to f are at the same type, so we can specialise to give:

170 171
        g a (d::Num a) (y::a) = let f@a (x::a) = +.sel a d x x
                                in +.sel a d (f@a y) (f@a y)
172 173 174 175 176


(b) The other case is when the type variables in the instance types
are *not* in scope at the definition point of f.  The example we are
working with above is a good case.  There are two instances of (+.sel a d),
177
but "a" is not in scope at the definition of +.sel.  Can we do anything?
178 179 180
Yes, we can "common them up", a sort of limited common sub-expression deal.
This would give:

181 182 183
        g a (d::Num a) (y::a) = let +.sel@a = +.sel a d
                                    f@a (x::a) = +.sel@a x x
                                in +.sel@a (f@a y) (f@a y)
184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201

This can save work, and can't be spotted by the type checker, because
the two instances of +.sel weren't originally at the same type.

Further notes on (b)

* There are quite a few variations here.  For example, the defn of
  +.sel could be floated ouside the \y, to attempt to gain laziness.
  It certainly mustn't be floated outside the \d because the d has to
  be in scope too.

* We don't want to inline f_rhs in this case, because
that will duplicate code.  Just commoning up the call is the point.

* Nothing gets added to +.sel's IdInfo.

* Don't bother unless the equivalence class has more than one item!

202
Not clear whether this is all worth it.  It is of course OK to
203 204 205 206 207 208
simply discard call-instances when passing a big lambda.

Polymorphism 2 -- Overloading
~~~~~~~~~~~~~~
Consider a function whose most general type is

209
        f :: forall a b. Ord a => [a] -> b -> b
210 211 212 213 214 215 216 217 218

There is really no point in making a version of g at Int/Int and another
at Int/Bool, because it's only instancing the type variable "a" which
buys us any efficiency. Since g is completely polymorphic in b there
ain't much point in making separate versions of g for the different
b types.

That suggests that we should identify which of g's type variables
are constrained (like "a") and which are unconstrained (like "b").
219
Then when taking equivalence classes in STEP 2, we ignore the type args
220 221 222
corresponding to unconstrained type variable.  In STEP 3 we make
polymorphic versions.  Thus:

223
        f@t1/ = /\b -> <f_rhs> t1 b d1 d2
224

sof's avatar
sof committed
225
We do this.
226 227


sof's avatar
sof committed
228 229 230
Dictionary floating
~~~~~~~~~~~~~~~~~~~
Consider this
231

232 233 234
        f a (d::Num a) = let g = ...
                         in
                         ...(let d1::Ord a = Num.Ord.sel a d in g a d1)...
235

sof's avatar
sof committed
236 237 238 239 240
Here, g is only called at one type, but the dictionary isn't in scope at the
definition point for g.  Usually the type checker would build a
definition for d1 which enclosed g, but the transformation system
might have moved d1's defn inward.  Solution: float dictionary bindings
outwards along with call instances.
241 242 243

Consider

244 245 246 247
        f x = let g p q = p==q
                  h r s = (r+s, g r s)
              in
              h x x
248 249 250 251


Before specialisation, leaving out type abstractions we have

252 253 254 255 256 257 258
        f df x = let g :: Eq a => a -> a -> Bool
                     g dg p q = == dg p q
                     h :: Num a => a -> a -> (a, Bool)
                     h dh r s = let deq = eqFromNum dh
                                in (+ dh r s, g deq r s)
              in
              h df x x
259 260 261

After specialising h we get a specialised version of h, like this:

262 263
                    h' r s = let deq = eqFromNum df
                             in (+ df r s, g deq r s)
264 265

But we can't naively make an instance for g from this, because deq is not in scope
266
at the defn of g.  Instead, we have to float out the (new) defn of deq
267 268 269 270 271 272 273 274
to widen its scope.  Notice that this floating can't be done in advance -- it only
shows up when specialisation is done.

User SPECIALIZE pragmas
~~~~~~~~~~~~~~~~~~~~~~~
Specialisation pragmas can be digested by the type checker, and implemented
by adding extra definitions along with that of f, in the same way as before

275
        f@t1/t2 = <f_rhs> t1 t2 d1 d2
276 277 278 279

Indeed the pragmas *have* to be dealt with by the type checker, because
only it knows how to build the dictionaries d1 and d2!  For example

280 281
        g :: Ord a => [a] -> [a]
        {-# SPECIALIZE f :: [Tree Int] -> [Tree Int] #-}
282 283 284 285 286 287 288 289 290 291 292

Here, the specialised version of g is an application of g's rhs to the
Ord dictionary for (Tree Int), which only the type checker can conjure
up.  There might not even *be* one, if (Tree Int) is not an instance of
Ord!  (All the other specialision has suitable dictionaries to hand
from actual calls.)

Problem.  The type checker doesn't have to hand a convenient <f_rhs>, because
it is buried in a complex (as-yet-un-desugared) binding group.
Maybe we should say

293
        f@t1/t2 = f* t1 t2 d1 d2
294 295 296 297 298 299 300 301 302

where f* is the Id f with an IdInfo which says "inline me regardless!".
Indeed all the specialisation could be done in this way.
That in turn means that the simplifier has to be prepared to inline absolutely
any in-scope let-bound thing.


Again, the pragma should permit polymorphism in unconstrained variables:

303 304
        h :: Ord a => [a] -> b -> b
        {-# SPECIALIZE h :: [Int] -> b -> b #-}
305 306 307

We *insist* that all overloaded type variables are specialised to ground types,
(and hence there can be no context inside a SPECIALIZE pragma).
308
We *permit* unconstrained type variables to be specialised to
309 310
        - a ground type
        - or left as a polymorphic type variable
311 312
but nothing in between.  So

313
        {-# SPECIALIZE h :: [Int] -> [c] -> [c] #-}
314

315 316 317 318 319 320 321 322
is *illegal*.  (It can be handled, but it adds complication, and gains the
programmer nothing.)


SPECIALISING INSTANCE DECLARATIONS
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Consider

323 324 325
        instance Foo a => Foo [a] where
                ...
        {-# SPECIALIZE instance Foo [Int] #-}
326 327 328 329

The original instance decl creates a dictionary-function
definition:

330
        dfun.Foo.List :: forall a. Foo a -> Foo [a]
331 332 333 334

The SPECIALIZE pragma just makes a specialised copy, just as for
ordinary function definitions:

335 336
        dfun.Foo.List@Int :: Foo [Int]
        dfun.Foo.List@Int = dfun.Foo.List Int dFooInt
337 338 339 340 341 342 343 344 345 346 347 348 349 350

The information about what instance of the dfun exist gets added to
the dfun's IdInfo in the same way as a user-defined function too.


Automatic instance decl specialisation?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Can instance decls be specialised automatically?  It's tricky.
We could collect call-instance information for each dfun, but
then when we specialised their bodies we'd get new call-instances
for ordinary functions; and when we specialised their bodies, we might get
new call-instances of the dfuns, and so on.  This all arises because of
the unrestricted mutual recursion between instance decls and value decls.

sof's avatar
sof committed
351 352 353
Still, there's no actual problem; it just means that we may not do all
the specialisation we could theoretically do.

354 355 356 357 358
Furthermore, instance decls are usually exported and used non-locally,
so we'll want to compile enough to get those specialisations done.

Lastly, there's no such thing as a local instance decl, so we can
survive solely by spitting out *usage* information, and then reading that
359
back in as a pragma when next compiling the file.  So for now,
360 361 362 363 364 365 366 367 368 369 370 371 372
we only specialise instance decls in response to pragmas.


SPITTING OUT USAGE INFORMATION
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

To spit out usage information we need to traverse the code collecting
call-instance information for all imported (non-prelude?) functions
and data types. Then we equivalence-class it and spit it out.

This is done at the top-level when all the call instances which escape
must be for imported functions and data types.

sof's avatar
sof committed
373 374
*** Not currently done ***

375 376 377 378 379

Partial specialisation by pragmas
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
What about partial specialisation:

380 381
        k :: (Ord a, Eq b) => [a] -> b -> b -> [a]
        {-# SPECIALIZE k :: Eq b => [Int] -> b -> b -> [a] #-}
382 383 384

or even

385
        {-# SPECIALIZE k :: Eq b => [Int] -> [b] -> [b] -> [a] #-}
386 387 388

Seems quite reasonable.  Similar things could be done with instance decls:

389 390 391 392
        instance (Foo a, Foo b) => Foo (a,b) where
                ...
        {-# SPECIALIZE instance Foo a => Foo (a,Int) #-}
        {-# SPECIALIZE instance Foo b => Foo (Int,b) #-}
393 394 395 396 397 398 399 400 401 402 403

Ho hum.  Things are complex enough without this.  I pass.


Requirements for the simplifer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The simplifier has to be able to take advantage of the specialisation.

* When the simplifier finds an application of a polymorphic f, it looks in
f's IdInfo in case there is a suitable instance to call instead.  This converts

404
        f t1 t2 d1 d2   ===>   f_t1_t2
405 406 407 408 409 410

Note that the dictionaries get eaten up too!

* Dictionary selection operations on constant dictionaries must be
  short-circuited:

411
        +.sel Int d     ===>  +Int
412 413 414 415 416 417 418 419 420 421 422

The obvious way to do this is in the same way as other specialised
calls: +.sel has inside it some IdInfo which tells that if it's applied
to the type Int then it should eat a dictionary and transform to +Int.

In short, dictionary selectors need IdInfo inside them for constant
methods.

* Exactly the same applies if a superclass dictionary is being
  extracted:

423
        Eq.sel Int d   ===>   dEqInt
424 425 426 427 428

* Something similar applies to dictionary construction too.  Suppose
dfun.Eq.List is the function taking a dictionary for (Eq a) to
one for (Eq [a]).  Then we want

429
        dfun.Eq.List Int d      ===> dEq.List_Int
430 431 432 433 434 435 436

Where does the Eq [Int] dictionary come from?  It is built in
response to a SPECIALIZE pragma on the Eq [a] instance decl.

In short, dfun Ids need IdInfo with a specialisation for each
constant instance of their instance declaration.

sof's avatar
sof committed
437 438
All this uses a single mechanism: the SpecEnv inside an Id

439 440 441 442

What does the specialisation IdInfo look like?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

sof's avatar
sof committed
443 444
The SpecEnv of an Id maps a list of types (the template) to an expression

445
        [Type]  |->  Expr
446

447
For example, if f has this SpecInfo:
448

449
        [Int, a]  ->  \d:Ord Int. f' a
450

sof's avatar
sof committed
451
it means that we can replace the call
452

453
        f Int t  ===>  (\d. f' t)
sof's avatar
sof committed
454 455 456

This chucks one dictionary away and proceeds with the
specialised version of f, namely f'.
457 458 459 460 461 462 463


What can't be done this way?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
There is no way, post-typechecker, to get a dictionary for (say)
Eq a from a dictionary for Eq [a].  So if we find

464
        ==.sel [t] d
465

466
we can't transform to
467

468
        eqList (==.sel t d')
469

470
where
471
        eqList :: (a->a->Bool) -> [a] -> [a] -> Bool
472 473 474 475 476 477 478 479 480

Of course, we currently have no way to automatically derive
eqList, nor to connect it to the Eq [a] instance decl, but you
can imagine that it might somehow be possible.  Taking advantage
of this is permanently ruled out.

Still, this is no great hardship, because we intend to eliminate
overloading altogether anyway!

sof's avatar
sof committed
481 482 483 484
A note about non-tyvar dictionaries
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Some Ids have types like

485
        forall a,b,c. Eq a -> Ord [a] -> tau
sof's avatar
sof committed
486 487 488 489 490 491 492 493 494

This seems curious at first, because we usually only have dictionary
args whose types are of the form (C a) where a is a type variable.
But this doesn't hold for the functions arising from instance decls,
which sometimes get arguements with types of form (C (T a)) for some
type constructor T.

Should we specialise wrt this compound-type dictionary?  We used to say
"no", saying:
495 496 497 498 499 500
        "This is a heuristic judgement, as indeed is the fact that we
        specialise wrt only dictionaries.  We choose *not* to specialise
        wrt compound dictionaries because at the moment the only place
        they show up is in instance decls, where they are simply plugged
        into a returned dictionary.  So nothing is gained by specialising
        wrt them."
sof's avatar
sof committed
501 502

But it is simpler and more uniform to specialise wrt these dicts too;
503
and in future GHC is likely to support full fledged type signatures
sof's avatar
sof committed
504
like
505
        f :: Eq [(a,b)] => ...
sof's avatar
sof committed
506

507

508
%************************************************************************
509
%*                                                                      *
510
\subsubsection{The new specialiser}
511
%*                                                                      *
512 513 514
%************************************************************************

Our basic game plan is this.  For let(rec) bound function
515
        f :: (C a, D c) => (a,b,c,d) -> Bool
516

517
* Find any specialised calls of f, (f ts ds), where
518 519 520 521 522
  ts are the type arguments t1 .. t4, and
  ds are the dictionary arguments d1 .. d2.

* Add a new definition for f1 (say):

523
        f1 = /\ b d -> (..body of f..) t1 b t3 d d1 d2
524 525 526 527 528

  Note that we abstract over the unconstrained type arguments.

* Add the mapping

529
        [t1,b,t3,d]  |->  \d1 d2 -> f1 b d
530 531

  to the specialisations of f.  This will be used by the
532 533
  simplifier to replace calls
                (f t1 t2 t3 t4) da db
534
  by
535
                (\d1 d1 -> f1 t2 t4) da db
536 537 538 539 540 541 542 543 544 545 546 547

  All the stuff about how many dictionaries to discard, and what types
  to apply the specialised function to, are handled by the fact that the
  SpecEnv contains a template for the result of the specialisation.

We don't build *partial* specialisations for f.  For example:

  f :: Eq a => a -> a -> Bool
  {-# SPECIALISE f :: (Eq b, Eq c) => (b,c) -> (b,c) -> Bool #-}

Here, little is gained by making a specialised copy of f.
There's a distinct danger that the specialised version would
548
first build a dictionary for (Eq b, Eq c), and then select the (==)
549 550 551 552 553 554 555
method from it!  Even if it didn't, not a great deal is saved.

We do, however, generate polymorphic, but not overloaded, specialisations:

  f :: Eq a => [a] -> b -> b -> b
  {#- SPECIALISE f :: [Int] -> b -> b -> b #-}

556
Hence, the invariant is this:
557

558
        *** no specialised version is overloaded ***
559 560


561
%************************************************************************
562
%*                                                                      *
563
\subsubsection{The exported function}
564
%*                                                                      *
565 566 567
%************************************************************************

\begin{code}
568
specProgram :: DynFlags -> ModGuts -> CoreM ModGuts
569
specProgram dflags guts@(ModGuts { mg_rules = rules, mg_binds = binds })
570 571
  = do { hpt_rules <- getRuleBase
       ; let local_rules = mg_rules guts
572
             rule_base = extendRuleBaseList hpt_rules rules
573

574
             -- Specialise the bindings of this module
575
       ; (binds', uds) <- runSpecM dflags (go binds)
576

577
             -- Specialise imported functions
578
       ; (new_rules, spec_binds) <- specImports dflags emptyVarSet rule_base uds
579

580 581
       ; let final_binds | null spec_binds = binds'
                         | otherwise       = Rec (flattenBinds spec_binds) : binds'
582
                   -- Note [Glom the bindings if imported functions are specialised]
583 584 585

       ; return (guts { mg_binds = final_binds
                      , mg_rules = new_rules ++ local_rules }) }
586
  where
587 588 589 590 591
        -- We need to start with a Subst that knows all the things
        -- that are in scope, so that the substitution engine doesn't
        -- accidentally re-use a unique that's already in use
        -- Easiest thing is to do it all at once, as if all the top-level
        -- decls were mutually recursive
592 593 594
    top_subst = SE { se_subst = CoreSubst.mkEmptySubst $ mkInScopeSet $ mkVarSet $
                                bindersOfBinds binds
                   , se_interesting = emptyVarSet }
595

596 597 598 599
    go []           = return ([], emptyUDs)
    go (bind:binds) = do (binds', uds) <- go binds
                         (bind', uds') <- specBind top_subst bind uds
                         return (bind' ++ binds', uds')
600

601 602
specImports :: DynFlags
            -> VarSet           -- Don't specialise these ones
603 604 605 606
                                -- See Note [Avoiding recursive specialisation]
            -> RuleBase         -- Rules from this module and the home package
                                -- (but not external packages, which can change)
            -> UsageDetails     -- Calls for imported things, and floating bindings
607 608
            -> CoreM ( [CoreRule]   -- New rules
                     , [CoreBind] ) -- Specialised bindings and floating bindings
609
-- See Note [Specialise imported INLINABLE things]
610
specImports dflags done rb uds
611 612 613 614 615 616
  = do { let import_calls = varEnvElts (ud_calls uds)
       ; (rules, spec_binds) <- go rb import_calls
       ; return (rules, wrapDictBinds (ud_binds uds) spec_binds) }
  where
    go _ [] = return ([], [])
    go rb (CIS fn calls_for_fn : other_calls)
617
      = do { (rules1, spec_binds1) <- specImport dflags done rb fn (Map.toList calls_for_fn)
618 619 620
           ; (rules2, spec_binds2) <- go (extendRuleBaseList rb rules1) other_calls
           ; return (rules1 ++ rules2, spec_binds1 ++ spec_binds2) }

621 622
specImport :: DynFlags
           -> VarSet                -- Don't specialise these
623 624 625
                                    -- See Note [Avoiding recursive specialisation]
           -> RuleBase              -- Rules from this module
           -> Id -> [CallInfo]      -- Imported function and calls for it
626 627
           -> CoreM ( [CoreRule]    -- New rules
                    , [CoreBind] )  -- Specialised bindings
628
specImport dflags done rb fn calls_for_fn
629 630
  | fn `elemVarSet` done
  = return ([], [])     -- No warning.  This actually happens all the time
Gabor Greif's avatar
typos  
Gabor Greif committed
631
                        -- when specialising a recursive function, because
632 633
                        -- the RHS of the specialised function contains a recursive
                        -- call to the original function
634 635

  | isInlinablePragma (idInlinePragma fn)
636 637
  , Just rhs <- maybeUnfoldingTemplate (realIdUnfolding fn)
  = do {     -- Get rules from the external package state
638 639
             -- We keep doing this in case we "page-fault in"
             -- more rules as we go along
640
       ; hsc_env <- getHscEnv
641
       ; eps <- liftIO $ hscEPS hsc_env
642
       ; let full_rb = unionRuleBase rb (eps_rule_base eps)
643
             rules_for_fn = getRules full_rb fn
644

645
       ; (rules1, spec_pairs, uds) <- runSpecM dflags $
646
              specCalls emptySpecEnv rules_for_fn calls_for_fn fn rhs
647
       ; let spec_binds1 = [NonRec b r | (b,r) <- spec_pairs]
648 649
             -- After the rules kick in we may get recursion, but
             -- we rely on a global GlomBinds to sort that out later
650
             -- See Note [Glom the bindings if imported functions are specialised]
651 652

              -- Now specialise any cascaded calls
653 654 655
       ; (rules2, spec_binds2) <- specImports dflags (extendVarSet done fn)
                                                     (extendRuleBaseList rb rules1)
                                                     uds
656 657 658 659 660

       ; return (rules2 ++ rules1, spec_binds2 ++ spec_binds1) }

  | otherwise
  = WARN( True, ptext (sLit "specImport discard") <+> ppr fn <+> ppr calls_for_fn )
661
    return ([], [])
662 663
\end{code}

664 665 666 667 668 669 670 671
Note [Specialise imported INLINABLE things]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
We specialise INLINABLE things but not INLINE things.  The latter
should be inlined bodily, so not much point in specialising them.
Moreover, we risk lots of orphan modules from vigorous specialisation.

Note [Glom the bindings if imported functions are specialised]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
672
Suppose we have an imported, *recursive*, INLINABLE function
673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692
   f :: Eq a => a -> a
   f = /\a \d x. ...(f a d)...
In the module being compiled we have
   g x = f (x::Int)
Now we'll make a specialised function
   f_spec :: Int -> Int
   f_spec = \x -> ...(f Int dInt)...
   {-# RULE  f Int _ = f_spec #-}
   g = \x. f Int dInt x
Note that f_spec doesn't look recursive
After rewriting with the RULE, we get
   f_spec = \x -> ...(f_spec)...
BUT since f_spec was non-recursive before it'll *stay* non-recursive.
The occurrence analyser never turns a NonRec into a Rec.  So we must
make sure that f_spec is recursive.  Easiest thing is to make all
the specialisations for imported bindings recursive.


Note [Avoiding recursive specialisation]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
693 694 695 696 697
When we specialise 'f' we may find new overloaded calls to 'g', 'h' in
'f's RHS.  So we want to specialise g,h.  But we don't want to
specialise f any more!  It's possible that f's RHS might have a
recursive yet-more-specialised call, so we'd diverge in that case.
And if the call is to the same type, one specialisation is enough.
698
Avoiding this recursive specialisation loop is the reason for the
699 700
'done' VarSet passed to specImports and specImport.

701
%************************************************************************
702
%*                                                                      *
703
\subsubsection{@specExpr@: the main function}
704
%*                                                                      *
705 706
%************************************************************************

707
\begin{code}
708 709 710 711 712 713 714
data SpecEnv 
  = SE { se_subst :: CoreSubst.Subst
             -- We carry a substitution down:
             -- a) we must clone any binding that might float outwards,
             --    to avoid name clashes
             -- b) we carry a type substitution to use when analysing
             --    the RHS of specialised bindings (no type-let!)
715

716

717 718 719 720 721 722 723 724 725 726 727 728 729
       , se_interesting :: VarSet 
             -- Dict Ids that we know something about
             -- and hence may be worth specialising against
             -- See Note [Interesting dictionary arguments]
     }

emptySpecEnv :: SpecEnv
emptySpecEnv = SE { se_subst = CoreSubst.emptySubst, se_interesting = emptyVarSet} 

specVar :: SpecEnv -> Id -> CoreExpr
specVar env v = CoreSubst.lookupIdSubst (text "specVar") (se_subst env) v

specExpr :: SpecEnv -> CoreExpr -> SpecM (CoreExpr, UsageDetails)
730

731 732 733 734 735 736 737 738 739 740 741
---------------- First the easy cases --------------------
specExpr env (Type ty)     = return (Type     (substTy env ty), emptyUDs)
specExpr env (Coercion co) = return (Coercion (substCo env co), emptyUDs)
specExpr env (Var v)       = return (specVar env v, emptyUDs)
specExpr _   (Lit lit)     = return (Lit lit,       emptyUDs)
specExpr env (Cast e co)
  = do { (e', uds) <- specExpr env e
       ; return ((Cast e' (substCo env co)), uds) }
specExpr env (Tick tickish body)
  = do { (body', uds) <- specExpr env body
       ; return (Tick (specTickish env tickish) body', uds) }
742 743

---------------- Applications might generate a call instance --------------------
744
specExpr env expr@(App {})
745
  = go expr []
746
  where
747
    go (App fun arg) args = do (arg', uds_arg) <- specExpr env arg
748 749
                               (fun', uds_app) <- go fun (arg':args)
                               return (App fun' arg', uds_arg `plusUDs` uds_app)
750

751 752
    go (Var f)       args = case specVar env f of
                                Var f' -> return (Var f', mkCallUDs env f' args)
753
                                e'     -> return (e', emptyUDs) -- I don't expect this!
754
    go other         _    = specExpr env other
755 756

---------------- Lambda/case require dumping of usage details --------------------
757 758
specExpr env e@(Lam _ _) = do
    (body', uds) <- specExpr env' body
759
    let (free_uds, dumped_dbs) = dumpUDs bndrs' uds
760
    return (mkLams bndrs' (wrapDictBindsE dumped_dbs body'), free_uds)
761
  where
762
    (bndrs, body) = collectBinders e
763
    (env', bndrs') = substBndrs env bndrs
764 765
        -- More efficient to collect a group of binders together all at once
        -- and we don't want to split a lambda group with dumped bindings
766

767 768
specExpr env (Case scrut case_bndr ty alts)
  = do { (scrut', scrut_uds) <- specExpr env scrut
769
       ; (scrut'', case_bndr', alts', alts_uds)
770 771
             <- specCase env scrut' case_bndr alts
       ; return (Case scrut'' case_bndr' (substTy env ty) alts'
772
                , scrut_uds `plusUDs` alts_uds) }
773 774

---------------- Finally, let is the interesting case --------------------
775 776 777
specExpr env (Let bind body)
  = do { -- Clone binders
         (rhs_env, body_env, bind') <- cloneBindSM env bind
778

779 780
         -- Deal with the body
       ; (body', body_uds) <- specExpr body_env body
781

782
        -- Deal with the bindings
783
      ; (binds', uds) <- specBind rhs_env bind' body_uds
784 785

        -- All done
786
      ; return (foldr Let body' binds', uds) }
787

788 789 790
specTickish :: SpecEnv -> Tickish Id -> Tickish Id
specTickish env (Breakpoint ix ids)
  = Breakpoint ix [ id' | id <- ids, Var id' <- [specVar env id]]
791 792 793
  -- drop vars from the list if they have a non-variable substitution.
  -- should never happen, but it's harmless to drop them anyway.
specTickish _ other_tickish = other_tickish
794

795
specCase :: SpecEnv
796
         -> CoreExpr            -- Scrutinee, already done
797
         -> Id -> [CoreAlt]
798 799 800
         -> SpecM ( CoreExpr    -- New scrutinee
                  , Id
                  , [CoreAlt]
801
                  , UsageDetails)
802
specCase env scrut' case_bndr [(con, args, rhs)]
803
  | isDictId case_bndr           -- See Note [Floating dictionaries out of cases]
804
  , interestingDict env scrut'
805
  , not (isDeadBinder case_bndr && null sc_args')
806
  = do { (case_bndr_flt : sc_args_flt) <- mapM clone_me (case_bndr' : sc_args')
807 808 809 810 811

       ; let sc_rhss = [ Case (Var case_bndr_flt) case_bndr' (idType sc_arg')
                              [(con, args', Var sc_arg')]
                       | sc_arg' <- sc_args' ]

812
             -- Extend the substitution for RHS to map the *original* binders
813
             -- to their floated verions.  
814
             mb_sc_flts :: [Maybe DictId]
815
             mb_sc_flts = map (lookupVarEnv clone_env) args'
816 817
             clone_env  = zipVarEnv sc_args' sc_args_flt
             subst_prs  = (case_bndr, Var case_bndr_flt)
818
                        : [ (arg, Var sc_flt)
819
                          | (arg, Just sc_flt) <- args `zip` mb_sc_flts ]
820 821 822
             env_rhs' = env_rhs { se_subst = CoreSubst.extendIdSubstList (se_subst env_rhs) subst_prs
                                , se_interesting = se_interesting env_rhs `extendVarSetList`
                                                   (case_bndr_flt : sc_args_flt) }
823

824
       ; (rhs', rhs_uds)   <- specExpr env_rhs' rhs
825 826 827 828 829 830 831 832 833 834
       ; let scrut_bind    = mkDB (NonRec case_bndr_flt scrut')
             case_bndr_set = unitVarSet case_bndr_flt
             sc_binds      = [(NonRec sc_arg_flt sc_rhs, case_bndr_set)
                             | (sc_arg_flt, sc_rhs) <- sc_args_flt `zip` sc_rhss ]
             flt_binds     = scrut_bind : sc_binds
             (free_uds, dumped_dbs) = dumpUDs (case_bndr':args') rhs_uds
             all_uds = flt_binds `addDictBinds` free_uds
             alt'    = (con, args', wrapDictBindsE dumped_dbs rhs')
       ; return (Var case_bndr_flt, case_bndr', [alt'], all_uds) }
  where
835
    (env_rhs, (case_bndr':args')) = substBndrs env (case_bndr:args)
836
    sc_args' = filter is_flt_sc_arg args'
837

838 839 840 841 842 843 844 845 846 847 848
    clone_me bndr = do { uniq <- getUniqueM
                       ; return (mkUserLocal occ uniq ty loc) }
       where
         name = idName bndr
         ty   = idType bndr
         occ  = nameOccName name
         loc  = getSrcSpan name

    arg_set = mkVarSet args'
    is_flt_sc_arg var =  isId var
                      && not (isDeadBinder var)
849
                      && isDictTy var_ty
850 851 852 853 854
                      && not (tyVarsOfType var_ty `intersectsVarSet` arg_set)
       where
         var_ty = idType var


855
specCase env scrut case_bndr alts
856 857 858
  = do { (alts', uds_alts) <- mapAndCombineSM spec_alt alts
       ; return (scrut, case_bndr', alts', uds_alts) }
  where
859
    (env_alt, case_bndr') = substBndr env case_bndr
860
    spec_alt (con, args, rhs) = do
861
          (rhs', uds) <- specExpr env_rhs rhs
862 863 864
          let (free_uds, dumped_dbs) = dumpUDs (case_bndr' : args') uds
          return ((con, args', wrapDictBindsE dumped_dbs rhs'), free_uds)
        where
865
          (env_rhs, args') = substBndrs env_alt args
866
\end{code}
867

868 869 870 871 872 873
Note [Floating dictionaries out of cases]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Consider
   g = \d. case d of { MkD sc ... -> ...(f sc)... }
Naively we can't float d2's binding out of the case expression,
because 'sc' is bound by the case, and that in turn means we can't
874
specialise f, which seems a pity.
875

876
So we invert the case, by floating out a binding
877 878 879 880 881 882 883 884
for 'sc_flt' thus:
    sc_flt = case d of { MkD sc ... -> sc }
Now we can float the call instance for 'f'.  Indeed this is just
what'll happen if 'sc' was originally bound with a let binding,
but case is more efficient, and necessary with equalities. So it's
good to work with both.

You might think that this won't make any difference, because the
885
call instance will only get nuked by the \d.  BUT if 'g' itself is
886 887 888 889 890 891 892 893 894 895 896 897 898
specialised, then transitively we should be able to specialise f.

In general, given
   case e of cb { MkD sc ... -> ...(f sc)... }
we transform to
   let cb_flt = e
       sc_flt = case cb_flt of { MkD sc ... -> sc }
   in
   case cb_flt of bg { MkD sc ... -> ....(f sc_flt)... }

The "_flt" things are the floated binds; we use the current substitution
to substitute sc -> sc_flt in the RHS

899
%************************************************************************
900
%*                                                                      *
901
                     Dealing with a binding
902
%*                                                                      *
903 904 905
%************************************************************************

\begin{code}
906
specBind :: SpecEnv                     -- Use this for RHSs
907 908 909 910
         -> CoreBind
         -> UsageDetails                -- Info on how the scope of the binding
         -> SpecM ([CoreBind],          -- New bindings
                   UsageDetails)        -- And info to pass upstream
911

912 913
-- Returned UsageDetails:
--    No calls for binders of this bind
914 915 916
specBind rhs_env (NonRec fn rhs) body_uds
  = do { (rhs', rhs_uds) <- specExpr rhs_env rhs
       ; (fn', spec_defns, body_uds1) <- specDefn rhs_env body_uds fn rhs
917

918
       ; let pairs = spec_defns ++ [(fn', rhs')]
919 920
                        -- fn' mentions the spec_defns in its rules,
                        -- so put the latter first
921

922
             combined_uds = body_uds1 `plusUDs` rhs_uds
923 924 925 926
                -- This way round a call in rhs_uds of a function f
                -- at type T will override a call of f at T in body_uds1; and
                -- that is good because it'll tend to keep "earlier" calls
                -- See Note [Specialisation of dictionary functions]
927

928 929
             (free_uds, dump_dbs, float_all) = dumpBindUDs [fn] combined_uds
                -- See Note [From non-recursive to recursive]
930 931 932 933

             final_binds | isEmptyBag dump_dbs = [NonRec b r | (b,r) <- pairs]
                         | otherwise = [Rec (flattenDictBinds dump_dbs pairs)]

934 935 936 937
         ; if float_all then
             -- Rather than discard the calls mentioning the bound variables
             -- we float this binding along with the others
              return ([], free_uds `snocDictBinds` final_binds)
938
           else
939 940 941
             -- No call in final_uds mentions bound variables,
             -- so we can just leave the binding here
              return (final_binds, free_uds) }
942 943


944
specBind rhs_env (Rec pairs) body_uds
945 946
       -- Note [Specialising a recursive group]
  = do { let (bndrs,rhss) = unzip pairs
947
       ; (rhss', rhs_uds) <- mapAndCombineSM (specExpr rhs_env) rhss
948
       ; let scope_uds = body_uds `plusUDs` rhs_uds
949
                       -- Includes binds and calls arising from rhss
950

951
       ; (bndrs1, spec_defns1, uds1) <- specDefns rhs_env scope_uds pairs
952 953 954

       ; (bndrs3, spec_defns3, uds3)
             <- if null spec_defns1  -- Common case: no specialisation
955 956
                then return (bndrs1, [], uds1)
                else do {            -- Specialisation occurred; do it again
957
                          (bndrs2, spec_defns2, uds2)
958
                              <- specDefns rhs_env uds1 (bndrs1 `zip` rhss)
959 960 961 962 963
                        ; return (bndrs2, spec_defns2 ++ spec_defns1, uds2) }

       ; let (final_uds, dumped_dbs, float_all) = dumpBindUDs bndrs uds3
             bind = Rec (flattenDictBinds dumped_dbs $
                         spec_defns3 ++ zip bndrs3 rhss')
964

965
       ; if float_all then
966
              return ([], final_uds `snocDictBind` bind)
967
           else
968
              return ([bind], final_uds) }
969 970 971


---------------------------
972
specDefns :: SpecEnv
973 974 975 976 977
          -> UsageDetails               -- Info on how it is used in its scope
          -> [(Id,CoreExpr)]            -- The things being bound and their un-processed RHS
          -> SpecM ([Id],               -- Original Ids with RULES added
                    [(Id,CoreExpr)],    -- Extra, specialised bindings
                    UsageDetails)       -- Stuff to fling upwards from the specialised versions
978 979 980 981 982 983 984

-- Specialise a list of bindings (the contents of a Rec), but flowing usages
-- upwards binding by binding.  Example: { f = ...g ...; g = ...f .... }
-- Then if the input CallDetails has a specialised call for 'g', whose specialisation
-- in turn generates a specialised call for 'f', we catch that in this one sweep.
-- But not vice versa (it's a fixpoint problem).

985
specDefns _env uds []
986
  = return ([], [], uds)
987 988 989
specDefns env uds ((bndr,rhs):pairs)
  = do { (bndrs1, spec_defns1, uds1) <- specDefns env uds pairs
       ; (bndr1, spec_defns2, uds2)  <- specDefn env uds1 bndr rhs
990
       ; return (bndr1 : bndrs1, spec_defns1 ++ spec_defns2, uds2) }
991 992

---------------------------
993
specDefn :: SpecEnv
994 995 996