Specialise.lhs 76.6 KB
Newer Older
1
%
2
% (c) The GRASP/AQUA Project, Glasgow University, 1993-1998
3 4 5 6
%
\section[Specialise]{Stamping out overloading, and (optionally) polymorphism}

\begin{code}
sof's avatar
sof committed
7
module Specialise ( specProgram ) where
8

9
#include "HsVersions.h"
10

11 12
import Id
import TcType
13
import Type
14
import CoreMonad
15
import CoreSubst
16
import CoreUnfold
17 18
import VarSet
import VarEnv
19
import CoreSyn
20
import Rules
21 22
import CoreUtils        ( exprIsTrivial, applyTypeToArgs )
import CoreFVs          ( exprFreeVars, exprsFreeVars, idFreeVars )
23
import UniqSupply
24
import Name
25 26 27
import MkId             ( voidArgId, realWorldPrimId )
import Maybes           ( catMaybes, isJust )
import BasicTypes
28
import HscTypes
29
import Bag
30
import DynFlags
Ian Lynagh's avatar
Ian Lynagh committed
31
import Util
32
import Outputable
33
import FastString
34
import State
35

36
import Control.Monad
37 38 39
import Data.Map (Map)
import qualified Data.Map as Map
import qualified FiniteMap as Map
40 41 42
\end{code}

%************************************************************************
43
%*                                                                      *
44
\subsection[notes-Specialise]{Implementation notes [SLPJ, Aug 18 1993]}
45
%*                                                                      *
46 47 48
%************************************************************************

These notes describe how we implement specialisation to eliminate
sof's avatar
sof committed
49
overloading.
50

sof's avatar
sof committed
51
The specialisation pass works on Core
52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71
syntax, complete with all the explicit dictionary application,
abstraction and construction as added by the type checker.  The
existing type checker remains largely as it is.

One important thought: the {\em types} passed to an overloaded
function, and the {\em dictionaries} passed are mutually redundant.
If the same function is applied to the same type(s) then it is sure to
be applied to the same dictionary(s)---or rather to the same {\em
values}.  (The arguments might look different but they will evaluate
to the same value.)

Second important thought: we know that we can make progress by
treating dictionary arguments as static and worth specialising on.  So
we can do without binding-time analysis, and instead specialise on
dictionary arguments and no others.

The basic idea
~~~~~~~~~~~~~~
Suppose we have

72 73
        let f = <f_rhs>
        in <body>
74

75
and suppose f is overloaded.
76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92

STEP 1: CALL-INSTANCE COLLECTION

We traverse <body>, accumulating all applications of f to types and
dictionaries.

(Might there be partial applications, to just some of its types and
dictionaries?  In principle yes, but in practice the type checker only
builds applications of f to all its types and dictionaries, so partial
applications could only arise as a result of transformation, and even
then I think it's unlikely.  In any case, we simply don't accumulate such
partial applications.)


STEP 2: EQUIVALENCES

So now we have a collection of calls to f:
93 94 95
        f t1 t2 d1 d2
        f t3 t4 d3 d4
        ...
96 97 98 99 100 101 102 103 104 105 106
Notice that f may take several type arguments.  To avoid ambiguity, we
say that f is called at type t1/t2 and t3/t4.

We take equivalence classes using equality of the *types* (ignoring
the dictionary args, which as mentioned previously are redundant).

STEP 3: SPECIALISATION

For each equivalence class, choose a representative (f t1 t2 d1 d2),
and create a local instance of f, defined thus:

107
        f@t1/t2 = <f_rhs> t1 t2 d1 d2
108

sof's avatar
sof committed
109 110 111 112 113 114
f_rhs presumably has some big lambdas and dictionary lambdas, so lots
of simplification will now result.  However we don't actually *do* that
simplification.  Rather, we leave it for the simplifier to do.  If we
*did* do it, though, we'd get more call instances from the specialised
RHS.  We can work out what they are by instantiating the call-instance
set from f's RHS with the types t1, t2.
115 116 117 118 119 120 121 122 123 124

Add this new id to f's IdInfo, to record that f has a specialised version.

Before doing any of this, check that f's IdInfo doesn't already
tell us about an existing instance of f at the required type/s.
(This might happen if specialisation was applied more than once, or
it might arise from user SPECIALIZE pragmas.)

Recursion
~~~~~~~~~
125
Wait a minute!  What if f is recursive?  Then we can't just plug in
126 127 128 129 130
its right-hand side, can we?

But it's ok.  The type checker *always* creates non-recursive definitions
for overloaded recursive functions.  For example:

131
        f x = f (x+x)           -- Yes I know its silly
132 133 134

becomes

135 136 137 138 139
        f a (d::Num a) = let p = +.sel a d
                         in
                         letrec fl (y::a) = fl (p y y)
                         in
                         fl
140

sof's avatar
sof committed
141 142
We still have recusion for non-overloaded functions which we
speciailise, but the recursive call should get specialised to the
143 144 145 146 147 148 149 150 151 152 153 154 155 156 157
same recursive version.


Polymorphism 1
~~~~~~~~~~~~~~

All this is crystal clear when the function is applied to *constant
types*; that is, types which have no type variables inside.  But what if
it is applied to non-constant types?  Suppose we find a call of f at type
t1/t2.  There are two possibilities:

(a) The free type variables of t1, t2 are in scope at the definition point
of f.  In this case there's no problem, we proceed just as before.  A common
example is as follows.  Here's the Haskell:

158 159
        g y = let f x = x+x
              in f y + f y
160 161 162

After typechecking we have

163 164
        g a (d::Num a) (y::a) = let f b (d'::Num b) (x::b) = +.sel b d' x x
                                in +.sel a d (f a d y) (f a d y)
165 166 167 168

Notice that the call to f is at type type "a"; a non-constant type.
Both calls to f are at the same type, so we can specialise to give:

169 170
        g a (d::Num a) (y::a) = let f@a (x::a) = +.sel a d x x
                                in +.sel a d (f@a y) (f@a y)
171 172 173 174 175


(b) The other case is when the type variables in the instance types
are *not* in scope at the definition point of f.  The example we are
working with above is a good case.  There are two instances of (+.sel a d),
176
but "a" is not in scope at the definition of +.sel.  Can we do anything?
177 178 179
Yes, we can "common them up", a sort of limited common sub-expression deal.
This would give:

180 181 182
        g a (d::Num a) (y::a) = let +.sel@a = +.sel a d
                                    f@a (x::a) = +.sel@a x x
                                in +.sel@a (f@a y) (f@a y)
183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200

This can save work, and can't be spotted by the type checker, because
the two instances of +.sel weren't originally at the same type.

Further notes on (b)

* There are quite a few variations here.  For example, the defn of
  +.sel could be floated ouside the \y, to attempt to gain laziness.
  It certainly mustn't be floated outside the \d because the d has to
  be in scope too.

* We don't want to inline f_rhs in this case, because
that will duplicate code.  Just commoning up the call is the point.

* Nothing gets added to +.sel's IdInfo.

* Don't bother unless the equivalence class has more than one item!

201
Not clear whether this is all worth it.  It is of course OK to
202 203 204 205 206 207
simply discard call-instances when passing a big lambda.

Polymorphism 2 -- Overloading
~~~~~~~~~~~~~~
Consider a function whose most general type is

208
        f :: forall a b. Ord a => [a] -> b -> b
209 210 211 212 213 214 215 216 217

There is really no point in making a version of g at Int/Int and another
at Int/Bool, because it's only instancing the type variable "a" which
buys us any efficiency. Since g is completely polymorphic in b there
ain't much point in making separate versions of g for the different
b types.

That suggests that we should identify which of g's type variables
are constrained (like "a") and which are unconstrained (like "b").
218
Then when taking equivalence classes in STEP 2, we ignore the type args
219 220 221
corresponding to unconstrained type variable.  In STEP 3 we make
polymorphic versions.  Thus:

222
        f@t1/ = /\b -> <f_rhs> t1 b d1 d2
223

sof's avatar
sof committed
224
We do this.
225 226


sof's avatar
sof committed
227 228 229
Dictionary floating
~~~~~~~~~~~~~~~~~~~
Consider this
230

231 232 233
        f a (d::Num a) = let g = ...
                         in
                         ...(let d1::Ord a = Num.Ord.sel a d in g a d1)...
234

sof's avatar
sof committed
235 236 237 238 239
Here, g is only called at one type, but the dictionary isn't in scope at the
definition point for g.  Usually the type checker would build a
definition for d1 which enclosed g, but the transformation system
might have moved d1's defn inward.  Solution: float dictionary bindings
outwards along with call instances.
240 241 242

Consider

243 244 245 246
        f x = let g p q = p==q
                  h r s = (r+s, g r s)
              in
              h x x
247 248 249 250


Before specialisation, leaving out type abstractions we have

251 252 253 254 255 256 257
        f df x = let g :: Eq a => a -> a -> Bool
                     g dg p q = == dg p q
                     h :: Num a => a -> a -> (a, Bool)
                     h dh r s = let deq = eqFromNum dh
                                in (+ dh r s, g deq r s)
              in
              h df x x
258 259 260

After specialising h we get a specialised version of h, like this:

261 262
                    h' r s = let deq = eqFromNum df
                             in (+ df r s, g deq r s)
263 264

But we can't naively make an instance for g from this, because deq is not in scope
265
at the defn of g.  Instead, we have to float out the (new) defn of deq
266 267 268 269 270 271 272 273
to widen its scope.  Notice that this floating can't be done in advance -- it only
shows up when specialisation is done.

User SPECIALIZE pragmas
~~~~~~~~~~~~~~~~~~~~~~~
Specialisation pragmas can be digested by the type checker, and implemented
by adding extra definitions along with that of f, in the same way as before

274
        f@t1/t2 = <f_rhs> t1 t2 d1 d2
275 276 277 278

Indeed the pragmas *have* to be dealt with by the type checker, because
only it knows how to build the dictionaries d1 and d2!  For example

279 280
        g :: Ord a => [a] -> [a]
        {-# SPECIALIZE f :: [Tree Int] -> [Tree Int] #-}
281 282 283 284 285 286 287 288 289 290 291

Here, the specialised version of g is an application of g's rhs to the
Ord dictionary for (Tree Int), which only the type checker can conjure
up.  There might not even *be* one, if (Tree Int) is not an instance of
Ord!  (All the other specialision has suitable dictionaries to hand
from actual calls.)

Problem.  The type checker doesn't have to hand a convenient <f_rhs>, because
it is buried in a complex (as-yet-un-desugared) binding group.
Maybe we should say

292
        f@t1/t2 = f* t1 t2 d1 d2
293 294 295 296 297 298 299 300 301

where f* is the Id f with an IdInfo which says "inline me regardless!".
Indeed all the specialisation could be done in this way.
That in turn means that the simplifier has to be prepared to inline absolutely
any in-scope let-bound thing.


Again, the pragma should permit polymorphism in unconstrained variables:

302 303
        h :: Ord a => [a] -> b -> b
        {-# SPECIALIZE h :: [Int] -> b -> b #-}
304 305 306

We *insist* that all overloaded type variables are specialised to ground types,
(and hence there can be no context inside a SPECIALIZE pragma).
307
We *permit* unconstrained type variables to be specialised to
308 309
        - a ground type
        - or left as a polymorphic type variable
310 311
but nothing in between.  So

312
        {-# SPECIALIZE h :: [Int] -> [c] -> [c] #-}
313

314 315 316 317 318 319 320 321
is *illegal*.  (It can be handled, but it adds complication, and gains the
programmer nothing.)


SPECIALISING INSTANCE DECLARATIONS
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Consider

322 323 324
        instance Foo a => Foo [a] where
                ...
        {-# SPECIALIZE instance Foo [Int] #-}
325 326 327 328

The original instance decl creates a dictionary-function
definition:

329
        dfun.Foo.List :: forall a. Foo a -> Foo [a]
330 331 332 333

The SPECIALIZE pragma just makes a specialised copy, just as for
ordinary function definitions:

334 335
        dfun.Foo.List@Int :: Foo [Int]
        dfun.Foo.List@Int = dfun.Foo.List Int dFooInt
336 337 338 339 340 341 342 343 344 345 346 347 348 349

The information about what instance of the dfun exist gets added to
the dfun's IdInfo in the same way as a user-defined function too.


Automatic instance decl specialisation?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Can instance decls be specialised automatically?  It's tricky.
We could collect call-instance information for each dfun, but
then when we specialised their bodies we'd get new call-instances
for ordinary functions; and when we specialised their bodies, we might get
new call-instances of the dfuns, and so on.  This all arises because of
the unrestricted mutual recursion between instance decls and value decls.

sof's avatar
sof committed
350 351 352
Still, there's no actual problem; it just means that we may not do all
the specialisation we could theoretically do.

353 354 355 356 357
Furthermore, instance decls are usually exported and used non-locally,
so we'll want to compile enough to get those specialisations done.

Lastly, there's no such thing as a local instance decl, so we can
survive solely by spitting out *usage* information, and then reading that
358
back in as a pragma when next compiling the file.  So for now,
359 360 361 362 363 364 365 366 367 368 369 370 371
we only specialise instance decls in response to pragmas.


SPITTING OUT USAGE INFORMATION
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

To spit out usage information we need to traverse the code collecting
call-instance information for all imported (non-prelude?) functions
and data types. Then we equivalence-class it and spit it out.

This is done at the top-level when all the call instances which escape
must be for imported functions and data types.

sof's avatar
sof committed
372 373
*** Not currently done ***

374 375 376 377 378

Partial specialisation by pragmas
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
What about partial specialisation:

379 380
        k :: (Ord a, Eq b) => [a] -> b -> b -> [a]
        {-# SPECIALIZE k :: Eq b => [Int] -> b -> b -> [a] #-}
381 382 383

or even

384
        {-# SPECIALIZE k :: Eq b => [Int] -> [b] -> [b] -> [a] #-}
385 386 387

Seems quite reasonable.  Similar things could be done with instance decls:

388 389 390 391
        instance (Foo a, Foo b) => Foo (a,b) where
                ...
        {-# SPECIALIZE instance Foo a => Foo (a,Int) #-}
        {-# SPECIALIZE instance Foo b => Foo (Int,b) #-}
392 393 394 395 396 397 398 399 400 401 402

Ho hum.  Things are complex enough without this.  I pass.


Requirements for the simplifer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The simplifier has to be able to take advantage of the specialisation.

* When the simplifier finds an application of a polymorphic f, it looks in
f's IdInfo in case there is a suitable instance to call instead.  This converts

403
        f t1 t2 d1 d2   ===>   f_t1_t2
404 405 406 407 408 409

Note that the dictionaries get eaten up too!

* Dictionary selection operations on constant dictionaries must be
  short-circuited:

410
        +.sel Int d     ===>  +Int
411 412 413 414 415 416 417 418 419 420 421

The obvious way to do this is in the same way as other specialised
calls: +.sel has inside it some IdInfo which tells that if it's applied
to the type Int then it should eat a dictionary and transform to +Int.

In short, dictionary selectors need IdInfo inside them for constant
methods.

* Exactly the same applies if a superclass dictionary is being
  extracted:

422
        Eq.sel Int d   ===>   dEqInt
423 424 425 426 427

* Something similar applies to dictionary construction too.  Suppose
dfun.Eq.List is the function taking a dictionary for (Eq a) to
one for (Eq [a]).  Then we want

428
        dfun.Eq.List Int d      ===> dEq.List_Int
429 430 431 432 433 434 435

Where does the Eq [Int] dictionary come from?  It is built in
response to a SPECIALIZE pragma on the Eq [a] instance decl.

In short, dfun Ids need IdInfo with a specialisation for each
constant instance of their instance declaration.

sof's avatar
sof committed
436 437
All this uses a single mechanism: the SpecEnv inside an Id

438 439 440 441

What does the specialisation IdInfo look like?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

sof's avatar
sof committed
442 443
The SpecEnv of an Id maps a list of types (the template) to an expression

444
        [Type]  |->  Expr
445

446
For example, if f has this SpecInfo:
447

448
        [Int, a]  ->  \d:Ord Int. f' a
449

sof's avatar
sof committed
450
it means that we can replace the call
451

452
        f Int t  ===>  (\d. f' t)
sof's avatar
sof committed
453 454 455

This chucks one dictionary away and proceeds with the
specialised version of f, namely f'.
456 457 458 459 460 461 462


What can't be done this way?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
There is no way, post-typechecker, to get a dictionary for (say)
Eq a from a dictionary for Eq [a].  So if we find

463
        ==.sel [t] d
464

465
we can't transform to
466

467
        eqList (==.sel t d')
468

469
where
470
        eqList :: (a->a->Bool) -> [a] -> [a] -> Bool
471 472 473 474 475 476 477 478 479

Of course, we currently have no way to automatically derive
eqList, nor to connect it to the Eq [a] instance decl, but you
can imagine that it might somehow be possible.  Taking advantage
of this is permanently ruled out.

Still, this is no great hardship, because we intend to eliminate
overloading altogether anyway!

sof's avatar
sof committed
480 481 482 483
A note about non-tyvar dictionaries
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Some Ids have types like

484
        forall a,b,c. Eq a -> Ord [a] -> tau
sof's avatar
sof committed
485 486 487 488 489 490 491 492 493

This seems curious at first, because we usually only have dictionary
args whose types are of the form (C a) where a is a type variable.
But this doesn't hold for the functions arising from instance decls,
which sometimes get arguements with types of form (C (T a)) for some
type constructor T.

Should we specialise wrt this compound-type dictionary?  We used to say
"no", saying:
494 495 496 497 498 499
        "This is a heuristic judgement, as indeed is the fact that we
        specialise wrt only dictionaries.  We choose *not* to specialise
        wrt compound dictionaries because at the moment the only place
        they show up is in instance decls, where they are simply plugged
        into a returned dictionary.  So nothing is gained by specialising
        wrt them."
sof's avatar
sof committed
500 501

But it is simpler and more uniform to specialise wrt these dicts too;
502
and in future GHC is likely to support full fledged type signatures
sof's avatar
sof committed
503
like
504
        f :: Eq [(a,b)] => ...
sof's avatar
sof committed
505

506

507
%************************************************************************
508
%*                                                                      *
509
\subsubsection{The new specialiser}
510
%*                                                                      *
511 512 513
%************************************************************************

Our basic game plan is this.  For let(rec) bound function
514
        f :: (C a, D c) => (a,b,c,d) -> Bool
515

516
* Find any specialised calls of f, (f ts ds), where
517 518 519 520 521
  ts are the type arguments t1 .. t4, and
  ds are the dictionary arguments d1 .. d2.

* Add a new definition for f1 (say):

522
        f1 = /\ b d -> (..body of f..) t1 b t3 d d1 d2
523 524 525 526 527

  Note that we abstract over the unconstrained type arguments.

* Add the mapping

528
        [t1,b,t3,d]  |->  \d1 d2 -> f1 b d
529 530

  to the specialisations of f.  This will be used by the
531 532
  simplifier to replace calls
                (f t1 t2 t3 t4) da db
533
  by
534
                (\d1 d1 -> f1 t2 t4) da db
535 536 537 538 539 540 541 542 543 544 545 546

  All the stuff about how many dictionaries to discard, and what types
  to apply the specialised function to, are handled by the fact that the
  SpecEnv contains a template for the result of the specialisation.

We don't build *partial* specialisations for f.  For example:

  f :: Eq a => a -> a -> Bool
  {-# SPECIALISE f :: (Eq b, Eq c) => (b,c) -> (b,c) -> Bool #-}

Here, little is gained by making a specialised copy of f.
There's a distinct danger that the specialised version would
547
first build a dictionary for (Eq b, Eq c), and then select the (==)
548 549 550 551 552 553 554
method from it!  Even if it didn't, not a great deal is saved.

We do, however, generate polymorphic, but not overloaded, specialisations:

  f :: Eq a => [a] -> b -> b -> b
  {#- SPECIALISE f :: [Int] -> b -> b -> b #-}

555
Hence, the invariant is this:
556

557
        *** no specialised version is overloaded ***
558 559


560
%************************************************************************
561
%*                                                                      *
562
\subsubsection{The exported function}
563
%*                                                                      *
564 565 566
%************************************************************************

\begin{code}
567 568
specProgram :: DynFlags -> ModGuts -> CoreM ModGuts
specProgram dflags guts
569 570 571 572
  = do { hpt_rules <- getRuleBase
       ; let local_rules = mg_rules guts
             rule_base = extendRuleBaseList hpt_rules (mg_rules guts)

573
             -- Specialise the bindings of this module
574
       ; (binds', uds) <- runSpecM dflags (go (mg_binds guts))
575

576
             -- Specialise imported functions
577
       ; (new_rules, spec_binds) <- specImports dflags emptyVarSet rule_base uds
578

579 580
       ; let final_binds | null spec_binds = binds'
                         | otherwise       = Rec (flattenBinds spec_binds) : binds'
581
                   -- Note [Glom the bindings if imported functions are specialised]
582 583 584

       ; return (guts { mg_binds = final_binds
                      , mg_rules = new_rules ++ local_rules }) }
585
  where
586 587 588 589 590 591
        -- We need to start with a Subst that knows all the things
        -- that are in scope, so that the substitution engine doesn't
        -- accidentally re-use a unique that's already in use
        -- Easiest thing is to do it all at once, as if all the top-level
        -- decls were mutually recursive
    top_subst = mkEmptySubst $ mkInScopeSet $ mkVarSet $
592
                bindersOfBinds $ mg_binds guts
593

594 595 596 597
    go []           = return ([], emptyUDs)
    go (bind:binds) = do (binds', uds) <- go binds
                         (bind', uds') <- specBind top_subst bind uds
                         return (bind' ++ binds', uds')
598

599 600
specImports :: DynFlags
            -> VarSet           -- Don't specialise these ones
601 602 603 604
                                -- See Note [Avoiding recursive specialisation]
            -> RuleBase         -- Rules from this module and the home package
                                -- (but not external packages, which can change)
            -> UsageDetails     -- Calls for imported things, and floating bindings
605 606
            -> CoreM ( [CoreRule]   -- New rules
                     , [CoreBind] ) -- Specialised bindings and floating bindings
607
-- See Note [Specialise imported INLINABLE things]
608
specImports dflags done rb uds
609 610 611 612 613 614
  = do { let import_calls = varEnvElts (ud_calls uds)
       ; (rules, spec_binds) <- go rb import_calls
       ; return (rules, wrapDictBinds (ud_binds uds) spec_binds) }
  where
    go _ [] = return ([], [])
    go rb (CIS fn calls_for_fn : other_calls)
615
      = do { (rules1, spec_binds1) <- specImport dflags done rb fn (Map.toList calls_for_fn)
616 617 618
           ; (rules2, spec_binds2) <- go (extendRuleBaseList rb rules1) other_calls
           ; return (rules1 ++ rules2, spec_binds1 ++ spec_binds2) }

619 620
specImport :: DynFlags
           -> VarSet                -- Don't specialise these
621 622 623
                                    -- See Note [Avoiding recursive specialisation]
           -> RuleBase              -- Rules from this module
           -> Id -> [CallInfo]      -- Imported function and calls for it
624 625
           -> CoreM ( [CoreRule]    -- New rules
                    , [CoreBind] )  -- Specialised bindings
626
specImport dflags done rb fn calls_for_fn
627 628
  | fn `elemVarSet` done
  = return ([], [])     -- No warning.  This actually happens all the time
Gabor Greif's avatar
typos  
Gabor Greif committed
629
                        -- when specialising a recursive function, because
630 631
                        -- the RHS of the specialised function contains a recursive
                        -- call to the original function
632 633

  | isInlinablePragma (idInlinePragma fn)
634 635
  , Just rhs <- maybeUnfoldingTemplate (realIdUnfolding fn)
  = do {     -- Get rules from the external package state
636 637
             -- We keep doing this in case we "page-fault in"
             -- more rules as we go along
638
       ; hsc_env <- getHscEnv
639
       ; eps <- liftIO $ hscEPS hsc_env
640
       ; let full_rb = unionRuleBase rb (eps_rule_base eps)
641
             rules_for_fn = getRules full_rb fn
642

643
       ; (rules1, spec_pairs, uds) <- runSpecM dflags $
644 645
              specCalls emptySubst rules_for_fn calls_for_fn fn rhs
       ; let spec_binds1 = [NonRec b r | (b,r) <- spec_pairs]
646 647
             -- After the rules kick in we may get recursion, but
             -- we rely on a global GlomBinds to sort that out later
648
             -- See Note [Glom the bindings if imported functions are specialised]
649 650

              -- Now specialise any cascaded calls
651 652 653
       ; (rules2, spec_binds2) <- specImports dflags (extendVarSet done fn)
                                                     (extendRuleBaseList rb rules1)
                                                     uds
654 655 656 657 658

       ; return (rules2 ++ rules1, spec_binds2 ++ spec_binds1) }

  | otherwise
  = WARN( True, ptext (sLit "specImport discard") <+> ppr fn <+> ppr calls_for_fn )
659
    return ([], [])
660 661
\end{code}

662 663 664 665 666 667 668 669
Note [Specialise imported INLINABLE things]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
We specialise INLINABLE things but not INLINE things.  The latter
should be inlined bodily, so not much point in specialising them.
Moreover, we risk lots of orphan modules from vigorous specialisation.

Note [Glom the bindings if imported functions are specialised]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
670
Suppose we have an imported, *recursive*, INLINABLE function
671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690
   f :: Eq a => a -> a
   f = /\a \d x. ...(f a d)...
In the module being compiled we have
   g x = f (x::Int)
Now we'll make a specialised function
   f_spec :: Int -> Int
   f_spec = \x -> ...(f Int dInt)...
   {-# RULE  f Int _ = f_spec #-}
   g = \x. f Int dInt x
Note that f_spec doesn't look recursive
After rewriting with the RULE, we get
   f_spec = \x -> ...(f_spec)...
BUT since f_spec was non-recursive before it'll *stay* non-recursive.
The occurrence analyser never turns a NonRec into a Rec.  So we must
make sure that f_spec is recursive.  Easiest thing is to make all
the specialisations for imported bindings recursive.


Note [Avoiding recursive specialisation]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
691 692 693 694 695
When we specialise 'f' we may find new overloaded calls to 'g', 'h' in
'f's RHS.  So we want to specialise g,h.  But we don't want to
specialise f any more!  It's possible that f's RHS might have a
recursive yet-more-specialised call, so we'd diverge in that case.
And if the call is to the same type, one specialisation is enough.
696
Avoiding this recursive specialisation loop is the reason for the
697 698
'done' VarSet passed to specImports and specImport.

699
%************************************************************************
700
%*                                                                      *
701
\subsubsection{@specExpr@: the main function}
702
%*                                                                      *
703 704
%************************************************************************

705
\begin{code}
706
specVar :: Subst -> Id -> CoreExpr
707
specVar subst v = lookupIdSubst (text "specVar") subst v
708 709 710

specExpr :: Subst -> CoreExpr -> SpecM (CoreExpr, UsageDetails)
-- We carry a substitution down:
711 712 713 714
--      a) we must clone any binding that might float outwards,
--         to avoid name clashes
--      b) we carry a type substitution to use when analysing
--         the RHS of specialised bindings (no type-let!)
715 716

---------------- First the easy cases --------------------
717
specExpr subst (Type ty) = return (Type (CoreSubst.substTy subst ty), emptyUDs)
718
specExpr subst (Coercion co) = return (Coercion (CoreSubst.substCo subst co), emptyUDs)
719
specExpr subst (Var v)   = return (specVar subst v,         emptyUDs)
720
specExpr _     (Lit lit) = return (Lit lit,                 emptyUDs)
721 722
specExpr subst (Cast e co) = do
    (e', uds) <- specExpr subst e
723
    return ((Cast e' (CoreSubst.substCo subst co)), uds)
724
specExpr subst (Tick tickish body) = do
725
    (body', uds) <- specExpr subst body
726
    return (Tick (specTickish subst tickish) body', uds)
727 728 729


---------------- Applications might generate a call instance --------------------
730
specExpr subst expr@(App {})
731
  = go expr []
732
  where
733 734 735
    go (App fun arg) args = do (arg', uds_arg) <- specExpr subst arg
                               (fun', uds_app) <- go fun (arg':args)
                               return (App fun' arg', uds_arg `plusUDs` uds_app)
736

737
    go (Var f)       args = case specVar subst f of
738
                                Var f' -> return (Var f', mkCallUDs f' args)
739 740
                                e'     -> return (e', emptyUDs) -- I don't expect this!
    go other         _    = specExpr subst other
741 742

---------------- Lambda/case require dumping of usage details --------------------
743 744
specExpr subst e@(Lam _ _) = do
    (body', uds) <- specExpr subst' body
745
    let (free_uds, dumped_dbs) = dumpUDs bndrs' uds
746
    return (mkLams bndrs' (wrapDictBindsE dumped_dbs body'), free_uds)
747
  where
748
    (bndrs, body) = collectBinders e
749
    (subst', bndrs') = substBndrs subst bndrs
750 751
        -- More efficient to collect a group of binders together all at once
        -- and we don't want to split a lambda group with dumped bindings
752

753
specExpr subst (Case scrut case_bndr ty alts)
754
  = do { (scrut', scrut_uds) <- specExpr subst scrut
755 756
       ; (scrut'', case_bndr', alts', alts_uds)
             <- specCase subst scrut' case_bndr alts
757 758
       ; return (Case scrut'' case_bndr' (CoreSubst.substTy subst ty) alts'
                , scrut_uds `plusUDs` alts_uds) }
759 760

---------------- Finally, let is the interesting case --------------------
761
specExpr subst (Let bind body) = do
762
        -- Clone binders
763
    (rhs_subst, body_subst, bind') <- cloneBindSM subst bind
764

765 766
        -- Deal with the body
    (body', body_uds) <- specExpr body_subst body
767

768 769 770 771 772
        -- Deal with the bindings
    (binds', uds) <- specBind rhs_subst bind' body_uds

        -- All done
    return (foldr Let body' binds', uds)
773

774 775 776 777 778 779
specTickish :: Subst -> Tickish Id -> Tickish Id
specTickish subst (Breakpoint ix ids)
  = Breakpoint ix [ id' | id <- ids, Var id' <- [specVar subst id]]
  -- drop vars from the list if they have a non-variable substitution.
  -- should never happen, but it's harmless to drop them anyway.
specTickish _ other_tickish = other_tickish
780

781 782
specCase :: Subst
         -> CoreExpr            -- Scrutinee, already done
783
         -> Id -> [CoreAlt]
784 785 786
         -> SpecM ( CoreExpr    -- New scrutinee
                  , Id
                  , [CoreAlt]
787 788
                  , UsageDetails)
specCase subst scrut' case_bndr [(con, args, rhs)]
789
  | isDictId case_bndr           -- See Note [Floating dictionaries out of cases]
790 791
  , interestingDict scrut'
  , not (isDeadBinder case_bndr && null sc_args')
792 793 794
  = do { dflags <- getDynFlags

       ; (case_bndr_flt : sc_args_flt) <- mapM clone_me (case_bndr' : sc_args')
795 796 797 798 799

       ; let sc_rhss = [ Case (Var case_bndr_flt) case_bndr' (idType sc_arg')
                              [(con, args', Var sc_arg')]
                       | sc_arg' <- sc_args' ]

800 801 802 803
             -- Extend the substitution for RHS to map the *original* binders
             -- to their floated verions.  Attach an unfolding to these floated
             -- binders so they look interesting to interestingDict
             mb_sc_flts :: [Maybe DictId]
804
             mb_sc_flts = map (lookupVarEnv clone_env) args'
805 806
             clone_env  = zipVarEnv sc_args' (zipWith (add_unf dflags) sc_args_flt sc_rhss)
             subst_prs  = (case_bndr, Var (add_unf dflags case_bndr_flt scrut'))
807
                        : [ (arg, Var sc_flt)
808 809
                          | (arg, Just sc_flt) <- args `zip` mb_sc_flts ]
             subst_rhs' = extendIdSubstList subst_rhs subst_prs
810

811 812 813 814 815 816 817 818 819 820 821 822 823
       ; (rhs',   rhs_uds)   <- specExpr subst_rhs' rhs
       ; let scrut_bind    = mkDB (NonRec case_bndr_flt scrut')
             case_bndr_set = unitVarSet case_bndr_flt
             sc_binds      = [(NonRec sc_arg_flt sc_rhs, case_bndr_set)
                             | (sc_arg_flt, sc_rhs) <- sc_args_flt `zip` sc_rhss ]
             flt_binds     = scrut_bind : sc_binds
             (free_uds, dumped_dbs) = dumpUDs (case_bndr':args') rhs_uds
             all_uds = flt_binds `addDictBinds` free_uds
             alt'    = (con, args', wrapDictBindsE dumped_dbs rhs')
       ; return (Var case_bndr_flt, case_bndr', [alt'], all_uds) }
  where
    (subst_rhs, (case_bndr':args')) = substBndrs subst (case_bndr:args)
    sc_args' = filter is_flt_sc_arg args'
824

825 826 827 828 829 830 831 832
    clone_me bndr = do { uniq <- getUniqueM
                       ; return (mkUserLocal occ uniq ty loc) }
       where
         name = idName bndr
         ty   = idType bndr
         occ  = nameOccName name
         loc  = getSrcSpan name

833 834
    add_unf dflags sc_flt sc_rhs  -- Sole purpose: make sc_flt respond True to interestingDictId
      = setIdUnfolding sc_flt (mkSimpleUnfolding dflags sc_rhs)
835 836 837 838

    arg_set = mkVarSet args'
    is_flt_sc_arg var =  isId var
                      && not (isDeadBinder var)
839
                      && isDictTy var_ty
840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855
                      && not (tyVarsOfType var_ty `intersectsVarSet` arg_set)
       where
         var_ty = idType var


specCase subst scrut case_bndr alts
  = do { (alts', uds_alts) <- mapAndCombineSM spec_alt alts
       ; return (scrut, case_bndr', alts', uds_alts) }
  where
    (subst_alt, case_bndr') = substBndr subst case_bndr
    spec_alt (con, args, rhs) = do
          (rhs', uds) <- specExpr subst_rhs rhs
          let (free_uds, dumped_dbs) = dumpUDs (case_bndr' : args') uds
          return ((con, args', wrapDictBindsE dumped_dbs rhs'), free_uds)
        where
          (subst_rhs, args') = substBndrs subst_alt args
856
\end{code}
857

858 859 860 861 862 863
Note [Floating dictionaries out of cases]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Consider
   g = \d. case d of { MkD sc ... -> ...(f sc)... }
Naively we can't float d2's binding out of the case expression,
because 'sc' is bound by the case, and that in turn means we can't
864
specialise f, which seems a pity.
865

866
So we invert the case, by floating out a binding
867 868 869 870 871 872 873 874
for 'sc_flt' thus:
    sc_flt = case d of { MkD sc ... -> sc }
Now we can float the call instance for 'f'.  Indeed this is just
what'll happen if 'sc' was originally bound with a let binding,
but case is more efficient, and necessary with equalities. So it's
good to work with both.

You might think that this won't make any difference, because the
875
call instance will only get nuked by the \d.  BUT if 'g' itself is
876 877 878 879 880 881 882 883 884 885 886 887 888
specialised, then transitively we should be able to specialise f.

In general, given
   case e of cb { MkD sc ... -> ...(f sc)... }
we transform to
   let cb_flt = e
       sc_flt = case cb_flt of { MkD sc ... -> sc }
   in
   case cb_flt of bg { MkD sc ... -> ....(f sc_flt)... }

The "_flt" things are the floated binds; we use the current substitution
to substitute sc -> sc_flt in the RHS

889
%************************************************************************
890
%*                                                                      *
891
                     Dealing with a binding
892
%*                                                                      *
893 894 895
%************************************************************************

\begin{code}
896 897 898 899 900
specBind :: Subst                       -- Use this for RHSs
         -> CoreBind
         -> UsageDetails                -- Info on how the scope of the binding
         -> SpecM ([CoreBind],          -- New bindings
                   UsageDetails)        -- And info to pass upstream
901

902 903 904 905 906
-- Returned UsageDetails:
--    No calls for binders of this bind
specBind rhs_subst (NonRec fn rhs) body_uds
  = do { (rhs', rhs_uds) <- specExpr rhs_subst rhs
       ; (fn', spec_defns, body_uds1) <- specDefn rhs_subst body_uds fn rhs
907

908
       ; let pairs = spec_defns ++ [(fn', rhs')]
909 910
                        -- fn' mentions the spec_defns in its rules,
                        -- so put the latter first
911

912
             combined_uds = body_uds1 `plusUDs` rhs_uds
913 914 915 916
                -- This way round a call in rhs_uds of a function f
                -- at type T will override a call of f at T in body_uds1; and
                -- that is good because it'll tend to keep "earlier" calls
                -- See Note [Specialisation of dictionary functions]
917

918 919
             (free_uds, dump_dbs, float_all) = dumpBindUDs [fn] combined_uds
                -- See Note [From non-recursive to recursive]
920 921 922 923

             final_binds | isEmptyBag dump_dbs = [NonRec b r | (b,r) <- pairs]
                         | otherwise = [Rec (flattenDictBinds dump_dbs pairs)]

924 925 926 927
         ; if float_all then
             -- Rather than discard the calls mentioning the bound variables
             -- we float this binding along with the others
              return ([], free_uds `snocDictBinds` final_binds)
928
           else
929 930 931
             -- No call in final_uds mentions bound variables,
             -- so we can just leave the binding here
              return (final_binds, free_uds) }
932 933 934


specBind rhs_subst (Rec pairs) body_uds
935 936 937
       -- Note [Specialising a recursive group]
  = do { let (bndrs,rhss) = unzip pairs
       ; (rhss', rhs_uds) <- mapAndCombineSM (specExpr rhs_subst) rhss
938
       ; let scope_uds = body_uds `plusUDs` rhs_uds
939
                       -- Includes binds and calls arising from rhss
940 941 942 943 944

       ; (bndrs1, spec_defns1, uds1) <- specDefns rhs_subst scope_uds pairs

       ; (bndrs3, spec_defns3, uds3)
             <- if null spec_defns1  -- Common case: no specialisation
945 946
                then return (bndrs1, [], uds1)
                else do {            -- Specialisation occurred; do it again
947 948 949 950 951 952 953
                          (bndrs2, spec_defns2, uds2)
                              <- specDefns rhs_subst uds1 (bndrs1 `zip` rhss)
                        ; return (bndrs2, spec_defns2 ++ spec_defns1, uds2) }

       ; let (final_uds, dumped_dbs, float_all) = dumpBindUDs bndrs uds3
             bind = Rec (flattenDictBinds dumped_dbs $
                         spec_defns3 ++ zip bndrs3 rhss')
954

955
       ; if float_all then
956
              return ([], final_uds `snocDictBind` bind)
957
           else
958
              return ([bind], final_uds) }
959 960 961 962


---------------------------
specDefns :: Subst
963 964 965 966 967
          -> UsageDetails               -- Info on how it is used in its scope
          -> [(Id,CoreExpr)]            -- The things being bound and their un-processed RHS
          -> SpecM ([Id],               -- Original Ids with RULES added
                    [(Id,CoreExpr)],    -- Extra, specialised bindings
                    UsageDetails)       -- Stuff to fling upwards from the specialised versions
968 969 970 971 972 973 974

-- Specialise a list of bindings (the contents of a Rec), but flowing usages
-- upwards binding by binding.  Example: { f = ...g ...; g = ...f .... }
-- Then if the input CallDetails has a specialised call for 'g', whose specialisation
-- in turn generates a specialised call for 'f', we catch that in this one sweep.
-- But not vice versa (it's a fixpoint problem).

975 976 977 978 979 980
specDefns _subst uds []
  = return ([], [], uds)
specDefns subst uds ((bndr,rhs):pairs)
  = do { (bndrs1, spec_defns1, uds1) <- specDefns subst uds pairs
       ; (bndr1, spec_defns2, uds2)  <- specDefn subst uds1 bndr rhs
       ; return (bndr1 : bndrs1, spec_defns1 ++ spec_defns2, uds2) }
981 982 983

---------------------------
specDefn :: Subst
984 985 986 987 988
         -> UsageDetails                -- Info on how it is used in its scope
         -> Id -> CoreExpr              -- The thing being bound and its un-processed RHS
         -> SpecM (Id,                  -- Original Id with added RULES
                   [(Id,CoreExpr)],     -- Extra, specialised bindings
                   UsageDetails)        -- Stuff to fling upwards from the specialised versions
989

990
specDefn subst body_uds fn rhs
991 992
  = do { let (body_uds_without_me, calls_for_me) = callsForMe fn body_uds
             rules_for_me = idCoreRules fn
993
       ; (rules, spec_defns, spec_uds) <- specCalls subst rules_for_me
994 995 996 997
                                                    calls_for_me fn rhs
       ; return ( fn `addIdSpecialisations` rules
                , spec_defns
                , body_uds_without_me `plusUDs` spec_uds) }
998 999 1000 1001 1002 1003
                -- It's important that the `plusUDs` is this way
                -- round, because body_uds_without_me may bind
                -- dictionaries that are used in calls_for_me passed
                -- to specDefn.  So the dictionary bindings in
                -- spec_uds may mention dictionaries bound in
                -- body_uds_without_me
1004 1005 1006

---------------------------
specCalls :: Subst
1007 1008 1009 1010 1011 1012
          -> [CoreRule]                 -- Existing RULES for the fn
          -> [CallInfo]
          -> Id -> CoreExpr
          -> SpecM ([CoreRule],         -- New RULES for the fn
                    [(Id,CoreExpr)],    -- Extra, specialised bindings
                    UsageDetails)       -- New usage details from the specialised RHSs
1013 1014

-- This function checks existing rules, and does not create
1015
-- duplicate ones. So the caller does not need to do this filtering.
1016 1017 1018
-- See 'already_covered'

specCalls subst rules_for_me calls_for_me fn rhs
1019
        -- The first case is the interesting one
1020
  |  rhs_tyvars `lengthIs`     n_tyvars -- Rhs of fn's defn has right number of big lambdas
1021 1022
  && rhs_ids    `lengthAtLeast` n_dicts -- and enough dict args
  && notNull calls_for_me               -- And there are some calls to specialise
1023
  && not (isNeverActive (idInlineActivation fn))
1024 1025
        -- Don't specialise NOINLINE things
        -- See Note [Auto-specialisation and RULES]
1026

1027 1028 1029
--   && not (certainlyWillInline (idUnfolding fn))      -- And it's not small
--      See Note [Inline specialisation] for why we do not
--      switch off specialisation for inline functions
1030

1031 1032
  = -- pprTrace "specDefn: some" (ppr fn $$ ppr calls_for_me $$ ppr rules_for_me) $
    do { stuff <- mapM spec_call calls_for_me
1033
       ; let (spec_defns, spec_uds, spec_rules) = unzip3 (catMaybes stuff)
1034
       ; return (spec_rules, spec_defns, plusUDList spec_uds) }
1035

1036 1037
  | otherwise   -- No calls or RHS doesn't fit our preconceptions
  = WARN( notNull calls_for_me, ptext (sLit "Missed specialisation opportunity for")
1038
                                 <+> ppr fn $$ _trace_doc )
1039
          -- Note [Specialisation shape]