1. 30 Aug, 2010 2 commits
  2. 30 Oct, 2009 1 commit
  3. 29 Oct, 2009 1 commit
    • simonpj@microsoft.com's avatar
      The Big INLINE Patch: totally reorganise way that INLINE pragmas work · 72462499
      simonpj@microsoft.com authored
      This patch has been a long time in gestation and has, as a
      result, accumulated some extra bits and bobs that are only
      loosely related.  I separated the bits that are easy to split
      off, but the rest comes as one big patch, I'm afraid.
      
      Note that:
       * It comes together with a patch to the 'base' library
       * Interface file formats change slightly, so you need to
         recompile all libraries
      
      The patch is mainly giant tidy-up, driven in part by the
      particular stresses of the Data Parallel Haskell project. I don't
      expect a big performance win for random programs.  Still, here are the
      nofib results, relative to the state of affairs without the patch
      
              Program           Size    Allocs   Runtime   Elapsed
      --------------------------------------------------------------------------------
                  Min         -12.7%    -14.5%    -17.5%    -17.8%
                  Max          +4.7%    +10.9%     +9.1%     +8.4%
       Geometric Mean          +0.9%     -0.1%     -5.6%     -7.3%
      
      The +10.9% allocation outlier is rewrite, which happens to have a
      very delicate optimisation opportunity involving an interaction
      of CSE and inlining (see nofib/Simon-nofib-notes). The fact that
      the 'before' case found the optimisation is somewhat accidental.
      Runtimes seem to go down, but I never kno wwhether to really trust
      this number.  Binary sizes wobble a bit, but nothing drastic.
      
      
      The Main Ideas are as follows.
      
      InlineRules
      ~~~~~~~~~~~
      When you say 
            {-# INLINE f #-}
            f x = <rhs>
      you intend that calls (f e) are replaced by <rhs>[e/x] So we
      should capture (\x.<rhs>) in the Unfolding of 'f', and never meddle
      with it.  Meanwhile, we can optimise <rhs> to our heart's content,
      leaving the original unfolding intact in Unfolding of 'f'.
      
      So the representation of an Unfolding has changed quite a bit
      (see CoreSyn).  An INLINE pragma gives rise to an InlineRule 
      unfolding.  
      
      Moreover, it's only used when 'f' is applied to the
      specified number of arguments; that is, the number of argument on 
      the LHS of the '=' sign in the original source definition. 
      For example, (.) is now defined in the libraries like this
         {-# INLINE (.) #-}
         (.) f g = \x -> f (g x)
      so that it'll inline when applied to two arguments. If 'x' appeared
      on the left, thus
         (.) f g x = f (g x)
      it'd only inline when applied to three arguments.  This slightly-experimental
      change was requested by Roman, but it seems to make sense.
      
      Other associated changes
      
      * Moving the deck chairs in DsBinds, which processes the INLINE pragmas
      
      * In the old system an INLINE pragma made the RHS look like
         (Note InlineMe <rhs>)
        The Note switched off optimisation in <rhs>.  But it was quite
        fragile in corner cases. The new system is more robust, I believe.
        In any case, the InlineMe note has disappeared 
      
      * The workerInfo of an Id has also been combined into its Unfolding,
        so it's no longer a separate field of the IdInfo.
      
      * Many changes in CoreUnfold, esp in callSiteInline, which is the critical
        function that decides which function to inline.  Lots of comments added!
      
      * exprIsConApp_maybe has moved to CoreUnfold, since it's so strongly
        associated with "does this expression unfold to a constructor application".
        It can now do some limited beta reduction too, which Roman found 
        was an important.
      
      Instance declarations
      ~~~~~~~~~~~~~~~~~~~~~
      It's always been tricky to get the dfuns generated from instance
      declarations to work out well.  This is particularly important in 
      the Data Parallel Haskell project, and I'm now on my fourth attempt,
      more or less.
      
      There is a detailed description in TcInstDcls, particularly in
      Note [How instance declarations are translated].   Roughly speaking
      we now generate a top-level helper function for every method definition
      in an instance declaration, so that the dfun takes a particularly
      stylised form:
        dfun a d1 d2 = MkD (op1 a d1 d2) (op2 a d1 d2) ...etc...
      
      In fact, it's *so* stylised that we never need to unfold a dfun.
      Instead ClassOps have a special rewrite rule that allows us to
      short-cut dictionary selection.  Suppose dfun :: Ord a -> Ord [a]
                                                  d :: Ord a
      Then   
          compare (dfun a d)  -->   compare_list a d 
      in one rewrite, without first inlining the 'compare' selector
      and the body of the dfun.
      
      To support this
      a) ClassOps have a BuiltInRule (see MkId.dictSelRule)
      b) DFuns have a special form of unfolding (CoreSyn.DFunUnfolding)
         which is exploited in CoreUnfold.exprIsConApp_maybe
      
      Implmenting all this required a root-and-branch rework of TcInstDcls
      and bits of TcClassDcl.
      
      
      Default methods
      ~~~~~~~~~~~~~~~
      If you give an INLINE pragma to a default method, it should be just
      as if you'd written out that code in each instance declaration, including
      the INLINE pragma.  I think that it now *is* so.  As a result, library
      code can be simpler; less duplication.
      
      
      The CONLIKE pragma
      ~~~~~~~~~~~~~~~~~~
      In the DPH project, Roman found cases where he had
      
         p n k = let x = replicate n k
                 in ...(f x)...(g x)....
      
         {-# RULE f (replicate x) = f_rep x #-}
      
      Normally the RULE would not fire, because doing so involves 
      (in effect) duplicating the redex (replicate n k).  A new
      experimental modifier to the INLINE pragma, {-# INLINE CONLIKE
      replicate #-}, allows you to tell GHC to be prepared to duplicate
      a call of this function if it allows a RULE to fire.
      
      See Note [CONLIKE pragma] in BasicTypes
      
      
      Join points
      ~~~~~~~~~~~
      See Note [Case binders and join points] in Simplify
      
      
      Other refactoring
      ~~~~~~~~~~~~~~~~~
      * I moved endPass from CoreLint to CoreMonad, with associated jigglings
      
      * Better pretty-printing of Core
      
      * The top-level RULES (ones that are not rules for locally-defined things)
        are now substituted on every simplifier iteration.  I'm not sure how
        we got away without doing this before.  This entails a bit more plumbing
        in SimplCore.
      
      * The necessary stuff to serialise and deserialise the new
        info across interface files.
      
      * Something about bottoming floats in SetLevels
            Note [Bottoming floats]
      
      * substUnfolding has moved from SimplEnv to CoreSubs, where it belongs
      
      
      --------------------------------------------------------------------------------
              Program           Size    Allocs   Runtime   Elapsed
      --------------------------------------------------------------------------------
                 anna          +2.4%     -0.5%      0.16      0.17
                 ansi          +2.6%     -0.1%      0.00      0.00
                 atom          -3.8%     -0.0%     -1.0%     -2.5%
               awards          +3.0%     +0.7%      0.00      0.00
               banner          +3.3%     -0.0%      0.00      0.00
           bernouilli          +2.7%     +0.0%     -4.6%     -6.9%
                boyer          +2.6%     +0.0%      0.06      0.07
               boyer2          +4.4%     +0.2%      0.01      0.01
                 bspt          +3.2%     +9.6%      0.02      0.02
            cacheprof          +1.4%     -1.0%    -12.2%    -13.6%
             calendar          +2.7%     -1.7%      0.00      0.00
             cichelli          +3.7%     -0.0%      0.13      0.14
              circsim          +3.3%     +0.0%     -2.3%     -9.9%
             clausify          +2.7%     +0.0%      0.05      0.06
        comp_lab_zift          +2.6%     -0.3%     -7.2%     -7.9%
             compress          +3.3%     +0.0%     -8.5%     -9.6%
            compress2          +3.6%     +0.0%    -15.1%    -17.8%
          constraints          +2.7%     -0.6%    -10.0%    -10.7%
         cryptarithm1          +4.5%     +0.0%     -4.7%     -5.7%
         cryptarithm2          +4.3%    -14.5%      0.02      0.02
                  cse          +4.4%     -0.0%      0.00      0.00
                eliza          +2.8%     -0.1%      0.00      0.00
                event          +2.6%     -0.0%     -4.9%     -4.4%
               exp3_8          +2.8%     +0.0%     -4.5%     -9.5%
               expert          +2.7%     +0.3%      0.00      0.00
                  fem          -2.0%     +0.6%      0.04      0.04
                  fft          -6.0%     +1.8%      0.05      0.06
                 fft2          -4.8%     +2.7%      0.13      0.14
             fibheaps          +2.6%     -0.6%      0.05      0.05
                 fish          +4.1%     +0.0%      0.03      0.04
                fluid          -2.1%     -0.2%      0.01      0.01
               fulsom          -4.8%     +9.2%     +9.1%     +8.4%
               gamteb          -7.1%     -1.3%      0.10      0.11
                  gcd          +2.7%     +0.0%      0.05      0.05
          gen_regexps          +3.9%     -0.0%      0.00      0.00
               genfft          +2.7%     -0.1%      0.05      0.06
                   gg          -2.7%     -0.1%      0.02      0.02
                 grep          +3.2%     -0.0%      0.00      0.00
               hidden          -0.5%     +0.0%    -11.9%    -13.3%
                  hpg          -3.0%     -1.8%     +0.0%     -2.4%
                  ida          +2.6%     -1.2%      0.17     -9.0%
                infer          +1.7%     -0.8%      0.08      0.09
              integer          +2.5%     -0.0%     -2.6%     -2.2%
            integrate          -5.0%     +0.0%     -1.3%     -2.9%
              knights          +4.3%     -1.5%      0.01      0.01
                 lcss          +2.5%     -0.1%     -7.5%     -9.4%
                 life          +4.2%     +0.0%     -3.1%     -3.3%
                 lift          +2.4%     -3.2%      0.00      0.00
            listcompr          +4.0%     -1.6%      0.16      0.17
             listcopy          +4.0%     -1.4%      0.17      0.18
             maillist          +4.1%     +0.1%      0.09      0.14
               mandel          +2.9%     +0.0%      0.11      0.12
              mandel2          +4.7%     +0.0%      0.01      0.01
              minimax          +3.8%     -0.0%      0.00      0.00
              mkhprog          +3.2%     -4.2%      0.00      0.00
           multiplier          +2.5%     -0.4%     +0.7%     -1.3%
             nucleic2          -9.3%     +0.0%      0.10      0.10
                 para          +2.9%     +0.1%     -0.7%     -1.2%
            paraffins         -10.4%     +0.0%      0.20     -1.9%
               parser          +3.1%     -0.0%      0.05      0.05
              parstof          +1.9%     -0.0%      0.00      0.01
                  pic          -2.8%     -0.8%      0.01      0.02
                power          +2.1%     +0.1%     -8.5%     -9.0%
               pretty         -12.7%     +0.1%      0.00      0.00
               primes          +2.8%     +0.0%      0.11      0.11
            primetest          +2.5%     -0.0%     -2.1%     -3.1%
               prolog          +3.2%     -7.2%      0.00      0.00
               puzzle          +4.1%     +0.0%     -3.5%     -8.0%
               queens          +2.8%     +0.0%      0.03      0.03
              reptile          +2.2%     -2.2%      0.02      0.02
              rewrite          +3.1%    +10.9%      0.03      0.03
                 rfib          -5.2%     +0.2%      0.03      0.03
                  rsa          +2.6%     +0.0%      0.05      0.06
                  scc          +4.6%     +0.4%      0.00      0.00
                sched          +2.7%     +0.1%      0.03      0.03
                  scs          -2.6%     -0.9%     -9.6%    -11.6%
               simple          -4.0%     +0.4%    -14.6%    -14.9%
                solid          -5.6%     -0.6%     -9.3%    -14.3%
              sorting          +3.8%     +0.0%      0.00      0.00
               sphere          -3.6%     +8.5%      0.15      0.16
               symalg          -1.3%     +0.2%      0.03      0.03
                  tak          +2.7%     +0.0%      0.02      0.02
            transform          +2.0%     -2.9%     -8.0%     -8.8%
             treejoin          +3.1%     +0.0%    -17.5%    -17.8%
            typecheck          +2.9%     -0.3%     -4.6%     -6.6%
              veritas          +3.9%     -0.3%      0.00      0.00
                 wang          -6.2%     +0.0%      0.18     -9.8%
            wave4main         -10.3%     +2.6%     -2.1%     -2.3%
         wheel-sieve1          +2.7%     -0.0%     +0.3%     -0.6%
         wheel-sieve2          +2.7%     +0.0%     -3.7%     -7.5%
                 x2n1          -4.1%     +0.1%      0.03      0.04
      --------------------------------------------------------------------------------
                  Min         -12.7%    -14.5%    -17.5%    -17.8%
                  Max          +4.7%    +10.9%     +9.1%     +8.4%
       Geometric Mean          +0.9%     -0.1%     -5.6%     -7.3%
      72462499
  4. 13 Jul, 2009 2 commits
  5. 07 Mar, 2009 1 commit
  6. 20 Sep, 2008 1 commit
    • simonpj@microsoft.com's avatar
      Tidy up the treatment of dead binders · 7e8cba32
      simonpj@microsoft.com authored
      This patch does a lot of tidying up of the way that dead variables are
      handled in Core.  Just the sort of thing to do on an aeroplane.
      
      * The tricky "binder-swap" optimisation is moved from the Simplifier
        to the Occurrence Analyser.  See Note [Binder swap] in OccurAnal.
        This is really a nice change.  It should reduce the number of
        simplifier iteratoins (slightly perhaps).  And it means that
        we can be much less pessimistic about zapping occurrence info
        on binders in a case expression.  
      
      * For example:
      	case x of y { (a,b) -> e }
        Previously, each time around, even if y,a,b were all dead, the
        Simplifier would pessimistically zap their OccInfo, so that we
        can't see they are dead any more.  As a result virtually no 
        case expression ended up with dead binders.  This wasn't Bad
        in itself, but it always felt wrong.
      
      * I added a check to CoreLint to check that a dead binder really
        isn't used.  That showed up a couple of bugs in CSE. (Only in
        this sense -- they didn't really matter.)
        
      * I've changed the PprCore printer to print "_" for a dead variable.
        (Use -dppr-debug to see it again.)  This reduces clutter quite a
        bit, and of course it's much more useful with the above change.
      
      * Another benefit of the binder-swap change is that I could get rid of
        the Simplifier hack (working, but hacky) in which the InScopeSet was
        used to map a variable to a *different* variable. That allowed me
        to remove VarEnv.modifyInScopeSet, and to simplify lookupInScopeSet
        so that it doesn't look for a fixpoint.  This fixes no bugs, but 
        is a useful cleanup.
      
      * Roman pointed out that Id.mkWildId is jolly dangerous, because
        of its fixed unique.  So I've 
      
           - localied it to MkCore, where it is private (not exported)
      
           - renamed it to 'mkWildBinder' to stress that you should only
             use it at binding sites, unless you really know what you are
             doing
      
           - provided a function MkCore.mkWildCase that emodies the most
             common use of mkWildId, and use that elsewhere
      
         So things are much better
      
      * A knock-on change is that I found a common pattern of localising
        a potentially global Id, and made a function for it: Id.localiseId
      7e8cba32
  7. 29 Mar, 2008 1 commit
  8. 03 Feb, 2008 1 commit
  9. 04 Sep, 2007 1 commit
  10. 03 Sep, 2007 1 commit
  11. 01 Sep, 2007 1 commit
  12. 08 Aug, 2007 1 commit
  13. 01 Aug, 2007 2 commits
  14. 31 Jul, 2007 2 commits