1. 18 Jan, 2016 1 commit
    • Simon Peyton Jones's avatar
      Refactoring on IdInfo and system derived names · ec8a188a
      Simon Peyton Jones authored
      Some modest refactoring, triggered in part by Trac #11051
      
      * Kill off PatSynId, ReflectionId in IdDetails
        They were barely used, and only for pretty-printing
      
      * Add helper function Id.mkExportedVanillaId, and use it
      
      * Polish up OccName.isDerivedOccName, as a predicate for
        definitions generated internally by GHC, which we
        might not want to show to the user.
      
      * Kill off unused OccName.mkDerivedTyConOcc
      
      * Shorten the derived OccNames for newtype and data
        instance axioms
      
      * A bit of related refactoring around newFamInstAxiomName
      ec8a188a
  2. 07 Jan, 2016 1 commit
  3. 11 Dec, 2015 1 commit
    • eir@cis.upenn.edu's avatar
      Add kind equalities to GHC. · 67465497
      eir@cis.upenn.edu authored
      This implements the ideas originally put forward in
      "System FC with Explicit Kind Equality" (ICFP'13).
      
      There are several noteworthy changes with this patch:
       * We now have casts in types. These change the kind
         of a type. See new constructor `CastTy`.
      
       * All types and all constructors can be promoted.
         This includes GADT constructors. GADT pattern matches
         take place in type family equations. In Core,
         types can now be applied to coercions via the
         `CoercionTy` constructor.
      
       * Coercions can now be heterogeneous, relating types
         of different kinds. A coercion proving `t1 :: k1 ~ t2 :: k2`
         proves both that `t1` and `t2` are the same and also that
         `k1` and `k2` are the same.
      
       * The `Coercion` type has been significantly enhanced.
         The documentation in `docs/core-spec/core-spec.pdf` reflects
         the new reality.
      
       * The type of `*` is now `*`. No more `BOX`.
      
       * Users can write explicit kind variables in their code,
         anywhere they can write type variables. For backward compatibility,
         automatic inference of kind-variable binding is still permitted.
      
       * The new extension `TypeInType` turns on the new user-facing
         features.
      
       * Type families and synonyms are now promoted to kinds. This causes
         trouble with parsing `*`, leading to the somewhat awkward new
         `HsAppsTy` constructor for `HsType`. This is dispatched with in
         the renamer, where the kind `*` can be told apart from a
         type-level multiplication operator. Without `-XTypeInType` the
         old behavior persists. With `-XTypeInType`, you need to import
         `Data.Kind` to get `*`, also known as `Type`.
      
       * The kind-checking algorithms in TcHsType have been significantly
         rewritten to allow for enhanced kinds.
      
       * The new features are still quite experimental and may be in flux.
      
       * TODO: Several open tickets: #11195, #11196, #11197, #11198, #11203.
      
       * TODO: Update user manual.
      
      Tickets addressed: #9017, #9173, #7961, #10524, #8566, #11142.
      Updates Haddock submodule.
      67465497
  4. 21 Nov, 2015 1 commit
    • niteria's avatar
      Create a deterministic version of tyVarsOfType · 2325bd4e
      niteria authored
      I've run into situations where I need deterministic `tyVarsOfType` and
      this implementation achieves that and also brings an algorithmic
      improvement.  Union of two `VarSet`s takes linear time the size of the
      sets and in the worst case we can have `n` unions of sets of sizes
      `(n-1, 1), (n-2, 1)...` making it quadratic.
      
      One reason why we need deterministic `tyVarsOfType` is in `abstractVars`
      in `SetLevels`. When we abstract type variables when floating we want
      them to be abstracted in deterministic order.
      
      Test Plan: harbormaster
      
      Reviewers: simonpj, goldfire, austin, hvr, simonmar, bgamari
      
      Reviewed By: simonmar
      
      Subscribers: thomie
      
      Differential Revision: https://phabricator.haskell.org/D1468
      
      GHC Trac Issues: #4012
      2325bd4e
  5. 16 Nov, 2015 1 commit
  6. 07 Nov, 2015 1 commit
    • Matthew Pickering's avatar
      Remove PatSynBuilderId · 22080113
      Matthew Pickering authored
      Summary:
      It was only used to pass field labels between the typechecker and
      desugarer. Instead we add an extra field the RecordCon to carry this
      information.
      
      Reviewers: austin, goldfire, bgamari
      
      Reviewed By: bgamari
      
      Subscribers: thomie
      
      Differential Revision: https://phabricator.haskell.org/D1443
      
      GHC Trac Issues: #11057
      22080113
  7. 29 Oct, 2015 1 commit
    • Matthew Pickering's avatar
      Record pattern synonyms · 2a74a64e
      Matthew Pickering authored
      This patch implements an extension to pattern synonyms which allows user
      to specify pattern synonyms using record syntax. Doing so generates
      appropriate selectors and update functions.
      
      === Interaction with Duplicate Record Fields ===
      
      The implementation given here isn't quite as general as it could be with
      respect to the recently-introduced `DuplicateRecordFields` extension.
      
      Consider the following module:
      
          {-# LANGUAGE DuplicateRecordFields #-}
          {-# LANGUAGE PatternSynonyms #-}
      
          module Main where
      
          pattern S{a, b} = (a, b)
          pattern T{a}    = Just a
      
          main = do
            print S{ a = "fst", b = "snd" }
            print T{ a = "a" }
      
      In principle, this ought to work, because there is no ambiguity. But at
      the moment it leads to a "multiple declarations of a" error. The problem
      is that pattern synonym record selectors don't do the same name mangling
      as normal datatypes when DuplicateRecordFields is enabled. They could,
      but this would require some work to track the field label and selector
      name separately.
      
      In particular, we currently represent datatype selectors in the third
      component of AvailTC, but pattern synonym selectors are just represented
      as Avails (because they don't have a corresponding type constructor).
      Moreover, the GlobalRdrElt for a selector currently requires it to have
      a parent tycon.
      
      (example due to Adam Gundry)
      
      === Updating Explicitly Bidirectional Pattern Synonyms ===
      
      Consider the following
      
      ```
      pattern Silly{a} <- [a] where
        Silly a = [a, a]
      
      f1 = a [5] -- 5
      
      f2 = [5] {a = 6} -- currently [6,6]
      ```
      
      === Fixing Polymorphic Updates ===
      
      They were fixed by adding these two lines in `dsExpr`. This might break
      record updates but will be easy to fix.
      
      ```
      + ; let req_wrap = mkWpTyApps (mkTyVarTys univ_tvs)
      
      - , pat_wrap = idHsWrapper }
      +, pat_wrap = req_wrap }
      ```
      
      === Mixed selectors error ===
      
      Note [Mixed Record Field Updates]
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      
      Consider the following pattern synonym.
      
          data MyRec = MyRec { foo :: Int, qux :: String }
      
          pattern HisRec{f1, f2} = MyRec{foo = f1, qux=f2}
      
      This allows updates such as the following
      
          updater :: MyRec -> MyRec
          updater a = a {f1 = 1 }
      
      It would also make sense to allow the following update (which we
      reject).
      
          updater a = a {f1 = 1, qux = "two" } ==? MyRec 1 "two"
      
      This leads to confusing behaviour when the selectors in fact refer the
      same field.
      
          updater a = a {f1 = 1, foo = 2} ==? ???
      
      For this reason, we reject a mixture of pattern synonym and normal
      record selectors in the same update block. Although of course we still
      allow the following.
      
          updater a = (a {f1 = 1}) {foo = 2}
      
          > updater (MyRec 0 "str")
          MyRec 2 "str"
      2a74a64e
  8. 10 Oct, 2015 1 commit
  9. 27 Aug, 2015 1 commit
  10. 01 Jun, 2015 1 commit
  11. 11 May, 2015 1 commit
    • Joachim Breitner's avatar
      IdInfo comment update · 27aa733a
      Joachim Breitner authored
      occInfo and callArityInfo is like demandInfo and oneShotInfo: Data about
      how the Id is used. [skip ci]
      27aa733a
  12. 14 Apr, 2015 1 commit
    • Simon Peyton Jones's avatar
      Zap usage info in CSE (Trac #10218) · d261d4cb
      Simon Peyton Jones authored
      Trac #10218 reports a subtle bug that turned out to be:
      
      - CSE invalidated the usage information computed
        by earlier demand analysis, by increasing sharing
      
      - that made a single-entry thunk into a multi-entry thunk
      
      - and with -feager-blackholing, that led to <<loop>>
      
      The patch fixes it by making the CSE pass zap usage information for
      let-bound identifiers.   It can be restored by -flate-dmd-anal.
      
      (But making -flate-dmd-anal the default needs some careful work;
      see Trac #7782.)
      d261d4cb
  13. 10 Feb, 2015 1 commit
  14. 23 Dec, 2014 1 commit
    • Simon Peyton Jones's avatar
      Eliminate so-called "silent superclass parameters" · a6f0f5ab
      Simon Peyton Jones authored
      The purpose of silent superclass parameters was to solve the
      awkward problem of superclass dictinaries being bound to bottom.
      See THE PROBLEM in Note [Recursive superclasses] in TcInstDcls
      
      Although the silent-superclass idea worked,
      
        * It had non-local consequences, and had effects even in Haddock,
          where we had to discard silent parameters before displaying
          instance declarations
      
        * It had unexpected peformance costs, shown up by Trac #3064 and its
          test case.  In monad-transformer code, when constructing a Monad
          dictionary you had to pass an Applicative dictionary; and to
          construct that you neede a Functor dictionary. Yet these extra
          dictionaries were often never used.  (All this got much worse when
          we added Applicative as a superclass of Monad.) Test T3064
          compiled *far* faster after silent superclasses were eliminated.
      
        * It introduced new bugs.  For example SilentParametersOverlapping,
          T5051, and T7862, all failed to compile because of instance overlap
          directly because of the silent-superclass trick.
      
      So this patch takes a new approach, which I worked out with Dimitrios
      in the closing hours before Christmas.  It is described in detail
      in THE PROBLEM in Note [Recursive superclasses] in TcInstDcls.
      
      Seems to work great!
      
      Quite a bit of knock-on effect
      
       * The main implementation work is in tcSuperClasses in TcInstDcls
         Everything else is fall-out
      
       * IdInfo.DFunId no longer needs its n-silent argument
         * Ditto IDFunId in IfaceSyn
         * Hence interface file format changes
      
       * Now that DFunIds do not have silent superclass parameters, printing
         out instance declarations is simpler. There is tiny knock-on effect
         in Haddock, so that submodule is updated
      
       * I realised that when computing the "size of a dictionary type"
         in TcValidity.sizePred, we should be rather conservative about
         type functions, which can arbitrarily increase the size of a type.
         Hence the new datatype TypeSize, which has a TSBig constructor for
         "arbitrarily big".
      
       * instDFunType moves from TcSMonad to Inst
      
       * Interestingly, CmmNode and CmmExpr both now need a non-silent
         (Ord r) in a couple of instance declarations. These were previously
         silent but must now be explicit.
      
       * Quite a bit of wibbling in error messages
      a6f0f5ab
  15. 03 Dec, 2014 1 commit
  16. 20 Aug, 2014 1 commit
  17. 15 May, 2014 1 commit
    • Herbert Valerio Riedel's avatar
      Add LANGUAGE pragmas to compiler/ source files · 23892440
      Herbert Valerio Riedel authored
      In some cases, the layout of the LANGUAGE/OPTIONS_GHC lines has been
      reorganized, while following the convention, to
      
      - place `{-# LANGUAGE #-}` pragmas at the top of the source file, before
        any `{-# OPTIONS_GHC #-}`-lines.
      
      - Moreover, if the list of language extensions fit into a single
        `{-# LANGUAGE ... -#}`-line (shorter than 80 characters), keep it on one
        line. Otherwise split into `{-# LANGUAGE ... -#}`-lines for each
        individual language extension. In both cases, try to keep the
        enumeration alphabetically ordered.
        (The latter layout is preferable as it's more diff-friendly)
      
      While at it, this also replaces obsolete `{-# OPTIONS ... #-}` pragma
      occurences by `{-# OPTIONS_GHC ... #-}` pragmas.
      23892440
  18. 10 Feb, 2014 1 commit
    • Joachim Breitner's avatar
      Implement CallArity analysis · cdceadf3
      Joachim Breitner authored
      This analysis finds out if a let-bound expression with lower manifest
      arity than type arity is always called with more arguments, as in that
      case eta-expansion is allowed and often viable. The analysis is very
      much tailored towards the code generated when foldl is implemented via
      foldr; without this analysis doing so would be a very bad idea!
      
      There are other ways to improve foldr/builder-fusion to cope with foldl,
      if any of these are implemented then this step can probably be moved to
      -O2 to save some compilation times. The current impact of adding this
      phase is just below +2% (measured running GHC's "make").
      cdceadf3
  19. 12 Dec, 2013 1 commit
    • Simon Peyton Jones's avatar
      Improve the handling of used-once stuff · 80989de9
      Simon Peyton Jones authored
      Joachim and I are committing this onto a branch so that we can share it,
      but we expect to do a bit more work before merging it onto head.
      
      Nofib staus:
        - Most programs, no change
        - A few improve
        - A couple get worse (cacheprof, tak, rfib)
      Investigating the "get worse" set is what's holding up putting this
      on head.
      
      The major issue is this.  Consider
      
          map (f g) ys
      
      where f's demand signature looks like
      
         f :: <L,C1(C1(U))> -> <L,U> -> .
      
      So 'f' is not saturated.  What demand do we place on g?
      Answer
              C(C1(U))
      That is, the inner C1 should stay, even though f is not saturated.
      
      I found that this made a significant difference in the demand signatures
      inferred in GHC.IO, which uses lots of higher-order exception handlers.
      
      I also had to add used-once demand signatures for some of the
      'catch' primops, so that we know their handlers are only called once.
      80989de9
  20. 09 Dec, 2013 1 commit
    • Joachim Breitner's avatar
      Rename topDmdType to nopDmdType · f64cf134
      Joachim Breitner authored
      because topDmdType is ''not'' the top of the lattice, as it puts an
      implicit absent demand on free variables, but Abs is the bottom of the
      Usage lattice.
      
      Why nopDmdType? Becuase it is the demand of doing nothing: Everything
      lazy, everything absent, no definite divergence.
      f64cf134
  21. 01 Oct, 2013 1 commit
  22. 02 Feb, 2013 1 commit
  23. 17 Jan, 2013 1 commit
    • Simon Peyton Jones's avatar
      Major patch to implement the new Demand Analyser · 0831a12e
      Simon Peyton Jones authored
      This patch is the result of Ilya Sergey's internship at MSR.  It
      constitutes a thorough overhaul and simplification of the demand
      analyser.  It makes a solid foundation on which we can now build.
      Main changes are
      
      * Instead of having one combined type for Demand, a Demand is
         now a pair (JointDmd) of
            - a StrDmd and
            - an AbsDmd.
         This allows strictness and absence to be though about quite
         orthogonally, and greatly reduces brain melt-down.
      
      * Similarly in the DmdResult type, it's a pair of
           - a PureResult (indicating only divergence/non-divergence)
           - a CPRResult (which deals only with the CPR property
      
      * In IdInfo, the
          strictnessInfo field contains a StrictSig, not a Maybe StrictSig
          demandInfo     field contains a Demand, not a Maybe Demand
        We don't need Nothing (to indicate no strictness/demand info)
        any more; topSig/topDmd will do.
      
      * Remove "boxity" analysis entirely.  This was an attempt to
        avoid "reboxing", but it added complexity, is extremely
        ad-hoc, and makes very little difference in practice.
      
      * Remove the "unboxing strategy" computation. This was an an
        attempt to ensure that a worker didn't get zillions of
        arguments by unboxing big tuples.  But in fact removing it
        DRAMATICALLY reduces allocation in an inner loop of the
        I/O library (where the threshold argument-count had been
        set just too low).  It's exceptional to have a zillion arguments
        and I don't think it's worth the complexity, especially since
        it turned out to have a serious performance hit.
      
      * Remove quite a bit of ad-hoc cruft
      
      * Move worthSplittingFun, worthSplittingThunk from WorkWrap to
        Demand. This allows JointDmd to be fully abstract, examined
        only inside Demand.
      
      Everything else really follows from these changes.
      
      All of this is really just refactoring, so we don't expect
      big performance changes, but acutally the numbers look quite
      good.  Here is a full nofib run with some highlights identified:
      
              Program           Size    Allocs   Runtime   Elapsed  TotalMem
      --------------------------------------------------------------------------------
               expert          -2.6%    -15.5%      0.00      0.00     +0.0%
                fluid          -2.4%     -7.1%      0.01      0.01     +0.0%
                   gg          -2.5%    -28.9%      0.02      0.02    -33.3%
            integrate          -2.6%     +3.2%     +2.6%     +2.6%     +0.0%
              mandel2          -2.6%     +4.2%      0.01      0.01     +0.0%
             nucleic2          -2.0%    -16.3%      0.11      0.11     +0.0%
                 para          -2.6%    -20.0%    -11.8%    -11.7%     +0.0%
               parser          -2.5%    -17.9%      0.05      0.05     +0.0%
               prolog          -2.6%    -13.0%      0.00      0.00     +0.0%
               puzzle          -2.6%     +2.2%     +0.8%     +0.8%     +0.0%
              sorting          -2.6%    -35.9%      0.00      0.00     +0.0%
             treejoin          -2.6%    -52.2%     -9.8%     -9.9%     +0.0%
      --------------------------------------------------------------------------------
                  Min          -2.7%    -52.2%    -11.8%    -11.7%    -33.3%
                  Max          -1.8%     +4.2%    +10.5%    +10.5%     +7.7%
       Geometric Mean          -2.5%     -2.8%     -0.4%     -0.5%     -0.4%
      
      Things to note
      
      * Binary sizes are smaller. I don't know why, but it's good.
      
      * Allocation is sometiemes a *lot* smaller. I believe that all the big numbers
        (I checked treejoin, gg, sorting) arise from one place, namely a function
        GHC.IO.Encoding.UTF8.utf8_decode, which is strict in two Buffers both of
        which have several arugments.  Not w/w'ing both arguments (which is what
        we did before) has a big effect.  So the big win in actually somewhat
        accidental, gained by removing the "unboxing strategy" code.
      
      * A couple of benchmarks allocate slightly more.  This turns out
        to be due to reboxing (integrate).  But the biggest increase is
        mandel2, and *that* turned out also to be a somewhat accidental
        loss of CSE, and pointed the way to doing better CSE: see Trac
        #7596.
      
      * Runtimes are never very reliable, but seem to improve very slightly.
      
      All in all, a good piece of work.  Thank you Ilya!
      0831a12e
  24. 01 Oct, 2012 1 commit
  25. 27 Jun, 2012 2 commits
    • Ian Lynagh's avatar
      Add some more Integer rules; fixes #6111 · c7a8941b
      Ian Lynagh authored
      c7a8941b
    • Simon Peyton Jones's avatar
      Add silent superclass parameters (again) · aa1e0976
      Simon Peyton Jones authored
      Silent superclass parameters solve the problem that
      the superclasses of a dicionary construction can easily
      turn out to be (wrongly) bottom.  The problem and solution
      are described in
         Note [Silent superclass arguments] in TcInstDcls
      
      I first implemented this fix (with Dimitrios) in Dec 2010, but removed
      it again in Jun 2011 becuase we thought it wasn't necessary any
      more. (The reason we thought it wasn't necessary is that we'd stopped
      generating derived superclass constraints for *wanteds*.  But we were
      wrong; that didn't solve the superclass-loop problem.)
      
      So we have to re-implement it.  It's not hard.  Main features:
      
        * The IdDetails for a DFunId says how many silent arguments it has
      
        * A DFunUnfolding describes which dictionary args are
          just parameters (DFunLamArg) and which are a function to apply
          to the parameters (DFunPolyArg).  This adds the DFunArg type
          to CoreSyn
      
        * Consequential changes to IfaceSyn.  (Binary hi file format changes
          slightly.)
      
        * TcInstDcls changes to generate the right dfuns
      
        * CoreSubst.exprIsConApp_maybe handles the new DFunUnfolding
      
      The thing taht is *not* done yet is to alter the vectoriser to
      pass the relevant extra argument when building a PA dictionary.
      aa1e0976
  26. 12 Jun, 2012 1 commit
  27. 04 Nov, 2011 1 commit
  28. 21 Jul, 2011 1 commit
    • Simon Peyton Jones's avatar
      Change loop breaker terminology · e815d4b1
      Simon Peyton Jones authored
      We used to have "loop breaker" and "non-rule loop breaker", but
      the unqualified version in particualr was pretty confusing.  So
      now we have "strong loop breaker" and "weak loop breaker";
      comments in BasicTypes and OccurAnal.
      e815d4b1
  29. 30 Jun, 2011 1 commit
  30. 22 Jun, 2011 1 commit
    • Simon Peyton Jones's avatar
      Remove "silent superclass parameters" · a9d48fd9
      Simon Peyton Jones authored
      We introduced silent superclass parameters as a way to avoid
      superclass loops, but we now solve that problem a different
      way ("derived" superclass constraints carry no evidence). So
      they aren't needed any more.
      
      Apart from being a needless complication, they broke DoCon.
      Admittedly in a very obscure way, but still the result is
      hard to explain. To see the details see Trac #5051, with
      test case typecheck/should_compile/T5051.  (The test is
      nice and small!)
      a9d48fd9
  31. 19 Apr, 2011 1 commit
    • Simon Peyton Jones's avatar
      This BIG PATCH contains most of the work for the New Coercion Representation · fdf86568
      Simon Peyton Jones authored
      See the paper "Practical aspects of evidence based compilation in System FC"
      
      * Coercion becomes a data type, distinct from Type
      
      * Coercions become value-level things, rather than type-level things,
        (although the value is zero bits wide, like the State token)
        A consequence is that a coerion abstraction increases the arity by 1
        (just like a dictionary abstraction)
      
      * There is a new constructor in CoreExpr, namely Coercion, to inject
        coercions into terms
      fdf86568
  32. 19 Feb, 2011 1 commit
  33. 13 Dec, 2010 1 commit
    • simonpj@microsoft.com's avatar
      Fix recursive superclasses (again). Fixes Trac #4809. · a3bab050
      simonpj@microsoft.com authored
      This patch finally deals with the super-delicate question of
      superclases in possibly-recursive dictionaries.  The key idea
      is the DFun Superclass Invariant (see TcInstDcls):
      
           In the body of a DFun, every superclass argument to the
           returned dictionary is
             either   * one of the arguments of the DFun,
             or       * constant, bound at top level
      
      To establish the invariant, we add new "silent" superclass
      argument(s) to each dfun, so that the dfun does not do superclass
      selection internally.  There's a bit of hoo-ha to make sure that
      we don't print those silent arguments in error messages; a knock
      on effect was a change in interface-file format.
      
      A second change is that instead of the complex and fragile
      "self dictionary binding" in TcInstDcls and TcClassDcl,
      using the same mechanism for existential pattern bindings.
      See Note [Subtle interaction of recursion and overlap] in TcInstDcls
      and Note [Binding when looking up instances] in InstEnv.
      
      Main notes are here:
      
        * Note [Silent Superclass Arguments] in TcInstDcls,
          including the DFun Superclass Invariant
      
      Main code changes are:
      
        * The code for MkId.mkDictFunId and mkDictFunTy
      
        * DFunUnfoldings get a little more complicated;
          their arguments are a new type DFunArg (in CoreSyn)
      
        * No "self" argument in tcInstanceMethod
        * No special tcSimplifySuperClasss
        * No "dependents" argument to EvDFunApp
      
      IMPORTANT
         It turns out that it's quite tricky to generate the right
         DFunUnfolding for a specialised dfun, when you use SPECIALISE
         INSTANCE.  For now I've just commented it out (in DsBinds) but
         that'll lose some optimisation, and I need to get back to
         this.
      a3bab050
  34. 20 Oct, 2010 2 commits
  35. 19 Oct, 2010 1 commit
  36. 19 Nov, 2009 2 commits
    • simonpj@microsoft.com's avatar
      Remove the (very) old strictness analyser · 2662dbc5
      simonpj@microsoft.com authored
      I finally got tired of the #ifdef OLD_STRICTNESS stuff.  I had been
      keeping it around in the hope of doing old-to-new comparisions, but
      have failed to do so for many years, so I don't think it's going to
      happen.  This patch deletes the clutter.
      2662dbc5
    • simonpj@microsoft.com's avatar
      Implement -fexpose-all-unfoldings, and fix a non-termination bug · 6a944ae7
      simonpj@microsoft.com authored
      The -fexpose-all-unfoldings flag arranges to put unfoldings for *everything*
      in the interface file.  Of course,  this makes the file a lot bigger, but
      it also makes it complete, and that's great for supercompilation; or indeed
      any whole-program work.
      
      Consequences:
        * Interface files need to record loop-breaker-hood.  (Previously,
          loop breakers were never exposed, so that info wasn't necessary.)
          Hence a small interface file format change. 
      
        * When inlining, must check loop-breaker-hood. (Previously, loop
          breakers didn't have an unfolding at all, so no need to check.)
      
        * Ditto in exprIsConApp_maybe.  Roman actually tripped this bug, 
          because a DFun, which had an unfolding, was also a loop breaker
      
        * TidyPgm.tidyIdInfo must be careful to preserve loop-breaker-hood
      
      So Id.idUnfolding checks for loop-breaker-hood and returns NoUnfolding
      if so. When you want the unfolding regardless of loop-breaker-hood, 
      use Id.realIdUnfolding.
      
      I have not documented the flag yet, because it's experimental.  Nor
      have I tested it thoroughly.  But with the flag off (the normal case)
      everything should work.
      6a944ae7
  37. 29 Oct, 2009 1 commit
    • simonpj@microsoft.com's avatar
      The Big INLINE Patch: totally reorganise way that INLINE pragmas work · 72462499
      simonpj@microsoft.com authored
      This patch has been a long time in gestation and has, as a
      result, accumulated some extra bits and bobs that are only
      loosely related.  I separated the bits that are easy to split
      off, but the rest comes as one big patch, I'm afraid.
      
      Note that:
       * It comes together with a patch to the 'base' library
       * Interface file formats change slightly, so you need to
         recompile all libraries
      
      The patch is mainly giant tidy-up, driven in part by the
      particular stresses of the Data Parallel Haskell project. I don't
      expect a big performance win for random programs.  Still, here are the
      nofib results, relative to the state of affairs without the patch
      
              Program           Size    Allocs   Runtime   Elapsed
      --------------------------------------------------------------------------------
                  Min         -12.7%    -14.5%    -17.5%    -17.8%
                  Max          +4.7%    +10.9%     +9.1%     +8.4%
       Geometric Mean          +0.9%     -0.1%     -5.6%     -7.3%
      
      The +10.9% allocation outlier is rewrite, which happens to have a
      very delicate optimisation opportunity involving an interaction
      of CSE and inlining (see nofib/Simon-nofib-notes). The fact that
      the 'before' case found the optimisation is somewhat accidental.
      Runtimes seem to go down, but I never kno wwhether to really trust
      this number.  Binary sizes wobble a bit, but nothing drastic.
      
      
      The Main Ideas are as follows.
      
      InlineRules
      ~~~~~~~~~~~
      When you say 
            {-# INLINE f #-}
            f x = <rhs>
      you intend that calls (f e) are replaced by <rhs>[e/x] So we
      should capture (\x.<rhs>) in the Unfolding of 'f', and never meddle
      with it.  Meanwhile, we can optimise <rhs> to our heart's content,
      leaving the original unfolding intact in Unfolding of 'f'.
      
      So the representation of an Unfolding has changed quite a bit
      (see CoreSyn).  An INLINE pragma gives rise to an InlineRule 
      unfolding.  
      
      Moreover, it's only used when 'f' is applied to the
      specified number of arguments; that is, the number of argument on 
      the LHS of the '=' sign in the original source definition. 
      For example, (.) is now defined in the libraries like this
         {-# INLINE (.) #-}
         (.) f g = \x -> f (g x)
      so that it'll inline when applied to two arguments. If 'x' appeared
      on the left, thus
         (.) f g x = f (g x)
      it'd only inline when applied to three arguments.  This slightly-experimental
      change was requested by Roman, but it seems to make sense.
      
      Other associated changes
      
      * Moving the deck chairs in DsBinds, which processes the INLINE pragmas
      
      * In the old system an INLINE pragma made the RHS look like
         (Note InlineMe <rhs>)
        The Note switched off optimisation in <rhs>.  But it was quite
        fragile in corner cases. The new system is more robust, I believe.
        In any case, the InlineMe note has disappeared 
      
      * The workerInfo of an Id has also been combined into its Unfolding,
        so it's no longer a separate field of the IdInfo.
      
      * Many changes in CoreUnfold, esp in callSiteInline, which is the critical
        function that decides which function to inline.  Lots of comments added!
      
      * exprIsConApp_maybe has moved to CoreUnfold, since it's so strongly
        associated with "does this expression unfold to a constructor application".
        It can now do some limited beta reduction too, which Roman found 
        was an important.
      
      Instance declarations
      ~~~~~~~~~~~~~~~~~~~~~
      It's always been tricky to get the dfuns generated from instance
      declarations to work out well.  This is particularly important in 
      the Data Parallel Haskell project, and I'm now on my fourth attempt,
      more or less.
      
      There is a detailed description in TcInstDcls, particularly in
      Note [How instance declarations are translated].   Roughly speaking
      we now generate a top-level helper function for every method definition
      in an instance declaration, so that the dfun takes a particularly
      stylised form:
        dfun a d1 d2 = MkD (op1 a d1 d2) (op2 a d1 d2) ...etc...
      
      In fact, it's *so* stylised that we never need to unfold a dfun.
      Instead ClassOps have a special rewrite rule that allows us to
      short-cut dictionary selection.  Suppose dfun :: Ord a -> Ord [a]
                                                  d :: Ord a
      Then   
          compare (dfun a d)  -->   compare_list a d 
      in one rewrite, without first inlining the 'compare' selector
      and the body of the dfun.
      
      To support this
      a) ClassOps have a BuiltInRule (see MkId.dictSelRule)
      b) DFuns have a special form of unfolding (CoreSyn.DFunUnfolding)
         which is exploited in CoreUnfold.exprIsConApp_maybe
      
      Implmenting all this required a root-and-branch rework of TcInstDcls
      and bits of TcClassDcl.
      
      
      Default methods
      ~~~~~~~~~~~~~~~
      If you give an INLINE pragma to a default method, it should be just
      as if you'd written out that code in each instance declaration, including
      the INLINE pragma.  I think that it now *is* so.  As a result, library
      code can be simpler; less duplication.
      
      
      The CONLIKE pragma
      ~~~~~~~~~~~~~~~~~~
      In the DPH project, Roman found cases where he had
      
         p n k = let x = replicate n k
                 in ...(f x)...(g x)....
      
         {-# RULE f (replicate x) = f_rep x #-}
      
      Normally the RULE would not fire, because doing so involves 
      (in effect) duplicating the redex (replicate n k).  A new
      experimental modifier to the INLINE pragma, {-# INLINE CONLIKE
      replicate #-}, allows you to tell GHC to be prepared to duplicate
      a call of this function if it allows a RULE to fire.
      
      See Note [CONLIKE pragma] in BasicTypes
      
      
      Join points
      ~~~~~~~~~~~
      See Note [Case binders and join points] in Simplify
      
      
      Other refactoring
      ~~~~~~~~~~~~~~~~~
      * I moved endPass from CoreLint to CoreMonad, with associated jigglings
      
      * Better pretty-printing of Core
      
      * The top-level RULES (ones that are not rules for locally-defined things)
        are now substituted on every simplifier iteration.  I'm not sure how
        we got away without doing this before.  This entails a bit more plumbing
        in SimplCore.
      
      * The necessary stuff to serialise and deserialise the new
        info across interface files.
      
      * Something about bottoming floats in SetLevels
            Note [Bottoming floats]
      
      * substUnfolding has moved from SimplEnv to CoreSubs, where it belongs
      
      
      --------------------------------------------------------------------------------
              Program           Size    Allocs   Runtime   Elapsed
      --------------------------------------------------------------------------------
                 anna          +2.4%     -0.5%      0.16      0.17
                 ansi          +2.6%     -0.1%      0.00      0.00
                 atom          -3.8%     -0.0%     -1.0%     -2.5%
               awards          +3.0%     +0.7%      0.00      0.00
               banner          +3.3%     -0.0%      0.00      0.00
           bernouilli          +2.7%     +0.0%     -4.6%     -6.9%
                boyer          +2.6%     +0.0%      0.06      0.07
               boyer2          +4.4%     +0.2%      0.01      0.01
                 bspt          +3.2%     +9.6%      0.02      0.02
            cacheprof          +1.4%     -1.0%    -12.2%    -13.6%
             calendar          +2.7%     -1.7%      0.00      0.00
             cichelli          +3.7%     -0.0%      0.13      0.14
              circsim          +3.3%     +0.0%     -2.3%     -9.9%
             clausify          +2.7%     +0.0%      0.05      0.06
        comp_lab_zift          +2.6%     -0.3%     -7.2%     -7.9%
             compress          +3.3%     +0.0%     -8.5%     -9.6%
            compress2          +3.6%     +0.0%    -15.1%    -17.8%
          constraints          +2.7%     -0.6%    -10.0%    -10.7%
         cryptarithm1          +4.5%     +0.0%     -4.7%     -5.7%
         cryptarithm2          +4.3%    -14.5%      0.02      0.02
                  cse          +4.4%     -0.0%      0.00      0.00
                eliza          +2.8%     -0.1%      0.00      0.00
                event          +2.6%     -0.0%     -4.9%     -4.4%
               exp3_8          +2.8%     +0.0%     -4.5%     -9.5%
               expert          +2.7%     +0.3%      0.00      0.00
                  fem          -2.0%     +0.6%      0.04      0.04
                  fft          -6.0%     +1.8%      0.05      0.06
                 fft2          -4.8%     +2.7%      0.13      0.14
             fibheaps          +2.6%     -0.6%      0.05      0.05
                 fish          +4.1%     +0.0%      0.03      0.04
                fluid          -2.1%     -0.2%      0.01      0.01
               fulsom          -4.8%     +9.2%     +9.1%     +8.4%
               gamteb          -7.1%     -1.3%      0.10      0.11
                  gcd          +2.7%     +0.0%      0.05      0.05
          gen_regexps          +3.9%     -0.0%      0.00      0.00
               genfft          +2.7%     -0.1%      0.05      0.06
                   gg          -2.7%     -0.1%      0.02      0.02
                 grep          +3.2%     -0.0%      0.00      0.00
               hidden          -0.5%     +0.0%    -11.9%    -13.3%
                  hpg          -3.0%     -1.8%     +0.0%     -2.4%
                  ida          +2.6%     -1.2%      0.17     -9.0%
                infer          +1.7%     -0.8%      0.08      0.09
              integer          +2.5%     -0.0%     -2.6%     -2.2%
            integrate          -5.0%     +0.0%     -1.3%     -2.9%
              knights          +4.3%     -1.5%      0.01      0.01
                 lcss          +2.5%     -0.1%     -7.5%     -9.4%
                 life          +4.2%     +0.0%     -3.1%     -3.3%
                 lift          +2.4%     -3.2%      0.00      0.00
            listcompr          +4.0%     -1.6%      0.16      0.17
             listcopy          +4.0%     -1.4%      0.17      0.18
             maillist          +4.1%     +0.1%      0.09      0.14
               mandel          +2.9%     +0.0%      0.11      0.12
              mandel2          +4.7%     +0.0%      0.01      0.01
              minimax          +3.8%     -0.0%      0.00      0.00
              mkhprog          +3.2%     -4.2%      0.00      0.00
           multiplier          +2.5%     -0.4%     +0.7%     -1.3%
             nucleic2          -9.3%     +0.0%      0.10      0.10
                 para          +2.9%     +0.1%     -0.7%     -1.2%
            paraffins         -10.4%     +0.0%      0.20     -1.9%
               parser          +3.1%     -0.0%      0.05      0.05
              parstof          +1.9%     -0.0%      0.00      0.01
                  pic          -2.8%     -0.8%      0.01      0.02
                power          +2.1%     +0.1%     -8.5%     -9.0%
               pretty         -12.7%     +0.1%      0.00      0.00
               primes          +2.8%     +0.0%      0.11      0.11
            primetest          +2.5%     -0.0%     -2.1%     -3.1%
               prolog          +3.2%     -7.2%      0.00      0.00
               puzzle          +4.1%     +0.0%     -3.5%     -8.0%
               queens          +2.8%     +0.0%      0.03      0.03
              reptile          +2.2%     -2.2%      0.02      0.02
              rewrite          +3.1%    +10.9%      0.03      0.03
                 rfib          -5.2%     +0.2%      0.03      0.03
                  rsa          +2.6%     +0.0%      0.05      0.06
                  scc          +4.6%     +0.4%      0.00      0.00
                sched          +2.7%     +0.1%      0.03      0.03
                  scs          -2.6%     -0.9%     -9.6%    -11.6%
               simple          -4.0%     +0.4%    -14.6%    -14.9%
                solid          -5.6%     -0.6%     -9.3%    -14.3%
              sorting          +3.8%     +0.0%      0.00      0.00
               sphere          -3.6%     +8.5%      0.15      0.16
               symalg          -1.3%     +0.2%      0.03      0.03
                  tak          +2.7%     +0.0%      0.02      0.02
            transform          +2.0%     -2.9%     -8.0%     -8.8%
             treejoin          +3.1%     +0.0%    -17.5%    -17.8%
            typecheck          +2.9%     -0.3%     -4.6%     -6.6%
              veritas          +3.9%     -0.3%      0.00      0.00
                 wang          -6.2%     +0.0%      0.18     -9.8%
            wave4main         -10.3%     +2.6%     -2.1%     -2.3%
         wheel-sieve1          +2.7%     -0.0%     +0.3%     -0.6%
         wheel-sieve2          +2.7%     +0.0%     -3.7%     -7.5%
                 x2n1          -4.1%     +0.1%      0.03      0.04
      --------------------------------------------------------------------------------
                  Min         -12.7%    -14.5%    -17.5%    -17.8%
                  Max          +4.7%    +10.9%     +9.1%     +8.4%
       Geometric Mean          +0.9%     -0.1%     -5.6%     -7.3%
      72462499