1. 13 Feb, 2014 1 commit
  2. 14 Jan, 2014 2 commits
  3. 12 Dec, 2013 1 commit
    • Simon Peyton Jones's avatar
      Improve the handling of used-once stuff · 80989de9
      Simon Peyton Jones authored and Joachim Breitner's avatar Joachim Breitner committed
      Joachim and I are committing this onto a branch so that we can share it,
      but we expect to do a bit more work before merging it onto head.
      
      Nofib staus:
        - Most programs, no change
        - A few improve
        - A couple get worse (cacheprof, tak, rfib)
      Investigating the "get worse" set is what's holding up putting this
      on head.
      
      The major issue is this.  Consider
      
          map (f g) ys
      
      where f's demand signature looks like
      
         f :: <L,C1(C1(U))> -> <L,U> -> .
      
      So 'f' is not saturated.  What demand do we place on g?
      Answer
              C(C1(U))
      That is, the inner C1 should stay, even though f is not saturated.
      
      I found that this made a significant difference in the demand signatures
      inferred in GHC.IO, which uses lots of higher-order exception handlers.
      
      I also had to add used-once demand signatures for some of the
      'catch' primops, so that we know their handlers are only called once.
      80989de9
  4. 09 Dec, 2013 1 commit
  5. 25 Oct, 2013 2 commits
    • Austin Seipp's avatar
      Update documentation regarding SpecConstr. · 32df4290
      Austin Seipp authored
      
      
       * Note new SPEC type in release notes.
       * Document SPEC in the users guide under the documentation for
         -fspec-constr.
       * Clean up comments in SpecConstr regarding the forcing of
         specialisation (see Note [Forcing specialisation].)
      Signed-off-by: default avatarAustin Seipp <austin@well-typed.com>
      32df4290
    • Austin Seipp's avatar
      Make SpecConstr also check for GHC.Types.SPEC · cee3adbc
      Austin Seipp authored
      
      
      SpecConstr has for a while now looked for types with the built
      in ForceSpecConstr annotation, in order to know where to be particularly
      aggressive.
      
      Unfortunately using an annotation has a number of downsides, the most
      prominent two being:
      
        A) ForceSpecConstr is vital for efficiency (even if it's
           a hack), but it means users of it must have GHCI - even though
           stage2 features are not required for anything but the annotation.
      
        B) Any user who might need it (read: vector) has to duplicate the same
           piece of code. In general there are few people actually doing this,
           but it's unclear why they should have to.
      
      This patch makes SpecConstr look for functions applied to the new
      GHC.Types.SPEC type - a copy of the already-extant 'SPEC' type - as well
      as look for annotations, in the stage2 compiler.
      
      In particular, this means `vector` can now be built with a stage1
      compiler, since it no longer depends on stage2 for anything else. This
      is particularly important for e.g. iOS cross-compilers.
      
      This also means we should be able to build `vector` earlier in the build
      process too, but this patch doesn't address that.
      
      This requires an accompanying bump in ghc-prim.
      Signed-off-by: default avatarAustin Seipp <austin@well-typed.com>
      cee3adbc
  6. 30 Aug, 2013 1 commit
  7. 02 Aug, 2013 1 commit
  8. 15 May, 2013 1 commit
    • amosrobinson's avatar
      SpecConstr: seed specialisation of top-level bindings, as with letrecs. · 8a588511
      amosrobinson authored
      When specialising a top-level recursive group, if none of the binders
      are exported then we can start specialising based on the later calls to
      the functions.
      This is instead of creating specialisations based on the RHS of the
      bindings.
      The main benefit of this is that only specialisations that will actually
      be used are created. This saves quite a bit of memory when compiling
      stream-fusion and ForceSpecConstr sort of code.
      
      Nofib has an average allocation and runtime of -0.7%, maximum 2%.
      There are a few with significant decreases in allocation (10 - 20%)
      but, interestingly, those ones seem to have similar runtimes.
      One of these does have a significantly reduced total elapsed time
      though: -38%.
      
      On average the nofib compilation times are the same, but they do vary
      with s.d. of -4 to 4%.
      I think this is acceptable because of the fairly major code blowup fixes
      this has for fusion-style code.
      (In one example, a SpecConstr was previously producing 122,000 term size,
      now only produces 28,000 with the same object code)
      8a588511
  9. 03 May, 2013 1 commit
  10. 11 Apr, 2013 1 commit
    • nfrisby's avatar
      ignore RealWorld in size_expr; flag to keep w/w from creating sharing · af12cf66
      nfrisby authored
      size_expr now ignores RealWorld lambdas, arguments, and applications.
      
      Worker-wrapper previously removed all lambdas from a function, if they
      were all unused. Removing *all* value lambdas is no longer
      allowed. Instead (\_ -> E) will become (\_void -> E), where it used to
      become E. The previous behavior can be recovered via the new
      -ffun-to-thunk flag.
      
      Nofib notables:
      
      ----------------------------------------------------------------
              Program               O2          O2 newly ignoring RealWorld
                                                and not turning function
                                                closures into thunks
      ----------------------------------------------------------------
      
       Allocations
      
        comp_lab_zift            333090392%           -5.0%
      reverse-complem            155188304%           -3.2%
      
              rewrite             15380888%           +4.0%
               boyer2              3901064%           +7.5%
      
      rewrite previously benefited from fortunate LoopBreaker choice that is
      now disrupted.
      
      A function in boyer2 goes from $wonewayunify1 size 700 to size 650,
      thus gets inlined into rewritelemmas, thus exposing a parameter
      scrutinisation, thus allowing SpecConstr, which unfortunately involves
      reboxing.
      
      Run Time
      
       fannkuch-redux                 7.89%          -15.9%
      
                  hpg                 0.25%           +5.6%
                 wang                 0.21%           +5.8%
      
      /shrug
      af12cf66
  11. 28 Mar, 2013 1 commit
  12. 02 Feb, 2013 1 commit
  13. 30 Jan, 2013 1 commit
  14. 24 Jan, 2013 1 commit
    • Simon Peyton Jones's avatar
      Introduce CPR for sum types (Trac #5075) · d3b8991b
      Simon Peyton Jones authored
      The main payload of this patch is to extend CPR so that it
      detects when a function always returns a result constructed
      with the *same* constructor, even if the constructor comes from
      a sum type.  This doesn't matter very often, but it does improve
      some things (results below).
      
      Binary sizes increase a little bit, I think because there are more
      wrappers.  This with -split-objs.  Without split-ojbs binary sizes
      increased by 6% even for HelloWorld.hs.  It's hard to see exactly why,
      but I think it was because System.Posix.Types.o got included in the
      linked binary, whereas it didn't before.
      
              Program           Size    Allocs   Runtime   Elapsed  TotalMem
                fluid          +1.8%     -0.3%      0.01      0.01     +0.0%
                  tak          +2.2%     -0.2%      0.02      0.02     +0.0%
                 ansi          +1.7%     -0.3%      0.00      0.00     +0.0%
            cacheprof          +1.6%     -0.3%     +0.6%     +0.5%     +1.4%
              parstof          +1.4%     -4.4%      0.00      0.00     +0.0%
              reptile          +2.0%     +0.3%      0.02      0.02     +0.0%
      ----------------------------------------------------------------------
                  Min          +1.1%     -4.4%     -4.7%     -4.7%    -15.0%
                  Max          +2.3%     +0.3%     +8.3%     +9.4%    +50.0%
       Geometric Mean          +1.9%     -0.1%     +0.6%     +0.7%     +0.3%
      
      Other things in this commit
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~
      * Got rid of the Lattice class in Demand
      
      * Refactored the way that products and newtypes are
        decomposed (no change in functionality)
      d3b8991b
  15. 17 Jan, 2013 1 commit
    • Simon Peyton Jones's avatar
      Major patch to implement the new Demand Analyser · 0831a12e
      Simon Peyton Jones authored
      This patch is the result of Ilya Sergey's internship at MSR.  It
      constitutes a thorough overhaul and simplification of the demand
      analyser.  It makes a solid foundation on which we can now build.
      Main changes are
      
      * Instead of having one combined type for Demand, a Demand is
         now a pair (JointDmd) of
            - a StrDmd and
            - an AbsDmd.
         This allows strictness and absence to be though about quite
         orthogonally, and greatly reduces brain melt-down.
      
      * Similarly in the DmdResult type, it's a pair of
           - a PureResult (indicating only divergence/non-divergence)
           - a CPRResult (which deals only with the CPR property
      
      * In IdInfo, the
          strictnessInfo field contains a StrictSig, not a Maybe StrictSig
          demandInfo     field contains a Demand, not a Maybe Demand
        We don't need Nothing (to indicate no strictness/demand info)
        any more; topSig/topDmd will do.
      
      * Remove "boxity" analysis entirely.  This was an attempt to
        avoid "reboxing", but it added complexity, is extremely
        ad-hoc, and makes very little difference in practice.
      
      * Remove the "unboxing strategy" computation. This was an an
        attempt to ensure that a worker didn't get zillions of
        arguments by unboxing big tuples.  But in fact removing it
        DRAMATICALLY reduces allocation in an inner loop of the
        I/O library (where the threshold argument-count had been
        set just too low).  It's exceptional to have a zillion arguments
        and I don't think it's worth the complexity, especially since
        it turned out to have a serious performance hit.
      
      * Remove quite a bit of ad-hoc cruft
      
      * Move worthSplittingFun, worthSplittingThunk from WorkWrap to
        Demand. This allows JointDmd to be fully abstract, examined
        only inside Demand.
      
      Everything else really follows from these changes.
      
      All of this is really just refactoring, so we don't expect
      big performance changes, but acutally the numbers look quite
      good.  Here is a full nofib run with some highlights identified:
      
              Program           Size    Allocs   Runtime   Elapsed  TotalMem
      --------------------------------------------------------------------------------
               expert          -2.6%    -15.5%      0.00      0.00     +0.0%
                fluid          -2.4%     -7.1%      0.01      0.01     +0.0%
                   gg          -2.5%    -28.9%      0.02      0.02    -33.3%
            integrate          -2.6%     +3.2%     +2.6%     +2.6%     +0.0%
              mandel2          -2.6%     +4.2%      0.01      0.01     +0.0%
             nucleic2          -2.0%    -16.3%      0.11      0.11     +0.0%
                 para          -2.6%    -20.0%    -11.8%    -11.7%     +0.0%
               parser          -2.5%    -17.9%      0.05      0.05     +0.0%
               prolog          -2.6%    -13.0%      0.00      0.00     +0.0%
               puzzle          -2.6%     +2.2%     +0.8%     +0.8%     +0.0%
              sorting          -2.6%    -35.9%      0.00      0.00     +0.0%
             treejoin          -2.6%    -52.2%     -9.8%     -9.9%     +0.0%
      --------------------------------------------------------------------------------
                  Min          -2.7%    -52.2%    -11.8%    -11.7%    -33.3%
                  Max          -1.8%     +4.2%    +10.5%    +10.5%     +7.7%
       Geometric Mean          -2.5%     -2.8%     -0.4%     -0.5%     -0.4%
      
      Things to note
      
      * Binary sizes are smaller. I don't know why, but it's good.
      
      * Allocation is sometiemes a *lot* smaller. I believe that all the big numbers
        (I checked treejoin, gg, sorting) arise from one place, namely a function
        GHC.IO.Encoding.UTF8.utf8_decode, which is strict in two Buffers both of
        which have several arugments.  Not w/w'ing both arguments (which is what
        we did before) has a big effect.  So the big win in actually somewhat
        accidental, gained by removing the "unboxing strategy" code.
      
      * A couple of benchmarks allocate slightly more.  This turns out
        to be due to reboxing (integrate).  But the biggest increase is
        mandel2, and *that* turned out also to be a somewhat accidental
        loss of CSE, and pointed the way to doing better CSE: see Trac
        #7596.
      
      * Runtimes are never very reliable, but seem to improve very slightly.
      
      All in all, a good piece of work.  Thank you Ilya!
      0831a12e
  16. 02 Nov, 2012 2 commits
  17. 09 Oct, 2012 1 commit
    • ian@well-typed.com's avatar
      Make the opt_UF_* static flags dynamic · 0a768bcb
      ian@well-typed.com authored
      I also removed the default values from the "Discounts and thresholds"
      note: most of them were no longer up-to-date.
      
      Along the way I added FloatSuffix to the argument parser, analogous to
      IntSuffix.
      0a768bcb
  18. 23 Aug, 2012 1 commit
  19. 12 Jun, 2012 1 commit
  20. 02 May, 2012 1 commit
    • Simon Peyton Jones's avatar
      Allow cases with empty alterantives · ac230c5e
      Simon Peyton Jones authored
      This patch allows, for the first time, case expressions with an empty
      list of alternatives. Max suggested the idea, and Trac #6067 showed
      that it is really quite important.
      
      So I've implemented the idea, fixing #6067. Main changes
      
       * See Note [Empty case alternatives] in CoreSyn
      
       * Various foldr1's become foldrs
      
       * IfaceCase does not record the type of the alternatives.
         I added IfaceECase for empty-alternative cases.
      
       * Core Lint does not complain about empty cases
      
       * MkCore.castBottomExpr constructs an empty-alternative case
         expression   (case e of ty {})
      
       * CoreToStg converts '(case e of {})' to just 'e'
      ac230c5e
  21. 09 Nov, 2011 1 commit
  22. 05 Nov, 2011 1 commit
  23. 04 Nov, 2011 1 commit
  24. 02 Nov, 2011 1 commit
    • Simon Marlow's avatar
      Overhaul of infrastructure for profiling, coverage (HPC) and breakpoints · 7bb0447d
      Simon Marlow authored
      User visible changes
      ====================
      
      Profilng
      --------
      
      Flags renamed (the old ones are still accepted for now):
      
        OLD            NEW
        ---------      ------------
        -auto-all      -fprof-auto
        -auto          -fprof-exported
        -caf-all       -fprof-cafs
      
      New flags:
      
        -fprof-auto              Annotates all bindings (not just top-level
                                 ones) with SCCs
      
        -fprof-top               Annotates just top-level bindings with SCCs
      
        -fprof-exported          Annotates just exported bindings with SCCs
      
        -fprof-no-count-entries  Do not maintain entry counts when profiling
                                 (can make profiled code go faster; useful with
                                 heap profiling where entry counts are not used)
      
      Cost-centre stacks have a new semantics, which should in most cases
      result in more useful and intuitive profiles.  If you find this not to
      be the case, please let me know.  This is the area where I have been
      experimenting most, and the current solution is probably not the
      final version, however it does address all the outstanding bugs and
      seems to be better than GHC 7.2.
      
      Stack traces
      ------------
      
      +RTS -xc now gives more information.  If the exception originates from
      a CAF (as is common, because GHC tends to lift exceptions out to the
      top-level), then the RTS walks up the stack and reports the stack in
      the enclosing update frame(s).
      
      Result: +RTS -xc is much more useful now - but you still have to
      compile for profiling to get it.  I've played around a little with
      adding 'head []' to GHC itself, and +RTS -xc does pinpoint the problem
      quite accurately.
      
      I plan to add more facilities for stack tracing (e.g. in GHCi) in the
      future.
      
      Coverage (HPC)
      --------------
      
       * derived instances are now coloured yellow if they weren't used
       * likewise record field names
       * entry counts are more accurate (hpc --fun-entry-count)
       * tab width is now correct (markup was previously off in source with
         tabs)
      
      Internal changes
      ================
      
      In Core, the Note constructor has been replaced by
      
              Tick (Tickish b) (Expr b)
      
      which is used to represent all the kinds of source annotation we
      support: profiling SCCs, HPC ticks, and GHCi breakpoints.
      
      Depending on the properties of the Tickish, different transformations
      apply to Tick.  See CoreUtils.mkTick for details.
      
      Tickets
      =======
      
      This commit closes the following tickets, test cases to follow:
      
        - Close #2552: not a bug, but the behaviour is now more intuitive
          (test is T2552)
      
        - Close #680 (test is T680)
      
        - Close #1531 (test is result001)
      
        - Close #949 (test is T949)
      
        - Close #2466: test case has bitrotted (doesn't compile against current
          version of vector-space package)
      7bb0447d
  25. 08 Sep, 2011 1 commit
  26. 06 Sep, 2011 1 commit
    • batterseapower's avatar
      Implement -XConstraintKind · 9729fe7c
      batterseapower authored
      Basically as documented in http://hackage.haskell.org/trac/ghc/wiki/KindFact,
      this patch adds a new kind Constraint such that:
      
        Show :: * -> Constraint
        (?x::Int) :: Constraint
        (Int ~ a) :: Constraint
      
      And you can write *any* type with kind Constraint to the left of (=>):
      even if that type is a type synonym, type variable, indexed type or so on.
      
      The following (somewhat related) changes are also made:
       1. We now box equality evidence. This is required because we want
          to give (Int ~ a) the *lifted* kind Constraint
       2. For similar reasons, implicit parameters can now only be of
          a lifted kind. (?x::Int#) => ty is now ruled out
       3. Implicit parameter constraints are now allowed in superclasses
          and instance contexts (this just falls out as OK with the new
          constraint solver)
      
      Internally the following major changes were made:
       1. There is now no PredTy in the Type data type. Instead
          GHC checks the kind of a type to figure out if it is a predicate
       2. There is now no AClass TyThing: we represent classes as TyThings
          just as a ATyCon (classes had TyCons anyway)
       3. What used to be (~) is now pretty-printed as (~#). The box
          constructor EqBox :: (a ~# b) -> (a ~ b)
       4. The type LCoercion is used internally in the constraint solver
          and type checker to represent coercions with free variables
          of type (a ~ b) rather than (a ~# b)
      9729fe7c
  27. 05 Sep, 2011 2 commits
  28. 03 Aug, 2011 1 commit
  29. 26 May, 2011 1 commit
    • Simon Peyton Jones's avatar
      Suppress the alarming SpecConstr message for normal users (Trac #5125) · 3664c198
      Simon Peyton Jones authored
      This is the offending message:
        SpecConstr
            Function `$wks2{v s2dJ} [lid]'
              has one call pattern, but the limit is 0
            Use -fspec-constr-count=n to set the bound
            Use -dppr-debug to see specialisations
      
      The message isn't very good, and is for experts only. So now it
      comes out only
          if you build with -DDEBUG
          or you specify -dppr-debug at runtime
      3664c198
  30. 19 Apr, 2011 1 commit
    • Simon Peyton Jones's avatar
      This BIG PATCH contains most of the work for the New Coercion Representation · fdf86568
      Simon Peyton Jones authored
      See the paper "Practical aspects of evidence based compilation in System FC"
      
      * Coercion becomes a data type, distinct from Type
      
      * Coercions become value-level things, rather than type-level things,
        (although the value is zero bits wide, like the State token)
        A consequence is that a coerion abstraction increases the arity by 1
        (just like a dictionary abstraction)
      
      * There is a new constructor in CoreExpr, namely Coercion, to inject
        coercions into terms
      fdf86568
  31. 07 Feb, 2011 1 commit
  32. 03 Feb, 2011 1 commit
    • simonpj@microsoft.com's avatar
      Fix typo in SpecConstr that made it not work at all · 2a130b13
      simonpj@microsoft.com authored
      There was a terrible typo in this patch; I wrote "env"
      instead of "env1".
      
         Mon Jan 31 11:35:29 GMT 2011  simonpj@microsoft.com
           * Improve Simplifier and SpecConstr behaviour
      
      Anyway, this fix is essential to make it work properly.
      Thanks to Max for spotting the problem (again).
      2a130b13
  33. 01 Feb, 2011 1 commit
    • simonpj@microsoft.com's avatar
      Some refactoring of SpecConstr · 8287e232
      simonpj@microsoft.com authored
      This was originally to improve the case when SpecConstr generated a
      function with an unused argument (see Trac #4941), but I ended up
      giving up on that.  But the refactoring is still an improvement.
      
      In particular I got rid of BothOcc, which was unused.
      8287e232
  34. 31 Jan, 2011 1 commit
    • simonpj@microsoft.com's avatar
      Improve Simplifier and SpecConstr behaviour · 70ad6e6a
      simonpj@microsoft.com authored
      Trac #4908 identified a case where SpecConstr wasn't "seeing" a
      specialisation it should easily get.  The solution was simple: see
      Note [Add scrutinee to ValueEnv too] in SpecConstr.
      
      Then it turned out that there was an exactly analogous infelicity in
      the mighty Simplifer too; see Note [Add unfolding for scrutinee] in
      Simplify. This fix is good for Simplify even in the absence of the
      SpecConstr change.  (It arose when I moved the binder- swap stuff to
      OccAnall, not realising that it *remains* valuable to record info
      about the scrutinee of a case expression.  The Note says why.
      
      Together these two changes are unconditionally good.  Better
      simplification, better specialisation. Thank you Max.
      70ad6e6a
  35. 27 Nov, 2010 1 commit
  36. 25 Nov, 2010 1 commit