1. 30 Mar, 2015 1 commit
    • Joachim Breitner's avatar
      Refactor the story around switches (#10137) · de1160be
      Joachim Breitner authored
      This re-implements the code generation for case expressions at the Stg →
      Cmm level, both for data type cases as well as for integral literal
      cases. (Cases on float are still treated as before).
      
      The goal is to allow for fancier strategies in implementing them, for a
      cleaner separation of the strategy from the gritty details of Cmm, and
      to run this later than the Common Block Optimization, allowing for one
      way to attack #10124. The new module CmmSwitch contains a number of
      notes explaining this changes. For example, it creates larger
      consecutive jump tables than the previous code, if possible.
      
      nofib shows little significant overall improvement of runtime. The
      rather large wobbling comes from changes in the code block order
      (see #8082, not much we can do about it). But the decrease in code size
      alone makes this worthwhile.
      
      ```
              Program           Size    Allocs   Runtime   Elapsed  TotalMem
                  Min          -1.8%      0.0%     -6.1%     -6.1%     -2.9%
                  Max          -0.7%     +0.0%     +5.6%     +5.7%     +7.8%
       Geometric Mean          -1.4%     -0.0%     -0.3%     -0.3%     +0.0%
      ```
      
      Compilation time increases slightly:
      ```
              -1 s.d.                -----            -2.0%
              +1 s.d.                -----            +2.5%
              Average                -----            +0.3%
      ```
      
      The test case T783 regresses a lot, but it is the only one exhibiting
      any regression. The cause is the changed order of branches in an
      if-then-else tree, which makes the hoople data flow analysis traverse
      the blocks in a suboptimal order. Reverting that gets rid of this
      regression, but has a consistent, if only very small (+0.2%), negative
      effect on runtime. So I conclude that this test is an extreme outlier
      and no reason to change the code.
      
      Differential Revision: https://phabricator.haskell.org/D720
      de1160be
  2. 23 Dec, 2014 1 commit
    • Simon Peyton Jones's avatar
      Eliminate so-called "silent superclass parameters" · a6f0f5ab
      Simon Peyton Jones authored
      The purpose of silent superclass parameters was to solve the
      awkward problem of superclass dictinaries being bound to bottom.
      See THE PROBLEM in Note [Recursive superclasses] in TcInstDcls
      
      Although the silent-superclass idea worked,
      
        * It had non-local consequences, and had effects even in Haddock,
          where we had to discard silent parameters before displaying
          instance declarations
      
        * It had unexpected peformance costs, shown up by Trac #3064 and its
          test case.  In monad-transformer code, when constructing a Monad
          dictionary you had to pass an Applicative dictionary; and to
          construct that you neede a Functor dictionary. Yet these extra
          dictionaries were often never used.  (All this got much worse when
          we added Applicative as a superclass of Monad.) Test T3064
          compiled *far* faster after silent superclasses were eliminated.
      
        * It introduced new bugs.  For example SilentParametersOverlapping,
          T5051, and T7862, all failed to compile because of instance overlap
          directly because of the silent-superclass trick.
      
      So this patch takes a new approach, which I worked out with Dimitrios
      in the closing hours before Christmas.  It is described in detail
      in THE PROBLEM in Note [Recursive superclasses] in TcInstDcls.
      
      Seems to work great!
      
      Quite a bit of knock-on effect
      
       * The main implementation work is in tcSuperClasses in TcInstDcls
         Everything else is fall-out
      
       * IdInfo.DFunId no longer needs its n-silent argument
         * Ditto IDFunId in IfaceSyn
         * Hence interface file format changes
      
       * Now that DFunIds do not have silent superclass parameters, printing
         out instance declarations is simpler. There is tiny knock-on effect
         in Haddock, so that submodule is updated
      
       * I realised that when computing the "size of a dictionary type"
         in TcValidity.sizePred, we should be rather conservative about
         type functions, which can arbitrarily increase the size of a type.
         Hence the new datatype TypeSize, which has a TSBig constructor for
         "arbitrarily big".
      
       * instDFunType moves from TcSMonad to Inst
      
       * Interestingly, CmmNode and CmmExpr both now need a non-silent
         (Ord r) in a couple of instance declarations. These were previously
         silent but must now be explicit.
      
       * Quite a bit of wibbling in error messages
      a6f0f5ab
  3. 19 Dec, 2014 1 commit
    • Peter Wortmann's avatar
      Some Dwarf generation fixes · f85db756
      Peter Wortmann authored
      - Make abbrev offset absolute on Non-Mac systems
      - Add another termination byte at the end of the abbrev section
        (readelf complains)
      - Scope combination was wrong for the simpler cases
      - Shouldn't have a "global/" in front of all scopes
      f85db756
  4. 16 Dec, 2014 3 commits
    • Peter Wortmann's avatar
      Add unwind information to Cmm · 711a51ad
      Peter Wortmann authored
      Unwind information allows the debugger to discover more information
      about a program state, by allowing it to "reconstruct" other states of
      the program. In practice, this means that we explain to the debugger
      how to unravel stack frames, which comes down mostly to explaining how
      to find their Sp and Ip register values.
      
      * We declare yet another new constructor for CmmNode - and this time
        there's actually little choice, as unwind information can and will
        change mid-block. We don't actually make use of these capabilities,
        and back-end support would be tricky (generate new labels?), but it
        feels like the right way to do it.
      
      * Even though we only use it for Sp so far, we allow CmmUnwind to specify
        unwind information for any register. This is pretty cheap and could
        come in useful in future.
      
      * We allow full CmmExpr expressions for specifying unwind values. The
        advantage here is that we don't have to make up new syntax, and can e.g.
        use the WDS macro directly. On the other hand, the back-end will now
        have to simplify the expression until it can sensibly be converted
        into DWARF byte code - a process which might fail, yielding NCG panics.
        On the other hand, when you're writing Cmm by hand you really ought to
        know what you're doing.
      
      (From Phabricator D169)
      711a51ad
    • Peter Wortmann's avatar
      Tick scopes · 5fecd767
      Peter Wortmann authored
      This patch solves the scoping problem of CmmTick nodes: If we just put
      CmmTicks into blocks we have no idea what exactly they are meant to
      cover.  Here we introduce tick scopes, which allow us to create
      sub-scopes and merged scopes easily.
      
      Notes:
      
      * Given that the code often passes Cmm around "head-less", we have to
        make sure that its intended scope does not get lost. To keep the amount
        of passing-around to a minimum we define a CmmAGraphScoped type synonym
        here that just bundles the scope with a portion of Cmm to be assembled
        later.
      
      * We introduce new scopes at somewhat random places, aligning with
        getCode calls. This works surprisingly well, but we might have to
        add new scopes into the mix later on if we find things too be too
        coarse-grained.
      
      (From Phabricator D169)
      5fecd767
    • Peter Wortmann's avatar
      Source notes (Cmm support) · 7ceaf96f
      Peter Wortmann authored
      This patch adds CmmTick nodes to Cmm code. This is relatively
      straight-forward, but also not very useful, as many blocks will simply
      end up with no annotations whatosever.
      
      Notes:
      
      * We use this design over, say, putting ticks into the entry node of all
        blocks, as it seems to work better alongside existing optimisations.
        Now granted, the reason for this is that currently GHC's main Cmm
        optimisations seem to mainly reorganize and merge code, so this might
        change in the future.
      
      * We have the Cmm parser generate a few source notes as well. This is
        relatively easy to do - worst part is that it complicates the CmmParse
        implementation a bit.
      
      (From Phabricator D169)
      7ceaf96f
  5. 15 May, 2014 1 commit
    • Herbert Valerio Riedel's avatar
      Add LANGUAGE pragmas to compiler/ source files · 23892440
      Herbert Valerio Riedel authored
      In some cases, the layout of the LANGUAGE/OPTIONS_GHC lines has been
      reorganized, while following the convention, to
      
      - place `{-# LANGUAGE #-}` pragmas at the top of the source file, before
        any `{-# OPTIONS_GHC #-}`-lines.
      
      - Moreover, if the list of language extensions fit into a single
        `{-# LANGUAGE ... -#}`-line (shorter than 80 characters), keep it on one
        line. Otherwise split into `{-# LANGUAGE ... -#}`-lines for each
        individual language extension. In both cases, try to keep the
        enumeration alphabetically ordered.
        (The latter layout is preferable as it's more diff-friendly)
      
      While at it, this also replaces obsolete `{-# OPTIONS ... #-}` pragma
      occurences by `{-# OPTIONS_GHC ... #-}` pragmas.
      23892440
  6. 18 Oct, 2013 1 commit
  7. 04 Sep, 2013 1 commit
  8. 03 Sep, 2013 1 commit
  9. 24 Jul, 2013 1 commit
    • Simon Marlow's avatar
      Fix a bug in stack layout with safe foreign calls (#8083) · c2348859
      Simon Marlow authored
      We weren't properly tracking the number of stack arguments in the
      continuation of a foreign call.  It happened to work when the
      continuation was not a join point, but when it was a join point we
      were using the wrong amount of stack fixup.
      c2348859
  10. 14 Apr, 2013 1 commit
  11. 06 Apr, 2013 1 commit
  12. 05 Mar, 2013 1 commit
  13. 12 Nov, 2012 1 commit
    • Simon Marlow's avatar
      Remove OldCmm, convert backends to consume new Cmm · d92bd17f
      Simon Marlow authored
      This removes the OldCmm data type and the CmmCvt pass that converts
      new Cmm to OldCmm.  The backends (NCGs, LLVM and C) have all been
      converted to consume new Cmm.
      
      The main difference between the two data types is that conditional
      branches in new Cmm have both true/false successors, whereas in OldCmm
      the false case was a fallthrough.  To generate slightly better code we
      occasionally need to invert a conditional to ensure that the
      branch-not-taken becomes a fallthrough; this was previously done in
      CmmCvt, and it is now done in CmmContFlowOpt.
      
      We could go further and use the Hoopl Block representation for native
      code, which would mean that we could use Hoopl's postorderDfs and
      analyses for native code, but for now I've left it as is, using the
      old ListGraph representation for native code.
      d92bd17f
  14. 30 Oct, 2012 1 commit
  15. 08 Oct, 2012 1 commit
    • Simon Marlow's avatar
      Produce new-style Cmm from the Cmm parser · a7c0387d
      Simon Marlow authored
      The main change here is that the Cmm parser now allows high-level cmm
      code with argument-passing and function calls.  For example:
      
      foo ( gcptr a, bits32 b )
      {
        if (b > 0) {
           // we can make tail calls passing arguments:
           jump stg_ap_0_fast(a);
        }
      
        return (x,y);
      }
      
      More details on the new cmm syntax are in Note [Syntax of .cmm files]
      in CmmParse.y.
      
      The old syntax is still more-or-less supported for those occasional
      code fragments that really need to explicitly manipulate the stack.
      However there are a couple of differences: it is now obligatory to
      give a list of live GlobalRegs on every jump, e.g.
      
        jump %ENTRY_CODE(Sp(0)) [R1];
      
      Again, more details in Note [Syntax of .cmm files].
      
      I have rewritten most of the .cmm files in the RTS into the new
      syntax, except for AutoApply.cmm which is generated by the genapply
      program: this file could be generated in the new syntax instead and
      would probably be better off for it, but I ran out of enthusiasm.
      
      Some other changes in this batch:
      
       - The PrimOp calling convention is gone, primops now use the ordinary
         NativeNodeCall convention.  This means that primops and "foreign
         import prim" code must be written in high-level cmm, but they can
         now take more than 10 arguments.
      
       - CmmSink now does constant-folding (should fix #7219)
      
       - .cmm files now go through the cmmPipeline, and as a result we
         generate better code in many cases.  All the object files generated
         for the RTS .cmm files are now smaller.  Performance should be
         better too, but I haven't measured it yet.
      
       - RET_DYN frames are removed from the RTS, lots of code goes away
      
       - we now have some more canned GC points to cover unboxed-tuples with
         2-4 pointers, which will reduce code size a little.
      a7c0387d
  16. 31 Aug, 2012 1 commit
    • Simon Marlow's avatar
      Fix a bug in foldExpDeep · 6dd55e8a
      Simon Marlow authored
      This caused the CAF analysis to occasionally miss a CAF sometimes,
      resulting in a very hard to diagnose crash.
      6dd55e8a
  17. 06 Aug, 2012 1 commit
  18. 20 Jul, 2012 1 commit
  19. 09 Jul, 2012 1 commit
  20. 05 Jul, 2012 2 commits
  21. 15 Mar, 2012 2 commits
  22. 13 Feb, 2012 1 commit
  23. 08 Feb, 2012 1 commit
    • Simon Marlow's avatar
      New stack layout algorithm · 76999b60
      Simon Marlow authored
      Also:
       - improvements to code generation: push slow-call continuations
         on the stack instead of generating explicit continuations
      
       - remove unused CmmInfo wrapper type (replace with CmmInfoTable)
      
       - squash Area and AreaId together, remove now-unused RegSlot
      
       - comment out old unused stack-allocation code that no longer
         compiles after removal of RegSlot
      76999b60
  24. 03 Feb, 2012 1 commit
  25. 30 Jan, 2012 1 commit
  26. 25 Jan, 2012 1 commit
  27. 23 Jan, 2012 1 commit
  28. 18 Jan, 2012 1 commit
  29. 19 Dec, 2011 1 commit
  30. 04 Nov, 2011 1 commit
  31. 25 Aug, 2011 1 commit
  32. 19 Aug, 2011 1 commit
  33. 28 Jun, 2011 1 commit
  34. 17 Jun, 2011 1 commit
    • Edward Z. Yang's avatar
      Port MachOp folding to new code generator. · b8e0ce7b
      Edward Z. Yang authored
          * Rewrote cmmMachOpFold to cmmMachOpFoldM, which returns
            Nothing if no folding took place.
          * Wrote some generic mapping functions which take functions
            of form 'a -> Maybe a' and are smart about sharing.
          * Split up optimizations from PIC and PPC work in the native
            codegen, so they'll be easier to turn off later
            (they are not currently being turned off, however.)
          * Whitespace fixes!
      
      ToDo: Turn off MachOp folding when new codegenerator is being used.
      Signed-off-by: Edward Z. Yang's avatarEdward Z. Yang <ezyang@mit.edu>
      b8e0ce7b
  35. 13 Jun, 2011 1 commit
  36. 15 May, 2011 1 commit