1. 03 Aug, 2015 1 commit
    • Ben Gamari's avatar
      CmmParse: Don't force alignment in memcpy-ish operations · 64b6733e
      Ben Gamari authored
      This was initially made in 681973c3.
      Here I wanted to enforce that the alignment passed to %memcpy was a
      constant expression, as this is required by LLVM. However, this breaks
      the knot-tying done in `loopDecls`, causing T8131 to hang.
      
      Here I remove the `seq` and mark T8131 as `expect_broken` in the case
      of the NCG, which doesn't force the alignment in this case.
      
      Fixes #10664.
      64b6733e
  2. 04 Jul, 2015 1 commit
    • thomie's avatar
      Replace usages of `-w` by `-fno-warn`s · 69beef56
      thomie authored
      And remove unused imports and language pragmas.
      
      I checked that the minimum Happy and Alex version requirements, as
      listed in aclocal.m4, don't have to change. Before building ghc, I ran:
        - cabal install happy==1.19.4 --with-ghc=ghc-7.8.4
        - cabal install alex==3.1.0 --with-ghc=ghc-7.6.3
      
      Differential Revision: https://phabricator.haskell.org/D1032
      69beef56
  3. 16 Jun, 2015 1 commit
  4. 10 Apr, 2015 1 commit
  5. 30 Mar, 2015 1 commit
    • Joachim Breitner's avatar
      Refactor the story around switches (#10137) · de1160be
      Joachim Breitner authored
      This re-implements the code generation for case expressions at the Stg →
      Cmm level, both for data type cases as well as for integral literal
      cases. (Cases on float are still treated as before).
      
      The goal is to allow for fancier strategies in implementing them, for a
      cleaner separation of the strategy from the gritty details of Cmm, and
      to run this later than the Common Block Optimization, allowing for one
      way to attack #10124. The new module CmmSwitch contains a number of
      notes explaining this changes. For example, it creates larger
      consecutive jump tables than the previous code, if possible.
      
      nofib shows little significant overall improvement of runtime. The
      rather large wobbling comes from changes in the code block order
      (see #8082, not much we can do about it). But the decrease in code size
      alone makes this worthwhile.
      
      ```
              Program           Size    Allocs   Runtime   Elapsed  TotalMem
                  Min          -1.8%      0.0%     -6.1%     -6.1%     -2.9%
                  Max          -0.7%     +0.0%     +5.6%     +5.7%     +7.8%
       Geometric Mean          -1.4%     -0.0%     -0.3%     -0.3%     +0.0%
      ```
      
      Compilation time increases slightly:
      ```
              -1 s.d.                -----            -2.0%
              +1 s.d.                -----            +2.5%
              Average                -----            +0.3%
      ```
      
      The test case T783 regresses a lot, but it is the only one exhibiting
      any regression. The cause is the changed order of branches in an
      if-then-else tree, which makes the hoople data flow analysis traverse
      the blocks in a suboptimal order. Reverting that gets rid of this
      regression, but has a consistent, if only very small (+0.2%), negative
      effect on runtime. So I conclude that this test is an extreme outlier
      and no reason to change the code.
      
      Differential Revision: https://phabricator.haskell.org/D720
      de1160be
  6. 20 Jan, 2015 1 commit
  7. 19 Jan, 2015 1 commit
    • Sergei Trofimovich's avatar
      CMM: add a mechanism to import C .data labels · d82f5925
      Sergei Trofimovich authored
      Summary:
      This introduces new .cmm syntax for import:
      
          'import' 'CLOSURE' <identifier>;
      
      Currently cmm syntax allows importing only function labels:
      
          import pthread_mutex_lock;
      
      but sometimes ghc needs to import global gariables
      or haskell closures:
      
          import ghczmprim_GHCziTypes_True_closure;
          import base_ControlziExceptionziBase_nestedAtomically_closure;
          import ghczmprim_GHCziTypes_False_closure;
          import sm_mutex;
      
      It breaks on ia64 where there is a difference in
      pointers to data and pointer to functions.
      
      Patch fixes threaded runtime on ia64 where
      dereference of 'sm_mutex' from CMM led to
      incurrect location.
      
      Exact breakage machanics are the same as in e18525fa
      
      
      
      Merge into the 7.10 branch
      Signed-off-by: default avatarSergei Trofimovich <siarheit@google.com>
      
      Test Plan: passes ./validate, makes ghci work on ghc-7.8.4
      
      Reviewers: simonmar, simonpj, austin
      
      Reviewed By: austin
      
      Subscribers: thomie
      
      Differential Revision: https://phabricator.haskell.org/D622
      d82f5925
  8. 16 Dec, 2014 3 commits
    • Peter Wortmann's avatar
      Add unwind information to Cmm · 711a51ad
      Peter Wortmann authored
      Unwind information allows the debugger to discover more information
      about a program state, by allowing it to "reconstruct" other states of
      the program. In practice, this means that we explain to the debugger
      how to unravel stack frames, which comes down mostly to explaining how
      to find their Sp and Ip register values.
      
      * We declare yet another new constructor for CmmNode - and this time
        there's actually little choice, as unwind information can and will
        change mid-block. We don't actually make use of these capabilities,
        and back-end support would be tricky (generate new labels?), but it
        feels like the right way to do it.
      
      * Even though we only use it for Sp so far, we allow CmmUnwind to specify
        unwind information for any register. This is pretty cheap and could
        come in useful in future.
      
      * We allow full CmmExpr expressions for specifying unwind values. The
        advantage here is that we don't have to make up new syntax, and can e.g.
        use the WDS macro directly. On the other hand, the back-end will now
        have to simplify the expression until it can sensibly be converted
        into DWARF byte code - a process which might fail, yielding NCG panics.
        On the other hand, when you're writing Cmm by hand you really ought to
        know what you're doing.
      
      (From Phabricator D169)
      711a51ad
    • Peter Wortmann's avatar
      Tick scopes · 5fecd767
      Peter Wortmann authored
      This patch solves the scoping problem of CmmTick nodes: If we just put
      CmmTicks into blocks we have no idea what exactly they are meant to
      cover.  Here we introduce tick scopes, which allow us to create
      sub-scopes and merged scopes easily.
      
      Notes:
      
      * Given that the code often passes Cmm around "head-less", we have to
        make sure that its intended scope does not get lost. To keep the amount
        of passing-around to a minimum we define a CmmAGraphScoped type synonym
        here that just bundles the scope with a portion of Cmm to be assembled
        later.
      
      * We introduce new scopes at somewhat random places, aligning with
        getCode calls. This works surprisingly well, but we might have to
        add new scopes into the mix later on if we find things too be too
        coarse-grained.
      
      (From Phabricator D169)
      5fecd767
    • Peter Wortmann's avatar
      Source notes (Cmm support) · 7ceaf96f
      Peter Wortmann authored
      This patch adds CmmTick nodes to Cmm code. This is relatively
      straight-forward, but also not very useful, as many blocks will simply
      end up with no annotations whatosever.
      
      Notes:
      
      * We use this design over, say, putting ticks into the entry node of all
        blocks, as it seems to work better alongside existing optimisations.
        Now granted, the reason for this is that currently GHC's main Cmm
        optimisations seem to mainly reorganize and merge code, so this might
        change in the future.
      
      * We have the Cmm parser generate a few source notes as well. This is
        relatively easy to do - worst part is that it complicates the CmmParse
        implementation a bit.
      
      (From Phabricator D169)
      7ceaf96f
  9. 20 Oct, 2014 3 commits
  10. 02 Oct, 2014 3 commits
    • Edward Z. Yang's avatar
      Properly generate info tables for static closures in C--. · 178eb906
      Edward Z. Yang authored
      Summary:
      Previously, we assumed all objects declared in C-- were not-static, even
      ones which were CONSTR_NOCAF_STATIC.  This used to be harmless, but now
      we need this information to be correct.
      
      Part of remove HEAP_ALLOCED patch set (#8199
      
      )
      
      Depends on D264
      Signed-off-by: Edward Z. Yang's avatarEdward Z. Yang <ezyang@mit.edu>
      
      Test Plan: validate
      
      Reviewers: simonmar, austin
      
      Subscribers: simonmar, ezyang, carter, thomie
      
      Differential Revision: https://phabricator.haskell.org/D265
      
      GHC Trac Issues: #8199
      178eb906
    • Edward Z. Yang's avatar
      BC-breaking changes to C-- CLOSURE syntax. · 3b5a840b
      Edward Z. Yang authored
      Summary:
      Previously, there were two variants of CLOSURE in C--:
      
          - Top-level CLOSURE(foo_closure, foo, lits...), which defines a new
            static closure and gives it a name, and
      
          - Array CLOSURE(foo, lits...), which was used for the static char
            and integer arrays.
      
      They used the same name, were confusing, and didn't even generate
      the correct internal label representation!  So now, we have two
      new forms:
      
          - Top-level CLOSURE(foo, lits...) which automatically generates
            foo_closure (along with foo_info, which we were doing already)
      
          - Array ANONYMOUS_CLOSURE(foo, lits...) which doesn't generate
            a foo_closure identifier.
      
      Part of remove HEAP_ALLOCED patch set (#8199
      
      )
      Signed-off-by: Edward Z. Yang's avatarEdward Z. Yang <ezyang@mit.edu>
      
      Test Plan: validate
      
      Reviewers: simonmar, austin
      
      Subscribers: simonmar, ezyang, carter, thomie
      
      Differential Revision: https://phabricator.haskell.org/D264
      
      GHC Trac Issues: #8199
      3b5a840b
    • Edward Z. Yang's avatar
      Place static closures in their own section. · b23ba2a7
      Edward Z. Yang authored
      Summary:
      The primary reason for doing this is assisting debuggability:
      if static closures are all in the same section, they are
      guaranteed to be adjacent to one another.  This will help
      later when we add some code that takes section start/end and
      uses this to sanity-check the sections.
      
      Part of remove HEAP_ALLOCED patch set (#8199
      
      )
      Signed-off-by: Edward Z. Yang's avatarEdward Z. Yang <ezyang@mit.edu>
      
      Test Plan: validate
      
      Reviewers: simonmar, austin
      
      Subscribers: simonmar, ezyang, carter, thomie
      
      Differential Revision: https://phabricator.haskell.org/D263
      
      GHC Trac Issues: #8199
      b23ba2a7
  11. 21 Jul, 2014 1 commit
    • Edward Z. Yang's avatar
      Rename PackageId to PackageKey, distinguishing it from Cabal's PackageId. · 4bebab25
      Edward Z. Yang authored
      
      
      Summary:
      Previously, both Cabal and GHC defined the type PackageId, and we expected
      them to be roughly equivalent (but represented differently).  This refactoring
      separates these two notions.
      
      A package ID is a user-visible identifier; it's the thing you write in a
      Cabal file, e.g. containers-0.9.  The components of this ID are semantically
      meaningful, and decompose into a package name and a package vrsion.
      
      A package key is an opaque identifier used by GHC to generate linking symbols.
      Presently, it just consists of a package name and a package version, but
      pursuant to #9265 we are planning to extend it to record other information.
      Within a single executable, it uniquely identifies a package.  It is *not* an
      InstalledPackageId, as the choice of a package key affects the ABI of a package
      (whereas an InstalledPackageId is computed after compilation.)  Cabal computes
      a package key for the package and passes it to GHC using -package-name (now
      *extremely* misnamed).
      
      As an added bonus, we don't have to worry about shadowing anymore.
      
      As a follow on, we should introduce -current-package-key having the same role as
      -package-name, and deprecate the old flag.  This commit is just renaming.
      
      The haddock submodule needed to be updated.
      Signed-off-by: default avatarEdward Z. Yang <ezyang@cs.stanford.edu>
      
      Test Plan: validate
      
      Reviewers: simonpj, simonmar, hvr, austin
      
      Subscribers: simonmar, relrod, carter
      
      Differential Revision: https://phabricator.haskell.org/D79
      
      Conflicts:
      	compiler/main/HscTypes.lhs
      	compiler/main/Packages.lhs
      	utils/haddock
      4bebab25
  12. 29 Mar, 2014 1 commit
    • tibbe's avatar
      Add SmallArray# and SmallMutableArray# types · 90329b6c
      tibbe authored
      These array types are smaller than Array# and MutableArray# and are
      faster when the array size is small, as they don't have the overhead
      of a card table. Having no card table reduces the closure size with 2
      words in the typical small array case and leads to less work when
      updating or GC:ing the array.
      
      Reduces both the runtime and memory allocation by 8.8% on my insert
      benchmark for the HashMap type in the unordered-containers package,
      which makes use of lots of small arrays. With tuned GC settings
      (i.e. `+RTS -A6M`) the runtime reduction is 15%.
      
      Fixes #8923.
      90329b6c
  13. 16 Jan, 2014 2 commits
    • Simon Marlow's avatar
      Allow the argument to 'reserve' to be a compile-time expression · 58e5843a
      Simon Marlow authored
      By using the constant-folder to reduce it to an integer.
      58e5843a
    • Simon Marlow's avatar
      Add a way to reserve temporary stack space in high-level Cmm · eaa37a0f
      Simon Marlow authored
      We occasionally need to reserve some temporary memory in a primop for
      passing to a foreign function.  We've been using the stack for this,
      but when we moved to high-level Cmm it became quite fragile because
      primops are in high-level Cmm and the stack is supposed to be under
      the control of the Cmm pipeline.
      
      So this change puts things on a firmer footing by adding a new Cmm
      construct 'reserve'.  e.g. in decodeFloat_Int#:
      
          reserve 2 = tmp {
      
            mp_tmp1  = tmp + WDS(1);
            mp_tmp_w = tmp;
      
            /* Perform the operation */
            ccall __decodeFloat_Int(mp_tmp1 "ptr", mp_tmp_w "ptr", arg);
      
            r1 = W_[mp_tmp1];
            r2 = W_[mp_tmp_w];
          }
      
      reserve is described in CmmParse.y.
      
      Unfortunately the argument to reserve must be a compile-time constant.
      We might have to extend the parser to allow expressions with
      arithmetic operators if this is too restrictive.
      
      Note also that the return instruction for the procedure must be
      outside the scope of the reserved stack area, so we have to extract
      the values from the reserved area before we close the scope.  This
      means some more local variables (r1, r2 in the example above).  The
      generated code is more or less identical to what we had before though.
      eaa37a0f
  14. 02 Oct, 2013 1 commit
  15. 01 Oct, 2013 3 commits
  16. 23 Sep, 2013 2 commits
  17. 06 Jun, 2013 1 commit
    • Simon Peyton Jones's avatar
      Implement cardinality analysis · 99d4e5b4
      Simon Peyton Jones authored
      This major patch implements the cardinality analysis described
      in our paper "Higher order cardinality analysis". It is joint
      work with Ilya Sergey and Dimitrios Vytiniotis.
      
      The basic is augment the absence-analysis part of the demand
      analyser so that it can tell when something is used
      	 never
      	 at most once
       	 some other way
      
      The "at most once" information is used
          a) to enable transformations, and
             in particular to identify one-shot lambdas
          b) to allow updates on thunks to be omitted.
      
      There are two new flags, mainly there so you can do performance
      comparisons:
          -fkill-absence   stops GHC doing absence analysis at all
          -fkill-one-shot  stops GHC spotting one-shot lambdas
                           and single-entry thunks
      
      The big changes are:
      
      * The Demand type is substantially refactored.  In particular
        the UseDmd is factored as follows
            data UseDmd
              = UCall Count UseDmd
              | UProd [MaybeUsed]
              | UHead
              | Used
      
            data MaybeUsed = Abs | Use Count UseDmd
      
            data Count = One | Many
      
        Notice that UCall recurses straight to UseDmd, whereas
        UProd goes via MaybeUsed.
      
        The "Count" embodies the "at most once" or "many" idea.
      
      * The demand analyser itself was refactored a lot
      
      * The previously ad-hoc stuff in the occurrence analyser for foldr and
        build goes away entirely.  Before if we had build (\cn -> ...x... )
        then the "\cn" was hackily made one-shot (by spotting 'build' as
        special.  That's essential to allow x to be inlined.  Now the
        occurrence analyser propagates info gotten from 'build's stricness
        signature (so build isn't special); and that strictness sig is
        in turn derived entirely automatically.  Much nicer!
      
      * The ticky stuff is improved to count single-entry thunks separately.
      
      One shortcoming is that there is no DEBUG way to spot if an
      allegedly-single-entry thunk is acually entered more than once.  It
      would not be hard to generate a bit of code to check for this, and it
      would be reassuring.  But it's fiddly and I have not done it.
      
      Despite all this fuss, the performance numbers are rather under-whelming.
      See the paper for more discussion.
      
             nucleic2          -0.8%    -10.9%      0.10      0.10     +0.0%
               sphere          -0.7%     -1.5%      0.08      0.08     +0.0%
      --------------------------------------------------------------------------------
                  Min          -4.7%    -10.9%     -9.3%     -9.3%    -50.0%
                  Max          -0.4%     +0.5%     +2.2%     +2.3%     +7.4%
       Geometric Mean          -0.8%     -0.2%     -1.3%     -1.3%     -1.8%
      
      I don't quite know how much credence to place in the runtime changes,
      but movement seems generally in the right direction.
      99d4e5b4
  18. 09 May, 2013 1 commit
  19. 24 Apr, 2013 1 commit
  20. 01 Feb, 2013 1 commit
  21. 23 Jan, 2013 1 commit
  22. 13 Nov, 2012 1 commit
    • Simon Marlow's avatar
      Fix the Slow calling convention (#7192) · 4270d7e7
      Simon Marlow authored
      The Slow calling convention passes the closure in R1, but we were
      ignoring this and hoping it would work, which it often did.  However,
      this bug seems to have been the cause of #7192, because the
      graph-colouring allocator is more sensitive to having correct liveness
      information on jumps.
      4270d7e7
  23. 05 Nov, 2012 1 commit
  24. 30 Oct, 2012 2 commits
  25. 19 Oct, 2012 1 commit
    • Simon Marlow's avatar
      Remove the old codegen · 6fbd46b0
      Simon Marlow authored
      Except for CgUtils.fixStgRegisters that is used in the NCG and LLVM
      backends, and should probably be moved somewhere else.
      6fbd46b0
  26. 16 Oct, 2012 1 commit
    • ian@well-typed.com's avatar
      Some alpha renaming · cd33eefd
      ian@well-typed.com authored
      Mostly d -> g (matching DynFlag -> GeneralFlag).
      Also renamed if* to when*, matching the Haskell if/when names
      cd33eefd
  27. 08 Oct, 2012 2 commits
    • Simon Marlow's avatar
      untab · a94144b8
      Simon Marlow authored
      a94144b8
    • Simon Marlow's avatar
      Produce new-style Cmm from the Cmm parser · a7c0387d
      Simon Marlow authored
      The main change here is that the Cmm parser now allows high-level cmm
      code with argument-passing and function calls.  For example:
      
      foo ( gcptr a, bits32 b )
      {
        if (b > 0) {
           // we can make tail calls passing arguments:
           jump stg_ap_0_fast(a);
        }
      
        return (x,y);
      }
      
      More details on the new cmm syntax are in Note [Syntax of .cmm files]
      in CmmParse.y.
      
      The old syntax is still more-or-less supported for those occasional
      code fragments that really need to explicitly manipulate the stack.
      However there are a couple of differences: it is now obligatory to
      give a list of live GlobalRegs on every jump, e.g.
      
        jump %ENTRY_CODE(Sp(0)) [R1];
      
      Again, more details in Note [Syntax of .cmm files].
      
      I have rewritten most of the .cmm files in the RTS into the new
      syntax, except for AutoApply.cmm which is generated by the genapply
      program: this file could be generated in the new syntax instead and
      would probably be better off for it, but I ran out of enthusiasm.
      
      Some other changes in this batch:
      
       - The PrimOp calling convention is gone, primops now use the ordinary
         NativeNodeCall convention.  This means that primops and "foreign
         import prim" code must be written in high-level cmm, but they can
         now take more than 10 arguments.
      
       - CmmSink now does constant-folding (should fix #7219)
      
       - .cmm files now go through the cmmPipeline, and as a result we
         generate better code in many cases.  All the object files generated
         for the RTS .cmm files are now smaller.  Performance should be
         better too, but I haven't measured it yet.
      
       - RET_DYN frames are removed from the RTS, lots of code goes away
      
       - we now have some more canned GC points to cover unboxed-tuples with
         2-4 pointers, which will reduce code size a little.
      a7c0387d
  28. 18 Sep, 2012 1 commit