1. 22 Sep, 2019 2 commits
  2. 27 Jun, 2019 1 commit
    • Matthew Pickering's avatar
      rts: Correct handling of LARGE ARR_WORDS in LDV profiler · a586b33f
      Matthew Pickering authored
      This implements the correct fix for #11627 by skipping over the slop
      (which is zeroed) rather than adding special case logic for LARGE
      ARR_WORDS which runs the risk of not performing a correct census by
      ignoring any subsequent blocks.
      
      This approach implements similar logic to that in Sanity.c
      a586b33f
  3. 25 Apr, 2019 1 commit
  4. 21 Sep, 2018 1 commit
    • Ömer Sinan Ağacan's avatar
      Fix slop zeroing for AP_STACK eager blackholes in debug build · 66c17293
      Ömer Sinan Ağacan authored
      As #15571 reports, eager blackholing breaks sanity checks as we can't
      zero the payload when eagerly blackholing (because we'll be using the
      payload after blackholing), but by the time we blackhole a previously
      eagerly blackholed object (in `threadPaused()`) we don't have the
      correct size information for the object (because the object's type
      becomes BLACKHOLE when we eagerly blackhole it) so can't properly zero
      the slop.
      
      This problem can be solved for AP_STACK eager blackholing (which unlike
      eager blackholing in general, is not optional) by zeroing the payload
      after entering the stack. This patch implements this idea.
      
      Fixes #15571.
      
      Test Plan:
      Previously concprog001 when compiled and run with sanity checks
      
          ghc-stage2 Mult.hs -debug -rtsopts
          ./Mult +RTS -DS
      
      was failing with
      
          Mult: internal error: checkClosure: stack frame
              (GHC version 8.7.20180821 for x86_64_unknown_linux)
              Please report this as a GHC bug:  http://www.haskell.org/ghc/reportabug
      
      thic patch fixes this panic. The test still panics, but it runs for a while
      before panicking (instead of directly panicking as before), and the new problem
      seems unrelated:
      
          Mult: internal error: ASSERTION FAILED: file rts/sm/Sanity.c, line 296
              (GHC version 8.7.20180919 for x86_64_unknown_linux)
              Please report this as a GHC bug:  http://www.haskell.org/ghc/reportabug
      
      The new problem will be fixed in another diff.
      
      I also tried slow validate (which requires D5164): this does not introduce any
      new failures.
      
      Reviewers: simonmar, bgamari, erikd
      
      Reviewed By: simonmar
      
      Subscribers: rwbarton, carter
      
      GHC Trac Issues: #15571
      
      Differential Revision: https://phabricator.haskell.org/D5165
      66c17293
  5. 05 Jun, 2018 1 commit
    • Ömer Sinan Ağacan's avatar
      Rename some mutable closure types for consistency · 4075656e
      Ömer Sinan Ağacan authored
      SMALL_MUT_ARR_PTRS_FROZEN0 -> SMALL_MUT_ARR_PTRS_FROZEN_DIRTY
      SMALL_MUT_ARR_PTRS_FROZEN  -> SMALL_MUT_ARR_PTRS_FROZEN_CLEAN
      MUT_ARR_PTRS_FROZEN0       -> MUT_ARR_PTRS_FROZEN_DIRTY
      MUT_ARR_PTRS_FROZEN        -> MUT_ARR_PTRS_FROZEN_CLEAN
      
      Naming is now consistent with other CLEAR/DIRTY objects (MVAR, MUT_VAR,
      MUT_ARR_PTRS).
      
      (alternatively we could rename MVAR_DIRTY/MVAR_CLEAN etc. to MVAR0/MVAR)
      
      Removed a few comments in Scav.c about FROZEN0 being on the mut_list
      because it's now clear from the closure type.
      
      Reviewers: bgamari, simonmar, erikd
      
      Reviewed By: simonmar
      
      Subscribers: rwbarton, thomie, carter
      
      Differential Revision: https://phabricator.haskell.org/D4784
      4075656e
  6. 16 May, 2018 3 commits
    • Simon Marlow's avatar
      Merge FUN_STATIC closure with its SRT · 838b6903
      Simon Marlow authored
      Summary:
      The idea here is to save a little code size and some work in the GC,
      by collapsing FUN_STATIC closures and their SRTs.
      
      This is (4) in a series; see D4632 for more details.
      
      There's a tradeoff here: more complexity in the compiler in exchange
      for a modest code size reduction (probably around 0.5%).
      
      Results:
      * GHC binary itself (statically linked) is 1% smaller
      * -0.2% binary sizes in nofib (-0.5% module sizes)
      
      Full nofib results comparing D4634 with this: P177 (ignore runtimes,
      these aren't stable on my laptop)
      
      Test Plan: validate, nofib
      
      Reviewers: bgamari, niteria, simonpj, erikd
      
      Subscribers: thomie, carter
      
      Differential Revision: https://phabricator.haskell.org/D4637
      838b6903
    • Simon Marlow's avatar
      Save a word in the info table on x86_64 · 2b0918c9
      Simon Marlow authored
      Summary:
      An info table with an SRT normally looks like this:
      
          StgWord64 srt_offset
          StgClosureInfo layout
          StgWord32 layout
          StgWord32 has_srt
      
      But we only need 32 bits for srt_offset on x86_64, because the small
      memory model requires that code segments are at most 2GB. So we can
      optimise this to
      
          StgClosureInfo layout
          StgWord32 layout
          StgWord32 srt_offset
      
      saving a word.  We can tell whether the info table has an SRT or not,
      because zero is not a valid srt_offset, so zero still indicates that
      there's no SRT.
      
      Test Plan:
      * validate
      * For results, see D4632.
      
      Reviewers: bgamari, niteria, osa1, erikd
      
      Subscribers: thomie, carter
      
      Differential Revision: https://phabricator.haskell.org/D4634
      2b0918c9
    • Simon Marlow's avatar
      An overhaul of the SRT representation · eb8e692c
      Simon Marlow authored
      Summary:
      - Previously we would hvae a single big table of pointers per module,
        with a set of bitmaps to reference entries within it. The new
        representation is identical to a static constructor, which is much
        simpler for the GC to traverse, and we get to remove the complicated
        bitmap-traversal code from the GC.
      
      - Rewrite all the code to generate SRTs in CmmBuildInfoTables, and
        document it much better (see Note [SRTs]). This has been something
        I've wanted to do since we moved to the new code generator, I
        finally had the opportunity to finish it while on a transatlantic
        flight recently :)
      
      There are a series of 4 diffs:
      
      1. D4632 (this one), which does the bulk of the changes
      
      2. D4633 which adds support for smaller `CmmLabelDiffOff` constants
      
      3. D4634 which takes advantage of D4632 and D4633 to save a word in
         info tables that have an SRT on x86_64. This is where most of the
         binary size improvement comes from.
      
      4. D4637 which makes a further optimisation to merge some SRTs with
         static FUN closures.  This adds some complexity and the benefits
         are fairly modest, so it's not clear yet whether we should do this.
      
      Results (after (3), on x86_64)
      
      - GHC itself (staticaly linked) is 5.2% smaller
      
      - -1.7% binary sizes in nofib, -2.9% module sizes. Full nofib results: P176
      
      - I measured the overhead of traversing all the static objects in a
        major GC in GHC itself by doing `replicateM_ 1000 performGC` as the
        first thing in `Main.main`.  The new version was 5-10% faster, but
        the results did vary quite a bit.
      
      - I'm not sure if there's a compile-time difference, the results are
        too unreliable.
      
      Test Plan: validate
      
      Reviewers: bgamari, michalt, niteria, simonpj, erikd, osa1
      
      Subscribers: thomie, carter
      
      Differential Revision: https://phabricator.haskell.org/D4632
      eb8e692c
  7. 29 Apr, 2017 1 commit
  8. 23 Apr, 2017 1 commit
  9. 13 Dec, 2016 1 commit
  10. 07 Dec, 2016 1 commit
    • Simon Marlow's avatar
      Overhaul of Compact Regions (#12455) · 7036fde9
      Simon Marlow authored
      Summary:
      This commit makes various improvements and addresses some issues with
      Compact Regions (aka Compact Normal Forms).
      
      This was the most important thing I wanted to fix.  Compaction
      previously prevented GC from running until it was complete, which
      would be a problem in a multicore setting.  Now, we compact using a
      hand-written Cmm routine that can be interrupted at any point.  When a
      GC is triggered during a sharing-enabled compaction, the GC has to
      traverse and update the hash table, so this hash table is now stored
      in the StgCompactNFData object.
      
      Previously, compaction consisted of a deepseq using the NFData class,
      followed by a traversal in C code to copy the data.  This is now done
      in a single pass with hand-written Cmm (see rts/Compact.cmm). We no
      longer use the NFData instances, instead the Cmm routine evaluates
      components directly as it compacts.
      
      The new compaction is about 50% faster than the old one with no
      sharing, and a little faster on average with sharing (the cost of the
      hash table dominates when we're doing sharing).
      
      Static objects that don't (transitively) refer to any CAFs don't need
      to be copied into the compact region.  In particular this means we
      often avoid copying Char values and small Int values, because these
      are static closures in the runtime.
      
      Each Compact# object can support a single compactAdd# operation at any
      given time, so the Data.Compact library now enforces mutual exclusion
      using an MVar stored in the Compact object.
      
      We now get exceptions rather than killing everything with a barf()
      when we encounter an object that cannot be compacted (a function, or a
      mutable object).  We now also detect pinned objects, which can't be
      compacted either.
      
      The Data.Compact API has been refactored and cleaned up.  A new
      compactSize operation returns the size (in bytes) of the compact
      object.
      
      Most of the documentation is in the Haddock docs for the compact
      library, which I've expanded and improved here.
      
      Various comments in the code have been improved, especially the main
      Note [Compact Normal Forms] in rts/sm/CNF.c.
      
      I've added a few tests, and expanded a few of the tests that were
      there.  We now also run the tests with GHCi, and in a new test way
      that enables sanity checking (+RTS -DS).
      
      There's a benchmark in libraries/compact/tests/compact_bench.hs for
      measuring compaction speed and comparing sharing vs. no sharing.
      
      The field totalDataW in StgCompactNFData was unnecessary.
      
      Test Plan:
      * new unit tests
      * validate
      * tested manually that we can compact Data.Aeson data
      
      Reviewers: gcampax, bgamari, ezyang, austin, niteria, hvr, erikd
      
      Subscribers: thomie, simonpj
      
      Differential Revision: https://phabricator.haskell.org/D2751
      
      GHC Trac Issues: #12455
      7036fde9
  11. 29 Nov, 2016 1 commit
  12. 20 Jul, 2016 1 commit
    • gcampax's avatar
      Compact Regions · cf989ffe
      gcampax authored
      This brings in initial support for compact regions, as described in the
      ICFP 2015 paper "Efficient Communication and Collection with Compact
      Normal Forms" (Edward Z. Yang et.al.) and implemented by Giovanni
      Campagna.
      
      Some things may change before the 8.2 release, but I (Simon M.) wanted
      to get the main patch committed so that we can iterate.
      
      What documentation there is is in the Data.Compact module in the new
      compact package.  We'll need to extend and polish the documentation
      before the release.
      
      Test Plan:
      validate
      (new test cases included)
      
      Reviewers: ezyang, simonmar, hvr, bgamari, austin
      
      Subscribers: vikraman, Yuras, RyanGlScott, qnikst, mboes, facundominguez, rrnewton, thomie, erikd
      
      Differential Revision: https://phabricator.haskell.org/D1264
      
      GHC Trac Issues: #11493
      cf989ffe
  13. 17 May, 2016 1 commit
    • Erik de Castro Lopo's avatar
      rts: More const correct-ness fixes · 33c029dd
      Erik de Castro Lopo authored
      In addition to more const-correctness fixes this patch fixes an
      infelicity of the previous const-correctness patch (995cf0f3) which
      left `UNTAG_CLOSURE` taking a `const StgClosure` pointer parameter
      but returning a non-const pointer. Here we restore the original type
      signature of `UNTAG_CLOSURE` and add a new function
      `UNTAG_CONST_CLOSURE` which takes and returns a const `StgClosure`
      pointer and uses that wherever possible.
      
      Test Plan: Validate on Linux, OS X and Windows
      
      Reviewers: Phyx, hsyl20, bgamari, austin, simonmar, trofi
      
      Reviewed By: simonmar, trofi
      
      Subscribers: thomie
      
      Differential Revision: https://phabricator.haskell.org/D2231
      33c029dd
  14. 12 May, 2016 1 commit
  15. 04 May, 2016 1 commit
  16. 29 Apr, 2016 1 commit
  17. 23 Jan, 2016 1 commit
    • Joachim Breitner's avatar
      Remove unused IND_PERM · f42db157
      Joachim Breitner authored
      it seems that this closure type has not been in use since 5d52d9, so all
      this is dead and untested code. This removes it. Some of the code might
      be useful for a counting indirection as described in #10613, so when
      implementing that, have a look at what this commit removes.
      
      Test Plan: validate on harbormaster
      
      Reviewers: austin, bgamari, simonmar
      
      Reviewed By: simonmar
      
      Subscribers: thomie
      
      Differential Revision: https://phabricator.haskell.org/D1821
      f42db157
  18. 11 Sep, 2015 1 commit
  19. 20 Oct, 2014 1 commit
  20. 02 Oct, 2014 1 commit
    • Edward Z. Yang's avatar
      Rename _closure to _static_closure, apply naming consistently. · 35672072
      Edward Z. Yang authored
      Summary:
      In preparation for indirecting all references to closures,
      we rename _closure to _static_closure to ensure any old code
      will get an undefined symbol error.  In order to reference
      a closure foobar_closure (which is now undefined), you should instead
      use STATIC_CLOSURE(foobar).  For convenience, a number of these
      old identifiers are macro'd.
      
      Across C-- and C (Windows and otherwise), there were differing
      conventions on whether or not foobar_closure or &foobar_closure
      was the address of the closure.  Now, all foobar_closure references
      are addresses, and no & is necessary.
      
      CHARLIKE/INTLIKE were not changed, simply alpha-renamed.
      
      Part of remove HEAP_ALLOCED patch set (#8199)
      
      Depends on D265
      Signed-off-by: Edward Z. Yang's avatarEdward Z. Yang <ezyang@mit.edu>
      
      Test Plan: validate
      
      Reviewers: simonmar, austin
      
      Subscribers: simonmar, ezyang, carter, thomie
      
      Differential Revision: https://phabricator.haskell.org/D267
      
      GHC Trac Issues: #8199
      35672072
  21. 20 Aug, 2014 1 commit
  22. 16 Aug, 2014 1 commit
    • Herbert Valerio Riedel's avatar
      Implement {resize,shrink}MutableByteArray# primops · 246436f1
      Herbert Valerio Riedel authored
      The two new primops with the type-signatures
      
        resizeMutableByteArray# :: MutableByteArray# s -> Int#
                                -> State# s -> (# State# s, MutableByteArray# s #)
      
        shrinkMutableByteArray# :: MutableByteArray# s -> Int#
                                -> State# s -> State# s
      
      allow to resize MutableByteArray#s in-place (when possible), and are useful
      for algorithms where memory is temporarily over-allocated. The motivating
      use-case is for implementing integer backends, where the final target size of
      the result is either N or N+1, and only known after the operation has been
      performed.
      
      A future commit will implement a stateful variant of the
      `sizeofMutableByteArray#` operation (see #9447 for details), since now the
      size of a `MutableByteArray#` may change over its lifetime (i.e before
      it gets frozen or GCed).
      
      Test Plan: ./validate --slow
      
      Reviewers: ezyang, austin, simonmar
      
      Reviewed By: austin, simonmar
      
      Differential Revision: https://phabricator.haskell.org/D133
      246436f1
  23. 29 Apr, 2014 1 commit
    • Arash Rouhani's avatar
      Rts: Consistently use StgWord for sizes of bitmaps · 43b3bab3
      Arash Rouhani authored
      A long debate is in issue #8742, but the main motivation is that this
      allows for applying a patch to reuse the function scavenge_small_bitmap
      without changing the .o-file output.
      
      Similarly, I changed the types in rts/sm/Compact.c, so I can create
      a STATIC_INLINE function for the redundant code block:
      
              while (size > 0) {
                  if ((bitmap & 1) == 0) {
                      thread((StgClosure **)p);
                  }
                  p++;
                  bitmap = bitmap >> 1;
                  size--;
              }
      43b3bab3
  24. 29 Mar, 2014 1 commit
    • tibbe's avatar
      Add SmallArray# and SmallMutableArray# types · 90329b6c
      tibbe authored
      These array types are smaller than Array# and MutableArray# and are
      faster when the array size is small, as they don't have the overhead
      of a card table. Having no card table reduces the closure size with 2
      words in the typical small array case and leads to less work when
      updating or GC:ing the array.
      
      Reduces both the runtime and memory allocation by 8.8% on my insert
      benchmark for the HashMap type in the unordered-containers package,
      which makes use of lots of small arrays. With tuned GC settings
      (i.e. `+RTS -A6M`) the runtime reduction is 15%.
      
      Fixes #8923.
      90329b6c
  25. 19 Jan, 2014 1 commit
  26. 25 Oct, 2013 1 commit
  27. 11 Oct, 2013 1 commit
  28. 07 Mar, 2013 1 commit
  29. 19 Dec, 2012 1 commit
  30. 08 Oct, 2012 1 commit
    • Simon Marlow's avatar
      Produce new-style Cmm from the Cmm parser · a7c0387d
      Simon Marlow authored
      The main change here is that the Cmm parser now allows high-level cmm
      code with argument-passing and function calls.  For example:
      
      foo ( gcptr a, bits32 b )
      {
        if (b > 0) {
           // we can make tail calls passing arguments:
           jump stg_ap_0_fast(a);
        }
      
        return (x,y);
      }
      
      More details on the new cmm syntax are in Note [Syntax of .cmm files]
      in CmmParse.y.
      
      The old syntax is still more-or-less supported for those occasional
      code fragments that really need to explicitly manipulate the stack.
      However there are a couple of differences: it is now obligatory to
      give a list of live GlobalRegs on every jump, e.g.
      
        jump %ENTRY_CODE(Sp(0)) [R1];
      
      Again, more details in Note [Syntax of .cmm files].
      
      I have rewritten most of the .cmm files in the RTS into the new
      syntax, except for AutoApply.cmm which is generated by the genapply
      program: this file could be generated in the new syntax instead and
      would probably be better off for it, but I ran out of enthusiasm.
      
      Some other changes in this batch:
      
       - The PrimOp calling convention is gone, primops now use the ordinary
         NativeNodeCall convention.  This means that primops and "foreign
         import prim" code must be written in high-level cmm, but they can
         now take more than 10 arguments.
      
       - CmmSink now does constant-folding (should fix #7219)
      
       - .cmm files now go through the cmmPipeline, and as a result we
         generate better code in many cases.  All the object files generated
         for the RTS .cmm files are now smaller.  Performance should be
         better too, but I haven't measured it yet.
      
       - RET_DYN frames are removed from the RTS, lots of code goes away
      
       - we now have some more canned GC points to cover unboxed-tuples with
         2-4 pointers, which will reduce code size a little.
      a7c0387d
  31. 21 Sep, 2012 3 commits
  32. 07 Sep, 2012 1 commit
    • Simon Marlow's avatar
      Deprecate lnat, and use StgWord instead · 41737f12
      Simon Marlow authored
      lnat was originally "long unsigned int" but we were using it when we
      wanted a 64-bit type on a 64-bit machine.  This broke on Windows x64,
      where long == int == 32 bits.  Using types of unspecified size is bad,
      but what we really wanted was a type with N bits on an N-bit machine.
      StgWord is exactly that.
      
      lnat was mentioned in some APIs that clients might be using
      (e.g. StackOverflowHook()), so we leave it defined but with a comment
      to say that it's deprecated.
      41737f12
  33. 27 Aug, 2012 1 commit
  34. 25 Aug, 2012 2 commits