1. 05 Sep, 2017 1 commit
  2. 16 Aug, 2017 1 commit
    • Ben Gamari's avatar
      Speed up compilation of profiling stubs · a8da0de2
      Ben Gamari authored
      Here we encode the cost centre list as static data. This means that the
      initialization stubs are small functions which should be easy for GCC to
      compile, even with optimization.
      
      Fixes #7960.
      
      Test Plan: Test profiling
      
      Reviewers: austin, erikd, simonmar
      
      Reviewed By: simonmar
      
      Subscribers: rwbarton, thomie
      
      GHC Trac Issues: #7960
      
      Differential Revision: https://phabricator.haskell.org/D3853
      a8da0de2
  3. 11 Jul, 2017 1 commit
    • Douglas Wilson's avatar
      Fix Work Balance computation in RTS stats · 7c9e356d
      Douglas Wilson authored
      An additional stat is tracked per gc: par_balanced_copied This is the
      the number of bytes copied by each gc thread under the balanced lmit,
      which is simply (copied_bytes / num_gc_threads).  The stat is added to
      all the appropriate GC structures, so is visible in the eventlog and in
      GHC.Stats.
      
      A note is added explaining how work balance is computed.
      
      Remove some end of line whitespace
      
      Test Plan:
      ./validate
      experiment with the program attached to the ticket
      examine code changes carefully
      
      Reviewers: simonmar, austin, hvr, bgamari, erikd
      
      Reviewed By: simonmar
      
      Subscribers: Phyx, rwbarton, thomie
      
      GHC Trac Issues: #13830
      
      Differential Revision: https://phabricator.haskell.org/D3658
      7c9e356d
  4. 29 Apr, 2017 2 commits
  5. 24 Apr, 2017 1 commit
    • Sergei Trofimovich's avatar
      compiler/cmm/PprC.hs: constify labels in .rodata · b68697e5
      Sergei Trofimovich authored
      Consider one-line module
          module B (v) where v = "hello"
      in -fvia-C mode it generates code like
          static char gibberish_str[] = "hello";
      
      It resides in data section (precious resource on ia64!).
      The patch switches genrator to emit:
          static const char gibberish_str[] = "hello";
      
      Other types if symbols that gained 'const' qualifier are:
      
      - info tables (from haskell and CMM)
      - static reference tables (from haskell and CMM)
      
      Cleanups along the way:
      
      - fixed info tables defined in .cmm to reside in .rodata
      - split out closure declaration into 'IC_' / 'EC_'
      - added label declaration (based on label type) right before
        each label definition (based on section type) so that C
        compiler could check if declaration and definition matches
        at definition site.
      Signed-off-by: default avatarSergei Trofimovich <slyfox@gentoo.org>
      
      Test Plan: ran testsuite on unregisterised x86_64 compiler
      
      Reviewers: simonmar, ezyang, austin, bgamari, erikd
      
      Reviewed By: bgamari, erikd
      
      Subscribers: rwbarton, thomie
      
      GHC Trac Issues: #8996
      
      Differential Revision: https://phabricator.haskell.org/D3481
      b68697e5
  6. 23 Apr, 2017 1 commit
  7. 05 Apr, 2017 1 commit
  8. 04 Apr, 2017 2 commits
  9. 01 Mar, 2017 1 commit
    • Ben Gamari's avatar
      rts: Fix build · b86d226f
      Ben Gamari authored
      I evidently neglected to consider that validate doesn't build profiled
      ways. Arg.
      b86d226f
  10. 28 Feb, 2017 1 commit
    • Ben Gamari's avatar
      rts: Allow profile output path to be specified on RTS command line · db2a6676
      Ben Gamari authored
      This introduces a RTS option, -po, which allows the user to override the stem
      used to form the output file names of the heap profile and cost center summary.
      
      It's a bit unclear to me whether this is really the interface we want.
      Alternatively we could just allow the user to specify the `.hp` and `.prof` file
      names separately. This would arguably be a bit more straightforward and would
      allow the user to name JSON output with an appropriate `.json` suffix if they so
      desired. However, this would come at the cost of taking more of the option
      space, which is a somewhat precious commodity.
      
      Test Plan: Validate, try using `-po` RTS option
      
      Reviewers: simonmar, austin, erikd
      
      Reviewed By: simonmar
      
      Subscribers: thomie
      
      Differential Revision: https://phabricator.haskell.org/D3182
      db2a6676
  11. 23 Feb, 2017 1 commit
    • Ben Gamari's avatar
      JSON profiler reports · a2043332
      Ben Gamari authored
      This introduces a JSON output format for cost-centre profiler reports.
      It's not clear whether this is really something we want to introduce
      given that we may also move to a more Haskell-driven output pipeline in
      the future, but I nevertheless found this helpful, so I thought I would
      put it up.
      
      Test Plan: Compile a program with `-prof -fprof-auto`; run with `+RTS
      -pj`
      
      Reviewers: austin, erikd, simonmar
      
      Reviewed By: simonmar
      
      Subscribers: duncan, maoe, thomie, simonmar
      
      Differential Revision: https://phabricator.haskell.org/D3132
      a2043332
  12. 08 Feb, 2017 1 commit
  13. 04 Feb, 2017 1 commit
    • Takenobu Tani's avatar
      Fix comment (old file names) in includes/ · 9984024a
      Takenobu Tani authored
      [skip ci]
      
      There ware some old file names (.lhs, ...) at comments.
      
      * includes/rts/Bytecodes.h
        - ghc/compiler/ghci/ByteCodeGen.lhs -> ByteCodeAsm.hs
      
      * includes/rts/Constants.h
        - libraries/base/GHC/Conc.lhs -> libraries/base/GHC/Conc/Sync.hs
      
      * includes/rts/storage/FunTypes.h
        - utils/genapply/GenApply.hs -> utils/genappl/Main.hs
        - compiler/codeGen/CgCallConv.lhs -> compiler/codeGen/StgCmmLayout.hs
      
      * includes/stg/MiscClosures.h
        - compiler/codeGen/CgStackery.lhs -> compiler/codeGen/StgCmmArgRep.hs
        - HeapStackCheck.hc  -> HeapStackCheck.cmm
      
      Reviewers: bgamari, austin, simonmar, erikd
      
      Reviewed By: erikd
      
      Subscribers: thomie
      
      Differential Revision: https://phabricator.haskell.org/D3074
      9984024a
  14. 02 Feb, 2017 1 commit
    • Ben Gamari's avatar
      Add support for StaticPointers in GHCi · eedb3df0
      Ben Gamari authored
      Here we add support to GHCi for StaticPointers. This process begins by
      adding remote GHCi messages for adding entries to the static pointer
      table. We then collect binders needing SPT entries after linking and
      send the interpreter a message adding entries with the appropriate
      fingerprints.
      
      Test Plan: `make test TEST=StaticPtr`
      
      Reviewers: facundominguez, mboes, simonpj, simonmar, goldfire, austin,
      hvr, erikd
      
      Reviewed By: simonpj, simonmar
      
      Subscribers: RyanGlScott, simonpj, thomie
      
      Differential Revision: https://phabricator.haskell.org/D2504
      
      GHC Trac Issues: #12356
      eedb3df0
  15. 31 Jan, 2017 1 commit
    • alexbiehl's avatar
      Abstract over the way eventlogs are flushed · 4dfc6d1c
      alexbiehl authored
      Currently eventlog data is always written to a file `progname.eventlog`.
      This patch introduces the `flushEventLog` field in `RtsConfig` which
      allows to customize the writing of eventlog data.
      
      One possible scenario is the ongoing live-profile-monitor effort by
      @NCrashed which slurps all eventlog data through `fluchEventLog`.
      
      `flushEventLog` takes a buffer with eventlog data and its size and
      returns `false` (0) in case eventlog data could not be procesed.
      
      Reviewers: simonmar, austin, erikd, bgamari
      
      Reviewed By: simonmar, bgamari
      
      Subscribers: qnikst, thomie, NCrashed
      
      Differential Revision: https://phabricator.haskell.org/D2934
      4dfc6d1c
  16. 10 Jan, 2017 1 commit
  17. 13 Dec, 2016 1 commit
  18. 11 Dec, 2016 1 commit
    • Moritz Angermann's avatar
      Make globals use sharedCAF · c3c70244
      Moritz Angermann authored
      Summary:
      The use of globals is quite painful when multiple rts are loaded, e.g.
      when plugins are loaded, which bring in a second rts. The sharedCAF
      appraoch was employed for the FastStringTable; I've taken the libery
      to extend this to the other globals I could find.
      
      This is a reboot of D2575, that should hopefully not exhibit the same
      windows build issues.
      
      Reviewers: Phyx, simonmar, goldfire, bgamari, austin, hvr, erikd
      
      Reviewed By: Phyx, simonmar, bgamari
      
      Subscribers: mpickering, thomie
      
      Differential Revision: https://phabricator.haskell.org/D2773
      c3c70244
  19. 07 Dec, 2016 1 commit
    • Simon Marlow's avatar
      Overhaul of Compact Regions (#12455) · 7036fde9
      Simon Marlow authored
      Summary:
      This commit makes various improvements and addresses some issues with
      Compact Regions (aka Compact Normal Forms).
      
      This was the most important thing I wanted to fix.  Compaction
      previously prevented GC from running until it was complete, which
      would be a problem in a multicore setting.  Now, we compact using a
      hand-written Cmm routine that can be interrupted at any point.  When a
      GC is triggered during a sharing-enabled compaction, the GC has to
      traverse and update the hash table, so this hash table is now stored
      in the StgCompactNFData object.
      
      Previously, compaction consisted of a deepseq using the NFData class,
      followed by a traversal in C code to copy the data.  This is now done
      in a single pass with hand-written Cmm (see rts/Compact.cmm). We no
      longer use the NFData instances, instead the Cmm routine evaluates
      components directly as it compacts.
      
      The new compaction is about 50% faster than the old one with no
      sharing, and a little faster on average with sharing (the cost of the
      hash table dominates when we're doing sharing).
      
      Static objects that don't (transitively) refer to any CAFs don't need
      to be copied into the compact region.  In particular this means we
      often avoid copying Char values and small Int values, because these
      are static closures in the runtime.
      
      Each Compact# object can support a single compactAdd# operation at any
      given time, so the Data.Compact library now enforces mutual exclusion
      using an MVar stored in the Compact object.
      
      We now get exceptions rather than killing everything with a barf()
      when we encounter an object that cannot be compacted (a function, or a
      mutable object).  We now also detect pinned objects, which can't be
      compacted either.
      
      The Data.Compact API has been refactored and cleaned up.  A new
      compactSize operation returns the size (in bytes) of the compact
      object.
      
      Most of the documentation is in the Haddock docs for the compact
      library, which I've expanded and improved here.
      
      Various comments in the code have been improved, especially the main
      Note [Compact Normal Forms] in rts/sm/CNF.c.
      
      I've added a few tests, and expanded a few of the tests that were
      there.  We now also run the tests with GHCi, and in a new test way
      that enables sanity checking (+RTS -DS).
      
      There's a benchmark in libraries/compact/tests/compact_bench.hs for
      measuring compaction speed and comparing sharing vs. no sharing.
      
      The field totalDataW in StgCompactNFData was unnecessary.
      
      Test Plan:
      * new unit tests
      * validate
      * tested manually that we can compact Data.Aeson data
      
      Reviewers: gcampax, bgamari, ezyang, austin, niteria, hvr, erikd
      
      Subscribers: thomie, simonpj
      
      Differential Revision: https://phabricator.haskell.org/D2751
      
      GHC Trac Issues: #12455
      7036fde9
  20. 06 Dec, 2016 1 commit
    • Simon Marlow's avatar
      Overhaul GC stats · 24e6594c
      Simon Marlow authored
      Summary:
      Visible API changes:
      
      * The C struct `GCDetails` gives the stats about a single GC.  This is
        passed to the `gcDone()` callback if one is set via the
        RtsConfig. (previously we just passed a collection of values, so this
        is more extensible, at the expense of breaking the existing API)
      
      * `RTSStats` gives cumulative stats since the start of the program,
        and includes the `GCDetails` for the most recent GC.  This struct
        can be obtained via `getRTSStats()` (the old `getGCStats()` has been
        removed, and `getGCStatsEnabled()` has been renamed to
        `getRTSStatsEnabled()`)
      
      Improvements:
      
      * The per-GC stats and cumulative stats are now cleanly separated.
      
      * Inside the RTS we have a top-level `RTSStats` struct to keep all our
        stats in, previously this was just a collection of strangely-named
        variables.  This struct is mostly just copied in `getRTSStats()`, so
        the implementation of that function is a lot shorter.
      
      * Types are more consistent.  We use a uint64_t byte count for all
        memory values, and Time for all time values.
      
      * Names are more consistent.  We use a suffix `_bytes` for all byte
        counts and `_ns` for all time values.
      
      * We now collect information about the amount of memory in large
        objects and compact objects in `GCDetails`. (the latter was the reason
        I started doing this patch but it seems to have ballooned a bit!)
      
      * I fixed a bug in the calculation of the elapsed MUT time, and added
        an ASSERT to stop the calculations going wrong in the future.
      
      For now I kept the Haskell API in `GHC.Stats` the same, by
      impedence-matching with the new API.  We could either break that API
      and make it match the C API more closely, or we could add a new API
      and deprecate the old one.  Opinions welcome.
      
      This stuff is very easy to get wrong, and it's hard to test.  Reviews
      welcome!
      
      Test Plan:
      manual testing
      validate
      
      Reviewers: bgamari, niteria, austin, ezyang, hvr, erikd, rwbarton, Phyx
      
      Subscribers: thomie
      
      Differential Revision: https://phabricator.haskell.org/D2756
      24e6594c
  21. 30 Nov, 2016 1 commit
  22. 29 Nov, 2016 3 commits
    • Ben Gamari's avatar
      Use C99's bool · 428e152b
      Ben Gamari authored
      Test Plan: Validate on lots of platforms
      
      Reviewers: erikd, simonmar, austin
      
      Reviewed By: erikd, simonmar
      
      Subscribers: michalt, thomie
      
      Differential Revision: https://phabricator.haskell.org/D2699
      428e152b
    • Moritz Angermann's avatar
      Make globals use sharedCAF · 6f7ed1e5
      Moritz Angermann authored
      The use of globals is quite painful when multiple rts are loaded, e.g.
      when plugins are loaded, which bring in a second rts. The sharedCAF
      appraoch was employed for the FastStringTable; I've taken the libery
      to extend this to the other globals I could find.
      
      Reviewers: rwbarton, simonmar, austin, hvr, erikd, bgamari
      
      Reviewed By: simonmar, bgamari
      
      Subscribers: thomie
      
      Differential Revision: https://phabricator.haskell.org/D2575
      6f7ed1e5
    • shlevy's avatar
      Define thread primitives if they're supported. · 1732d7ac
      shlevy authored
      On iOS, we use the pthread-based implementation of Itimer.c even for a
      non-threaded RTS. Since 999c464d, this relies on synchronization
      primitives like Mutex, so ensure those primitives are defined whenever
      they are supported, even if !THREADED_RTS.
      
      Fixes #12799.
      
      Reviewers: erikd, austin, simonmar, bgamari
      
      Reviewed By: simonmar, bgamari
      
      Subscribers: thomie
      
      Differential Revision: https://phabricator.haskell.org/D2712
      
      GHC Trac Issues: #12799
      1732d7ac
  23. 14 Nov, 2016 1 commit
    • Simon Marlow's avatar
      Remove CONSTR_STATIC · 55d535da
      Simon Marlow authored
      Summary:
      We currently have two info tables for a constructor
      
      * XXX_con_info: the info table for a heap-resident instance of the
        constructor, It has type CONSTR, or one of the specialised types like
        CONSTR_1_0
      
      * XXX_static_info: the info table for a static instance of this
        constructor, which has type CONSTR_STATIC or CONSTR_STATIC_NOCAF.
      
      I'm getting rid of the latter, and using the `con_info` info table for
      both static and dynamic constructors.  For rationale and more details
      see Note [static constructors] in SMRep.hs.
      
      I also removed these macros: `isSTATIC()`, `ip_STATIC()`,
      `closure_STATIC()`, since they relied on the CONSTR/CONSTR_STATIC
      distinction, and anyway HEAP_ALLOCED() does the same job.
      
      Test Plan: validate
      
      Reviewers: bgamari, simonpj, austin, gcampax, hvr, niteria, erikd
      
      Subscribers: thomie
      
      Differential Revision: https://phabricator.haskell.org/D2690
      
      GHC Trac Issues: #12455
      55d535da
  24. 02 Nov, 2016 1 commit
  25. 01 Oct, 2016 1 commit
    • Tamar Christina's avatar
      Support more than 64 logical processors on Windows · 3c179054
      Tamar Christina authored
      Windows support for more than 64 logical processors are implemented
      using processor groups.
      
      Essentially what it's doing is keeping the existing maximum of 64
      processors and keeping the affinity mask a 64 bit value, but adds an
      hierarchy above that.
      
      This support was added to Windows 7 and so we need to at runtime detect
      if the APIs are still there due to our minimum supported version being
      Windows Vista.
      
      The Maximum number of groups supported at this time is 4, so 256 logical
      cores.  The group indices are 0 based. One thread can have affinity with
      multiple groups.
      
      See
      https://msdn.microsoft.com/en-us/library/windows/desktop/ms684251.aspx
      and particularly helpful is the whitepaper: 'Supporting Systems that
      have more than 64 processors' at
      https://msdn.microsoft.com/en-us/library/windows/hardware/dn653313.aspx
      
      Processor groups are not guaranteed to be uniformly distributed nor
      guaranteed to be filled before a next group is needed. The OS will
      assign processors to groups based on physical proximity and will never
      partially assign cores from one physical cpu to more than one group. If
      one has two 48 core CPUs then you'd end up with two groups of 48 logical
      cpus. Now add a 3rd CPU with 10 cores and the group it is assigned to
      depends where the socket is on the board.
      
      Test Plan:
      ./validate or make test -c . in the rts test folder.
      
      This tests for regressions, to test this particular functionality
      itself:
      
         <program> +RTS -N -qa -RTS
      
      Test is detailed in description.
      
      Reviewers: bgamari, simonmar, austin, erikd
      
      Reviewed By: simonmar
      
      Subscribers: thomie, #ghc_windows_task_force
      
      Differential Revision: https://phabricator.haskell.org/D2533
      
      GHC Trac Issues: #11054
      3c179054
  26. 19 Aug, 2016 1 commit
  27. 05 Aug, 2016 1 commit
    • avd's avatar
      codeGen: Remove binutils<2.17 hack, fixes T11758 · e3e2e49a
      avd authored
      There was a complication on the x86_64 platform, where pointers were 64
      bits, but the tools didn't support 64-bit relative relocations.  This
      was true before binutils 2.17, which nowadays is quite standart (even
      CentOs 5 is shipped with 2.17).
      
      Hacks were removed from x86 genSwitch and asm pretty printer. Also
      [x86-64-relative] note was dropped from
      includes/rts/storage/InfoTables.h as it's not referenced anywhere now.
      
      Reviewers: austin, simonmar, rwbarton, erikd, bgamari
      
      Reviewed By: simonmar, erikd, bgamari
      
      Subscribers: thomie
      
      Differential Revision: https://phabricator.haskell.org/D2426
      e3e2e49a
  28. 27 Jul, 2016 1 commit
  29. 20 Jul, 2016 1 commit
    • gcampax's avatar
      Compact Regions · cf989ffe
      gcampax authored
      This brings in initial support for compact regions, as described in the
      ICFP 2015 paper "Efficient Communication and Collection with Compact
      Normal Forms" (Edward Z. Yang et.al.) and implemented by Giovanni
      Campagna.
      
      Some things may change before the 8.2 release, but I (Simon M.) wanted
      to get the main patch committed so that we can iterate.
      
      What documentation there is is in the Data.Compact module in the new
      compact package.  We'll need to extend and polish the documentation
      before the release.
      
      Test Plan:
      validate
      (new test cases included)
      
      Reviewers: ezyang, simonmar, hvr, bgamari, austin
      
      Subscribers: vikraman, Yuras, RyanGlScott, qnikst, mboes, facundominguez, rrnewton, thomie, erikd
      
      Differential Revision: https://phabricator.haskell.org/D1264
      
      GHC Trac Issues: #11493
      cf989ffe
  30. 16 Jul, 2016 1 commit
  31. 17 Jun, 2016 1 commit
    • Simon Marlow's avatar
      NUMA cleanups · 498ed266
      Simon Marlow authored
      - Move the numaMap and nNumaNodes out of RtsFlags to Capability.c
      - Add a test to tests/rts
      498ed266
  32. 10 Jun, 2016 2 commits
    • Simon Marlow's avatar
      Rts flags cleanup · c88f31a0
      Simon Marlow authored
      * Remove unused/old flags from the structs
      * Update old comments
      * Add missing flags to GHC.RTS
      * Simplify GHC.RTS, remove C code and use hsc2hs instead
      * Make ParFlags unconditional, and add support to GHC.RTS
      c88f31a0
    • Simon Marlow's avatar
      NUMA support · 9e5ea67e
      Simon Marlow authored
      Summary:
      The aim here is to reduce the number of remote memory accesses on
      systems with a NUMA memory architecture, typically multi-socket servers.
      
      Linux provides a NUMA API for doing two things:
      * Allocating memory local to a particular node
      * Binding a thread to a particular node
      
      When given the +RTS --numa flag, the runtime will
      * Determine the number of NUMA nodes (N) by querying the OS
      * Assign capabilities to nodes, so cap C is on node C%N
      * Bind worker threads on a capability to the correct node
      * Keep a separate free lists in the block layer for each node
      * Allocate the nursery for a capability from node-local memory
      * Allocate blocks in the GC from node-local memory
      
      For example, using nofib/parallel/queens on a 24-core 2-socket machine:
      
      ```
      $ ./Main 15 +RTS -N24 -s -A64m
        Total   time  173.960s  (  7.467s elapsed)
      
      $ ./Main 15 +RTS -N24 -s -A64m --numa
        Total   time  150.836s  (  6.423s elapsed)
      ```
      
      The biggest win here is expected to be allocating from node-local
      memory, so that means programs using a large -A value (as here).
      
      According to perf, on this program the number of remote memory accesses
      were reduced by more than 50% by using `--numa`.
      
      Test Plan:
      * validate
      * There's a new flag --debug-numa=<n> that pretends to do NUMA without
        actually making the OS calls, which is useful for testing the code
        on non-NUMA systems.
      * TODO: I need to add some unit tests
      
      Reviewers: erikd, austin, rwbarton, ezyang, bgamari, hvr, niteria
      
      Subscribers: thomie
      
      Differential Revision: https://phabricator.haskell.org/D2199
      9e5ea67e
  33. 17 May, 2016 1 commit
    • Erik de Castro Lopo's avatar
      rts: More const correct-ness fixes · 33c029dd
      Erik de Castro Lopo authored
      In addition to more const-correctness fixes this patch fixes an
      infelicity of the previous const-correctness patch (995cf0f3) which
      left `UNTAG_CLOSURE` taking a `const StgClosure` pointer parameter
      but returning a non-const pointer. Here we restore the original type
      signature of `UNTAG_CLOSURE` and add a new function
      `UNTAG_CONST_CLOSURE` which takes and returns a const `StgClosure`
      pointer and uses that wherever possible.
      
      Test Plan: Validate on Linux, OS X and Windows
      
      Reviewers: Phyx, hsyl20, bgamari, austin, simonmar, trofi
      
      Reviewed By: simonmar, trofi
      
      Subscribers: thomie
      
      Differential Revision: https://phabricator.haskell.org/D2231
      33c029dd
  34. 12 May, 2016 1 commit
  35. 04 May, 2016 1 commit