This project is mirrored from https://gitlab.haskell.org/ghc/ghc.git. Pull mirroring failed .
Repository mirroring has been paused due to too many failed attempts, and can be resumed by a project maintainer.
Last successful update .
  1. 13 Sep, 2018 1 commit
  2. 17 Jun, 2018 1 commit
    • Ömer Sinan Ağacan's avatar
      Use __FILE__ for Cmm assertion locations, fix #8619 · 008ea12d
      Ömer Sinan Ağacan authored
      It seems like we currently support string literals in Cmm, so we can use
      __LINE__ CPP macro in assertion macros. This improves error messages
      that previously looked like
      
          ASSERTION FAILED: file (null), line 1302
      
      (null) part now shows the actual file name.
      
      Also inline some single-use string literals in PrimOps.cmm.
      
      Reviewers: bgamari, simonmar, erikd
      
      Reviewed By: bgamari
      
      Subscribers: rwbarton, thomie, carter
      
      Differential Revision: https://phabricator.haskell.org/D4862
      008ea12d
  3. 05 Jun, 2018 1 commit
    • Ömer Sinan Ağacan's avatar
      Rename some mutable closure types for consistency · 4075656e
      Ömer Sinan Ağacan authored
      SMALL_MUT_ARR_PTRS_FROZEN0 -> SMALL_MUT_ARR_PTRS_FROZEN_DIRTY
      SMALL_MUT_ARR_PTRS_FROZEN  -> SMALL_MUT_ARR_PTRS_FROZEN_CLEAN
      MUT_ARR_PTRS_FROZEN0       -> MUT_ARR_PTRS_FROZEN_DIRTY
      MUT_ARR_PTRS_FROZEN        -> MUT_ARR_PTRS_FROZEN_CLEAN
      
      Naming is now consistent with other CLEAR/DIRTY objects (MVAR, MUT_VAR,
      MUT_ARR_PTRS).
      
      (alternatively we could rename MVAR_DIRTY/MVAR_CLEAN etc. to MVAR0/MVAR)
      
      Removed a few comments in Scav.c about FROZEN0 being on the mut_list
      because it's now clear from the closure type.
      
      Reviewers: bgamari, simonmar, erikd
      
      Reviewed By: simonmar
      
      Subscribers: rwbarton, thomie, carter
      
      Differential Revision: https://phabricator.haskell.org/D4784
      4075656e
  4. 02 Jun, 2018 1 commit
  5. 28 May, 2018 1 commit
    • Ömer Sinan Ağacan's avatar
      Update GHC.Stats docs · a5446c45
      Ömer Sinan Ağacan authored
      Make it clear that max_live_bytes is updated after a major GC whereas
      live_bytes is updated after all GCs (including minor collections) and
      considers data in uncollected generations as live.
      
      Reviewers: bgamari, simonmar, hvr
      
      Reviewed By: bgamari
      
      Subscribers: rwbarton, thomie, carter
      
      Differential Revision: https://phabricator.haskell.org/D4734
      a5446c45
  6. 23 May, 2018 1 commit
    • Ben Gamari's avatar
      Disable the SRT offset optimisation on MachO platforms · bf10456e
      Ben Gamari authored
      Unfortunately, this optimisation is infeasible on MachO platforms (e.g.
      Darwin) due to an object format limitation. Specifically, linking fails
      with errors of the form:
      
           error: unsupported relocation with subtraction expression, symbol
           '_integerzmgmp_GHCziIntegerziType_quotInteger_closure' can not be
           undefined in a subtraction expression
      
      Apparently MachO does not permit relocations' subtraction expressions to
      refer to undefined symbols. As far as I can tell this means that it is
      essentially impossible to express an offset between symbols living in
      different compilation units. This means that we lively can't use this
      optimisation on MachO platforms.
      
      Test Plan: Validate on Darwin
      
      Reviewers: simonmar, erikd
      
      Subscribers: rwbarton, thomie, carter, angerman
      
      GHC Trac Issues: #15169
      
      Differential Revision: https://phabricator.haskell.org/D4715
      bf10456e
  7. 20 May, 2018 1 commit
    • patrickdoc's avatar
      Add HeapView functionality · ec22f7dd
      patrickdoc authored
      This pulls parts of Joachim Breitner's ghc-heap-view library inside GHC.
      The bits added are the C hooks into the RTS and a basic Haskell wrapper
      to these C hooks. The main reason for these to be added to GHC proper
      is that the code needs to be kept in sync with the closure types
      defined by the RTS. It is expected that the version of HeapView shipped
      with GHC will always work with that version of GHC and that extra
      functionality can be layered on top with a library like ghc-heap-view
      distributed via Hackage.
      
      Test Plan: validate
      
      Reviewers: simonmar, hvr, nomeata, austin, Phyx, bgamari, erikd
      
      Reviewed By: bgamari
      
      Subscribers: carter, patrickdoc, tmcgilchrist, rwbarton, thomie
      
      Differential Revision: https://phabricator.haskell.org/D3055
      ec22f7dd
  8. 17 May, 2018 1 commit
  9. 16 May, 2018 4 commits
    • Ben Gamari's avatar
      InfoTables: Fix #if uses introduced by D4634 · 3310f7f1
      Ben Gamari authored
      3310f7f1
    • Simon Marlow's avatar
      Merge FUN_STATIC closure with its SRT · 838b6903
      Simon Marlow authored
      Summary:
      The idea here is to save a little code size and some work in the GC,
      by collapsing FUN_STATIC closures and their SRTs.
      
      This is (4) in a series; see D4632 for more details.
      
      There's a tradeoff here: more complexity in the compiler in exchange
      for a modest code size reduction (probably around 0.5%).
      
      Results:
      * GHC binary itself (statically linked) is 1% smaller
      * -0.2% binary sizes in nofib (-0.5% module sizes)
      
      Full nofib results comparing D4634 with this: P177 (ignore runtimes,
      these aren't stable on my laptop)
      
      Test Plan: validate, nofib
      
      Reviewers: bgamari, niteria, simonpj, erikd
      
      Subscribers: thomie, carter
      
      Differential Revision: https://phabricator.haskell.org/D4637
      838b6903
    • Simon Marlow's avatar
      Save a word in the info table on x86_64 · 2b0918c9
      Simon Marlow authored
      Summary:
      An info table with an SRT normally looks like this:
      
          StgWord64 srt_offset
          StgClosureInfo layout
          StgWord32 layout
          StgWord32 has_srt
      
      But we only need 32 bits for srt_offset on x86_64, because the small
      memory model requires that code segments are at most 2GB. So we can
      optimise this to
      
          StgClosureInfo layout
          StgWord32 layout
          StgWord32 srt_offset
      
      saving a word.  We can tell whether the info table has an SRT or not,
      because zero is not a valid srt_offset, so zero still indicates that
      there's no SRT.
      
      Test Plan:
      * validate
      * For results, see D4632.
      
      Reviewers: bgamari, niteria, osa1, erikd
      
      Subscribers: thomie, carter
      
      Differential Revision: https://phabricator.haskell.org/D4634
      2b0918c9
    • Simon Marlow's avatar
      An overhaul of the SRT representation · eb8e692c
      Simon Marlow authored
      Summary:
      - Previously we would hvae a single big table of pointers per module,
        with a set of bitmaps to reference entries within it. The new
        representation is identical to a static constructor, which is much
        simpler for the GC to traverse, and we get to remove the complicated
        bitmap-traversal code from the GC.
      
      - Rewrite all the code to generate SRTs in CmmBuildInfoTables, and
        document it much better (see Note [SRTs]). This has been something
        I've wanted to do since we moved to the new code generator, I
        finally had the opportunity to finish it while on a transatlantic
        flight recently :)
      
      There are a series of 4 diffs:
      
      1. D4632 (this one), which does the bulk of the changes
      
      2. D4633 which adds support for smaller `CmmLabelDiffOff` constants
      
      3. D4634 which takes advantage of D4632 and D4633 to save a word in
         info tables that have an SRT on x86_64. This is where most of the
         binary size improvement comes from.
      
      4. D4637 which makes a further optimisation to merge some SRTs with
         static FUN closures.  This adds some complexity and the benefits
         are fairly modest, so it's not clear yet whether we should do this.
      
      Results (after (3), on x86_64)
      
      - GHC itself (staticaly linked) is 5.2% smaller
      
      - -1.7% binary sizes in nofib, -2.9% module sizes. Full nofib results: P176
      
      - I measured the overhead of traversing all the static objects in a
        major GC in GHC itself by doing `replicateM_ 1000 performGC` as the
        first thing in `Main.main`.  The new version was 5-10% faster, but
        the results did vary quite a bit.
      
      - I'm not sure if there's a compile-time difference, the results are
        too unreliable.
      
      Test Plan: validate
      
      Reviewers: bgamari, michalt, niteria, simonpj, erikd, osa1
      
      Subscribers: thomie, carter
      
      Differential Revision: https://phabricator.haskell.org/D4632
      eb8e692c
  10. 13 May, 2018 1 commit
    • Michal Terepeta's avatar
      Fix a few GCC warnings · eb39f988
      Michal Terepeta authored
      GCC 8 now generates warnings for incompatible function pointer casts
      [-Werror=cast-function-type]. Apparently there are a few of those in rts
      code, which makes `./validate` unhappy (since we compile with `-Werror`)
      
      This commit tries to fix these issues by changing the functions to have
      the correct type (and, if necessary, moving the casts into those
      functions).
      
      For instance, hash/comparison function are declared (`Hash.h`) to take
      `StgWord` but we want to use `StgWord64[2]` in `StaticPtrTable.c`.
      Instead of casting the function pointers, we can cast the `StgWord`
      parameter to `StgWord*`. I think this should be ok since `StgWord`
      should be the same size as a pointer.
      Signed-off-by: Michal Terepeta's avatarMichal Terepeta <michal.terepeta@gmail.com>
      
      Test Plan: ./validate
      
      Reviewers: bgamari, erikd, simonmar
      
      Reviewed By: bgamari
      
      Subscribers: rwbarton, thomie, carter
      
      Differential Revision: https://phabricator.haskell.org/D4673
      eb39f988
  11. 12 May, 2018 1 commit
  12. 11 May, 2018 1 commit
  13. 10 May, 2018 1 commit
  14. 25 Apr, 2018 1 commit
  15. 16 Apr, 2018 1 commit
  16. 05 Apr, 2018 1 commit
  17. 30 Mar, 2018 1 commit
  18. 26 Mar, 2018 1 commit
    • Douglas Wilson's avatar
      rts, base: Refactor stats.c to improve --machine-readable report · f0b258bc
      Douglas Wilson authored
      There should be no change in the output of the '+RTS -s' (summary)
      report, or
      the 'RTS -t' (one-line) report.
      
      All data shown in the summary report is now shown in the machine
      readable
      report.
      
      All data in RTSStats is now shown in the machine readable report.
      
      init times are added to RTSStats and added to GHC.Stats.
      
      Example of the new output:
      ```
       [("bytes allocated", "375016384")
       ,("num_GCs", "113")
       ,("average_bytes_used", "148348")
       ,("max_bytes_used", "206552")
       ,("num_byte_usage_samples", "2")
       ,("peak_megabytes_allocated", "6")
       ,("init_cpu_seconds", "0.001642")
       ,("init_wall_seconds", "0.001027")
       ,("mut_cpu_seconds", "3.020166")
       ,("mut_wall_seconds", "0.757244")
       ,("GC_cpu_seconds", "0.037750")
       ,("GC_wall_seconds", "0.009569")
       ,("exit_cpu_seconds", "0.000890")
       ,("exit_wall_seconds", "0.002551")
       ,("total_cpu_seconds", "3.060452")
       ,("total_wall_seconds", "0.770395")
       ,("major_gcs", "2")
       ,("allocated_bytes", "375016384")
       ,("max_live_bytes", "206552")
       ,("max_large_objects_bytes", "159344")
       ,("max_compact_bytes", "0")
       ,("max_slop_bytes", "59688")
       ,("max_mem_in_use_bytes", "6291456")
       ,("cumulative_live_bytes", "296696")
       ,("copied_bytes", "541024")
       ,("par_copied_bytes", "493976")
       ,("cumulative_par_max_copied_bytes", "104104")
       ,("cumulative_par_balanced_copied_bytes", "274456")
       ,("fragmentation_bytes", "2112")
       ,("alloc_rate", "124170795")
       ,("productivity_cpu_percent", "0.986838")
       ,("productivity_wall_percent", "0.982935")
       ,("bound_task_count", "1")
       ,("sparks_count", "5836258")
       ,("sparks_converted", "237")
       ,("sparks_overflowed", "1990408")
       ,("sparks_dud ", "0")
       ,("sparks_gcd", "3455553")
       ,("sparks_fizzled", "390060")
       ,("work_balance", "0.555606")
       ,("n_capabilities", "4")
       ,("task_count", "10")
       ,("peak_worker_count", "9")
       ,("worker_count", "9")
       ,("gc_alloc_block_sync_spin", "162")
       ,("gc_alloc_block_sync_yield", "0")
       ,("gc_alloc_block_sync_spin", "162")
       ,("gc_spin_spin", "18840855")
       ,("gc_spin_yield", "10355")
       ,("mut_spin_spin", "70331392")
       ,("mut_spin_yield", "61700")
       ,("waitForGcThreads_spin", "241")
       ,("waitForGcThreads_yield", "2797")
       ,("whitehole_gc_spin", "0")
       ,("whitehole_lockClosure_spin", "0")
       ,("whitehole_lockClosure_yield", "0")
       ,("whitehole_executeMessage_spin", "0")
       ,("whitehole_threadPaused_spin", "0")
       ,("any_work", "1667")
       ,("no_work", "1662")
       ,("scav_find_work", "1026")
       ,("gen_0_collections", "111")
       ,("gen_0_par_collections", "111")
       ,("gen_0_cpu_seconds", "0.036126")
       ,("gen_0_wall_seconds", "0.036126")
       ,("gen_0_max_pause_seconds", "0.036126")
       ,("gen_0_avg_pause_seconds", "0.000081")
       ,("gen_0_sync_spin", "21")
       ,("gen_0_sync_yield", "0")
       ,("gen_1_collections", "2")
       ,("gen_1_par_collections", "1")
       ,("gen_1_cpu_seconds", "0.001624")
       ,("gen_1_wall_seconds", "0.001624")
       ,("gen_1_max_pause_seconds", "0.001624")
       ,("gen_1_avg_pause_seconds", "0.000272")
       ,("gen_1_sync_spin", "3")
       ,("gen_1_sync_yield", "0")
       ]
      ```
      
      Test Plan: Ensure that one-line and summary reports are unchanged.
      
      Reviewers: erikd, simonmar, hvr
      
      Subscribers: duog, carter, thomie, rwbarton
      
      GHC Trac Issues: #14660
      
      Differential Revision: https://phabricator.haskell.org/D4529
      f0b258bc
  19. 20 Mar, 2018 1 commit
  20. 19 Mar, 2018 3 commits
    • Douglas Wilson's avatar
      rts, base: Refactor stats.c to improve --machine-readable report · 2d4bda2e
      Douglas Wilson authored
      There should be no change in the output of the '+RTS -s' (summary)
      report, or the 'RTS -t' (one-line) report.
      
      All data shown in the summary report is now shown in the machine
      readable report.
      
      All data in RTSStats is now shown in the machine readable report.
      
      init times are added to RTSStats and added to GHC.Stats.
      
      Example of the new output:
      ```
       [("bytes allocated", "375016384")
       ,("num_GCs", "113")
       ,("average_bytes_used", "148348")
       ,("max_bytes_used", "206552")
       ,("num_byte_usage_samples", "2")
       ,("peak_megabytes_allocated", "6")
       ,("init_cpu_seconds", "0.001642")
       ,("init_wall_seconds", "0.001027")
       ,("mut_cpu_seconds", "3.020166")
       ,("mut_wall_seconds", "0.757244")
       ,("GC_cpu_seconds", "0.037750")
       ,("GC_wall_seconds", "0.009569")
       ,("exit_cpu_seconds", "0.000890")
       ,("exit_wall_seconds", "0.002551")
       ,("total_cpu_seconds", "3.060452")
       ,("total_wall_seconds", "0.770395")
       ,("major_gcs", "2")
       ,("allocated_bytes", "375016384")
       ,("max_live_bytes", "206552")
       ,("max_large_objects_bytes", "159344")
       ,("max_compact_bytes", "0")
       ,("max_slop_bytes", "59688")
       ,("max_mem_in_use_bytes", "6291456")
       ,("cumulative_live_bytes", "296696")
       ,("copied_bytes", "541024")
       ,("par_copied_bytes", "493976")
       ,("cumulative_par_max_copied_bytes", "104104")
       ,("cumulative_par_balanced_copied_bytes", "274456")
       ,("fragmentation_bytes", "2112")
       ,("alloc_rate", "124170795")
       ,("productivity_cpu_percent", "0.986838")
       ,("productivity_wall_percent", "0.982935")
       ,("bound_task_count", "1")
       ,("sparks_count", "5836258")
       ,("sparks_converted", "237")
       ,("sparks_overflowed", "1990408")
       ,("sparks_dud ", "0")
       ,("sparks_gcd", "3455553")
       ,("sparks_fizzled", "390060")
       ,("work_balance", "0.555606")
       ,("n_capabilities", "4")
       ,("task_count", "10")
       ,("peak_worker_count", "9")
       ,("worker_count", "9")
       ,("gc_alloc_block_sync_spin", "162")
       ,("gc_alloc_block_sync_yield", "0")
       ,("gc_alloc_block_sync_spin", "162")
       ,("gc_spin_spin", "18840855")
       ,("gc_spin_yield", "10355")
       ,("mut_spin_spin", "70331392")
       ,("mut_spin_yield", "61700")
       ,("waitForGcThreads_spin", "241")
       ,("waitForGcThreads_yield", "2797")
       ,("whitehole_gc_spin", "0")
       ,("whitehole_lockClosure_spin", "0")
       ,("whitehole_lockClosure_yield", "0")
       ,("whitehole_executeMessage_spin", "0")
       ,("whitehole_threadPaused_spin", "0")
       ,("any_work", "1667")
       ,("no_work", "1662")
       ,("scav_find_work", "1026")
       ,("gen_0_collections", "111")
       ,("gen_0_par_collections", "111")
       ,("gen_0_cpu_seconds", "0.036126")
       ,("gen_0_wall_seconds", "0.036126")
       ,("gen_0_max_pause_seconds", "0.036126")
       ,("gen_0_avg_pause_seconds", "0.000081")
       ,("gen_0_sync_spin", "21")
       ,("gen_0_sync_yield", "0")
       ,("gen_1_collections", "2")
       ,("gen_1_par_collections", "1")
       ,("gen_1_cpu_seconds", "0.001624")
       ,("gen_1_wall_seconds", "0.001624")
       ,("gen_1_max_pause_seconds", "0.001624")
       ,("gen_1_avg_pause_seconds", "0.000272")
       ,("gen_1_sync_spin", "3")
       ,("gen_1_sync_yield", "0")
       ]
      ```
      
      Test Plan: Ensure that one-line and summary reports are unchanged.
      
      Reviewers: bgamari, erikd, simonmar, hvr
      
      Reviewed By: simonmar
      
      Subscribers: rwbarton, thomie, carter
      
      GHC Trac Issues: #14660
      
      Differential Revision: https://phabricator.haskell.org/D4303
      2d4bda2e
    • Ben Gamari's avatar
      Improve accuracy of get/setAllocationCounter · 20cbb016
      Ben Gamari authored
      Summary:
      get/setAllocationCounter didn't take into account allocations in the
      current block. This was known at the time, but it turns out to be
      important to have more accuracy when using these in a fine-grained
      way.
      
      Test Plan:
      New unit test to test incrementally larger allocaitons.  Before I got
      results like this:
      
      ```
      +0
      +0
      +0
      +0
      +0
      +4096
      +0
      +0
      +0
      +0
      +0
      +4064
      +0
      +0
      +4088
      +4056
      +0
      +0
      +0
      +4088
      +4096
      +4056
      +4096
      ```
      
      Notice how the results aren't always monotonically increasing.  After
      this patch:
      
      ```
      +344
      +416
      +488
      +560
      +632
      +704
      +776
      +848
      +920
      +992
      +1064
      +1136
      +1208
      +1280
      +1352
      +1424
      +1496
      +1568
      +1640
      +1712
      +1784
      +1856
      +1928
      +2000
      +2072
      +2144
      ```
      
      Reviewers: hvr, erikd, simonmar, jrtc27, trommler
      
      Reviewed By: simonmar
      
      Subscribers: trommler, jrtc27, rwbarton, thomie, carter
      
      Differential Revision: https://phabricator.haskell.org/D4363
      20cbb016
    • Douglas Wilson's avatar
      rts: Add --internal-counters RTS flag and several counters · 2918abf7
      Douglas Wilson authored
      The existing internal counters:
      * gc_alloc_block_sync
      * whitehole_spin
      * gen[g].sync
      * gen[1].sync
      
      are now not shown in the -s report unless --internal-counters is also passed.
      
      If --internal-counters is passed we now show the counters above, reformatted, as
      well as several other counters. In particular, we now count the yieldThread()
      calls that SpinLocks do as well as their spins.
      
      The added counters are:
      * gc_spin (spin and yield)
      * mut_spin (spin and yield)
      * whitehole_threadPaused (spin only)
      * whitehole_executeMessage (spin only)
      * whitehole_lockClosure (spin only)
      * waitForGcThreadsd (spin and yield)
      
      As well as the following, which are not SpinLock-like things:
      * any_work
      * do_work
      * scav_find_work
      
      See the Note for descriptions of what these counters are.
      
      We add busy_wait_nops in these loops along with the counter increment where it
      was absent.
      
      Old internal counters output:
      ```
      gc_alloc_block_sync: 0
      whitehole_gc_spin: 0
      gen[0].sync: 0
      gen[1].sync: 0
      ```
      
      New internal counters output:
      ```
      Internal Counters:
                                                 Spins        Yields
          gc_alloc_block_sync                      323             0
          gc_spin                              9016713           752
          mut_spin                            57360944         47716
          whitehole_gc                               0           n/a
          whitehole_threadPaused                     0           n/a
          whitehole_executeMessage                   0           n/a
          whitehole_lockClosure                      0             0
          waitForGcThreads                           2           415
          gen[0].sync                                6             0
          gen[1].sync                                1             0
      
          any_work                                2017
          no_work                                 2014
          scav_find_work                          1004
      ```
      
      Test Plan:
      ./validate
      
      Check it builds with #define PROF_SPIN removed from includes/rts/Config.h
      
      Reviewers: bgamari, erikd, simonmar, hvr
      
      Reviewed By: simonmar
      
      Subscribers: rwbarton, thomie, carter
      
      GHC Trac Issues: #3553, #9221
      
      Differential Revision: https://phabricator.haskell.org/D4302
      2918abf7
  21. 09 Mar, 2018 1 commit
    • Sergei Trofimovich's avatar
      UNREG: fix implicit declarations from pdep and pext · dd3906bf
      Sergei Trofimovich authored
      Unreg build failed as:
      
        $ ./configure --enable-unregisterised
        $ make
      
        HC [stage 1] libraries/ghc-prim/dist-install/build/GHC/PrimopWrappers.o
          ghc_1.hc: In function 'ghczmprim_GHCziPrimopWrappers_pdep8zh_entry':
      
          ghc_1.hc:1810:9: error:
           error: implicit declaration of function 'hs_pdep8'; did you mean 'hs_ctz8'?
             [-Werror=implicit-function-declaration]
           _c3jz = hs_pdep8(*Sp, Sp[1]);
                   ^~~~~~~~
                   hs_ctz8
             |
        1810 | _c3jz = hs_pdep8(*Sp, Sp[1]);
             |         ^
      Signed-off-by: default avatarSergei Trofimovich <slyfox@gentoo.org>
      dd3906bf
  22. 08 Feb, 2018 1 commit
  23. 06 Feb, 2018 1 commit
  24. 01 Feb, 2018 1 commit
  25. 26 Jan, 2018 1 commit
    • Tamar Christina's avatar
      Fix Windows stack allocations. · a55d581f
      Tamar Christina authored
      On Windows we use the function `win32AllocStack` to do stack
      allocations in 4k blocks and insert a stack check afterwards
      to ensure the allocation returned a valid block.
      
      The problem is this function does something that by C semantics
      is pointless. The stack allocated value can never escape the
      function, and the stack isn't used so the compiler just optimizes
      away the entire function body.
      
      After considering a bunch of other possibilities I think the simplest
      fix is to just disable optimizations for the function.
      
      Alternatively inline assembly is an option but the stack check function
      doesn't have a very portable name as it relies on e.g. `libgcc`.
      
      Thanks to Sergey Vinokurov for helping diagnose and test.
      
      Test Plan: ./validate
      
      Reviewers: bgamari, erikd, simonmar
      
      Reviewed By: bgamari
      
      Subscribers: rwbarton, thomie, carter
      
      GHC Trac Issues: #14669
      
      Differential Revision: https://phabricator.haskell.org/D4343
      a55d581f
  26. 18 Jan, 2018 1 commit
  27. 08 Jan, 2018 1 commit
    • Simon Marlow's avatar
      Improve accuracy of get/setAllocationCounter · a1a689dd
      Simon Marlow authored
      Summary:
      get/setAllocationCounter didn't take into account allocations in the
      current block. This was known at the time, but it turns out to be
      important to have more accuracy when using these in a fine-grained
      way.
      
      Test Plan:
      New unit test to test incrementally larger allocaitons.  Before I got
      results like this:
      
      ```
      +0
      +0
      +0
      +0
      +0
      +4096
      +0
      +0
      +0
      +0
      +0
      +4064
      +0
      +0
      +4088
      +4056
      +0
      +0
      +0
      +4088
      +4096
      +4056
      +4096
      ```
      
      Notice how the results aren't always monotonically increasing.  After
      this patch:
      
      ```
      +344
      +416
      +488
      +560
      +632
      +704
      +776
      +848
      +920
      +992
      +1064
      +1136
      +1208
      +1280
      +1352
      +1424
      +1496
      +1568
      +1640
      +1712
      +1784
      +1856
      +1928
      +2000
      +2072
      +2144
      ```
      
      Reviewers: niteria, bgamari, hvr, erikd
      
      Subscribers: rwbarton, thomie, carter
      
      Differential Revision: https://phabricator.haskell.org/D4288
      a1a689dd
  28. 23 Nov, 2017 1 commit
  29. 22 Nov, 2017 1 commit
  30. 16 Nov, 2017 1 commit
    • Simon Marlow's avatar
      Detect overly long GC sync · 2f463873
      Simon Marlow authored
      Summary:
      GC sync is the time between a GC being intiated and all the mutator
      threads finally stopping so that the GC can start. Problems that cause
      the GC sync to be delayed are hard to find and can cause dramatic
      slowdowns for heavily parallel programs.
      
      The new flag --long-gc-sync=<time> helps by emitting a warning and
      calling a user-overridable hook when the GC sync time exceeds the
      specified threshold. A debugger can be used to set a breakpoint when
      this happens and inspect the stacks of threads to find the culprit.
      
      Test Plan:
      ```
      $ ./inplace/bin/ghc-stage2 +RTS --long-gc-sync=0.0000001 -S
          Alloc    Copied     Live     GC     GC      TOT      TOT  Page Flts
          bytes     bytes     bytes   user   elap     user     elap
        1135856     51144    153736  0.000  0.000    0.002    0.002    0    0  (Gen:  0)
        1034760     94704    188752  0.000  0.000    0.002    0.002    0    0  (Gen:  0)
        1038888    134832    228888  0.009  0.009    0.011    0.011    0    0  (Gen:  1)
        1025288     90128    235184  0.000  0.000    0.012    0.012    0    0  (Gen:  0)
        1049088    130080    333984  0.000  0.000    0.013    0.013    0    0  (Gen:  0)
      Warning: waited 0us for GC sync
        1034424     73360    331976  0.000  0.000    0.013    0.013    0    0  (Gen:  0)
      ```
      
      Also tested on a real production problem.
      
      Reviewers: niteria, bgamari, erikd
      
      Subscribers: rwbarton, thomie
      
      Differential Revision: https://phabricator.haskell.org/D4193
      2f463873
  31. 08 Nov, 2017 1 commit
  32. 30 Oct, 2017 1 commit
    • Michal Terepeta's avatar
      Allow packing constructor fields · cca2d6b7
      Michal Terepeta authored
      This is another step for fixing #13825 and is based on D38 by Simon
      Marlow.
      
      The change allows storing multiple constructor fields within the same
      word. This currently applies only to `Float`s, e.g.,
      ```
      data Foo = Foo {-# UNPACK #-} !Float {-# UNPACK #-} !Float
      ```
      on 64-bit arch, will now store both fields within the same constructor
      word. For `WordX/IntX` we'll need to introduce new primop types.
      
      Main changes:
      
      - We now use sizes in bytes when we compute the offsets for
        constructor fields in `StgCmmLayout` and introduce padding if
        necessary (word-sized fields are still word-aligned)
      
      - `ByteCodeGen` had to be updated to correctly construct the data
        types. This required some new bytecode instructions to allow pushing
        things that are not full words onto the stack (and updating
        `Interpreter.c`). Note that we only use the packed stuff when
        constructing data types (i.e., for `PACK`), in all other cases the
        behavior should not change.
      
      - `RtClosureInspect` was changed to handle the new layout when
        extracting subterms. This seems to be used by things like `:print`.
        I've also added a test for this.
      
      - I deviated slightly from Simon's approach and use `PrimRep` instead
        of `ArgRep` for computing the size of fields.  This seemed more
        natural and in the future we'll probably want to introduce new
        primitive types (e.g., `Int8#`) and `PrimRep` seems like a better
        place to do that (where we already have `Int64Rep` for example).
        `ArgRep` on the other hand seems to be more focused on calling
        functions.
      Signed-off-by: Michal Terepeta's avatarMichal Terepeta <michal.terepeta@gmail.com>
      
      Test Plan: ./validate
      
      Reviewers: bgamari, simonmar, austin, hvr, goldfire, erikd
      
      Reviewed By: bgamari
      
      Subscribers: maoe, rwbarton, thomie
      
      GHC Trac Issues: #13825
      
      Differential Revision: https://phabricator.haskell.org/D3809
      cca2d6b7
  33. 22 Oct, 2017 1 commit
    • Tamar Christina's avatar
      Add stack traces on crashes on Windows · 99c61e22
      Tamar Christina authored
      Summary:
      This patch adds the ability to generate stack traces on crashes for Windows.
      When running in the interpreter this attempts to use symbol information from
      the interpreter and information we know about the loaded object files to
      resolve addresses to symbols.
      
      When running compiled it doesn't have this information and then defaults
      to using symbol information from PDB files. Which for now means only
      files compiled with ICC or MSVC will show traces compiled.
      
      But I have a future patch that may address this shortcoming.
      
      Also since I don't know how to walk a pure haskell stack, I can for now
      only show the last entry. I'm hoping to figure out how Apply.cmm works to
      be able to walk the stalk and give more entries for pure haskell code.
      
      In GHCi
      
      ```
      $ echo main | inplace/bin/ghc-stage2.exe --interactive ./testsuite/tests/rts/derefnull.hs
      GHCi, version 8.3.20170830: http://www.haskell.org/ghc/  :? for help
      Ok, 1 module loaded.
      Prelude Main>
      Access violation in generated code when reading 0x0
      
       Attempting to reconstruct a stack trace...
      
         Frame        Code address
       * 0x77cde10    0xc370229 E:\..\base\dist-install\build\HSbase-4.10.0.0.o+0x190031
                       (base_ForeignziStorable_zdfStorableInt4_info+0x3f)
      ```
      
      and compiled
      
      ```
      Access violation in generated code when reading 0x0
      
       Attempting to reconstruct a stack trace...
      
         Frame        Code address
       * 0xf0dbd0     0x40bb01 E:\..\rts\derefnull.run\derefnull.exe+0xbb01
      ```
      
      Test Plan: ./validate
      
      Reviewers: austin, hvr, bgamari, erikd, simonmar
      
      Reviewed By: bgamari
      
      Subscribers: rwbarton, thomie
      
      Differential Revision: https://phabricator.haskell.org/D3913
      99c61e22
  34. 16 Oct, 2017 1 commit
    • Herbert Valerio Riedel's avatar
      Implement new `compareByteArrays#` primop · e3ba26f8
      Herbert Valerio Riedel authored
      The new primop
      
          compareByteArrays# :: ByteArray# -> Int# {- offset -}
                             -> ByteArray# -> Int# {- offset -}
                             -> Int# {- length -}
                             -> Int#
      
      allows to compare the subrange of the first `ByteArray#` to
      the (same-length) subrange of the second `ByteArray#` and returns a
      value less than, equal to, or greater than zero if the range is found,
      respectively, to be byte-wise lexicographically less than, to match,
      or be greater than the second range.
      
      Under the hood, the new primop is implemented in terms of the standard
      ISO C `memcmp(3)` function. It is currently an out-of-line primop but
      work is underway to optimise this into an inline primop for a future
      follow-up Differential (see D4091).
      
      This primop has applications in packages like `text`, `text-short`,
      `bytestring`, `text-containers`, `primitive`, etc.  which currently
      have to incur the overhead of an ordinary FFI call to directly or
      indirectly invoke `memcmp(3)` as well has having to deal with some
      `unsafePerformIO`-variant.
      
      While at it, this also improves the documentation for the existing
      `copyByteArray#` primitive which has a non-trivial type-signature
      that significantly benefits from a more explicit description of its
      arguments.
      
      Reviewed By: bgamari
      
      Differential Revision: https://phabricator.haskell.org/D4090
      e3ba26f8
  35. 03 Oct, 2017 1 commit
    • Tamar Christina's avatar
      Add ability to produce crash dumps on Windows · ec9ac20d
      Tamar Christina authored
      It's often hard to debug things like segfaults on Windows,
      mostly because gdb isn't always of use and users don't know
      how to effectively use it.
      
      This patch provides a way to create a crash drump by passing
      
      `+RTS --generate-crash-dumps` as an option. If any unhandled
      exception is triggered a dump is made that contains enough
      information to be able to diagnose things successfully.
      
      Currently the created dumps are a bit big because I include
      all registers, code and threads information.
      
      This looks like
      
      ```
      $ testsuite/tests/rts/derefnull.run/derefnull.exe +RTS
      --generate-crash-dumps
      
      Access violation in generated code when reading 0000000000000000
      Crash dump created. Dump written to:
              E:\msys64\tmp\ghc-20170901-220250-11216-16628.dmp
      ```
      
      Test Plan: ./validate
      
      Reviewers: austin, hvr, bgamari, erikd, simonmar
      
      Reviewed By: bgamari, simonmar
      
      Subscribers: rwbarton, thomie
      
      Differential Revision: https://phabricator.haskell.org/D3912
      ec9ac20d