1. 12 Jan, 2019 1 commit
  2. 10 Jan, 2019 1 commit
  3. 03 Jan, 2019 1 commit
  4. 31 Mar, 2018 1 commit
    • Tamar Christina's avatar
      Remove MAX_PATH restrictions from RTS, I/O manager and various utilities · 4de585a5
      Tamar Christina authored
      Summary:
      This shims out fopen and sopen so that they use modern APIs under the hood
      along with namespaced paths.
      
      This lifts the MAX_PATH restrictions from Haskell programs and makes the new
      limit ~32k.
      
      There are only some slight caveats that have been documented.
      
      Some utilities have not been upgraded such as lndir, since all these things are
      different cabal packages I have been forced to copy the source in different places
      which is less than ideal. But it's the only way to keep sdist working.
      
      Test Plan: ./validate
      
      Reviewers: hvr, bgamari, erikd, simonmar
      
      Reviewed By: bgamari
      
      Subscribers: rwbarton, thomie, carter
      
      GHC Trac Issues: #10822
      
      Differential Revision: https://phabricator.haskell.org/D4416
      4de585a5
  5. 16 Aug, 2017 1 commit
    • Ben Gamari's avatar
      Speed up compilation of profiling stubs · a8da0de2
      Ben Gamari authored
      Here we encode the cost centre list as static data. This means that the
      initialization stubs are small functions which should be easy for GCC to
      compile, even with optimization.
      
      Fixes #7960.
      
      Test Plan: Test profiling
      
      Reviewers: austin, erikd, simonmar
      
      Reviewed By: simonmar
      
      Subscribers: rwbarton, thomie
      
      GHC Trac Issues: #7960
      
      Differential Revision: https://phabricator.haskell.org/D3853
      a8da0de2
  6. 29 Apr, 2017 1 commit
  7. 01 Mar, 2017 1 commit
    • Ben Gamari's avatar
      rts: Fix build · b86d226f
      Ben Gamari authored
      I evidently neglected to consider that validate doesn't build profiled
      ways. Arg.
      b86d226f
  8. 28 Feb, 2017 1 commit
    • Ben Gamari's avatar
      rts: Allow profile output path to be specified on RTS command line · db2a6676
      Ben Gamari authored
      This introduces a RTS option, -po, which allows the user to override the stem
      used to form the output file names of the heap profile and cost center summary.
      
      It's a bit unclear to me whether this is really the interface we want.
      Alternatively we could just allow the user to specify the `.hp` and `.prof` file
      names separately. This would arguably be a bit more straightforward and would
      allow the user to name JSON output with an appropriate `.json` suffix if they so
      desired. However, this would come at the cost of taking more of the option
      space, which is a somewhat precious commodity.
      
      Test Plan: Validate, try using `-po` RTS option
      
      Reviewers: simonmar, austin, erikd
      
      Reviewed By: simonmar
      
      Subscribers: thomie
      
      Differential Revision: https://phabricator.haskell.org/D3182
      db2a6676
  9. 23 Feb, 2017 1 commit
    • Ben Gamari's avatar
      JSON profiler reports · a2043332
      Ben Gamari authored
      This introduces a JSON output format for cost-centre profiler reports.
      It's not clear whether this is really something we want to introduce
      given that we may also move to a more Haskell-driven output pipeline in
      the future, but I nevertheless found this helpful, so I thought I would
      put it up.
      
      Test Plan: Compile a program with `-prof -fprof-auto`; run with `+RTS
      -pj`
      
      Reviewers: austin, erikd, simonmar
      
      Reviewed By: simonmar
      
      Subscribers: duncan, maoe, thomie, simonmar
      
      Differential Revision: https://phabricator.haskell.org/D3132
      a2043332
  10. 12 Feb, 2017 2 commits
    • Ben Gamari's avatar
      rts/Profiling: Factor out report generation · 56c9bb39
      Ben Gamari authored
      Here we move the actual report generation logic to
      `rts/ProfilerReport.c`. This break is actually quite clean,
      
          void writeCCSReport( FILE *prof_file, CostCentreStack const *ccs,
                               ProfilerTotals totals );
      
      This is more profiler refactoring in preparation for machine-readable
      output.
      
      Test Plan: Validate
      
      Reviewers: austin, erikd, simonmar
      
      Reviewed By: simonmar
      
      Subscribers: thomie
      
      Differential Revision: https://phabricator.haskell.org/D3097
      56c9bb39
    • Ben Gamari's avatar
      rts/Profiling: Kill a few globals and add consts · 1a14d384
      Ben Gamari authored
      Previously it was quite difficult to follow the dataflow through this
      file due to global mutation and rather non-descriptive types.
      
      This is a cleanup in preparation for factoring out the report-generating
      logic, which is itself in preparation for somedayteaching the profiler
      to produce more machine-readable reports (JSON perhaps?).
      
      Test Plan: Validate
      
      Reviewers: austin, erikd, simonmar
      
      Reviewed By: simonmar
      
      Subscribers: thomie
      
      Differential Revision: https://phabricator.haskell.org/D3096
      1a14d384
  11. 06 Jan, 2017 1 commit
    • Simon Marlow's avatar
      More fixes for #5654 · 3a18baff
      Simon Marlow authored
      * In stg_ap_0_fast, if we're evaluating a thunk, the thunk might
        evaluate to a function in which case we may have to adjust its CCS.
      
      * The interpreter has its own implementation of stg_ap_0_fast, so we
        have to do the same shenanigans with creating empty PAPs and copying
        PAPs there.
      
      * GHCi creates Cost Centres as children of CCS_MAIN, which enterFunCCS()
        wrongly assumed to imply that they were CAFs.  Now we use the is_caf
        flag for this, which we have to correctly initialise when we create a
        Cost Centre in GHCi.
      3a18baff
  12. 29 Nov, 2016 1 commit
  13. 14 Nov, 2016 1 commit
    • Simon Marlow's avatar
      Remove CONSTR_STATIC · 55d535da
      Simon Marlow authored
      Summary:
      We currently have two info tables for a constructor
      
      * XXX_con_info: the info table for a heap-resident instance of the
        constructor, It has type CONSTR, or one of the specialised types like
        CONSTR_1_0
      
      * XXX_static_info: the info table for a static instance of this
        constructor, which has type CONSTR_STATIC or CONSTR_STATIC_NOCAF.
      
      I'm getting rid of the latter, and using the `con_info` info table for
      both static and dynamic constructors.  For rationale and more details
      see Note [static constructors] in SMRep.hs.
      
      I also removed these macros: `isSTATIC()`, `ip_STATIC()`,
      `closure_STATIC()`, since they relied on the CONSTR/CONSTR_STATIC
      distinction, and anyway HEAP_ALLOCED() does the same job.
      
      Test Plan: validate
      
      Reviewers: bgamari, simonpj, austin, gcampax, hvr, niteria, erikd
      
      Subscribers: thomie
      
      Differential Revision: https://phabricator.haskell.org/D2690
      
      GHC Trac Issues: #12455
      55d535da
  14. 08 Jun, 2016 1 commit
    • Ömer Sinan Ağacan's avatar
      Show sources of cost centers in .prof · d7933cbc
      Ömer Sinan Ağacan authored
      This fixes the problem with duplicate cost-centre names that was
      reported a couple of times before. When a module implements a typeclass
      multiple times for different types, methods of different implementations
      get same cost-centre names and are reported like this:
      
          COST CENTRE MODULE            %time %alloc
      
          CAF         GHC.IO.Handle.FD    0.0   32.8
          CAF         GHC.Read            0.0    1.0
          CAF         GHC.IO.Encoding     0.0    1.8
          showsPrec   Main                0.0    1.2
          readPrec    Main                0.0   19.4
          readPrec    Main                0.0   20.5
          main        Main                0.0   20.2
      
                                                  individual      inherited
          COST CENTRE  MODULE  no.     entries  %time %alloc   %time %alloc
      
          MAIN         MAIN     53          0    0.0    0.2     0.0  100.0
           CAF         Main    105          0    0.0    0.3     0.0   62.5
            readPrec   Main    109          1    0.0    0.6     0.0    0.6
            readPrec   Main    107          1    0.0    0.6     0.0    0.6
            main       Main    106          1    0.0   20.2     0.0   61.0
             ==        Main    114          1    0.0    0.0     0.0    0.0
             ==        Main    113          1    0.0    0.0     0.0    0.0
             showsPrec Main    112          2    0.0    1.2     0.0    1.2
             showsPrec Main    111          2    0.0    0.9     0.0    0.9
             readPrec  Main    110          0    0.0   18.8     0.0   18.8
             readPrec  Main    108          0    0.0   19.9     0.0   19.9
      
      It's not possible to tell from the report which `==` took how long. This
      patch adds one more column at the cost of making outputs wider. The
      report now looks like this:
      
          COST CENTRE MODULE           SRC                       %time %alloc
      
          CAF         GHC.IO.Handle.FD <entire-module>             0.0   32.9
          CAF         GHC.IO.Encoding  <entire-module>             0.0    1.8
          CAF         GHC.Read         <entire-module>             0.0    1.0
          showsPrec   Main             Main_1.hs:7:19-22           0.0    1.2
          readPrec    Main             Main_1.hs:7:13-16           0.0   19.5
          readPrec    Main             Main_1.hs:4:13-16           0.0   20.5
          main        Main             Main_1.hs:(10,1)-(20,20)    0.0   20.2
      
                                                                             individual      inherited
          COST CENTRE  MODULE        SRC                      no. entries  %time %alloc   %time %alloc
      
          MAIN         MAIN          <built-in>                53      0    0.0    0.2     0.0  100.0
           CAF         Main          <entire-module>          105      0    0.0    0.3     0.0   62.5
            readPrec   Main          Main_1.hs:7:13-16        109      1    0.0    0.6     0.0    0.6
            readPrec   Main          Main_1.hs:4:13-16        107      1    0.0    0.6     0.0    0.6
            main       Main          Main_1.hs:(10,1)-(20,20) 106      1    0.0   20.2     0.0   61.0
             ==        Main          Main_1.hs:7:25-26        114      1    0.0    0.0     0.0    0.0
             ==        Main          Main_1.hs:4:25-26        113      1    0.0    0.0     0.0    0.0
             showsPrec Main          Main_1.hs:7:19-22        112      2    0.0    1.2     0.0    1.2
             showsPrec Main          Main_1.hs:4:19-22        111      2    0.0    0.9     0.0    0.9
             readPrec  Main          Main_1.hs:7:13-16        110      0    0.0   18.8     0.0   18.8
             readPrec  Main          Main_1.hs:4:13-16        108      0    0.0   19.9     0.0   19.9
           CAF         Text.Read.Lex <entire-module>          102      0    0.0    0.5     0.0    0.5
      
      To fix failing test cases because of different orderings of cost centres
      (e.g. optimized and non-optimized build printing in different order),
      with this patch we also start sorting cost centres before printing. The
      order depends on 1) entries (more entered cost centres come first) 2)
      names (using strcmp() on cost centre names).
      
      Reviewers: simonmar, austin, erikd, bgamari
      
      Reviewed By: simonmar, bgamari
      
      Subscribers: thomie
      
      Differential Revision: https://phabricator.haskell.org/D2282
      
      GHC Trac Issues: #11543, #8473, #7105
      d7933cbc
  15. 17 May, 2016 1 commit
    • Erik de Castro Lopo's avatar
      rts: More const correct-ness fixes · 33c029dd
      Erik de Castro Lopo authored
      In addition to more const-correctness fixes this patch fixes an
      infelicity of the previous const-correctness patch (995cf0f3) which
      left `UNTAG_CLOSURE` taking a `const StgClosure` pointer parameter
      but returning a non-const pointer. Here we restore the original type
      signature of `UNTAG_CLOSURE` and add a new function
      `UNTAG_CONST_CLOSURE` which takes and returns a const `StgClosure`
      pointer and uses that wherever possible.
      
      Test Plan: Validate on Linux, OS X and Windows
      
      Reviewers: Phyx, hsyl20, bgamari, austin, simonmar, trofi
      
      Reviewed By: simonmar, trofi
      
      Subscribers: thomie
      
      Differential Revision: https://phabricator.haskell.org/D2231
      33c029dd
  16. 04 May, 2016 1 commit
  17. 07 Feb, 2016 1 commit
  18. 29 Jan, 2016 1 commit
  19. 26 Jan, 2016 1 commit
    • thomie's avatar
      Fix segmentation fault when .prof file not writeable · 6d2bdfd8
      thomie authored
      There are two ways to do retainer profiling. Quoting from the user's guide:
        1. `+RTS -hr` "Breaks down the graph by retainer set"
        2. `+RTS -hr<cc> -h<x>`, where `-h<x>` is one of normal heap profiling
           break-down options (e.g. `-hc`), and `-hr<cc> means "Restrict the
           profile to closures with retainer sets containing cost-centre
           stacks with one of the specified cost centres at the top."
      
      Retainer profiling writes to a .hp file, like the other heap profiling
      options, but also to a .prof file. Therefore, when the .prof file is not
      writeable for whatever reason, retainer profiling should be turned off
      completely.
      
      This worked ok when running the program with `+RTS -hr` (option 1), but a
      segfault would occur when using `+RTS -hr<cc> -h<x>`, with `x!=r` (option 2).
      
      This commit fixes that.
      
      Reviewed by: bgamari
      
      Differential Revision: https://phabricator.haskell.org/D1849
      
      GHC Trac Issues: #11489
      6d2bdfd8
  20. 21 Dec, 2015 1 commit
    • Simon Marlow's avatar
      Maintain cost-centre stacks in the interpreter · c8c44fd9
      Simon Marlow authored
      Summary:
      Breakpoints become SCCs, so we have detailed call-stack info for
      interpreted code.  Currently this only works when GHC is compiled with
      -prof, but D1562 (Remote GHCi) removes this constraint so that in the
      future call stacks will be available without building your own GHCi.
      
      How can you get a stack trace?
      
      * programmatically: GHC.Stack.currentCallStack
      * I've added an experimental :where command that shows the stack when
        stopped at a breakpoint
      * `error` attaches a call stack automatically, although since calls to
        `error` are often lifted out to the top level, this is less useful
        than it might be (ImplicitParams still works though).
      * Later we might attach call stacks to all exceptions
      
      Other related changes in this diff:
      
      * I reduced the number of places that get ticks attached for
        breakpoints.  In particular there was a breakpoint around the whole
        declaration, which was often redundant because it bound no variables.
        This reduces clutter in the stack traces and speeds up compilation.
      
      * I tidied up some RealSrcSpan stuff in InteractiveUI, and made a few
        other small cleanups
      
      Test Plan: validate
      
      Reviewers: ezyang, bgamari, austin, hvr
      
      Subscribers: thomie
      
      Differential Revision: https://phabricator.haskell.org/D1595
      
      GHC Trac Issues: #11047
      c8c44fd9
  21. 07 Nov, 2015 1 commit
    • Simon Marlow's avatar
      Make GHCi & TH work when the compiler is built with -prof · ce1f1607
      Simon Marlow authored
      Summary:
      Amazingly, there were zero changes to the byte code generator and very
      few changes to the interpreter - mainly because we've used good
      abstractions that hide the differences between profiling and
      non-profiling.  So that bit was pleasantly straightforward, but there
      were a pile of other wibbles to get the whole test suite through.
      
      Note that a compiler built with -prof is now like one built with
      -dynamic, in that to use TH you have to build the code the same way.
      For dynamic, we automatically enable -dynamic-too when TH is required,
      but we don't have anything equivalent for profiling, so you have to
      explicitly use -prof when building code that uses TH with a profiled
      compiler.  For this reason Cabal won't work with TH.  We don't expect
      to ship a profiled compiler, so I think that's OK.
      
      Test Plan: validate with GhcProfiled=YES in validate.mk
      
      Reviewers: goldfire, bgamari, rwbarton, austin, hvr, erikd, ezyang
      
      Reviewed By: ezyang
      
      Subscribers: thomie
      
      Differential Revision: https://phabricator.haskell.org/D1407
      
      GHC Trac Issues: #4837, #545
      ce1f1607
  22. 06 Apr, 2015 1 commit
  23. 21 Oct, 2014 1 commit
  24. 29 Sep, 2014 1 commit
  25. 28 Jul, 2014 1 commit
  26. 02 Jul, 2014 1 commit
  27. 05 Oct, 2013 1 commit
  28. 04 Sep, 2013 1 commit
    • Simon Marlow's avatar
      Don't move Capabilities in setNumCapabilities (#8209) · aa779e09
      Simon Marlow authored
      We have various problems with reallocating the array of Capabilities,
      due to threads in waitForReturnCapability that are already holding a
      pointer to a Capability.
      
      Rather than add more locking to make this safer, I decided it would be
      easier to ensure that we never move the Capabilities at all.  The
      capabilities array is now an array of pointers to Capabaility.  There
      are extra indirections, but it rarely matters - we don't often access
      Capabilities via the array, normally we already have a pointer to
      one.  I ran the parallel benchmarks and didn't see any difference.
      aa779e09
  29. 25 Oct, 2012 1 commit
  30. 23 Oct, 2012 1 commit
  31. 07 Sep, 2012 1 commit
    • Simon Marlow's avatar
      Deprecate lnat, and use StgWord instead · 41737f12
      Simon Marlow authored
      lnat was originally "long unsigned int" but we were using it when we
      wanted a 64-bit type on a 64-bit machine.  This broke on Windows x64,
      where long == int == 32 bits.  Using types of unspecified size is bad,
      but what we really wanted was a type with N bits on an N-bit machine.
      StgWord is exactly that.
      
      lnat was mentioned in some APIs that clients might be using
      (e.g. StackOverflowHook()), so we leave it defined but with a comment
      to say that it's deprecated.
      41737f12
  32. 20 Aug, 2012 1 commit
  33. 11 Jul, 2012 1 commit
  34. 02 Dec, 2011 1 commit
    • Simon Marlow's avatar
      More changes aimed at improving call stacks. · 1469f1eb
      Simon Marlow authored
        - Attach a SrcSpan to every CostCentre.  This had the side effect
          that CostCentres that used to be merged because they had the same
          name are now considered distinct; so I had to add a Unique to
          CostCentre to give them distinct object-code symbols.
      
        - New flag: -fprof-auto-calls.  This flag adds an automatic SCC to
          every call site (application, to be precise).  This is typically
          more useful for call stacks than annotating whole functions.
      
      Various tidy-ups at the same time: removed unused NoCostCentre
      constructor, and refactored a bit in Coverage.lhs.
      
      The call stack we get from traceStack now looks like this:
      
      Stack trace:
        Main.CAF (<entire-module>)
        Main.main.xs (callstack002.hs:18:12-24)
        Main.map (callstack002.hs:13:12-16)
        Main.map.go (callstack002.hs:15:21-34)
        Main.map.go (callstack002.hs:15:21-23)
        Main.f (callstack002.hs:10:7-43)
      1469f1eb
  35. 01 Dec, 2011 1 commit
  36. 29 Nov, 2011 1 commit
    • Simon Marlow's avatar
      Make profiling work with multiple capabilities (+RTS -N) · 50de6034
      Simon Marlow authored
      This means that both time and heap profiling work for parallel
      programs.  Main internal changes:
      
        - CCCS is no longer a global variable; it is now another
          pseudo-register in the StgRegTable struct.  Thus every
          Capability has its own CCCS.
      
        - There is a new built-in CCS called "IDLE", which records ticks for
          Capabilities in the idle state.  If you profile a single-threaded
          program with +RTS -N2, you'll see about 50% of time in "IDLE".
      
        - There is appropriate locking in rts/Profiling.c to protect the
          shared cost-centre-stack data structures.
      
      This patch does enough to get it working, I have cut one big corner:
      the cost-centre-stack data structure is still shared amongst all
      Capabilities, which means that multiple Capabilities will race when
      updating the "allocations" and "entries" fields of a CCS.  Not only
      does this give unpredictable results, but it runs very slowly due to
      cache line bouncing.
      
      It is strongly recommended that you use -fno-prof-count-entries to
      disable the "entries" count when profiling parallel programs. (I shall
      add a note to this effect to the docs).
      50de6034
  37. 25 Nov, 2011 1 commit
    • Simon Marlow's avatar
      Time handling overhaul · 6b109851
      Simon Marlow authored
      Terminology cleanup: the type "Ticks" has been renamed "Time", which
      is an StgWord64 in units of TIME_RESOLUTION (currently nanoseconds).
      The terminology "tick" is now used consistently to mean the interval
      between timer signals.
      
      The ticker now always ticks in realtime (actually CLOCK_MONOTONIC if
      we have it).  Before it used CPU time in the non-threaded RTS and
      realtime in the threaded RTS, but I've discovered that the CPU timer
      has terrible resolution (at least on Linux) and isn't much use for
      profiling.  So now we always use realtime.  This should also fix
      
      The default tick interval is now 10ms, except when profiling where we
      drop it to 1ms.  This gives more accurate profiles without affecting
      runtime too much (<1%).
      
      Lots of cleanups - the resolution of Time is now in one place
      only (Rts.h) rather than having calculations that depend on the
      resolution scattered all over the RTS.  I hope I found them all.
      6b109851
  38. 14 Nov, 2011 1 commit
  39. 08 Nov, 2011 1 commit