1. 02 Sep, 2014 1 commit
  2. 10 Aug, 2014 2 commits
    • Austin Seipp's avatar
    • Sergei Trofimovich's avatar
      rts/Printer.c: drop zcode mangling/demangling support in C code · 7a754a94
      Sergei Trofimovich authored
      
      
      Summary:
      GHC's RTS contains ancient Zdecode code which changed format a bit.
      It's easier to drop broken part and show original names.
      
      The patch changes output for './hello +RTS -Da' (apply)
      from such gibberish:
      
          stg_ap_v_ret... PAP/1(0x92922a, &i!_-&i!_:<.s_r=Z)
          stg_ap_0_ret... base:GHC.MVar.MVar(0x7fd3d1f040f8)
          stg_ap_v_ret... THUNK(&i!_-&i!_i!f.Z)
          stg_ap_v_ret... PAP/1(0x92c1f3, EO_:<.s_r=Z, EP_:<.s_r=Z)
          stg_ap_0_ret... ghc-prim:GHC.Tuple.(,)(0x7fd3d1f04209, 0x7fd3d1f041fa)
          stg_ap_0_ret... ghc-prim:GHC.Types.:(0x7fd3d1f04301, 0x7fd3d1f042ea)
          stg_ap_0_ret... THUNK(3F0_i!f.Z, 0x9152a1)
          stg_ap_0_ret... FUN/3(&s=_GHCziIOziFD_z/fB_ff=r=/IOFD14_i!f.Z)
          stg_ap_ppv_ret... FUN/3(&s=_GHCziIOziFD_z/fB_ff=r=/IOFD14_i!f.Z)
          stg_ap_0_ret... FUN/2(&s=_GHCziIOziFD_z/fIOD=vi:=FD15_i!f.Z)
          stg_ap_pv_ret... FUN/2(&s=_GHCziIOziFD_z/fIOD=vi:=FD15_i!f.Z)
          stg_ap_0_ret... base:GHC.IO.Handle.Types.FileHandle(5'A_:<.s_r=Z, 0x7fd3d1f04ef0)
          stg_ap_v_ret... THUNK(*>_&+_2__+/_i!f.Z, 0x7fd3d1f0602a, 0x7fd3d1f04f10)
          stg_ap_v_ret... PAP/1(0x7fd3d1f0602a, 0x7fd3d1f04f10)
      
      to something more readable:
      
          stg_ap_v_ret... PAP/1(0x92922a, <Main_main_closure>[0x90b710])
          stg_ap_0_ret... base:GHC.MVar.MVar(0x7f1e256040f8)
          stg_ap_v_ret... THUNK(<Main_main_info>[0x4046c8])
          stg_ap_v_ret... PAP/1(0x92c1f3, <sEO_closure>[0x90b6f0], <sEP_closure>[0x90b6d0])
          stg_ap_0_ret... ghc-prim:GHC.Tuple.(,)(0x7f1e25604209, 0x7f1e256041fa)
          stg_ap_0_ret... ghc-prim:GHC.Types.:(0x7f1e25604301, 0x7f1e256042ea)
          stg_ap_0_ret... THUNK(<s3F0_info>[0x434f70], 0x9152a1)
          stg_ap_0_ret... FUN/3(<base_GHCziIOziFD_zdfBufferedIOFD14_info>[0x5f5198])
          stg_ap_ppv_ret... FUN/3(<base_GHCziIOziFD_zdfBufferedIOFD14_info>[0x5f5198])
          stg_ap_0_ret... FUN/2(<base_GHCziIOziFD_zdfIODeviceFD15_info>[0x5f7c60])
          stg_ap_pv_ret... FUN/2(<base_GHCziIOziFD_zdfIODeviceFD15_info>[0x5f7c60])
          stg_ap_0_ret... base:GHC.IO.Handle.Types.FileHandle(<r5qA_closure>[0x91a920], 0x7f1e25604ef0)
          stg_ap_v_ret... THUNK(<stg_ap_2_upd_info>[0x6b1c60], 0x7f1e2560602a, 0x7f1e25604f10)
          stg_ap_v_ret... PAP/1(0x7f1e2560602a, 0x7f1e25604f10)
      
      First observed on '+RTS -Di' (interpreter) on unregisterised builds.
      Signed-off-by: default avatarSergei Trofimovich <slyfox@gentoo.org>
      
      Test Plan: built 'hello world' with -debug in moth modes and ran under '+RTS -Da'
      
      Reviewers: simonmar, austin, ezyang
      
      Reviewed By: austin, ezyang
      
      Subscribers: phaskell, rwbarton, simonmar, relrod, ezyang, carter
      
      Differential Revision: https://phabricator.haskell.org/D116
      7a754a94
  3. 28 Jul, 2014 1 commit
  4. 29 Mar, 2014 1 commit
    • tibbe's avatar
      Add SmallArray# and SmallMutableArray# types · 90329b6c
      tibbe authored
      These array types are smaller than Array# and MutableArray# and are
      faster when the array size is small, as they don't have the overhead
      of a card table. Having no card table reduces the closure size with 2
      words in the typical small array case and leads to less work when
      updating or GC:ing the array.
      
      Reduces both the runtime and memory allocation by 8.8% on my insert
      benchmark for the HashMap type in the unordered-containers package,
      which makes use of lots of small arrays. With tuned GC settings
      (i.e. `+RTS -A6M`) the runtime reduction is 15%.
      
      Fixes #8923.
      90329b6c
  5. 15 Jan, 2014 1 commit
  6. 16 Nov, 2012 1 commit
    • Simon Marlow's avatar
      Add a write barrier for TVAR closures · 6d784c43
      Simon Marlow authored
      This improves GC performance when there are a lot of TVars in the
      heap.  For instance, a TChan with a lot of elements causes a massive
      GC drag without this patch.
      
      There's more to do - several other STM closure types don't have write
      barriers, so GC performance when there are a lot of threads blocked on
      STM isn't great.  But fixing the problem for TVar is a good start.
      6d784c43
  7. 08 Oct, 2012 1 commit
    • Simon Marlow's avatar
      Produce new-style Cmm from the Cmm parser · a7c0387d
      Simon Marlow authored
      The main change here is that the Cmm parser now allows high-level cmm
      code with argument-passing and function calls.  For example:
      
      foo ( gcptr a, bits32 b )
      {
        if (b > 0) {
           // we can make tail calls passing arguments:
           jump stg_ap_0_fast(a);
        }
      
        return (x,y);
      }
      
      More details on the new cmm syntax are in Note [Syntax of .cmm files]
      in CmmParse.y.
      
      The old syntax is still more-or-less supported for those occasional
      code fragments that really need to explicitly manipulate the stack.
      However there are a couple of differences: it is now obligatory to
      give a list of live GlobalRegs on every jump, e.g.
      
        jump %ENTRY_CODE(Sp(0)) [R1];
      
      Again, more details in Note [Syntax of .cmm files].
      
      I have rewritten most of the .cmm files in the RTS into the new
      syntax, except for AutoApply.cmm which is generated by the genapply
      program: this file could be generated in the new syntax instead and
      would probably be better off for it, but I ran out of enthusiasm.
      
      Some other changes in this batch:
      
       - The PrimOp calling convention is gone, primops now use the ordinary
         NativeNodeCall convention.  This means that primops and "foreign
         import prim" code must be written in high-level cmm, but they can
         now take more than 10 arguments.
      
       - CmmSink now does constant-folding (should fix #7219)
      
       - .cmm files now go through the cmmPipeline, and as a result we
         generate better code in many cases.  All the object files generated
         for the RTS .cmm files are now smaller.  Performance should be
         better too, but I haven't measured it yet.
      
       - RET_DYN frames are removed from the RTS, lots of code goes away
      
       - we now have some more canned GC points to cover unboxed-tuples with
         2-4 pointers, which will reduce code size a little.
      a7c0387d
  8. 21 Sep, 2012 1 commit
  9. 14 Sep, 2012 1 commit
  10. 07 Sep, 2012 1 commit
    • Simon Marlow's avatar
      Deprecate lnat, and use StgWord instead · 41737f12
      Simon Marlow authored
      lnat was originally "long unsigned int" but we were using it when we
      wanted a 64-bit type on a 64-bit machine.  This broke on Windows x64,
      where long == int == 32 bits.  Using types of unspecified size is bad,
      but what we really wanted was a type with N bits on an N-bit machine.
      StgWord is exactly that.
      
      lnat was mentioned in some APIs that clients might be using
      (e.g. StackOverflowHook()), so we leave it defined but with a comment
      to say that it's deprecated.
      41737f12
  11. 31 Aug, 2012 1 commit
  12. 25 Aug, 2012 1 commit
    • ian@well-typed.com's avatar
      Make a function for get_itbl, rather than using a CPP macro · 8413d838
      ian@well-typed.com authored
      This has several advantages:
      * It can be called from gdb
      * There is more type information for the user, and type checking
        for the compiler
      * Less opportunity for things to go wrong, e.g. due to missing
        parentheses or repeated execution
      
      The sizes of the non-debug .o files hasn't changed (other than
      Inlines.o), so I'm pretty sure the compiled code is identical.
      8413d838
  13. 26 Apr, 2012 1 commit
    • Ian Lynagh's avatar
      Fix warnings on Win64 · 1dbe6d59
      Ian Lynagh authored
      Mostly this meant getting pointer<->int conversions to use the right
      sizes. lnat is now size_t, rather than unsigned long, as that seems a
      better match for how it's used.
      1dbe6d59
  14. 14 Mar, 2012 1 commit
  15. 24 Jun, 2011 1 commit
  16. 15 Dec, 2010 1 commit
    • Simon Marlow's avatar
      Implement stack chunks and separate TSO/STACK objects · f30d5273
      Simon Marlow authored
      This patch makes two changes to the way stacks are managed:
      
      1. The stack is now stored in a separate object from the TSO.
      
      This means that it is easier to replace the stack object for a thread
      when the stack overflows or underflows; we don't have to leave behind
      the old TSO as an indirection any more.  Consequently, we can remove
      ThreadRelocated and deRefTSO(), which were a pain.
      
      This is obviously the right thing, but the last time I tried to do it
      it made performance worse.  This time I seem to have cracked it.
      
      2. Stacks are now represented as a chain of chunks, rather than
         a single monolithic object.
      
      The big advantage here is that individual chunks are marked clean or
      dirty according to whether they contain pointers to the young
      generation, and the GC can avoid traversing clean stack chunks during
      a young-generation collection.  This means that programs with deep
      stacks will see a big saving in GC overhead when using the default GC
      settings.
      
      A secondary advantage is that there is much less copying involved as
      the stack grows.  Programs that quickly grow a deep stack will see big
      improvements.
      
      In some ways the implementation is simpler, as nothing special needs
      to be done to reclaim stack as the stack shrinks (the GC just recovers
      the dead stack chunks).  On the other hand, we have to manage stack
      underflow between chunks, so there's a new stack frame
      (UNDERFLOW_FRAME), and we now have separate TSO and STACK objects.
      The total amount of code is probably about the same as before.
      
      There are new RTS flags:
      
         -ki<size> Sets the initial thread stack size (default 1k)  Egs: -ki4k -ki2m
         -kc<size> Sets the stack chunk size (default 32k)
         -kb<size> Sets the stack chunk buffer size (default 1k)
      
      -ki was previously called just -k, and the old name is still accepted
      for backwards compatibility.  These new options are documented.
      f30d5273
  17. 20 Jun, 2010 1 commit
  18. 01 Jan, 2010 1 commit
  19. 06 May, 2010 1 commit
  20. 01 Apr, 2010 1 commit
    • Simon Marlow's avatar
      Remove the IND_OLDGEN and IND_OLDGEN_PERM closure types · 70a2431f
      Simon Marlow authored
      These are no longer used: once upon a time they used to have different
      layout from IND and IND_PERM respectively, but that is no longer the
      case since we changed the remembered set to be an array of addresses
      instead of a linked list of closures.
      70a2431f
  21. 29 Mar, 2010 1 commit
    • Simon Marlow's avatar
      New implementation of BLACKHOLEs · 5d52d9b6
      Simon Marlow authored
      This replaces the global blackhole_queue with a clever scheme that
      enables us to queue up blocked threads on the closure that they are
      blocked on, while still avoiding atomic instructions in the common
      case.
      
      Advantages:
      
       - gets rid of a locked global data structure and some tricky GC code
         (replacing it with some per-thread data structures and different
         tricky GC code :)
      
       - wakeups are more prompt: parallel/concurrent performance should
         benefit.  I haven't seen anything dramatic in the parallel
         benchmarks so far, but a couple of threading benchmarks do improve
         a bit.
      
       - waking up a thread blocked on a blackhole is now O(1) (e.g. if
         it is the target of throwTo).
      
       - less sharing and better separation of Capabilities: communication
         is done with messages, the data structures are strictly owned by a
         Capability and cannot be modified except by sending messages.
      
       - this change will utlimately enable us to do more intelligent
         scheduling when threads block on each other.  This is what started
         off the whole thing, but it isn't done yet (#3838).
      
      I'll be documenting all this on the wiki in due course.
      5d52d9b6
  22. 11 Mar, 2010 1 commit
    • Simon Marlow's avatar
      Use message-passing to implement throwTo in the RTS · 7408b392
      Simon Marlow authored
      This replaces some complicated locking schemes with message-passing
      in the implementation of throwTo. The benefits are
      
       - previously it was impossible to guarantee that a throwTo from
         a thread running on one CPU to a thread running on another CPU
         would be noticed, and we had to rely on the GC to pick up these
         forgotten exceptions. This no longer happens.
      
       - the locking regime is simpler (though the code is about the same
         size)
      
       - threads can be unblocked from a blocked_exceptions queue without
         having to traverse the whole queue now.  It's a rare case, but
         replaces an O(n) operation with an O(1).
      
       - generally we move in the direction of sharing less between
         Capabilities (aka HECs), which will become important with other
         changes we have planned.
      
      Also in this patch I replaced several STM-specific closure types with
      a generic MUT_PRIM closure type, which allowed a lot of code in the GC
      and other places to go away, hence the line-count reduction.  The
      message-passing changes resulted in about a net zero line-count
      difference.
      7408b392
  23. 03 Dec, 2009 1 commit
    • Simon Marlow's avatar
      GC refactoring, remove "steps" · 214b3663
      Simon Marlow authored
      The GC had a two-level structure, G generations each of T steps.
      Steps are for aging within a generation, mostly to avoid premature
      promotion.  
      
      Measurements show that more than 2 steps is almost never worthwhile,
      and 1 step is usually worse than 2.  In theory fractional steps are
      possible, so the ideal number of steps is somewhere between 1 and 3.
      GHC's default has always been 2.
      
      We can implement 2 steps quite straightforwardly by having each block
      point to the generation to which objects in that block should be
      promoted, so blocks in the nursery point to generation 0, and blocks
      in gen 0 point to gen 1, and so on.
      
      This commit removes the explicit step structures, merging generations
      with steps, thus simplifying a lot of code.  Performance is
      unaffected.  The tunable number of steps is now gone, although it may
      be replaced in the future by a way to tune the aging in generation 0.
      214b3663
  24. 29 Aug, 2009 1 commit
    • Simon Marlow's avatar
      Unify event logging and debug tracing. · a5288c55
      Simon Marlow authored
        - tracing facilities are now enabled with -DTRACING, and -DDEBUG
          additionally enables debug-tracing.  -DEVENTLOG has been
          removed.
      
        - -debug now implies -eventlog
      
        - events can be printed to stderr instead of being sent to the
          binary .eventlog file by adding +RTS -v (which is implied by the
          +RTS -Dx options).
      
        - -Dx debug messages can be sent to the binary .eventlog file
          by adding +RTS -l.  This should help debugging by reducing
          the impact of debug tracing on execution time.
      
        - Various debug messages that duplicated the information in events
          have been removed.
      a5288c55
  25. 03 Aug, 2009 1 commit
  26. 02 Aug, 2009 1 commit
    • Simon Marlow's avatar
      RTS tidyup sweep, first phase · a2a67cd5
      Simon Marlow authored
      The first phase of this tidyup is focussed on the header files, and in
      particular making sure we are exposinng publicly exactly what we need
      to, and no more.
      
       - Rts.h now includes everything that the RTS exposes publicly,
         rather than a random subset of it.
      
       - Most of the public header files have moved into subdirectories, and
         many of them have been renamed.  But clients should not need to
         include any of the other headers directly, just #include the main
         public headers: Rts.h, HsFFI.h, RtsAPI.h.
      
       - All the headers needed for via-C compilation have moved into the
         stg subdirectory, which is self-contained.  Most of the headers for
         the rest of the RTS APIs have moved into the rts subdirectory.
      
       - I left MachDeps.h where it is, because it is so widely used in
         Haskell code.
       
       - I left a deprecated stub for RtsFlags.h in place.  The flag
         structures are now exposed by Rts.h.
      
       - Various internal APIs are no longer exposed by public header files.
      
       - Various bits of dead code and declarations have been removed
      
       - More gcc warnings are turned on, and the RTS code is more
         warning-clean.
      
       - More source files #include "PosixSource.h", and hence only use
         standard POSIX (1003.1c-1995) interfaces.
      
      There is a lot more tidying up still to do, this is just the first
      pass.  I also intend to standardise the names for external RTS APIs
      (e.g use the rts_ prefix consistently), and declare the internal APIs
      as hidden for shared libraries.
      a2a67cd5
  27. 02 Jun, 2009 1 commit
  28. 12 Jan, 2009 1 commit
  29. 18 Nov, 2008 1 commit
    • Simon Marlow's avatar
      Add optional eager black-holing, with new flag -feager-blackholing · d600bf7a
      Simon Marlow authored
      Eager blackholing can improve parallel performance by reducing the
      chances that two threads perform the same computation.  However, it
      has a cost: one extra memory write per thunk entry.  
      
      To get the best results, any code which may be executed in parallel
      should be compiled with eager blackholing turned on.  But since
      there's a cost for sequential code, we make it optional and turn it on
      for the parallel package only.  It might be a good idea to compile
      applications (or modules) with parallel code in with
      -feager-blackholing.
      
      ToDo: document -feager-blackholing.
      d600bf7a
  30. 11 Oct, 2007 1 commit
    • Simon Marlow's avatar
      Add a proper write barrier for MVars · 1ed01a87
      Simon Marlow authored
      Previously MVars were always on the mutable list of the old
      generation, which meant every MVar was visited during every minor GC.
      With lots of MVars hanging around, this gets expensive.  We addressed
      this problem for MUT_VARs (aka IORefs) a while ago, the solution is to
      use a traditional GC write-barrier when the object is modified.  This
      patch does the same thing for MVars.
      
      TVars are still done the old way, they could probably benefit from the
      same treatment too.
      1ed01a87
  31. 29 Aug, 2007 1 commit
  32. 13 Jun, 2007 3 commits
    • Simon Marlow's avatar
      FIX #1418 (partially) · 23e5985c
      Simon Marlow authored
      When the con_desc field of an info table was made into a relative
      reference, this had the side effect of making the profiling fields
      (closure_desc and closure_type) also relative, but only when compiling
      via C, and the heap profiler was still treating them as absolute,
      leading to crashes when profiling with -hd or -hy.
      
      This patch fixes up the story to be consistent: these fields really
      should be relative (otherwise we couldn't make shared versions of the
      profiling libraries), so I've made them relative and fixed up the RTS
      to know about this.
      23e5985c
    • Simon Marlow's avatar
    • Simon Marlow's avatar
      warning police · edb58a77
      Simon Marlow authored
      edb58a77
  33. 17 Apr, 2007 1 commit
    • Simon Marlow's avatar
      Re-working of the breakpoint support · cdce6477
      Simon Marlow authored
      This is the result of Bernie Pope's internship work at MSR Cambridge,
      with some subsequent improvements by me.  The main plan was to
      
       (a) Reduce the overhead for breakpoints, so we could enable 
           the feature by default without incurrent a significant penalty
       (b) Scatter more breakpoint sites throughout the code
      
      Currently we can set a breakpoint on almost any subexpression, and the
      overhead is around 1.5x slower than normal GHCi.  I hope to be able to
      get this down further and/or allow breakpoints to be turned off.
      
      This patch also fixes up :print following the recent changes to
      constructor info tables.  (most of the :print tests now pass)
      
      We now support single-stepping, which just enables all breakpoints.
      
        :step <expr>     executes <expr> with single-stepping turned on
        :step            single-steps from the current breakpoint
      
      The mechanism is quite different to the previous implementation.  We
      share code with the HPC (haskell program coverage) implementation now.
      The coverage pass annotates source code with "tick" locations which
      are tracked by the coverage tool.  In GHCi, each "tick" becomes a
      potential breakpoint location.
      
      Previously breakpoints were compiled into code that magically invoked
      a nested instance of GHCi.  Now, a breakpoint causes the current
      thread to block and control is returned to GHCi.
      
      See the wiki page for more details and the current ToDo list:
      
        http://hackage.haskell.org/trac/ghc/wiki/NewGhciDebugger
      cdce6477
  34. 28 Feb, 2007 1 commit
  35. 15 Nov, 2006 1 commit
  36. 07 Oct, 2006 1 commit
  37. 23 Sep, 2006 1 commit