1. 04 Jun, 2009 3 commits
  2. 02 Jun, 2009 2 commits
  3. 30 May, 2009 1 commit
  4. 29 May, 2009 1 commit
  5. 18 May, 2009 1 commit
    • Ben.Lippmeier@anu.edu.au's avatar
      Split Reg into vreg/hreg and add register pairs · f9288086
      Ben.Lippmeier@anu.edu.au authored
       * The old Reg type is now split into VirtualReg and RealReg.
       * For the graph coloring allocator, the type of the register graph
         is now (Graph VirtualReg RegClass RealReg), which shows that it colors
         in nodes representing virtual regs with colors representing real regs.
         (as was intended)  
       * RealReg contains two contructors, RealRegSingle and RealRegPair,
         where RealRegPair is used to represent a SPARC double reg 
         constructed from two single precision FP regs. 
       * On SPARC we can now allocate double regs into an arbitrary register
         pair, instead of reserving some reg ranges to only hold float/double values. 
      f9288086
  6. 24 May, 2009 1 commit
  7. 23 May, 2009 1 commit
  8. 19 May, 2009 1 commit
  9. 14 May, 2009 1 commit
  10. 08 May, 2009 1 commit
  11. 28 Apr, 2009 1 commit
  12. 26 Apr, 2009 1 commit
  13. 24 Apr, 2009 1 commit
  14. 23 Apr, 2009 2 commits
  15. 13 Apr, 2009 2 commits
  16. 03 Apr, 2009 1 commit
  17. 31 Mar, 2009 1 commit
    • Ben.Lippmeier@anu.edu.au's avatar
      SPARC NCG: HpLim is now always stored on the stack, not in a register · 456dc6d6
      Ben.Lippmeier@anu.edu.au authored
         This fixes the out of memory errors we were getting on sparc
         after the following patch:
      
           Fri Mar 13 03:45:16 PDT 2009  Simon Marlow <marlowsd@gmail.com>
           * Instead of a separate context-switch flag, set HpLim to zero
           Ignore-this: 6c5bbe1ce2c5ef551efe98f288483b0
           This reduces the latency between a context-switch being triggered and
           the thread returning to the scheduler, which in turn should reduce the
           cost of the GC barrier when there are many cores. 
      456dc6d6
  18. 18 Mar, 2009 1 commit
  19. 19 Mar, 2009 1 commit
  20. 17 Mar, 2009 3 commits
    • Simon Marlow's avatar
      Add fast event logging · 8b18faef
      Simon Marlow authored
      Generate binary log files from the RTS containing a log of runtime
      events with timestamps.  The log file can be visualised in various
      ways, for investigating runtime behaviour and debugging performance
      problems.  See for example the forthcoming ThreadScope viewer.
      
      New GHC option:
      
        -eventlog   (link-time option) Enables event logging.
      
        +RTS -l     (runtime option) Generates <prog>.eventlog with
                    the binary event information.
      
      This replaces some of the tracing machinery we already had in the RTS:
      e.g. +RTS -vg  for GC tracing (we should do this using the new event
      logging instead).
      
      Event logging has almost no runtime cost when it isn't enabled, though
      in the future we might add more fine-grained events and this might
      change; hence having a link-time option and compiling a separate
      version of the RTS for event logging.  There's a small runtime cost
      for enabling event-logging, for most programs it shouldn't make much
      difference.
      
      (Todo: docs)
      8b18faef
    • Simon Marlow's avatar
      FIX biographical profiling (#3039, probably #2297) · f8f4cb3f
      Simon Marlow authored
      Since we introduced pointer tagging, we no longer always enter a
      closure to evaluate it.  However, the biographical profiler relies on
      closures being entered in order to mark them as "used", so we were
      getting spurious amounts of data attributed to VOID.  It turns out
      there are various places that need to be fixed, and I think at least
      one of them was also wrong before pointer tagging (CgCon.cgReturnDataCon).
      f8f4cb3f
    • Simon Marlow's avatar
      Add getNumberOfProcessors(), FIX MacOS X build problem (hopefully) · 0ee0be10
      Simon Marlow authored
      Somebody needs to implement getNumberOfProcessors() for MacOS X,
      currently it will return 1.
      0ee0be10
  21. 13 Mar, 2009 2 commits
    • Simon Marlow's avatar
      Use work-stealing for load-balancing in the GC · 4e354226
      Simon Marlow authored
        
      New flag: "+RTS -qb" disables load-balancing in the parallel GC
      (though this is subject to change, I think we will probably want to do
      something more automatic before releasing this).
      
      To get the "PARGC3" configuration described in the "Runtime support
      for Multicore Haskell" paper, use "+RTS -qg0 -qb -RTS".
      
      The main advantage of this is that it allows us to easily disable
      load-balancing altogether, which turns out to be important in parallel
      programs.  Maintaining locality is sometimes more important that
      spreading the work out in parallel GC.  There is a side benefit in
      that the parallel GC should have improved locality even when
      load-balancing, because each processor prefers to take work from its
      own queue before stealing from others.
      4e354226
    • Simon Marlow's avatar
      Instead of a separate context-switch flag, set HpLim to zero · 304e7fb7
      Simon Marlow authored
      This reduces the latency between a context-switch being triggered and
      the thread returning to the scheduler, which in turn should reduce the
      cost of the GC barrier when there are many cores.
      
      We still retain the old context_switch flag which is checked at the
      end of each block of allocation.  The idea is that setting HpLim may
      fail if the the target thread is modifying HpLim at the same time; the
      context_switch flag is a fallback.  It also allows us to "context
      switch soon" without forcing an immediate switch, which can be costly.
      304e7fb7
  22. 06 Mar, 2009 1 commit
    • Simon Marlow's avatar
      Partial fix for #2917 · 1b62aece
      Simon Marlow authored
       - add newAlignedPinnedByteArray# for allocating pinned BAs with
         arbitrary alignment
      
       - the old newPinnedByteArray# now aligns to 16 bytes
      
      Foreign.alloca will use newAlignedPinnedByteArray#, and so might end
      up wasting less space than before (we used to align to 8 by default).
      Foreign.allocaBytes and Foreign.mallocForeignPtrBytes will get 16-byte
      aligned memory, which is enough to avoid problems with SSE
      instructions on x86, for example.
      
      There was a bug in the old newPinnedByteArray#: it aligned to 8 bytes,
      but would have failed if the header was not a multiple of 8
      (fortunately it always was, even with profiling).  Also we
      occasionally wasted some space unnecessarily due to alignment in
      allocatePinned().
      
      I haven't done anything about Foreign.malloc/mallocBytes, which will
      give you the same alignment guarantees as malloc() (8 bytes on
      Linux/x86 here).
      1b62aece
  23. 19 Feb, 2009 1 commit
    • Simon Marlow's avatar
      Rewrite of signal-handling (ghc patch; see also base and unix patches) · 7ed3f755
      Simon Marlow authored
      The API is the same (for now).  The new implementation has the
      capability to define signal handlers that have access to the siginfo
      of the signal (#592), but this functionality is not exposed in this
      patch.
      
      #2451 is the ticket for the new API.
      
      The main purpose of bringing this in now is to fix race conditions in
      the old signal handling code (#2858).  Later we can enable the new
      API in the HEAD.
      
      Implementation differences:
      
       - More of the signal-handling is moved into Haskell.  We store the
         table of signal handlers in an MVar, rather than having a table of
         StablePtrs in the RTS.
      
       - In the threaded RTS, the siginfo of the signal is passed down the
         pipe to the IO manager thread, which manages the business of
         starting up new signal handler threads.  In the non-threaded RTS,
         the siginfo of caught signals is stored in the RTS, and the
         scheduler starts new signal handler threads.
      7ed3f755
  24. 12 Feb, 2009 1 commit
  25. 11 Feb, 2009 1 commit
  26. 13 Feb, 2009 1 commit
  27. 11 Feb, 2009 1 commit
  28. 06 Feb, 2009 3 commits
  29. 03 Feb, 2009 1 commit
  30. 27 Jan, 2009 1 commit