1. 18 Nov, 2008 1 commit
    • Simon Marlow's avatar
      Add optional eager black-holing, with new flag -feager-blackholing · d600bf7a
      Simon Marlow authored
      Eager blackholing can improve parallel performance by reducing the
      chances that two threads perform the same computation.  However, it
      has a cost: one extra memory write per thunk entry.  
      
      To get the best results, any code which may be executed in parallel
      should be compiled with eager blackholing turned on.  But since
      there's a cost for sequential code, we make it optional and turn it on
      for the parallel package only.  It might be a good idea to compile
      applications (or modules) with parallel code in with
      -feager-blackholing.
      
      ToDo: document -feager-blackholing.
      d600bf7a
  2. 31 Oct, 2007 1 commit
    • Simon Marlow's avatar
      Initial parallel GC support · f2ca6dee
      Simon Marlow authored
      eg. use +RTS -g2 -RTS for 2 threads.  Only major GCs are parallelised,
      minor GCs are still sequential. Don't use more threads than you
      have CPUs.
      
      It works most of the time, although you won't see much speedup yet.
      Tuning and more work on stability still required.
      f2ca6dee
  3. 11 Oct, 2007 1 commit
    • Simon Marlow's avatar
      Add a proper write barrier for MVars · 1ed01a87
      Simon Marlow authored
      Previously MVars were always on the mutable list of the old
      generation, which meant every MVar was visited during every minor GC.
      With lots of MVars hanging around, this gets expensive.  We addressed
      this problem for MUT_VARs (aka IORefs) a while ago, the solution is to
      use a traditional GC write-barrier when the object is modified.  This
      patch does the same thing for MVars.
      
      TVars are still done the old way, they could probably benefit from the
      same treatment too.
      1ed01a87
  4. 06 Jun, 2007 1 commit
  5. 28 Feb, 2007 1 commit
  6. 07 Oct, 2006 1 commit
  7. 07 Sep, 2006 1 commit
    • Simon Marlow's avatar
      Remove CONSTR_CHARLIKE and CONSTR_INTLIKE closure types · a0be7e7c
      Simon Marlow authored
      These closure types aren't used/needed, as far as I can tell.  The
      commoning up of Chars/Ints happens by comparing info pointers, and
      the info table for a dynamic C#/I# is CONSTR_0_1.  The RTS seemed
      a little confused about whether CONSTR_CHARLIKE/CONSTR_INTLIKE were
      supposed to be static or dynamic closures, too.
      a0be7e7c
  8. 07 Apr, 2006 1 commit
    • Simon Marlow's avatar
      Reorganisation of the source tree · 0065d5ab
      Simon Marlow authored
      Most of the other users of the fptools build system have migrated to
      Cabal, and with the move to darcs we can now flatten the source tree
      without losing history, so here goes.
      
      The main change is that the ghc/ subdir is gone, and most of what it
      contained is now at the top level.  The build system now makes no
      pretense at being multi-project, it is just the GHC build system.
      
      No doubt this will break many things, and there will be a period of
      instability while we fix the dependencies.  A straightforward build
      should work, but I haven't yet fixed binary/source distributions.
      Changes to the Building Guide will follow, too.
      0065d5ab
  9. 17 Jan, 2006 2 commits
    • simonmar's avatar
      [project @ 2006-01-17 16:13:18 by simonmar] · 91b07216
      simonmar authored
      Improve the GC behaviour of IORefs (see Ticket #650).
      
      This is a small change to the way IORefs interact with the GC, which
      should improve GC performance for programs with plenty of IORefs.
      
      Previously we had a single closure type for mutable variables,
      MUT_VAR.  Mutable variables were *always* on the mutable list in older
      generations, and always traversed on every GC.
      
      Now, we have two closure types: MUT_VAR_CLEAN and MUT_VAR_DIRTY.  The
      latter is on the mutable list, but the former is not.  (NB. this
      differs from MUT_ARR_PTRS_CLEAN and MUT_ARR_PTRS_DIRTY, both of which
      are on the mutable list).  writeMutVar# now implements a write
      barrier, by calling dirty_MUT_VAR() in the runtime, that does the
      necessary modification of MUT_VAR_CLEAN into MUT_VAR_DIRY, and adding
      to the mutable list if necessary.
      
      This results in some pretty dramatic speedups for GHC itself.  I've
      just measureed a 30% overall speedup compiling a 31-module program
      (anna) with the default heap settings :-D
      91b07216
    • simonmar's avatar
      [project @ 2006-01-17 16:03:47 by simonmar] · da69fa9c
      simonmar authored
      Improve the GC behaviour of IOArrays/STArrays
      
      See Ticket #650
      
      This is a small change to the way mutable arrays interact with the GC,
      that can have a dramatic effect on performance, and make tricks with
      unsafeThaw/unsafeFreeze redundant.  Data.HashTable should be faster
      now (I haven't measured it yet).
      
      We now have two mutable array closure types, MUT_ARR_PTRS_CLEAN and
      MUT_ARR_PTRS_DIRTY.  Both are on the mutable list if the array is in
      an old generation.  writeArray# sets the type to MUT_ARR_PTRS_DIRTY.
      The garbage collector can set the type to MUT_ARR_PTRS_CLEAN if it
      finds that no element of the array points into a younger generation
      (discovering this required a small addition to evacuate(), but rough
      tests indicate that it doesn't measurably affect performance).
      
      NOTE: none of this affects unboxed arrays (IOUArray/STUArray), only
      boxed arrays (IOArray/STArray).
      
      We could go further and extend the DIRTY bit to be per-block rather
      than for the whole array, but for now this is an easy improvement.
      da69fa9c
  10. 07 Nov, 2005 1 commit
  11. 25 Jul, 2005 1 commit
    • simonmar's avatar
      [project @ 2005-07-25 14:12:48 by simonmar] · e792bb84
      simonmar authored
      Remove the ForeignObj# type, and all its PrimOps.  The new efficient
      representation of ForeignPtr doesn't use ForeignObj# underneath, and
      there seems no need to keep it.
      e792bb84
  12. 20 Apr, 2005 1 commit
  13. 18 Nov, 2004 1 commit
  14. 12 Sep, 2004 1 commit
  15. 11 Dec, 2002 1 commit
    • simonmar's avatar
      [project @ 2002-12-11 15:36:20 by simonmar] · 0bffc410
      simonmar authored
      Merge the eval-apply-branch on to the HEAD
      ------------------------------------------
      
      This is a change to GHC's evaluation model in order to ultimately make
      GHC more portable and to reduce complexity in some areas.
      
      At some point we'll update the commentary to describe the new state of
      the RTS.  Pending that, the highlights of this change are:
      
        - No more Su.  The Su register is gone, update frames are one
          word smaller.
      
        - Slow-entry points and arg checks are gone.  Unknown function calls
          are handled by automatically-generated RTS entry points (AutoApply.hc,
          generated by the program in utils/genapply).
      
        - The stack layout is stricter: there are no "pending arguments" on
          the stack any more, the stack is always strictly a sequence of
          stack frames.
      
          This means that there's no need for LOOKS_LIKE_GHC_INFO() or
          LOOKS_LIKE_STATIC_CLOSURE() any more, and GHC doesn't need to know
          how to find the boundary between the text and data segments (BIG WIN!).
      
        - A couple of nasty hacks in the mangler caused by the neet to
          identify closure ptrs vs. info tables have gone away.
      
        - Info tables are a bit more complicated.  See InfoTables.h for the
          details.
      
        - As a side effect, GHCi can now deal with polymorphic seq.  Some bugs
          in GHCi which affected primitives and unboxed tuples are now
          fixed.
      
        - Binary sizes are reduced by about 7% on x86.  Performance is roughly
          similar, some programs get faster while some get slower.  I've seen
          GHCi perform worse on some examples, but haven't investigated
          further yet (GHCi performance *should* be about the same or better
          in theory).
      
        - Internally the code generator is rather better organised.  I've moved
          info-table generation from the NCG into the main codeGen where it is
          shared with the C back-end; info tables are now emitted as arrays
          of words in both back-ends.  The NCG is one step closer to being able
          to support profiling.
      
      This has all been fairly thoroughly tested, but no doubt I've messed
      up the commit in some way.
      0bffc410
  16. 19 Apr, 2002 1 commit
  17. 14 Aug, 2001 1 commit
    • sewardj's avatar
      [project @ 2001-08-14 13:40:07 by sewardj] · bc5c8021
      sewardj authored
      Change the story about POSIX headers in C compilation.
      
      Until now, all C code in the RTS and library cbits has by default been
      compiled with settings for POSIXness enabled, that is:
         #define _POSIX_SOURCE   1
         #define _POSIX_C_SOURCE 199309L
         #define _ISOC9X_SOURCE
      If you wanted to negate this, you'd have to define NON_POSIX_SOURCE
      before including headers.
      
      This scheme has some bad effects:
      
      * It means that ccall-unfoldings exported via interfaces from a
        module compiled with -DNON_POSIX_SOURCE may not compile when
        imported into a module which does not -DNON_POSIX_SOURCE.
      
      * It overlaps with the feature tests we do with autoconf.
      
      * It seems to have caused borkage in the Solaris builds for some
        considerable period of time.
      
      The New Way is:
      
      * The default changes to not-being-in-Posix mode.
      
      * If you want to force a C file into Posix mode, #include as
        the **first** include the new file ghc/includes/PosixSource.h.
        Most of the RTS C sources have this include now.
      
      * NON_POSIX_SOURCE is almost totally expunged.  Unfortunately
        we have to retain some vestiges of it in ghc/compiler so that
        modules compiled via C on Solaris using older compilers don't
        break.
      bc5c8021
  18. 23 Jul, 2001 1 commit
    • simonmar's avatar
      [project @ 2001-07-23 17:23:19 by simonmar] · dfd7d6d0
      simonmar authored
      Add a compacting garbage collector.
      
      It isn't enabled by default, as there are still a couple of problems:
      there's a fallback case I haven't implemented yet which means it will
      occasionally bomb out, and speed-wise it's quite a bit slower than the
      copying collector (about 1.8x slower).
      
      Until I can make it go faster, it'll only be useful when you're
      actually running low on real memory.
      
      '+RTS -c' to enable it.
      
      Oh, and I cleaned up a few things in the RTS while I was there, and
      fixed one or two possibly real bugs in the existing GC.
      dfd7d6d0
  19. 22 Mar, 2001 1 commit
    • hwloidl's avatar
      [project @ 2001-03-22 03:51:08 by hwloidl] · 20fc2f0c
      hwloidl authored
      -*- outline -*-
      Time-stamp: <Thu Mar 22 2001 03:50:16 Stardate: [-30]6365.79 hwloidl>
      
      This commit covers changes in GHC to get GUM (way=mp) and GUM/GdH (way=md)
      working. It is a merge of my working version of GUM, based on GHC 4.06,
      with GHC 4.11. Almost all changes are in the RTS (see below).
      
      GUM is reasonably stable, we used the 4.06 version in large-ish programs for
      recent papers. Couple of things I want to change, but nothing urgent.
      GUM/GdH has just been merged and needs more testing. Hope to do that in the
      next weeks. It works in our working build but needs tweaking to run.
      GranSim doesn't work yet (*sigh*). Most of the code should be in, but needs
      more debugging.
      
      ToDo: I still want to make the following minor modifications before the release
      - Better wrapper skript for parallel execution [ghc/compiler/main]
      - Update parallel docu: started on it but it's minimal [ghc/docs/users_guide]
      - Clean up [nofib/parallel]: it's a real mess right now (*sigh*)
      - Update visualisation tools (minor things only IIRC) [ghc/utils/parallel]
      - Add a Klingon-English glossary
      
      * RTS:
      
      Almost all changes are restricted to ghc/rts/parallel and should not
      interfere with the rest. I only comment on changes outside the parallel
      dir:
      
      - Several changes in Schedule.c (scheduling loop; createThreads etc);
        should only affect parallel code
      - Added ghc/rts/hooks/ShutdownEachPEHook.c
      - ghc/rts/Linker.[ch]: GUM doesn't know about Stable Names (ifdefs)!!
      - StgMiscClosures.h: END_TSO_QUEUE etc now defined here (from StgMiscClosures.hc)
                           END_ECAF_LIST was missing a leading stg_
      - SchedAPI.h: taskStart now defined in here; it's only a wrapper around
                    scheduleThread now, but might use some init, shutdown later
      - RtsAPI.h: I have nuked the def of rts_evalNothing
      
      * Compiler:
      
      - ghc/compiler/main/DriverState.hs
        added PVM-ish flags to the parallel way
        added new ways for parallel ticky profiling and distributed exec
      
      - ghc/compiler/main/DriverPipeline.hs
        added a fct run_phase_MoveBinary which is called with way=mp after linking;
        it moves the bin file into a PVM dir and produces a wrapper script for
        parallel execution
        maybe cleaner to add a MoveBinary phase in DriverPhases.hs but this way
        it's less intrusive and MoveBinary makes probably only sense for mp anyway
      
      * Nofib:
      
      - nofib/spectral/Makefile, nofib/real/Makefile, ghc/tests/programs/Makefile:
        modified to skip some tests if HWL_NOFIB_HACK is set; only tmp to record
        which test prgs cause problems in my working build right now
      20fc2f0c
  20. 02 Mar, 2001 1 commit
  21. 29 Jan, 2001 1 commit
  22. 13 Jan, 2000 1 commit
    • hwloidl's avatar
      [project @ 2000-01-13 14:33:57 by hwloidl] · 1b28d4e1
      hwloidl authored
      Merged GUM-4-04 branch into the main trunk. In particular merged GUM and
      SMP code. Most of the GranSim code in GUM-4-04 still has to be carried over.
      1b28d4e1
  23. 12 Jan, 2000 1 commit
  24. 09 Nov, 1999 1 commit
    • simonmar's avatar
      [project @ 1999-11-09 15:46:49 by simonmar] · 30681e79
      simonmar authored
      A slew of SMP-related changes.
      
       - New locking scheme for thunks: we now check whether the thunk
         being entered is in our private allocation area, and if so
         we don't lock it.  Well, that's the upshot.  In practice it's
         a lot more fiddly than that.
      
       - I/O blocking is handled a bit more sanely now (but still not
         properly, methinks)
      
       - deadlock detection is back
      
       - remove old pre-SMP scheduler code
      
       - revamp the timing code.  We actually get reasonable-looking
         timing info for SMP programs now.
      
       - fix a bug in the garbage collector to do with IND_OLDGENs appearing
         on the mutable list of the old generation.
      
       - move BDescr() function from rts/BlockAlloc.h to includes/Block.h.
      
       - move struct generation and struct step into includes/StgStorage.h (sigh)
      
       - add UPD_IND_NOLOCK for updating with an indirection where locking
         the black hole is not required.
      30681e79
  25. 02 Nov, 1999 1 commit
    • simonmar's avatar
      [project @ 1999-11-02 15:05:38 by simonmar] · f6692611
      simonmar authored
      This commit adds in the current state of our SMP support.  Notably,
      this allows the new way 's' to be built, providing support for running
      multiple Haskell threads simultaneously on top of any pthreads
      implementation, the idea being to take advantage of commodity SMP
      boxes.
      
      Don't expect to get much of a speedup yet; due to the excessive
      locking required to synchronise access to mutable heap objects, you'll
      see a slowdown in most cases, even on a UP machine.  The best I've
      seen is a 1.6-1.7 speedup on an example that did no locking (two
      optimised nfibs in parallel).
      
      	- new RTS -N flag specifies how many pthreads to start.
      
      	- new driver -smp flag, tells the driver to use way 's'.
      
      	- new compiler -fsmp option (not for user comsumption)
      	  tells the compiler not to generate direct jumps to
      	  thunk entry code.
      
      	- largely rewritten scheduler
      
      	- _ccall_GC is now done by handing back a "token" to the
      	  RTS before executing the ccall; it should now be possible
      	  to execute blocking ccalls in the current thread while
      	  allowing the RTS to continue running Haskell threads as
      	  normal.
      
      	- you can only call thread-safe C libraries from a way 's'
      	  build, of course.
      
      Pthread support is still incomplete, and weird things (including
      deadlocks) are likely to happen.
      f6692611
  26. 11 May, 1999 1 commit
    • keithw's avatar
      [project @ 1999-05-11 16:47:39 by keithw] · eb407ca1
      keithw authored
      (this is number 9 of 9 commits to be applied together)
      
        Usage verification changes / ticky-ticky changes:
      
        We want to verify that SingleEntry thunks are indeed entered at most
        once.  In order to do this, -ticky / -DTICKY_TICKY turns on eager
        blackholing.  We blackhole with new blackholes: SE_BLACKHOLE and
        SE_CAF_BLACKHOLE.  We will enter one of these if we attempt to enter
        a SingleEntry thunk twice.  Note that CAFs are dealt with in by
        codeGen, and ordinary thunks by the RTS.
      
        We also want to see how many times we enter each Updatable thunk.
        To this end, we have modified -ticky.  When -ticky is on, we update
        with a permanent indirection, and arrange that when we enter a
        permanent indirection we count the entry and then convert the
        indirection to a normal indirection.  This gives us a means of
        counting the number of thunks entered again after the first entry.
        Obviously this screws up profiling, and so you can't build a ticky
        and profiling compiler any more.
      
        Also a few other changes that didn't make it into the previous 8
        commits, but form a part of this set.
      eb407ca1
  27. 15 Mar, 1999 1 commit