1. 19 Jan, 2014 1 commit
  2. 25 Oct, 2013 1 commit
    • Simon Marlow's avatar
      Make integer overflow less likely to happen (#7762) · 36b042fb
      Simon Marlow authored
      The particular problematic code in #7762 was this:
      
                  nat newSize = size - n;
                  char *freeAddr = MBLOCK_ROUND_DOWN(bd->start);
                  freeAddr += newSize * MBLOCK_SIZE;
                              ^^^^^^^^^^^^^^^^^^^^^^  OVERFLOW!!!
      
      For good measure, I'm going to fix the bug twice.  This patch fixes
      the class of bugs of this kind, by making sure that any expressions
      involving BLOCK_SIZE or MBLOCK_SIZE are promoted to unsigned long.  In
      a separate patch, I'll fix a bunch of individual instances (including
      the one above).
      36b042fb
  3. 13 Nov, 2012 1 commit
  4. 07 Sep, 2012 1 commit
  5. 02 Feb, 2011 1 commit
    • Simon Marlow's avatar
      A small GC optimisation · bef3da1e
      Simon Marlow authored
      Store the *number* of the destination generation in the Bdescr struct,
      so that in evacuate() we don't have to deref gen to get it.
      This is another improvement ported over from my GC branch.
      bef3da1e
  6. 10 Dec, 2010 1 commit
  7. 18 Jul, 2010 1 commit
    • marcotmarcot's avatar
      Don't check for swept blocks in -DS. · eff182c3
      marcotmarcot authored
      The checkHeap function assumed the allocated part of the block contained only
      alive objects and slops.  This was not true for blocks that are collected using
      mark sweep.  The code in this patch skip the test for this kind of blocks.
      eff182c3
  8. 01 Apr, 2010 1 commit
    • Simon Marlow's avatar
      Change the representation of the MVar blocked queue · f4692220
      Simon Marlow authored
      The list of threads blocked on an MVar is now represented as a list of
      separately allocated objects rather than being linked through the TSOs
      themselves.  This lets us remove a TSO from the list in O(1) time
      rather than O(n) time, by marking the list object.  Removing this
      linear component fixes some pathalogical performance cases where many
      threads were blocked on an MVar and became unreachable simultaneously
      (nofib/smp/threads007), or when sending an asynchronous exception to a
      TSO in a long list of thread blocked on an MVar.
      
      MVar performance has actually improved by a few percent as a result of
      this change, slightly to my surprise.
      
      This is the final cleanup in the sequence, which let me remove the old
      way of waking up threads (unblockOne(), MSG_WAKEUP) in favour of the
      new way (tryWakeupThread and MSG_TRY_WAKEUP, which is idempotent).  It
      is now the case that only the Capability that owns a TSO may modify
      its state (well, almost), and this simplifies various things.  More of
      the RTS is based on message-passing between Capabilities now.
      f4692220
  9. 03 Dec, 2009 1 commit
    • Simon Marlow's avatar
      GC refactoring, remove "steps" · 214b3663
      Simon Marlow authored
      The GC had a two-level structure, G generations each of T steps.
      Steps are for aging within a generation, mostly to avoid premature
      promotion.  
      
      Measurements show that more than 2 steps is almost never worthwhile,
      and 1 step is usually worse than 2.  In theory fractional steps are
      possible, so the ideal number of steps is somewhere between 1 and 3.
      GHC's default has always been 2.
      
      We can implement 2 steps quite straightforwardly by having each block
      point to the generation to which objects in that block should be
      promoted, so blocks in the nursery point to generation 0, and blocks
      in gen 0 point to gen 1, and so on.
      
      This commit removes the explicit step structures, merging generations
      with steps, thus simplifying a lot of code.  Performance is
      unaffected.  The tunable number of steps is now gone, although it may
      be replaced in the future by a way to tune the aging in generation 0.
      214b3663
  10. 29 Nov, 2009 1 commit
    • Simon Marlow's avatar
      Store a destination step in the block descriptor · f9d15f9f
      Simon Marlow authored
      At the moment, this just saves a memory reference in the GC inner loop
      (worth a percent or two of GC time).  Later, it will hopefully let me
      experiment with partial steps, and simplifying the generation/step
      infrastructure.
      f9d15f9f
  11. 02 Aug, 2009 1 commit
    • Simon Marlow's avatar
      RTS tidyup sweep, first phase · a2a67cd5
      Simon Marlow authored
      The first phase of this tidyup is focussed on the header files, and in
      particular making sure we are exposinng publicly exactly what we need
      to, and no more.
      
       - Rts.h now includes everything that the RTS exposes publicly,
         rather than a random subset of it.
      
       - Most of the public header files have moved into subdirectories, and
         many of them have been renamed.  But clients should not need to
         include any of the other headers directly, just #include the main
         public headers: Rts.h, HsFFI.h, RtsAPI.h.
      
       - All the headers needed for via-C compilation have moved into the
         stg subdirectory, which is self-contained.  Most of the headers for
         the rest of the RTS APIs have moved into the rts subdirectory.
      
       - I left MachDeps.h where it is, because it is so widely used in
         Haskell code.
       
       - I left a deprecated stub for RtsFlags.h in place.  The flag
         structures are now exposed by Rts.h.
      
       - Various internal APIs are no longer exposed by public header files.
      
       - Various bits of dead code and declarations have been removed
      
       - More gcc warnings are turned on, and the RTS code is more
         warning-clean.
      
       - More source files #include "PosixSource.h", and hence only use
         standard POSIX (1003.1c-1995) interfaces.
      
      There is a lot more tidying up still to do, this is just the first
      pass.  I also intend to standardise the names for external RTS APIs
      (e.g use the rts_ prefix consistently), and declare the internal APIs
      as hidden for shared libraries.
      a2a67cd5
  12. 29 May, 2009 1 commit
  13. 09 Sep, 2008 1 commit
  14. 09 Jun, 2008 1 commit
  15. 16 Apr, 2008 1 commit
  16. 28 Feb, 2008 1 commit
  17. 20 Feb, 2008 1 commit
  18. 14 Dec, 2006 1 commit
    • Simon Marlow's avatar
      Rework the block allocator · 485b8d1a
      Simon Marlow authored
      The main goal here is to reduce fragmentation, which turns out to be
      the case of #743.  While I was here I found some opportunities to
      improve performance too.  The code is rather more complex, but it also
      contains a long comment describing the strategy, so please take a look
      at that for the details.
      485b8d1a
  19. 30 May, 2006 1 commit
    • Simon Marlow's avatar
      replace stgMallocBytesRWX() with our own allocator · e3c55aeb
      Simon Marlow authored
      See bug #738
      
      Allocating executable memory is getting more difficult these days.  In
      particular, the default SELinux policy on Fedora Core 5 disallows
      making the heap (i.e. malloc()'d memory) executable, although it does
      apparently allow mmap()'ing anonymous executable memory by default.
      
      Previously, stgMallocBytesRWX() used malloc() underneath, and then
      tried to make the page holding the memory executable.  This was rather
      hacky and fails with Fedora Core 5.  
      
      This patch adds a mini-allocator for executable memory, based on the
      block allocator.  We grab page-sized blocks and make them executable,
      then allocate small objects from the page.  There's a simple free
      function, that will free whole pages back to the system when they are
      empty.
      e3c55aeb
  20. 07 Apr, 2006 1 commit
    • Simon Marlow's avatar
      Reorganisation of the source tree · 0065d5ab
      Simon Marlow authored
      Most of the other users of the fptools build system have migrated to
      Cabal, and with the move to darcs we can now flatten the source tree
      without losing history, so here goes.
      
      The main change is that the ghc/ subdir is gone, and most of what it
      contained is now at the top level.  The build system now makes no
      pretense at being multi-project, it is just the GHC build system.
      
      No doubt this will break many things, and there will be a period of
      instability while we fix the dependencies.  A straightforward build
      should work, but I haven't yet fixed binary/source distributions.
      Changes to the Building Guide will follow, too.
      0065d5ab
  21. 21 Oct, 2005 1 commit
    • simonmar's avatar
      [project @ 2005-10-21 14:02:17 by simonmar] · 03a9ff01
      simonmar authored
      Big re-hash of the threaded/SMP runtime
      
      This is a significant reworking of the threaded and SMP parts of
      the runtime.  There are two overall goals here:
      
        - To push down the scheduler lock, reducing contention and allowing
          more parts of the system to run without locks.  In particular,
          the scheduler does not require a lock any more in the common case.
      
        - To improve affinity, so that running Haskell threads stick to the
          same OS threads as much as possible.
      
      At this point we have the basic structure working, but there are some
      pieces missing.  I believe it's reasonably stable - the important
      parts of the testsuite pass in all the (normal,threaded,SMP) ways.
      
      In more detail:
      
        - Each capability now has a run queue, instead of one global run
          queue.  The Capability and Task APIs have been completely
          rewritten; see Capability.h and Task.h for the details.
      
        - Each capability has its own pool of worker Tasks.  Hence, Haskell
          threads on a Capability's run queue will run on the same worker
          Task(s).  As long as the OS is doing something reasonable, this
          should mean they usually stick to the same CPU.  Another way to
          look at this is that we're assuming each Capability is associated
          with a fixed CPU.
      
        - What used to be StgMainThread is now part of the Task structure.
          Every OS thread in the runtime has an associated Task, and it
          can ask for its current Task at any time with myTask().
      
        - removed RTS_SUPPORTS_THREADS symbol, use THREADED_RTS instead
          (it is now defined for SMP too).
      
        - The RtsAPI has had to change; we must explicitly pass a Capability
          around now.  The previous interface assumed some global state.
          SchedAPI has also changed a lot.
      
        - The OSThreads API now supports thread-local storage, used to
          implement myTask(), although it could be done more efficiently
          using gcc's __thread extension when available.
      
        - I've moved some POSIX-specific stuff into the posix subdirectory,
          moving in the direction of separating out platform-specific
          implementations.
      
        - lots of lock-debugging and assertions in the runtime.  In particular,
          when DEBUG is on, we catch multiple ACQUIRE_LOCK()s, and there is
          also an ASSERT_LOCK_HELD() call.
      
      What's missing so far:
      
        - I have almost certainly broken the Win32 build, will fix soon.
      
        - any kind of thread migration or load balancing.  This is high up
          the agenda, though.
      
        - various performance tweaks to do
      
        - throwTo and forkProcess still do not work in SMP mode
      03a9ff01
  22. 13 Jun, 2005 1 commit
    • simonmar's avatar
      [project @ 2005-06-13 12:29:48 by simonmar] · b07f3876
      simonmar authored
      Block allocator performance fix: instead of keeping the free list
      ordered, keep it doubly-linked, and introduce a new flag BF_FREE so we
      can tell when a block is free.  We can still coalesce blocks on the
      free list because block descriptors are kept consecutively in memory,
      so we can tell based on the BF_FREE flag whether to coalesce with the
      next higher/lower blocks when freeing a block.
      
      This (almost) make freeChain O(n) rather than O(n^2), and has been
      reported to help a lot when dealing with very large heaps.
      b07f3876
  23. 12 Apr, 2005 1 commit
    • simonmar's avatar
      [project @ 2005-04-12 09:04:23 by simonmar] · 693550d9
      simonmar authored
      Per-task nurseries for SMP.  This was kind-of implemented before, but
      it's much cleaner now.  There is now one *step* per capability, so we
      have somewhere to hang the block count.  So for SMP, there are simply
      multiple instances of generation 0 step 0.  The rNursery entry in the
      register table now points to the step rather than the head block of
      the nurersy.
      693550d9
  24. 27 Mar, 2005 1 commit
    • panne's avatar
      [project @ 2005-03-27 13:41:13 by panne] · 03dc2dd3
      panne authored
      * Some preprocessors don't like the C99/C++ '//' comments after a
        directive, so use '/* */' instead. For consistency, a lot of '//' in
        the include files were converted, too.
      
      * UnDOSified libraries/base/cbits/runProcess.c.
      
      * My favourite sport: Killed $Id$s.
      03dc2dd3
  25. 10 Feb, 2005 1 commit
    • simonmar's avatar
      [project @ 2005-02-10 13:01:52 by simonmar] · e7c3f957
      simonmar authored
      GC changes: instead of threading old-generation mutable lists
      through objects in the heap, keep it in a separate flat array.
      
      This has some advantages:
      
        - the IND_OLDGEN object is now only 2 words, so the minimum
          size of a THUNK is now 2 words instead of 3.  This saves
          some amount of allocation (about 2% on average according to
          my measurements), and is more friendly to the cache by
          squashing objects together more.
      
        - keeping the mutable list separate from the IND object
          will be necessary for our multiprocessor implementation.
      
        - removing the mut_link field makes the layout of some objects
          more uniform, leading to less complexity and special cases.
      
        - I also unified the two mutable lists (mut_once_list and mut_list)
          into a single mutable list, which lead to more simplifications
          in the GC.
      e7c3f957
  26. 13 Aug, 2004 1 commit
  27. 26 Nov, 2003 1 commit
  28. 12 Nov, 2003 1 commit
    • sof's avatar
      [project @ 2003-11-12 17:27:00 by sof] · a814590c
      sof authored
      Tidy up a couple of unportable coding issues:
      
      - conditionally use empty structs.
      - use GNU attributes only if supported.
      - 'long long' usage
      - use of 'inline' in declarations and definitions.
      
      Upshot of these changes is that MSVC is now capable of compiling
      the non-.hc portions of the RTS.
      a814590c
  29. 23 Sep, 2003 1 commit
    • simonmar's avatar
      [project @ 2003-09-23 15:38:35 by simonmar] · 76ebf3dc
      simonmar authored
      Add a BF_PINNED block flag, and attach it to blocks containing pinned
      objects (in addition to the usual BF_LARGE).
      
      In heapCensus, we now ignore blocks containing pinned objects, because
      they might contain gaps, and in any case it isn't clear that we want
      to include the whole block in a heap census, because much of it might
      well be dead.  Ignoring it isn't right either, though, so this patch
      just fixes the crash and leaves a ToDo.
      76ebf3dc
  30. 28 Mar, 2003 1 commit
  31. 25 Mar, 2003 1 commit
  32. 11 Dec, 2002 1 commit
    • simonmar's avatar
      [project @ 2002-12-11 15:36:20 by simonmar] · 0bffc410
      simonmar authored
      Merge the eval-apply-branch on to the HEAD
      ------------------------------------------
      
      This is a change to GHC's evaluation model in order to ultimately make
      GHC more portable and to reduce complexity in some areas.
      
      At some point we'll update the commentary to describe the new state of
      the RTS.  Pending that, the highlights of this change are:
      
        - No more Su.  The Su register is gone, update frames are one
          word smaller.
      
        - Slow-entry points and arg checks are gone.  Unknown function calls
          are handled by automatically-generated RTS entry points (AutoApply.hc,
          generated by the program in utils/genapply).
      
        - The stack layout is stricter: there are no "pending arguments" on
          the stack any more, the stack is always strictly a sequence of
          stack frames.
      
          This means that there's no need for LOOKS_LIKE_GHC_INFO() or
          LOOKS_LIKE_STATIC_CLOSURE() any more, and GHC doesn't need to know
          how to find the boundary between the text and data segments (BIG WIN!).
      
        - A couple of nasty hacks in the mangler caused by the neet to
          identify closure ptrs vs. info tables have gone away.
      
        - Info tables are a bit more complicated.  See InfoTables.h for the
          details.
      
        - As a side effect, GHCi can now deal with polymorphic seq.  Some bugs
          in GHCi which affected primitives and unboxed tuples are now
          fixed.
      
        - Binary sizes are reduced by about 7% on x86.  Performance is roughly
          similar, some programs get faster while some get slower.  I've seen
          GHCi perform worse on some examples, but haven't investigated
          further yet (GHCi performance *should* be about the same or better
          in theory).
      
        - Internally the code generator is rather better organised.  I've moved
          info-table generation from the NCG into the main codeGen where it is
          shared with the C back-end; info tables are now emitted as arrays
          of words in both back-ends.  The NCG is one step closer to being able
          to support profiling.
      
      This has all been fairly thoroughly tested, but no doubt I've messed
      up the commit in some way.
      0bffc410
  33. 03 Oct, 2001 1 commit
    • simonmar's avatar
      [project @ 2001-10-03 13:57:42 by simonmar] · b4623557
      simonmar authored
      Tidy up ghc/includes/Constants and related things.
      
      Now all the constants that the compiler needs to know, such as header
      size, update frame size, info table size and so on are generated
      automatically into a header file, DeriviedConstants.h, by a small C
      program in the same way as NativeDefs.h.  The C code in the RTS is
      expected to use sizeof() directly (it already does).
      
      Also tidied up the constants in MachDeps.h - all the constants
      representing the sizes of various types are named SIZEOF_<foo>, to
      match the constants defined in config.h.  PrelStorable.lhs now doesn't
      contain any special knowledge about GHC's conventions as regards the
      size of certain types, this is all in MachDeps.h.
      b4623557
  34. 23 Jul, 2001 2 commits
    • simonmar's avatar
      [project @ 2001-07-23 17:23:19 by simonmar] · dfd7d6d0
      simonmar authored
      Add a compacting garbage collector.
      
      It isn't enabled by default, as there are still a couple of problems:
      there's a fallback case I haven't implemented yet which means it will
      occasionally bomb out, and speed-wise it's quite a bit slower than the
      copying collector (about 1.8x slower).
      
      Until I can make it go faster, it'll only be useful when you're
      actually running low on real memory.
      
      '+RTS -c' to enable it.
      
      Oh, and I cleaned up a few things in the RTS while I was there, and
      fixed one or two possibly real bugs in the existing GC.
      dfd7d6d0
    • simonmar's avatar
      [project @ 2001-07-23 10:47:16 by simonmar] · 6f83fbc0
      simonmar authored
      Small changes to improve GC performance slightly:
      
        - store the generation *number* in the block descriptor rather
          than a pointer to the generation structure, since the most
          common operation is to pull out the generation number, and
          it's one less indirection this way.
      
        - cache the generation number in the step structure too, which
          avoids an extra indirection in several places.
      6f83fbc0
  35. 05 Apr, 2000 1 commit
    • panne's avatar
      [project @ 2000-04-05 14:26:31 by panne] · a5841e53
      panne authored
      Changed a bunch of `#endif FOO' to `#endif /* FOO */', the former is
      not strictly ANSI (don't know if the latter is, but `gcc -Wall -ansi
      -pedantic' is silent then).
      a5841e53
  36. 09 Nov, 1999 1 commit
    • simonmar's avatar
      [project @ 1999-11-09 15:46:49 by simonmar] · 30681e79
      simonmar authored
      A slew of SMP-related changes.
      
       - New locking scheme for thunks: we now check whether the thunk
         being entered is in our private allocation area, and if so
         we don't lock it.  Well, that's the upshot.  In practice it's
         a lot more fiddly than that.
      
       - I/O blocking is handled a bit more sanely now (but still not
         properly, methinks)
      
       - deadlock detection is back
      
       - remove old pre-SMP scheduler code
      
       - revamp the timing code.  We actually get reasonable-looking
         timing info for SMP programs now.
      
       - fix a bug in the garbage collector to do with IND_OLDGENs appearing
         on the mutable list of the old generation.
      
       - move BDescr() function from rts/BlockAlloc.h to includes/Block.h.
      
       - move struct generation and struct step into includes/StgStorage.h (sigh)
      
       - add UPD_IND_NOLOCK for updating with an indirection where locking
         the black hole is not required.
      30681e79
  37. 02 Mar, 1999 1 commit
  38. 05 Feb, 1999 1 commit
  39. 13 Jan, 1999 1 commit
    • simonm's avatar
      [project @ 1999-01-13 17:25:37 by simonm] · 4391e44f
      simonm authored
      Added a generational garbage collector.
      
      The collector is reliable but fairly untuned as yet.  It works with an
      arbitrary number of generations: use +RTS -G<gens> to change the
      number of generations used (default 2).
      
      Stats: +RTS -Sstderr is quite useful, but to really see what's going
      on compile the RTS with -DDEBUG and use +RTS -D32.
      
      ARR_PTRS removed - it wasn't used anywhere.
      
      Sanity checking improved:
      	- free blocks are now spammed when sanity checking is turned on
      	- a check for leaking blocks is performed after each GC.
      4391e44f