1. 03 Aug, 2009 1 commit
  2. 02 Aug, 2009 1 commit
    • Simon Marlow's avatar
      RTS tidyup sweep, first phase · a2a67cd5
      Simon Marlow authored
      The first phase of this tidyup is focussed on the header files, and in
      particular making sure we are exposinng publicly exactly what we need
      to, and no more.
      
       - Rts.h now includes everything that the RTS exposes publicly,
         rather than a random subset of it.
      
       - Most of the public header files have moved into subdirectories, and
         many of them have been renamed.  But clients should not need to
         include any of the other headers directly, just #include the main
         public headers: Rts.h, HsFFI.h, RtsAPI.h.
      
       - All the headers needed for via-C compilation have moved into the
         stg subdirectory, which is self-contained.  Most of the headers for
         the rest of the RTS APIs have moved into the rts subdirectory.
      
       - I left MachDeps.h where it is, because it is so widely used in
         Haskell code.
       
       - I left a deprecated stub for RtsFlags.h in place.  The flag
         structures are now exposed by Rts.h.
      
       - Various internal APIs are no longer exposed by public header files.
      
       - Various bits of dead code and declarations have been removed
      
       - More gcc warnings are turned on, and the RTS code is more
         warning-clean.
      
       - More source files #include "PosixSource.h", and hence only use
         standard POSIX (1003.1c-1995) interfaces.
      
      There is a lot more tidying up still to do, this is just the first
      pass.  I also intend to standardise the names for external RTS APIs
      (e.g use the rts_ prefix consistently), and declare the internal APIs
      as hidden for shared libraries.
      a2a67cd5
  3. 18 Nov, 2008 1 commit
    • Simon Marlow's avatar
      Add optional eager black-holing, with new flag -feager-blackholing · d600bf7a
      Simon Marlow authored
      Eager blackholing can improve parallel performance by reducing the
      chances that two threads perform the same computation.  However, it
      has a cost: one extra memory write per thunk entry.  
      
      To get the best results, any code which may be executed in parallel
      should be compiled with eager blackholing turned on.  But since
      there's a cost for sequential code, we make it optional and turn it on
      for the parallel package only.  It might be a good idea to compile
      applications (or modules) with parallel code in with
      -feager-blackholing.
      
      ToDo: document -feager-blackholing.
      d600bf7a
  4. 12 Sep, 2008 1 commit
  5. 17 Apr, 2008 1 commit
  6. 16 Apr, 2008 3 commits
  7. 12 Oct, 2007 1 commit
  8. 11 Oct, 2007 1 commit
    • Simon Marlow's avatar
      Add a proper write barrier for MVars · 1ed01a87
      Simon Marlow authored
      Previously MVars were always on the mutable list of the old
      generation, which meant every MVar was visited during every minor GC.
      With lots of MVars hanging around, this gets expensive.  We addressed
      this problem for MUT_VARs (aka IORefs) a while ago, the solution is to
      use a traditional GC write-barrier when the object is modified.  This
      patch does the same thing for MVars.
      
      TVars are still done the old way, they could probably benefit from the
      same treatment too.
      1ed01a87
  9. 03 Sep, 2007 1 commit
  10. 26 Aug, 2007 1 commit
  11. 08 Aug, 2007 1 commit
  12. 27 Jul, 2007 1 commit
    • Simon Marlow's avatar
      Pointer Tagging · 6015a94f
      Simon Marlow authored
        
      This patch implements pointer tagging as per our ICFP'07 paper "Faster
      laziness using dynamic pointer tagging".  It improves performance by
      10-15% for most workloads, including GHC itself.
      
      The original patches were by Alexey Rodriguez Yakushev
      <mrchebas@gmail.com>, with additions and improvements by me.  I've
      re-recorded the development as a single patch.
      
      The basic idea is this: we use the low 2 bits of a pointer to a heap
      object (3 bits on a 64-bit architecture) to encode some information
      about the object pointed to.  For a constructor, we encode the "tag"
      of the constructor (e.g. True vs. False), for a function closure its
      arity.  This enables some decisions to be made without dereferencing
      the pointer, which speeds up some common operations.  In particular it
      enables us to avoid costly indirect jumps in many cases.
      
      More information in the commentary:
      
      http://hackage.haskell.org/trac/ghc/wiki/Commentary/Rts/HaskellExecution/PointerTagging
      6015a94f
  13. 13 Jun, 2007 1 commit
    • Simon Marlow's avatar
      FIX #1418 (partially) · 23e5985c
      Simon Marlow authored
      When the con_desc field of an info table was made into a relative
      reference, this had the side effect of making the profiling fields
      (closure_desc and closure_type) also relative, but only when compiling
      via C, and the heap profiler was still treating them as absolute,
      leading to crashes when profiling with -hd or -hy.
      
      This patch fixes up the story to be consistent: these fields really
      should be relative (otherwise we couldn't make shared versions of the
      profiling libraries), so I've made them relative and fixed up the RTS
      to know about this.
      23e5985c
  14. 28 Feb, 2007 1 commit
  15. 15 Nov, 2006 1 commit
  16. 10 Nov, 2006 1 commit
  17. 24 Oct, 2006 1 commit
    • Simon Marlow's avatar
      Split GC.c, and move storage manager into sm/ directory · ab0e778c
      Simon Marlow authored
      In preparation for parallel GC, split up the monolithic GC.c file into
      smaller parts.  Also in this patch (and difficult to separate,
      unfortunatley):
        
        - Don't include Stable.h in Rts.h, instead just include it where
          necessary.
        
        - consistently use STATIC_INLINE in source files, and INLINE_HEADER
          in header files.  STATIC_INLINE is now turned off when DEBUG is on,
          to make debugging easier.
        
        - The GC no longer takes the get_roots function as an argument.
          We weren't making use of this generalisation.
      ab0e778c
  18. 07 Oct, 2006 1 commit
  19. 07 Sep, 2006 1 commit
    • Simon Marlow's avatar
      Remove CONSTR_CHARLIKE and CONSTR_INTLIKE closure types · a0be7e7c
      Simon Marlow authored
      These closure types aren't used/needed, as far as I can tell.  The
      commoning up of Chars/Ints happens by comparing info pointers, and
      the info table for a dynamic C#/I# is CONSTR_0_1.  The RTS seemed
      a little confused about whether CONSTR_CHARLIKE/CONSTR_INTLIKE were
      supposed to be static or dynamic closures, too.
      a0be7e7c
  20. 07 Apr, 2006 1 commit
    • Simon Marlow's avatar
      Reorganisation of the source tree · 0065d5ab
      Simon Marlow authored
      Most of the other users of the fptools build system have migrated to
      Cabal, and with the move to darcs we can now flatten the source tree
      without losing history, so here goes.
      
      The main change is that the ghc/ subdir is gone, and most of what it
      contained is now at the top level.  The build system now makes no
      pretense at being multi-project, it is just the GHC build system.
      
      No doubt this will break many things, and there will be a period of
      instability while we fix the dependencies.  A straightforward build
      should work, but I haven't yet fixed binary/source distributions.
      Changes to the Building Guide will follow, too.
      0065d5ab
  21. 21 Mar, 2006 2 commits
  22. 08 Feb, 2006 1 commit
    • Simon Marlow's avatar
      make the smp way RTS-only, normal libraries now work with -smp · beb5737b
      Simon Marlow authored
      We had to bite the bullet here and add an extra word to every thunk,
      to enable running ordinary libraries on SMP.  Otherwise, we would have
      needed to ship an extra set of libraries with GHC 6.6 in addition to
      the two sets we already ship (normal + profiled), and all Cabal
      packages would have to be compiled for SMP too.  We decided it best
      just to take the hit now, making SMP easily accessible to everyone in
      GHC 6.6.
      
      Incedentally, although this increases allocation by around 12% on
      average, the performance hit is around 5%, and much less if your inner
      loop doesn't use any laziness.
      beb5737b
  23. 17 Jan, 2006 2 commits
    • simonmar's avatar
      [project @ 2006-01-17 16:13:18 by simonmar] · 91b07216
      simonmar authored
      Improve the GC behaviour of IORefs (see Ticket #650).
      
      This is a small change to the way IORefs interact with the GC, which
      should improve GC performance for programs with plenty of IORefs.
      
      Previously we had a single closure type for mutable variables,
      MUT_VAR.  Mutable variables were *always* on the mutable list in older
      generations, and always traversed on every GC.
      
      Now, we have two closure types: MUT_VAR_CLEAN and MUT_VAR_DIRTY.  The
      latter is on the mutable list, but the former is not.  (NB. this
      differs from MUT_ARR_PTRS_CLEAN and MUT_ARR_PTRS_DIRTY, both of which
      are on the mutable list).  writeMutVar# now implements a write
      barrier, by calling dirty_MUT_VAR() in the runtime, that does the
      necessary modification of MUT_VAR_CLEAN into MUT_VAR_DIRY, and adding
      to the mutable list if necessary.
      
      This results in some pretty dramatic speedups for GHC itself.  I've
      just measureed a 30% overall speedup compiling a 31-module program
      (anna) with the default heap settings :-D
      91b07216
    • simonmar's avatar
      [project @ 2006-01-17 16:03:47 by simonmar] · da69fa9c
      simonmar authored
      Improve the GC behaviour of IOArrays/STArrays
      
      See Ticket #650
      
      This is a small change to the way mutable arrays interact with the GC,
      that can have a dramatic effect on performance, and make tricks with
      unsafeThaw/unsafeFreeze redundant.  Data.HashTable should be faster
      now (I haven't measured it yet).
      
      We now have two mutable array closure types, MUT_ARR_PTRS_CLEAN and
      MUT_ARR_PTRS_DIRTY.  Both are on the mutable list if the array is in
      an old generation.  writeArray# sets the type to MUT_ARR_PTRS_DIRTY.
      The garbage collector can set the type to MUT_ARR_PTRS_CLEAN if it
      finds that no element of the array points into a younger generation
      (discovering this required a small addition to evacuate(), but rough
      tests indicate that it doesn't measurably affect performance).
      
      NOTE: none of this affects unboxed arrays (IOUArray/STUArray), only
      boxed arrays (IOArray/STArray).
      
      We could go further and extend the DIRTY bit to be per-block rather
      than for the whole array, but for now this is an easy improvement.
      da69fa9c
  24. 26 Jul, 2005 2 commits
  25. 25 Jul, 2005 1 commit
    • simonmar's avatar
      [project @ 2005-07-25 14:12:48 by simonmar] · e792bb84
      simonmar authored
      Remove the ForeignObj# type, and all its PrimOps.  The new efficient
      representation of ForeignPtr doesn't use ForeignObj# underneath, and
      there seems no need to keep it.
      e792bb84
  26. 13 Jun, 2005 1 commit
    • simonmar's avatar
      [project @ 2005-06-13 12:29:48 by simonmar] · b07f3876
      simonmar authored
      Block allocator performance fix: instead of keeping the free list
      ordered, keep it doubly-linked, and introduce a new flag BF_FREE so we
      can tell when a block is free.  We can still coalesce blocks on the
      free list because block descriptors are kept consecutively in memory,
      so we can tell based on the BF_FREE flag whether to coalesce with the
      next higher/lower blocks when freeing a block.
      
      This (almost) make freeChain O(n) rather than O(n^2), and has been
      reported to help a lot when dealing with very large heaps.
      b07f3876
  27. 26 Apr, 2005 1 commit
  28. 22 Apr, 2005 2 commits
    • sof's avatar
      [project @ 2005-04-22 16:49:38 by sof] · 23e16cda
      sof authored
      resetStaticObjectForRetainerProfiling(): warning wibble
      23e16cda
    • simonmar's avatar
      [project @ 2005-04-22 09:32:39 by simonmar] · 0f3205e6
      simonmar authored
      SMP: the rest of the changes to support safe thunk entry & updates.  I
      thought the compiler changes were independent, but I ended up breaking
      the HEAD, so I'll have to commit the rest.  non-SMP compilation should
      not be affected.
      0f3205e6
  29. 05 Apr, 2005 2 commits
    • simonmar's avatar
      [project @ 2005-04-05 12:19:54 by simonmar] · 16214216
      simonmar authored
      Some multi-processor hackery, including
      
        - Don't hang blocked threads off BLACKHOLEs any more, instead keep
          them all on a separate queue which is checked periodically for
          threads to wake up.
      
          This is good because (a) we don't have to worry about locking the
          closure in SMP mode when we want to block on it, and (b) it means
          the standard update code doesn't need to wake up any threads or
          check for a BLACKHOLE_BQ, simplifying the update code.
      
          The downside is that if there are lots of threads blocked on
          BLACKHOLEs, we might have to do a lot of repeated list traversal.
          We don't expect this to be common, though.  conc023 goes slower
          with this change, but we expect most programs to benefit from the
          shorter update code.
      
        - Fixing up the Capability code to handle multiple capabilities (SMP
          mode), and related changes to get the SMP mode at least building.
      16214216
    • simonmar's avatar
      [project @ 2005-04-05 09:38:00 by simonmar] · 3f4fd743
      simonmar authored
      Main x86_64 hacking: we have a problem on this arch where binutils
      can't generate 64-bit relative relocations (R_X86_64_PC64), which many
      of our info-table fields are.  So far we've been hacking around it by
      putting everything in the text section, but I've decided to adopt
      another approach: we'll use explicit 32-bit offset fields on this
      platform instead.  This is safe in the default "small" memory model
      where all symbols are guaranteed to be in the lower 2Gb of the address
      space.
      
      NCG changes coming; mangler changes are probably required too.
      3f4fd743
  30. 27 Mar, 2005 1 commit
    • panne's avatar
      [project @ 2005-03-27 13:41:13 by panne] · 03dc2dd3
      panne authored
      * Some preprocessors don't like the C99/C++ '//' comments after a
        directive, so use '/* */' instead. For consistency, a lot of '//' in
        the include files were converted, too.
      
      * UnDOSified libraries/base/cbits/runProcess.c.
      
      * My favourite sport: Killed $Id$s.
      03dc2dd3
  31. 24 Feb, 2005 1 commit
  32. 10 Feb, 2005 1 commit
    • simonmar's avatar
      [project @ 2005-02-10 13:01:52 by simonmar] · e7c3f957
      simonmar authored
      GC changes: instead of threading old-generation mutable lists
      through objects in the heap, keep it in a separate flat array.
      
      This has some advantages:
      
        - the IND_OLDGEN object is now only 2 words, so the minimum
          size of a THUNK is now 2 words instead of 3.  This saves
          some amount of allocation (about 2% on average according to
          my measurements), and is more friendly to the cache by
          squashing objects together more.
      
        - keeping the mutable list separate from the IND object
          will be necessary for our multiprocessor implementation.
      
        - removing the mut_link field makes the layout of some objects
          more uniform, leading to less complexity and special cases.
      
        - I also unified the two mutable lists (mut_once_list and mut_list)
          into a single mutable list, which lead to more simplifications
          in the GC.
      e7c3f957
  33. 07 Oct, 2004 1 commit
    • wolfgang's avatar
      [project @ 2004-10-07 15:54:03 by wolfgang] · b4d045ae
      wolfgang authored
      Position Independent Code and Dynamic Linking Support, Part 1
      
      This commit allows generation of position independent code (PIC) that fully supports dynamic linking on Mac OS X and PowerPC Linux.
      Other platforms are not yet supported, and there is no support for actually linking or using dynamic libraries - so if you use the -fPIC or -dynamic code generation flags, you have to type your (platform-specific) linker command lines yourself.
      
      
      nativeGen/PositionIndependentCode.hs:
      New file. Look here for some more comments on how this works.
      
      cmm/CLabel.hs:
      Add support for DynamicLinkerLabels and PIC base labels - for use inside the NCG.
      needsCDecl: Case alternative labels now need C decls, see the codeGen/CgInfoTbls.hs below for details
      
      cmm/Cmm.hs:
      Add CmmPicBaseReg (used in NCG),
      and CmmLabelDiffOff (used in NCG and for offsets in info tables)
      
      cmm/CmmParse.y:
      support offsets in info tables
      
      cmm/PprC.hs:
      support CmmLabelDiffOff
      Case alternative labels now need C decls (see the codeGen/CgInfoTbls.hs for details), so we need to pprDataExterns for info tables.
      
      cmm/PprCmm.hs:
      support CmmLabelDiffOff
      
      codeGen/CgInfoTbls.hs:
      no longer store absolute addresses in info tables, instead, we store offsets.
      Also, for vectored return points, emit the alternatives _after_ the vector table. This is to work around a limitation in Apple's as, which refuses to handle label differences where one label is at the end of a section. Emitting alternatives after vector info tables makes sure this never happens in GHC generated code. Case alternatives now require prototypes in hc code, though (see changes in PprC.hs, CLabel.hs).
      
      main/CmdLineOpts.lhs:
      Add a new option, -fPIC.
      
      main/DriverFlags.hs:
      Pass the correct options for PIC to gcc, depending on the platform. Only for powerpc for now.
      
      nativeGen/AsmCodeGen.hs:
      Many changes...
      Mac OS X-specific management of import stubs is no longer, it's now part of a general mechanism to handle such things for all platforms that need it (Darwin [both ppc and x86], Linux on ppc, and some platforms we don't support).
      Move cmmToCmm into its own monad which can accumulate a list of imported symbols. Make it call cmmMakeDynamicReference at the right places.
      
      nativeGen/MachCodeGen.hs:
      nativeGen/MachInstrs.hs:
      nativeGen/MachRegs.lhs:
      nativeGen/PprMach.hs:
      nativeGen/RegAllocInfo.hs:
      Too many changes to enumerate here, PowerPC specific.
      
      nativeGen/NCGMonad.hs:
      NatM still tracks imported symbols, as more labels can be created during code generation (float literals, jump tables; on some platforms all data access has to go through the dynamic linking mechanism).
      
      driver/mangler/ghc-asm.lprl:
      Mangle absolute addresses in info tables to offsets.
      Correctly pass through GCC-generated PIC for Mac OS X and powerpc linux.
      
      includes/Cmm.h:
      includes/InfoTables.h:
      includes/Storage.h:
      includes/mkDerivedConstants.c:
      rts/GC.c:
      rts/GCCompact.c:
      rts/HeapStackCheck.cmm:
      rts/Printer.c:
      rts/RetainerProfile.c:
      rts/Sanity.c:
      Adapt to the fact that info tables now contain offsets.
      
      rts/Linker.c:
      Mac-specific: change machoInitSymbolsWithoutUnderscore to support PIC.
      b4d045ae