1. 10 Feb, 2005 1 commit
    • simonmar's avatar
      [project @ 2005-02-10 13:01:52 by simonmar] · e7c3f957
      simonmar authored
      GC changes: instead of threading old-generation mutable lists
      through objects in the heap, keep it in a separate flat array.
      
      This has some advantages:
      
        - the IND_OLDGEN object is now only 2 words, so the minimum
          size of a THUNK is now 2 words instead of 3.  This saves
          some amount of allocation (about 2% on average according to
          my measurements), and is more friendly to the cache by
          squashing objects together more.
      
        - keeping the mutable list separate from the IND object
          will be necessary for our multiprocessor implementation.
      
        - removing the mut_link field makes the layout of some objects
          more uniform, leading to less complexity and special cases.
      
        - I also unified the two mutable lists (mut_once_list and mut_list)
          into a single mutable list, which lead to more simplifications
          in the GC.
      e7c3f957
  2. 07 Oct, 2004 1 commit
    • wolfgang's avatar
      [project @ 2004-10-07 15:54:03 by wolfgang] · b4d045ae
      wolfgang authored
      Position Independent Code and Dynamic Linking Support, Part 1
      
      This commit allows generation of position independent code (PIC) that fully supports dynamic linking on Mac OS X and PowerPC Linux.
      Other platforms are not yet supported, and there is no support for actually linking or using dynamic libraries - so if you use the -fPIC or -dynamic code generation flags, you have to type your (platform-specific) linker command lines yourself.
      
      
      nativeGen/PositionIndependentCode.hs:
      New file. Look here for some more comments on how this works.
      
      cmm/CLabel.hs:
      Add support for DynamicLinkerLabels and PIC base labels - for use inside the NCG.
      needsCDecl: Case alternative labels now need C decls, see the codeGen/CgInfoTbls.hs below for details
      
      cmm/Cmm.hs:
      Add CmmPicBaseReg (used in NCG),
      and CmmLabelDiffOff (used in NCG and for offsets in info tables)
      
      cmm/CmmParse.y:
      support offsets in info tables
      
      cmm/PprC.hs:
      support CmmLabelDiffOff
      Case alternative labels now need C decls (see the codeGen/CgInfoTbls.hs for details), so we need to pprDataExterns for info tables.
      
      cmm/PprCmm.hs:
      support CmmLabelDiffOff
      
      codeGen/CgInfoTbls.hs:
      no longer store absolute addresses in info tables, instead, we store offsets.
      Also, for vectored return points, emit the alternatives _after_ the vector table. This is to work around a limitation in Apple's as, which refuses to handle label differences where one label is at the end of a section. Emitting alternatives after vector info tables makes sure this never happens in GHC generated code. Case alternatives now require prototypes in hc code, though (see changes in PprC.hs, CLabel.hs).
      
      main/CmdLineOpts.lhs:
      Add a new option, -fPIC.
      
      main/DriverFlags.hs:
      Pass the correct options for PIC to gcc, depending on the platform. Only for powerpc for now.
      
      nativeGen/AsmCodeGen.hs:
      Many changes...
      Mac OS X-specific management of import stubs is no longer, it's now part of a general mechanism to handle such things for all platforms that need it (Darwin [both ppc and x86], Linux on ppc, and some platforms we don't support).
      Move cmmToCmm into its own monad which can accumulate a list of imported symbols. Make it call cmmMakeDynamicReference at the right places.
      
      nativeGen/MachCodeGen.hs:
      nativeGen/MachInstrs.hs:
      nativeGen/MachRegs.lhs:
      nativeGen/PprMach.hs:
      nativeGen/RegAllocInfo.hs:
      Too many changes to enumerate here, PowerPC specific.
      
      nativeGen/NCGMonad.hs:
      NatM still tracks imported symbols, as more labels can be created during code generation (float literals, jump tables; on some platforms all data access has to go through the dynamic linking mechanism).
      
      driver/mangler/ghc-asm.lprl:
      Mangle absolute addresses in info tables to offsets.
      Correctly pass through GCC-generated PIC for Mac OS X and powerpc linux.
      
      includes/Cmm.h:
      includes/InfoTables.h:
      includes/Storage.h:
      includes/mkDerivedConstants.c:
      rts/GC.c:
      rts/GCCompact.c:
      rts/HeapStackCheck.cmm:
      rts/Printer.c:
      rts/RetainerProfile.c:
      rts/Sanity.c:
      Adapt to the fact that info tables now contain offsets.
      
      rts/Linker.c:
      Mac-specific: change machoInitSymbolsWithoutUnderscore to support PIC.
      b4d045ae
  3. 13 Aug, 2004 2 commits
  4. 12 Nov, 2003 1 commit
    • sof's avatar
      [project @ 2003-11-12 17:49:05 by sof] · 20593d1d
      sof authored
      Tweaks to have RTS (C) sources compile with MSVC. Apart from wibbles
      related to the handling of 'inline', changed Schedule.h:POP_RUN_QUEUE()
      not to use expression-level statement blocks.
      20593d1d
  5. 22 Apr, 2003 1 commit
    • simonmar's avatar
      [project @ 2003-04-22 16:25:08 by simonmar] · 1da232fc
      simonmar authored
      Fix an obscure bug: the most general kind of heap check,
      HEAP_CHECK_GEN(), is supposed to save the contents of *every* register
      known to the STG machine (used in cases where we either can't figure
      out which ones are live, or doing so would be too much hassle).  The
      problem is that it wasn't saving the L1 register.
      
      A slight complication arose in that saving the L1 register pushed the
      size of the frame over the 16 words allowed for the size of the bitmap
      stored in the frame, so I changed the layout of the frame a bit.
      Describing all the registers using a single bitmap is overkill when
      only 8 of them can actually be pointers, so now the bitmap is only 8
      bits long and we always skip over a fixed number of non-ptr words to
      account for all the non-ptr regs.  This is all described in StgMacros.h.
      1da232fc
  6. 27 Mar, 2003 1 commit
    • simonmar's avatar
      [project @ 2003-03-27 13:54:31 by simonmar] · bf8b921f
      simonmar authored
      Two performance tweaks:
      
        - Use specialised indirections, which perform the right kind of
          return without needing to enter the object they point to.  This
          saves a small percentages of memory reads.
      
        - Tweak the update code to generate better code with gcc.  This
          saves a few instructions per update.
      bf8b921f
  7. 26 Mar, 2003 1 commit
  8. 24 Mar, 2003 1 commit
    • simonmar's avatar
      [project @ 2003-03-24 14:46:53 by simonmar] · b3f53081
      simonmar authored
      Fix some bugs in compacting GC.
      
      Bug 1: When threading the fields of an AP or PAP, we were grabbing the
      info table of the function without unthreading it first.
      
      Bug 2: eval_thunk_selector() might accidentally find itself in
      to-space when going through indirections in a compacted generation.
      We must check for this case and bale out if necessary.
      
      Bug 3: This is somewhat more nasty.  When we have an AP or PAP that
      points to a BCO, the layout info for the AP/PAP is in the BCO's
      instruction array, which is two objects deep from the AP/PAP itself.
      The trouble is, during compacting GC, we can only safely look one
      object deep from the current object, because pointers from objects any
      deeper might have been already updated to point to their final
      destinations.
      
      The solution is to put the arity and bitmap info for a BCO into the
      BCO object itself.  This means BCOs become variable-length, which is a
      slight annoyance, but it also means that looking up the arity/bitmap
      is quicker.  There is a slight reduction in complexity in the byte
      code generator due to not having to stuff the bitmap at the front of
      the instruction stream.
      b3f53081
  9. 21 Mar, 2003 1 commit
    • sof's avatar
      [project @ 2003-03-21 16:18:37 by sof] · 557bca73
      sof authored
      Friday morning code-wibbling:
      - made RetainerProfile.c:firstStack a 'static'
      - added RetainerProfile.c:retainerStackBlocks()
      557bca73
  10. 13 Dec, 2002 1 commit
  11. 11 Dec, 2002 1 commit
    • simonmar's avatar
      [project @ 2002-12-11 15:36:20 by simonmar] · 0bffc410
      simonmar authored
      Merge the eval-apply-branch on to the HEAD
      ------------------------------------------
      
      This is a change to GHC's evaluation model in order to ultimately make
      GHC more portable and to reduce complexity in some areas.
      
      At some point we'll update the commentary to describe the new state of
      the RTS.  Pending that, the highlights of this change are:
      
        - No more Su.  The Su register is gone, update frames are one
          word smaller.
      
        - Slow-entry points and arg checks are gone.  Unknown function calls
          are handled by automatically-generated RTS entry points (AutoApply.hc,
          generated by the program in utils/genapply).
      
        - The stack layout is stricter: there are no "pending arguments" on
          the stack any more, the stack is always strictly a sequence of
          stack frames.
      
          This means that there's no need for LOOKS_LIKE_GHC_INFO() or
          LOOKS_LIKE_STATIC_CLOSURE() any more, and GHC doesn't need to know
          how to find the boundary between the text and data segments (BIG WIN!).
      
        - A couple of nasty hacks in the mangler caused by the neet to
          identify closure ptrs vs. info tables have gone away.
      
        - Info tables are a bit more complicated.  See InfoTables.h for the
          details.
      
        - As a side effect, GHCi can now deal with polymorphic seq.  Some bugs
          in GHCi which affected primitives and unboxed tuples are now
          fixed.
      
        - Binary sizes are reduced by about 7% on x86.  Performance is roughly
          similar, some programs get faster while some get slower.  I've seen
          GHCi perform worse on some examples, but haven't investigated
          further yet (GHCi performance *should* be about the same or better
          in theory).
      
        - Internally the code generator is rather better organised.  I've moved
          info-table generation from the NCG into the main codeGen where it is
          shared with the C back-end; info tables are now emitted as arrays
          of words in both back-ends.  The NCG is one step closer to being able
          to support profiling.
      
      This has all been fairly thoroughly tested, but no doubt I've messed
      up the commit in some way.
      0bffc410
  12. 21 Oct, 2002 1 commit
    • simonmar's avatar
      [project @ 2002-10-21 11:38:53 by simonmar] · 2be44cb2
      simonmar authored
      Bite the bullet and generalise the central memory allocation scheme.
      Previously we tried to allocate memory starting from a fixed address,
      which was set for each architecture (0x5000000 was a common one), and
      to decide whether a particular address was in the heap or not we would
      do a simple comparison against this address.
      
      This doesn't work too well, because:
      
       - if we dynamically-load some objects above the boundary, the
         heap-allocated test becomes invalid
      
       - on windows we have less control, and the heap might be
         split into multiple sections
      
       - it turns out that on some Linux kernels we don't get memory where
         we asked for it.  This might be a bug in those kernels, but it
         exposes the fragility of our allocation scheme.
      
      The solution is to bite the bullet and maintain a table mapping
      addresses to a value indicating whether that address is in the heap or
      not.  Since we normally allocate heap in chunks of 1Mb, the table is
      quite small: 4k on a 32-bit machine, using one byte for each 1Mb
      block.  Testing an address for heap residency now involves a memory
      access, but the table is normally cache-resident.  I didn't manage to
      measure any slowdown after making the change.
      
      On a 64-bit machine, we'll need to use a 2-level table; I haven't
      implemented that yet.
      
      Now we can generalise the procedure used to grab memory from the OS.
      In the general case, we allocate one megablock more than we need to,
      and trim off the slop around the allocation to leave an aligned chunk.
      The next time around, however, we try to allocate memory right after
      the last chunk allocated, on the grounds that it is aligned and
      probably free: if this doesn't work, we have to back off to the
      general mechanism (it seems to work most of the time).
      
      This cleans up the Windows story too: is_heap_alloced() has gone, and
      we should be able to handle more than 256M of memory (or whatever the
      arbitrary limit was before).
      
      MERGE TO STABLE (after lots of testing)
      2be44cb2
  13. 26 Mar, 2002 2 commits
    • sof's avatar
      [project @ 2002-03-26 23:56:44 by sof] · b6beb173
      sof authored
      TEXT_BEFORE_HEAP & cygwin: same as for mingw
      b6beb173
    • simonmar's avatar
      [project @ 2002-03-26 10:43:15 by simonmar] · 4b5f32d7
      simonmar authored
      A couple of cleanups to the previous change: we should test
      TABLES_NEXT_TO_CODE rather than USE_MINIINTERPRETER to enable the
      MacOSX "plan C", and use structure field selection rather than array
      indexing to get the entry code ptr from the info table.
      4b5f32d7
  14. 21 Mar, 2002 1 commit
    • sebc's avatar
      [project @ 2002-03-21 11:23:59 by sebc] · d182db3a
      sebc authored
      Implement Plan C, with correct code to detect the data and text
      sections for MacOS X.
      Also add a sanity check in initStorage, to make sure we are able to
      make the distinction between closures and infotables.
      d182db3a
  15. 14 Feb, 2002 1 commit
  16. 04 Feb, 2002 1 commit
  17. 01 Feb, 2002 1 commit
    • simonmar's avatar
      [project @ 2002-02-01 10:50:35 by simonmar] · b8684d58
      simonmar authored
      When distinguishing between code & data pointers, rather than testing
      for membership of the text section, test for not membership of one of
      the data sections.
      
      The reason for this change is that testing for membership of the text
      section was fragile:  we could only test whether a value was smaller
      than the end address, because there doesn't appear to be a portable
      way to find the beginning of the text section.  Indeed, the test
      breaks on very recent Linux kernels which mmap() memory below the
      program text.
      
      In fact, the reversed test may be faster because the expected common
      case is when the pointer is into the dynamic heap, and we eliminate
      these case immediately in the new test.  A quick test shows no
      measurable performance difference with the change.
      
      MERGE TO STABLE
      b8684d58
  18. 25 Jan, 2002 1 commit
  19. 22 Nov, 2001 1 commit
    • simonmar's avatar
      [project @ 2001-11-22 14:25:11 by simonmar] · db61851c
      simonmar authored
      Retainer Profiling / Lag-drag-void profiling.
      
      This is mostly work by Sungwoo Park, who spent a summer internship at
      MSR Cambridge this year implementing these two types of heap profiling
      in GHC.
      
      Relative to Sungwoo's original work, I've made some improvements to
      the code:
      
         - it's now possible to apply constraints to retainer and LDV profiles
           in the same way as we do for other types of heap profile (eg.
           +RTS -hc{foo,bar} -hR -RTS gives you a retainer profiling considering
           only closures with cost centres 'foo' and 'bar').
      
         - the heap-profile timer implementation is cleaned up.
      
         - heap profiling no longer has to be run in a two-space heap.
      
         - general cleanup of the code and application of the SDM C coding
           style guidelines.
      
      Profiling will be a little slower and require more space than before,
      mainly because closures have an extra header word to support either
      retainer profiling or LDV profiling (you can't do both at the same
      time).
      
      We've used the new profiling tools on GHC itself, with moderate
      success.  Fixes for some space leaks in GHC to follow...
      db61851c
  20. 08 Aug, 2001 1 commit
    • simonmar's avatar
      [project @ 2001-08-08 10:50:36 by simonmar] · 52c07834
      simonmar authored
      Had a brainwave on the way to work this morning, and realised that the
      garbage collector can handle "pinned objects" as long as they don't
      contain any pointers.
      
      This is absolutely ideal for doing temporary allocation in the FFI,
      because what we really want to do is allocate a pinned ByteArray and
      let the GC clean it up later.  So this set of changes adds the
      required framework.
      
      There are two new primops:
      
       newPinnedByteArray# :: Int# -> State# s -> (# State# s, MutByteArr# s #)
       byteArrayContents#  :: ByteArr# -> Addr#
      
      obviously byteArrayContents# is highly unsafe.
      
      Allocating a pinned ByteArr# isn't the default, because a pinned
      ByteArr# will hold an entire block (currently 4k) live until it is
      garbage collected (that doesn't mean each pinned ByteArr# requires
      4k of storage, just that if a block contains a single live pinned
      ByteArray, the whole block must be retained).
      52c07834
  21. 24 Jul, 2001 1 commit
  22. 23 Jul, 2001 2 commits
    • simonmar's avatar
      [project @ 2001-07-23 17:23:19 by simonmar] · dfd7d6d0
      simonmar authored
      Add a compacting garbage collector.
      
      It isn't enabled by default, as there are still a couple of problems:
      there's a fallback case I haven't implemented yet which means it will
      occasionally bomb out, and speed-wise it's quite a bit slower than the
      copying collector (about 1.8x slower).
      
      Until I can make it go faster, it'll only be useful when you're
      actually running low on real memory.
      
      '+RTS -c' to enable it.
      
      Oh, and I cleaned up a few things in the RTS while I was there, and
      fixed one or two possibly real bugs in the existing GC.
      dfd7d6d0
    • simonmar's avatar
      [project @ 2001-07-23 10:47:16 by simonmar] · 6f83fbc0
      simonmar authored
      Small changes to improve GC performance slightly:
      
        - store the generation *number* in the block descriptor rather
          than a pointer to the generation structure, since the most
          common operation is to pull out the generation number, and
          it's one less indirection this way.
      
        - cache the generation number in the step structure too, which
          avoids an extra indirection in several places.
      6f83fbc0
  23. 03 May, 2001 1 commit
  24. 02 Mar, 2001 2 commits
    • simonmar's avatar
      [project @ 2001-03-02 16:15:53 by simonmar] · 435b1086
      simonmar authored
      ASSERT in updateWithIndirection() that we haven't already updated this
      object with an indirection, and fix two places in the RTS where this
      could happen.
      
      The problem only occurs when we're in a black-hole-style loop, and
      there are multiple update frames on the stack pointing to the same
      object (this is possible because of lazy black-holing).  Both stack
      squeezing and asynchronous exception raising walk down the stack and
      remove update frames, updating their contents with indirections.  If
      we don't protect against multiple updates, the mutable list in the old
      generation may get into a bogus state.
      435b1086
    • simonmar's avatar
      [project @ 2001-03-02 14:36:16 by simonmar] · ffaa2614
      simonmar authored
      Add some ASSERT()s so we can catch updates where updatee==target.
      ffaa2614
  25. 11 Feb, 2001 1 commit
    • simonmar's avatar
      [project @ 2001-02-11 17:51:07 by simonmar] · 6d35596c
      simonmar authored
      Bite the bullet and make GHCi support non-optional in the RTS.  GHC
      4.11 should be able to build GHCi without any additional tweaks now.
      
      - the Linker is split into two parts: LinkerBasic.c, containing the
        routines required by the rest of the RTS, and Linker.c, containing
        the linker proper, which is not referred to from the rest of the RTS.
        Only Linker.c requires -ldl, so programs which don't make use of the
        linker (everything except GHC, in other words) won't need -ldl.
      6d35596c
  26. 09 Feb, 2001 2 commits
  27. 08 Feb, 2001 1 commit
  28. 29 Jan, 2001 1 commit
  29. 26 Jan, 2001 2 commits
  30. 24 Jan, 2001 1 commit
    • simonmar's avatar
      [project @ 2001-01-24 15:46:19 by simonmar] · 43b212f5
      simonmar authored
      Add a CAF list for GHCI.
      
      Retaining all looked-up symbols in a list in the interpreter was the
      Wrong Thing To Do, since we can't guarantee that the transitive
      closure of this list points to all the CAFs so far evaluated (the
      transitive closure gets smaller as reachable CAFs are evaluated).
      
      A Better Thing To Do is just to retain all the CAFs.  A refinement is
      to only retain all CAFs in dynamically linked code, which is what this
      patch implements.
      43b212f5
  31. 09 Jan, 2001 1 commit
  32. 19 Dec, 2000 1 commit
  33. 11 Dec, 2000 1 commit
  34. 04 Dec, 2000 1 commit