1. 13 Jul, 2013 1 commit
  2. 10 Jul, 2013 1 commit
  3. 09 Jul, 2013 1 commit
  4. 15 Jun, 2013 1 commit
    • aljee@hyper.cx's avatar
      Allow multiple C finalizers to be attached to a Weak# · d61c623e
      aljee@hyper.cx authored
      The commit replaces mkWeakForeignEnv# with addCFinalizerToWeak#.
      This new primop mutates an existing Weak# object and adds a new
      C finalizer to it.
      This change removes an invariant in MarkWeak.c, namely that the relative
      order of Weak# objects in the list needs to be preserved across GC. This
      makes it easier to split the list into per-generation structures.
      The patch also removes a race condition between two threads calling
      finalizeWeak# on the same WEAK object at that same time.
  5. 06 Jun, 2013 1 commit
    • Simon Peyton Jones's avatar
      Implement cardinality analysis · 99d4e5b4
      Simon Peyton Jones authored
      This major patch implements the cardinality analysis described
      in our paper "Higher order cardinality analysis". It is joint
      work with Ilya Sergey and Dimitrios Vytiniotis.
      The basic is augment the absence-analysis part of the demand
      analyser so that it can tell when something is used
      	 at most once
       	 some other way
      The "at most once" information is used
          a) to enable transformations, and
             in particular to identify one-shot lambdas
          b) to allow updates on thunks to be omitted.
      There are two new flags, mainly there so you can do performance
          -fkill-absence   stops GHC doing absence analysis at all
          -fkill-one-shot  stops GHC spotting one-shot lambdas
                           and single-entry thunks
      The big changes are:
      * The Demand type is substantially refactored.  In particular
        the UseDmd is factored as follows
            data UseDmd
              = UCall Count UseDmd
              | UProd [MaybeUsed]
              | UHead
              | Used
            data MaybeUsed = Abs | Use Count UseDmd
            data Count = One | Many
        Notice that UCall recurses straight to UseDmd, whereas
        UProd goes via MaybeUsed.
        The "Count" embodies the "at most once" or "many" idea.
      * The demand analyser itself was refactored a lot
      * The previously ad-hoc stuff in the occurrence analyser for foldr and
        build goes away entirely.  Before if we had build (\cn -> ...x... )
        then the "\cn" was hackily made one-shot (by spotting 'build' as
        special.  That's essential to allow x to be inlined.  Now the
        occurrence analyser propagates info gotten from 'build's stricness
        signature (so build isn't special); and that strictness sig is
        in turn derived entirely automatically.  Much nicer!
      * The ticky stuff is improved to count single-entry thunks separately.
      One shortcoming is that there is no DEBUG way to spot if an
      allegedly-single-entry thunk is acually entered more than once.  It
      would not be hard to generate a bit of code to check for this, and it
      would be reassuring.  But it's fiddly and I have not done it.
      Despite all this fuss, the performance numbers are rather under-whelming.
      See the paper for more discussion.
             nucleic2          -0.8%    -10.9%      0.10      0.10     +0.0%
               sphere          -0.7%     -1.5%      0.08      0.08     +0.0%
                  Min          -4.7%    -10.9%     -9.3%     -9.3%    -50.0%
                  Max          -0.4%     +0.5%     +2.2%     +2.3%     +7.4%
       Geometric Mean          -0.8%     -0.2%     -1.3%     -1.3%     -1.8%
      I don't quite know how much credence to place in the runtime changes,
      but movement seems generally in the right direction.
  6. 19 May, 2013 1 commit
  7. 17 May, 2013 1 commit
  8. 26 Apr, 2013 1 commit
  9. 22 Apr, 2013 1 commit
  10. 11 Apr, 2013 1 commit
  11. 29 Mar, 2013 1 commit
    • nfrisby's avatar
      ticky enhancements · 460abd75
      nfrisby authored
        * the new StgCmmArgRep module breaks a dependency cycle; I also
          untabified it, but made no real changes
        * updated the documentation in the wiki and change the user guide to
          point there
        * moved the allocation enters for ticky and CCS to after the heap check
          * I left LDV where it was, which was before the heap check at least
            once, since I have no idea what it is
        * standardized all (active?) ticky alloc totals to bytes
        * in order to avoid double counting StgCmmLayout.adjustHpBackwards
          no longer bumps ALLOC_HEAP_ctr
        * I resurrected the SLOW_CALL counters
          * the new module StgCmmArgRep breaks cyclic dependency between
            Layout and Ticky (which the SLOW_CALL counters cause)
          * renamed them SLOW_CALL_fast_<pattern> and VERY_SLOW_CALL
        * added ALLOC_RTS_ctr and _tot ticky counters
          * eg allocation by Storage.c:allocate or a BUILD_PAP in stg_ap_*_info
          * resurrected ticky counters for ALLOC_THK, ALLOC_PAP, and
          * added -ticky and -DTICKY_TICKY in ways.mk for debug ways
        * added a ticky counter for total LNE entries
        * new flags for ticky: -ticky-allocd -ticky-dyn-thunk -ticky-LNE
          * all off by default
          * -ticky-allocd: tracks allocation *of* closure in addition to
             allocation *by* that closure
          * -ticky-dyn-thunk tracks dynamic thunks as if they were functions
          * -ticky-LNE tracks LNEs as if they were functions
        * updated the ticky report format, including making the argument
          categories (more?) accurate again
        * the printed name for things in the report include the unique of
          their ticky parent as well as if they are not top-level
  12. 15 Mar, 2013 1 commit
  13. 14 Feb, 2013 1 commit
  14. 12 Feb, 2013 1 commit
  15. 01 Feb, 2013 1 commit
  16. 29 Jan, 2013 1 commit
  17. 23 Jan, 2013 1 commit
  18. 11 Dec, 2012 1 commit
  19. 23 Nov, 2012 1 commit
  20. 16 Nov, 2012 1 commit
    • Simon Marlow's avatar
      Add a write barrier for TVAR closures · 6d784c43
      Simon Marlow authored
      This improves GC performance when there are a lot of TVars in the
      heap.  For instance, a TChan with a lot of elements causes a massive
      GC drag without this patch.
      There's more to do - several other STM closure types don't have write
      barriers, so GC performance when there are a lot of threads blocked on
      STM isn't great.  But fixing the problem for TVar is a good start.
  21. 26 Oct, 2012 1 commit
  22. 15 Oct, 2012 1 commit
    • Duncan Coutts's avatar
      Add a new traceMarker# primop for use in profiling output · a609027d
      Duncan Coutts authored
      In time-based profiling visualisations (e.g. heap profiles and ThreadScope)
      it would be useful to be able to mark particular points in the execution and
      have those points in time marked in the visualisation.
      The traceMarker# primop currently emits an event into the eventlog. In
      principle it could be extended to do something in the heap profiling too.
  23. 13 Oct, 2012 2 commits
  24. 08 Oct, 2012 1 commit
    • Simon Marlow's avatar
      Produce new-style Cmm from the Cmm parser · a7c0387d
      Simon Marlow authored
      The main change here is that the Cmm parser now allows high-level cmm
      code with argument-passing and function calls.  For example:
      foo ( gcptr a, bits32 b )
        if (b > 0) {
           // we can make tail calls passing arguments:
           jump stg_ap_0_fast(a);
        return (x,y);
      More details on the new cmm syntax are in Note [Syntax of .cmm files]
      in CmmParse.y.
      The old syntax is still more-or-less supported for those occasional
      code fragments that really need to explicitly manipulate the stack.
      However there are a couple of differences: it is now obligatory to
      give a list of live GlobalRegs on every jump, e.g.
        jump %ENTRY_CODE(Sp(0)) [R1];
      Again, more details in Note [Syntax of .cmm files].
      I have rewritten most of the .cmm files in the RTS into the new
      syntax, except for AutoApply.cmm which is generated by the genapply
      program: this file could be generated in the new syntax instead and
      would probably be better off for it, but I ran out of enthusiasm.
      Some other changes in this batch:
       - The PrimOp calling convention is gone, primops now use the ordinary
         NativeNodeCall convention.  This means that primops and "foreign
         import prim" code must be written in high-level cmm, but they can
         now take more than 10 arguments.
       - CmmSink now does constant-folding (should fix #7219)
       - .cmm files now go through the cmmPipeline, and as a result we
         generate better code in many cases.  All the object files generated
         for the RTS .cmm files are now smaller.  Performance should be
         better too, but I haven't measured it yet.
       - RET_DYN frames are removed from the RTS, lots of code goes away
       - we now have some more canned GC points to cover unboxed-tuples with
         2-4 pointers, which will reduce code size a little.
  25. 14 Sep, 2012 1 commit
  26. 07 Sep, 2012 1 commit
    • Simon Marlow's avatar
      Deprecate lnat, and use StgWord instead · 41737f12
      Simon Marlow authored
      lnat was originally "long unsigned int" but we were using it when we
      wanted a 64-bit type on a 64-bit machine.  This broke on Windows x64,
      where long == int == 32 bits.  Using types of unspecified size is bad,
      but what we really wanted was a type with N bits on an N-bit machine.
      StgWord is exactly that.
      lnat was mentioned in some APIs that clients might be using
      (e.g. StackOverflowHook()), so we leave it defined but with a comment
      to say that it's deprecated.
  27. 21 Aug, 2012 2 commits
  28. 20 Aug, 2012 1 commit
  29. 13 Aug, 2012 1 commit
    • Erik de Castro Lopo's avatar
      Fix GHCi segfault during startup on linux-powerpc (#2972). · 3e6c9308
      Erik de Castro Lopo authored
      Slightly modified version of a patch from Ben Collins <bcollins@ubuntu.com>
      who did the final debugging that showed the segfault was being caused the
      memory protection mechanism.
      Due to the requirement of "jump islands" to handle 24 bit relative jump
      offsets, GHCi on PowerPC did not use mmap to load object files like the
      other architectures. Instead, it allocated memory using malloc and fread
      to load the object code. However there is a quirk in the GNU libc malloc
      implementation. For memory regions over certain size (dynamic and
      configurable), malloc will use mmap to obtain the required memory instead
      of sbrk and malloc's call to mmap sets the memory readable and writable,
      but not executable. That means when GHCi loads code into a memory region
      that was mmapped instead of malloc-ed and tries to execute it we get a
      This solution drops the malloc/fread object loading in favour of using
      mmap and then puts the jump island for each object code module at the
      end of the mmaped region for that object.
      This patch may also be a solution on other ELF based powerpc systems
      but does not work on darwin-powerpc.
  30. 10 Aug, 2012 1 commit
  31. 31 Jul, 2012 1 commit
  32. 19 Jun, 2012 1 commit
  33. 17 Jun, 2012 1 commit
  34. 16 Jun, 2012 1 commit
  35. 09 Jun, 2012 1 commit
  36. 09 May, 2012 2 commits
  37. 08 May, 2012 1 commit