1. 13 May, 2013 1 commit
  2. 24 Apr, 2013 1 commit
  3. 29 Mar, 2013 1 commit
    • nfrisby's avatar
      ticky enhancements · 460abd75
      nfrisby authored
        * the new StgCmmArgRep module breaks a dependency cycle; I also
          untabified it, but made no real changes
      
        * updated the documentation in the wiki and change the user guide to
          point there
      
        * moved the allocation enters for ticky and CCS to after the heap check
      
          * I left LDV where it was, which was before the heap check at least
            once, since I have no idea what it is
      
        * standardized all (active?) ticky alloc totals to bytes
      
        * in order to avoid double counting StgCmmLayout.adjustHpBackwards
          no longer bumps ALLOC_HEAP_ctr
      
        * I resurrected the SLOW_CALL counters
      
          * the new module StgCmmArgRep breaks cyclic dependency between
            Layout and Ticky (which the SLOW_CALL counters cause)
      
          * renamed them SLOW_CALL_fast_<pattern> and VERY_SLOW_CALL
      
        * added ALLOC_RTS_ctr and _tot ticky counters
      
          * eg allocation by Storage.c:allocate or a BUILD_PAP in stg_ap_*_info
      
          * resurrected ticky counters for ALLOC_THK, ALLOC_PAP, and
            ALLOC_PRIM
      
          * added -ticky and -DTICKY_TICKY in ways.mk for debug ways
      
        * added a ticky counter for total LNE entries
      
        * new flags for ticky: -ticky-allocd -ticky-dyn-thunk -ticky-LNE
      
          * all off by default
      
          * -ticky-allocd: tracks allocation *of* closure in addition to
             allocation *by* that closure
      
          * -ticky-dyn-thunk tracks dynamic thunks as if they were functions
      
          * -ticky-LNE tracks LNEs as if they were functions
      
        * updated the ticky report format, including making the argument
          categories (more?) accurate again
      
        * the printed name for things in the report include the unique of
          their ticky parent as well as if they are not top-level
      460abd75
  4. 24 Jan, 2013 1 commit
    • Simon Peyton Jones's avatar
      Introduce CPR for sum types (Trac #5075) · d3b8991b
      Simon Peyton Jones authored
      The main payload of this patch is to extend CPR so that it
      detects when a function always returns a result constructed
      with the *same* constructor, even if the constructor comes from
      a sum type.  This doesn't matter very often, but it does improve
      some things (results below).
      
      Binary sizes increase a little bit, I think because there are more
      wrappers.  This with -split-objs.  Without split-ojbs binary sizes
      increased by 6% even for HelloWorld.hs.  It's hard to see exactly why,
      but I think it was because System.Posix.Types.o got included in the
      linked binary, whereas it didn't before.
      
              Program           Size    Allocs   Runtime   Elapsed  TotalMem
                fluid          +1.8%     -0.3%      0.01      0.01     +0.0%
                  tak          +2.2%     -0.2%      0.02      0.02     +0.0%
                 ansi          +1.7%     -0.3%      0.00      0.00     +0.0%
            cacheprof          +1.6%     -0.3%     +0.6%     +0.5%     +1.4%
              parstof          +1.4%     -4.4%      0.00      0.00     +0.0%
              reptile          +2.0%     +0.3%      0.02      0.02     +0.0%
      ----------------------------------------------------------------------
                  Min          +1.1%     -4.4%     -4.7%     -4.7%    -15.0%
                  Max          +2.3%     +0.3%     +8.3%     +9.4%    +50.0%
       Geometric Mean          +1.9%     -0.1%     +0.6%     +0.7%     +0.3%
      
      Other things in this commit
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~
      * Got rid of the Lattice class in Demand
      
      * Refactored the way that products and newtypes are
        decomposed (no change in functionality)
      d3b8991b
  5. 16 Oct, 2012 1 commit
    • ian@well-typed.com's avatar
      Some alpha renaming · cd33eefd
      ian@well-typed.com authored
      Mostly d -> g (matching DynFlag -> GeneralFlag).
      Also renamed if* to when*, matching the Haskell if/when names
      cd33eefd
  6. 08 Oct, 2012 1 commit
    • Simon Marlow's avatar
      Produce new-style Cmm from the Cmm parser · a7c0387d
      Simon Marlow authored
      The main change here is that the Cmm parser now allows high-level cmm
      code with argument-passing and function calls.  For example:
      
      foo ( gcptr a, bits32 b )
      {
        if (b > 0) {
           // we can make tail calls passing arguments:
           jump stg_ap_0_fast(a);
        }
      
        return (x,y);
      }
      
      More details on the new cmm syntax are in Note [Syntax of .cmm files]
      in CmmParse.y.
      
      The old syntax is still more-or-less supported for those occasional
      code fragments that really need to explicitly manipulate the stack.
      However there are a couple of differences: it is now obligatory to
      give a list of live GlobalRegs on every jump, e.g.
      
        jump %ENTRY_CODE(Sp(0)) [R1];
      
      Again, more details in Note [Syntax of .cmm files].
      
      I have rewritten most of the .cmm files in the RTS into the new
      syntax, except for AutoApply.cmm which is generated by the genapply
      program: this file could be generated in the new syntax instead and
      would probably be better off for it, but I ran out of enthusiasm.
      
      Some other changes in this batch:
      
       - The PrimOp calling convention is gone, primops now use the ordinary
         NativeNodeCall convention.  This means that primops and "foreign
         import prim" code must be written in high-level cmm, but they can
         now take more than 10 arguments.
      
       - CmmSink now does constant-folding (should fix #7219)
      
       - .cmm files now go through the cmmPipeline, and as a result we
         generate better code in many cases.  All the object files generated
         for the RTS .cmm files are now smaller.  Performance should be
         better too, but I haven't measured it yet.
      
       - RET_DYN frames are removed from the RTS, lots of code goes away
      
       - we now have some more canned GC points to cover unboxed-tuples with
         2-4 pointers, which will reduce code size a little.
      a7c0387d
  7. 25 Sep, 2012 1 commit
  8. 18 Sep, 2012 1 commit
  9. 04 Sep, 2012 1 commit
  10. 03 Sep, 2012 1 commit
  11. 13 Jun, 2012 2 commits
  12. 19 Oct, 2011 1 commit
  13. 04 Oct, 2011 1 commit
  14. 02 Oct, 2011 5 commits
  15. 25 Aug, 2011 3 commits
  16. 29 Jul, 2011 3 commits
  17. 28 Jul, 2011 3 commits
  18. 15 Jul, 2011 1 commit
    • Ian Lynagh's avatar
      More work towards cross-compilation · f07af788
      Ian Lynagh authored
      There's now a variant of the Outputable class that knows what
      platform we're targetting:
      
      class PlatformOutputable a where
          pprPlatform :: Platform -> a -> SDoc
          pprPlatformPrec :: Platform -> Rational -> a -> SDoc
      
      and various instances have had to be converted to use that class,
      and we pass Platform around accordingly.
      f07af788
  19. 07 Jul, 2011 1 commit
  20. 05 Jul, 2011 1 commit
  21. 01 May, 2011 1 commit
  22. 30 Apr, 2011 1 commit
  23. 19 Apr, 2011 1 commit
  24. 12 Apr, 2011 1 commit
    • Simon Marlow's avatar
      Change the way module initialisation is done (#3252, #4417) · a52ff761
      Simon Marlow authored
      Previously the code generator generated small code fragments labelled
      with __stginit_M for each module M, and these performed whatever
      initialisation was necessary for that module and recursively invoked
      the initialisation functions for imported modules.  This appraoch had
      drawbacks:
      
       - FFI users had to call hs_add_root() to ensure the correct
         initialisation routines were called.  This is a non-standard,
         and ugly, API.
      
       - unless we were using -split-objs, the __stginit dependencies would
         entail linking the whole transitive closure of modules imported,
         whether they were actually used or not.  In an extreme case (#4387,
         #4417), a module from GHC might be imported for use in Template
         Haskell or an annotation, and that would force the whole of GHC to
         be needlessly linked into the final executable.
      
      So now instead we do our initialisation with C functions marked with
      __attribute__((constructor)), which are automatically invoked at
      program startup time (or DSO load-time).  The C initialisers are
      emitted into the stub.c file.  This means that every time we compile
      with -prof or -hpc, we now get a stub file, but thanks to #3687 that
      is now invisible to the user.
      
      There are some refactorings in the RTS (particularly for HPC) to
      handle the fact that initialisers now get run earlier than they did
      before.
      
      The __stginit symbols are still generated, and the hs_add_root()
      function still exists (but does nothing), for backwards compatibility.
      a52ff761
  25. 22 Mar, 2011 1 commit
  26. 29 Mar, 2010 1 commit
    • Simon Marlow's avatar
      New implementation of BLACKHOLEs · 5d52d9b6
      Simon Marlow authored
      This replaces the global blackhole_queue with a clever scheme that
      enables us to queue up blocked threads on the closure that they are
      blocked on, while still avoiding atomic instructions in the common
      case.
      
      Advantages:
      
       - gets rid of a locked global data structure and some tricky GC code
         (replacing it with some per-thread data structures and different
         tricky GC code :)
      
       - wakeups are more prompt: parallel/concurrent performance should
         benefit.  I haven't seen anything dramatic in the parallel
         benchmarks so far, but a couple of threading benchmarks do improve
         a bit.
      
       - waking up a thread blocked on a blackhole is now O(1) (e.g. if
         it is the target of throwTo).
      
       - less sharing and better separation of Capabilities: communication
         is done with messages, the data structures are strictly owned by a
         Capability and cannot be modified except by sending messages.
      
       - this change will utlimately enable us to do more intelligent
         scheduling when threads block on each other.  This is what started
         off the whole thing, but it isn't done yet (#3838).
      
      I'll be documenting all this on the wiki in due course.
      5d52d9b6
  27. 06 Jan, 2010 1 commit
  28. 02 Jan, 2010 2 commits