Skip to content
  • Simon Marlow's avatar
    Make profiling work with multiple capabilities (+RTS -N) · 50de6034
    Simon Marlow authored
    This means that both time and heap profiling work for parallel
    programs.  Main internal changes:
    
      - CCCS is no longer a global variable; it is now another
        pseudo-register in the StgRegTable struct.  Thus every
        Capability has its own CCCS.
    
      - There is a new built-in CCS called "IDLE", which records ticks for
        Capabilities in the idle state.  If you profile a single-threaded
        program with +RTS -N2, you'll see about 50% of time in "IDLE".
    
      - There is appropriate locking in rts/Profiling.c to protect the
        shared cost-centre-stack data structures.
    
    This patch does enough to get it working, I have cut one big corner:
    the cost-centre-stack data structure is still shared amongst all
    Capabilities, which means that multiple Capabilities will race when
    updating the "allocations" and "entries" fields of a CCS.  Not only
    does this give unpredictable results, but it runs very slowly due to
    cache line bouncing.
    
    It is strongly recommended that you use -fno-prof-count-entries to
    disable the "entries" count when profiling parallel programs. (I shall
    add a note to this effect to the docs).
    50de6034