Skip to content
Snippets Groups Projects
Commit 50de6034 authored by Simon Marlow's avatar Simon Marlow
Browse files

Make profiling work with multiple capabilities (+RTS -N)

This means that both time and heap profiling work for parallel
programs.  Main internal changes:

  - CCCS is no longer a global variable; it is now another
    pseudo-register in the StgRegTable struct.  Thus every
    Capability has its own CCCS.

  - There is a new built-in CCS called "IDLE", which records ticks for
    Capabilities in the idle state.  If you profile a single-threaded
    program with +RTS -N2, you'll see about 50% of time in "IDLE".

  - There is appropriate locking in rts/Profiling.c to protect the
    shared cost-centre-stack data structures.

This patch does enough to get it working, I have cut one big corner:
the cost-centre-stack data structure is still shared amongst all
Capabilities, which means that multiple Capabilities will race when
updating the "allocations" and "entries" fields of a CCS.  Not only
does this give unpredictable results, but it runs very slowly due to
cache line bouncing.

It is strongly recommended that you use -fno-prof-count-entries to
disable the "entries" count when profiling parallel programs. (I shall
add a note to this effect to the docs).
parent 1c2b8381
No related branches found
No related tags found
Loading
Showing
with 73 additions and 47 deletions
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment