- 08 Sep, 2013 2 commits
-
-
thoughtpolice authored
This reverts commit d85044f6.
-
thoughtpolice authored
When servicing a stack overflows, only throw an exception to the given thread if the user explicitly set a max stack size, using +RTS -K. Otherwise just service it normally and grow the stack. In case we actually run out of *heap* (stack chuncks are allocated on the heap), then we need to bail by calling the stackOverflow() hook and exit immediately. Authored-by:
Ben Gamari <bgamari.foss@gmail.com> Signed-off-by:
Austin Seipp <aseipp@pobox.com>
-
- 04 Sep, 2013 1 commit
-
-
Simon Marlow authored
We have various problems with reallocating the array of Capabilities, due to threads in waitForReturnCapability that are already holding a pointer to a Capability. Rather than add more locking to make this safer, I decided it would be easier to ensure that we never move the Capabilities at all. The capabilities array is now an array of pointers to Capabaility. There are extra indirections, but it rarely matters - we don't often access Capabilities via the array, normally we already have a pointer to one. I ran the parallel benchmarks and didn't see any difference.
-
- 09 Jul, 2013 1 commit
-
-
Edward Z. Yang authored
We add the invariant to the MVar blocked threads queue that threads blocked on an atomic read are always at the front of the queue. This invariant is easy to maintain, since takers are only ever added to the end of the queue. Signed-off-by:
Edward Z. Yang <ezyang@mit.edu>
-
- 07 Jul, 2013 1 commit
-
-
ian@well-typed.com authored
-
- 21 Jun, 2013 1 commit
-
-
thoughtpolice authored
Again, the range of gc_type is actually 1-3, which is technically outside the range of rtsBool. Signed-off-by:
Austin Seipp <aseipp@pobox.com>
-
- 20 Feb, 2013 1 commit
-
-
Simon Marlow authored
We were changing n_capabilities after we had released the Capabilities, which lead to a range of interesting crashes. This should fix test failures in setnumcapabilities001.
-
- 17 Feb, 2013 1 commit
-
-
ian@well-typed.com authored
-
- 16 Feb, 2013 1 commit
-
-
Ian Lynagh authored
-
- 12 Feb, 2013 2 commits
-
-
AndreasVoellmy authored
ioManagerCapabilitiesChanged now queries getNumCapabilities for the current number of enabled capabilities.
-
AndreasVoellmy authored
This enables the IO manager to change the number of IO loops it uses (usually one per capability).
-
- 07 Feb, 2013 1 commit
-
-
Simon Marlow authored
Fixes the following crash: internal error: threadStackUnderflow: not enough space for return values when using STM.
-
- 16 Jan, 2013 1 commit
-
-
Edward Z. Yang authored
This adds some new functions: peekRunQueue, promoteInRunQueue, singletonRunQueue and truncateRunQueue which help abstract away manual linked list manipulation, making it easier to swap in a new queue implementation. Signed-off-by:
Edward Z. Yang <ezyang@mit.edu>
-
- 16 Nov, 2012 1 commit
-
-
Simon Marlow authored
This improves GC performance when there are a lot of TVars in the heap. For instance, a TChan with a lot of elements causes a massive GC drag without this patch. There's more to do - several other STM closure types don't have write barriers, so GC performance when there are a lot of threads blocked on STM isn't great. But fixing the problem for TVar is a good start.
-
- 25 Oct, 2012 2 commits
-
-
Simon Marlow authored
-
Simon Marlow authored
-
- 23 Oct, 2012 1 commit
-
-
Simon Marlow authored
-
- 22 Oct, 2012 1 commit
-
-
Simon Marlow authored
-
- 24 Sep, 2012 1 commit
-
-
Simon Marlow authored
Improvements: - we now turn off the timer signal in the non-threaded RTS after idleGCDelay. This should make the xmonad users on #5991 happy. - we now turn off the timer signal after idleGCDelay even if the idle GC is disabled with +RTS -I0. - we now do *not* turn off the timer when profiling. - more comments to explain the meaning of the various ACTIVITY_* values
-
- 07 Sep, 2012 1 commit
-
-
Simon Marlow authored
lnat was originally "long unsigned int" but we were using it when we wanted a 64-bit type on a 64-bit machine. This broke on Windows x64, where long == int == 32 bits. Using types of unspecified size is bad, but what we really wanted was a type with N bits on an N-bit machine. StgWord is exactly that. lnat was mentioned in some APIs that clients might be using (e.g. StackOverflowHook()), so we leave it defined but with a comment to say that it's deprecated.
-
- 21 Aug, 2012 1 commit
-
-
Simon Marlow authored
-
- 07 Aug, 2012 1 commit
-
-
Simon Marlow authored
The problem occurred when the idle GC was turned off with +RTS -I0. Then the scheduler would go into the state ACTIVITY_DONE_GC directly without doing a GC, and a subsequent GC would put it back to ACTIVITY_YES but without turning the timer back on. Instead if the GC finds the state is ACTIVITY_DONE_GC it should leave it there.
-
- 10 Jul, 2012 2 commits
-
-
Duncan Coutts authored
Based on initial patches by Mikolaj Konarski <mikolaj@well-typed.com> Use the new task tracing functions traceTaskCreate/Migrate/Delete. There are two key places. One is for worker tasks which have a relatively simple life cycle. Worker tasks are created and deleted by the RTS. The other case is bound tasks which are either created by the RTS, or appear as foreign C threads making calls into the RTS. For bound threads we do the tracing in rts_lock/unlock, which actually covers both threads coming in from outside, and also bound threads made by the RTS.
-
Simon Marlow authored
We do a final GC before shutting down the system, to clean up. However, we were doing an ordinary GC rather than forcing a major GC, so especially when the allocation area is large, this final GC could be expensive. This is really just a bug - the final GC should have virtually nothing to do, because there is nothing live.
-
- 07 Jun, 2012 1 commit
-
-
Ian Lynagh authored
If we are interrupted to do a GC, then we do not immediately do another one. This avoids a starvation situation where one Capability keeps forcing a GC and the other Capabilities make no progress at all.
-
- 27 May, 2012 1 commit
-
-
Ian Lynagh authored
-
- 04 Apr, 2012 2 commits
-
-
Mikolaj Konarski authored
There was a discrepancy between GC times reported in +RTS -s and the timestamps of GC_START and GC_END events on the cap, on which +RTS -s stats for the given GC are based. This is fixed by posting the events with exactly the same timestamp as generated for the stat calculation. The calls posting the events are moved too, so that the events are emitted close to the time instant they claim to be emitted at. The GC_STATS_GHC was moved, too, ensuring it's emitted before the moved GC_END on all caps, which simplifies tools code.
-
Duncan Coutts authored
Now that we can adjust the number of capabilities on the fly, we need this reflected in the eventlog. Previously the eventlog had a single startup event that declared a static number of capabilities. Obviously that's no good anymore. For compatability we're keeping the EVENT_STARTUP but adding new EVENT_CAP_CREATE/DELETE. The EVENT_CAP_DELETE is actually just the old EVENT_SHUTDOWN but renamed and extended (using the existing mechanism to extend eventlog events in a compatible way). So we now emit both EVENT_STARTUP and EVENT_CAP_CREATE. One day we will drop EVENT_STARTUP. Since reducing the number of capabilities at runtime does not really delete them, it just disables them, then we also have new events for disable/enable. The old EVENT_SHUTDOWN was in the scheduler class of events. The new EVENT_CAP_* events are in the unconditional class, along with the EVENT_CAPSET_* ones. Knowing when capabilities are created and deleted is crucial to making sense of eventlogs, you always want those events. In any case, they're extremely low volume.
-
- 19 Mar, 2012 1 commit
-
-
Ian Lynagh authored
-
- 18 Mar, 2012 1 commit
-
-
Ian Lynagh authored
-
- 16 Mar, 2012 1 commit
-
-
Ian Lynagh authored
-
- 27 Feb, 2012 1 commit
-
-
Gabor Greif authored
-
- 06 Jan, 2012 1 commit
-
-
Simon Marlow authored
-
- 15 Dec, 2011 1 commit
-
-
Simon Marlow authored
This patch allows setNumCapabilities to /reduce/ the number of active capabilities as well as increase it. This is particularly tricky to do, because a Capability is a large data structure and ties into the rest of the system in many ways. Trying to clean it all up would be extremely error prone. So instead, the solution is to mark the extra capabilities as "disabled". This has the following consequences: - threads on a disabled capability are migrated away by the scheduler loop - disabled capabilities do not participate in GC (see scheduleDoGC()) - No spark threads are created on this capability (see scheduleActivateSpark()) - We do not attempt to migrate threads *to* a disabled capability (see schedulePushWork()). So a disabled capability should do no work, and does not participate in GC, although it remains alive in other respects. For example, a blocked thread might wake up on a disabled capability, and it will get quickly migrated to a live capability. A disabled capability can still initiate GC if necessary. Indeed, it turns out to be hard to migrate bound threads, so we wait until the next GC to do this (see comments for details).
-
- 13 Dec, 2011 1 commit
-
-
Simon Marlow authored
This is an experimental tweak to the parallel GC that avoids waking up a Capability to do parallel GC if we know that the capability has been idle for a (tunable) number of GC cycles. The idea is that if you're only using a few Capabilities, there's no point waking up the ones that aren't busy. e.g. +RTS -qi3 says "A Capability will participate in parallel GC if it was running at all since the last 3 GC cycles." Results are a bit hit and miss, and I don't completely understand why yet. Hence, for now it is turned off by default, and also not documented except in the +RTS -? output.
-
- 06 Dec, 2011 2 commits
-
-
Simon Marlow authored
At present the number of capabilities can only be *increased*, not decreased. The latter presents a few more challenges!
-
Simon Marlow authored
Consider this experimental for the time being. There are a lot of things that could go wrong, but I've verified that at least it works on the test cases we have. I also did some API cleanups while I was here. Previously we had: Capability * rts_eval (Capability *cap, HaskellObj p, /*out*/HaskellObj *ret); but this API is particularly error-prone: if you forget to discard the Capability * you passed in and use the return value instead, then you're in for subtle bugs with +RTS -N later on. So I changed all these functions to this form: void rts_eval (/* inout */ Capability **cap, /* in */ HaskellObj p, /* out */ HaskellObj *ret) It's much harder to use this version incorrectly, because you have to pass the Capability in by reference.
-
- 01 Dec, 2011 1 commit
-
-
Simon Marlow authored
The parallel GC was using setContextSwitches() to stop all the other threads, which sets the context_switch flag on every Capability. That had the side effect of causing every Capability to also switch threads, and since GCs can be much more frequent than context switches, this increased the context switch frequency. When context switches are expensive (because the switch is between two bound threads or a bound and unbound thread), the difference is quite noticeable. The fix is to have a separate flag to indicate that a Capability should stop and return to the scheduler, but not switch threads. I've called this the "interrupt" flag.
-
- 29 Nov, 2011 1 commit
-
-
Simon Marlow authored
This means that both time and heap profiling work for parallel programs. Main internal changes: - CCCS is no longer a global variable; it is now another pseudo-register in the StgRegTable struct. Thus every Capability has its own CCCS. - There is a new built-in CCS called "IDLE", which records ticks for Capabilities in the idle state. If you profile a single-threaded program with +RTS -N2, you'll see about 50% of time in "IDLE". - There is appropriate locking in rts/Profiling.c to protect the shared cost-centre-stack data structures. This patch does enough to get it working, I have cut one big corner: the cost-centre-stack data structure is still shared amongst all Capabilities, which means that multiple Capabilities will race when updating the "allocations" and "entries" fields of a CCS. Not only does this give unpredictable results, but it runs very slowly due to cache line bouncing. It is strongly recommended that you use -fno-prof-count-entries to disable the "entries" count when profiling parallel programs. (I shall add a note to this effect to the docs).
-
- 25 Nov, 2011 1 commit
-
-
Simon Marlow authored
Terminology cleanup: the type "Ticks" has been renamed "Time", which is an StgWord64 in units of TIME_RESOLUTION (currently nanoseconds). The terminology "tick" is now used consistently to mean the interval between timer signals. The ticker now always ticks in realtime (actually CLOCK_MONOTONIC if we have it). Before it used CPU time in the non-threaded RTS and realtime in the threaded RTS, but I've discovered that the CPU timer has terrible resolution (at least on Linux) and isn't much use for profiling. So now we always use realtime. This should also fix The default tick interval is now 10ms, except when profiling where we drop it to 1ms. This gives more accurate profiles without affecting runtime too much (<1%). Lots of cleanups - the resolution of Time is now in one place only (Rts.h) rather than having calculations that depend on the resolution scattered all over the RTS. I hope I found them all.
-