- 08 Dec, 2009 1 commit
-
-
Simon Marlow authored
-
- 07 Dec, 2009 1 commit
-
-
Simon Marlow authored
-
- 01 Dec, 2009 1 commit
-
-
Simon Marlow authored
This is a batch of refactoring to remove some of the GC's global state, as we move towards CPU-local GC. - allocateLocal() now allocates large objects into the local nursery, rather than taking a global lock and allocating then in gen 0 step 0. - allocatePinned() was still allocating from global storage and taking a lock each time, now it uses local storage. (mallocForeignPtrBytes should be faster with -threaded). - We had a gen 0 step 0, distinct from the nurseries, which are stored in a separate nurseries[] array. This is slightly strange. I removed the g0s0 global that pointed to gen 0 step 0, and removed all uses of it. I think now we don't use gen 0 step 0 at all, except possibly when there is only one generation. Possibly more tidying up is needed here. - I removed the global allocate() function, and renamed allocateLocal() to allocate(). - the alloc_blocks global is gone. MAYBE_GC() and doYouWantToGC() now check the local nursery only.
-
- 14 Oct, 2009 1 commit
-
-
Simon Marlow authored
While fixing #3578 I noticed that this function was just a field access to StgTRecHeader, so I inlined it manually.
-
- 25 Sep, 2009 1 commit
-
-
Simon Marlow authored
added: primop TraceEventOp "traceEvent#" GenPrimOp Addr# -> State# s -> State# s { Emits an event via the RTS tracing framework. The contents of the event is the zero-terminated byte string passed as the first argument. The event will be emitted either to the .eventlog file, or to stderr, depending on the runtime RTS flags. } and added the required RTS functionality to support it. Also a bit of refactoring in the RTS tracing code.
-
- 18 Aug, 2009 1 commit
-
-
Simon Marlow authored
There were two bugs, and had it not been for the first one we would not have noticed the second one, so this is quite fortunate. The first bug is in stg_unblockAsyncExceptionszh_ret, when we found a pending exception to raise, but don't end up raising it, there was a missing adjustment to the stack pointer. The second bug was that this case was actually happening at all: it ought to be incredibly rare, because the pending exception thread would have to be killed between us finding it and attempting to raise the exception. This made me suspicious. It turned out that there was a race condition on the tso->flags field; multiple threads were updating this bitmask field non-atomically (one of the bits is the dirty-bit for the generational GC). The fix is to move the dirty bit into its own field of the TSO, making the TSO one word larger (sadly).
-
- 03 Aug, 2009 1 commit
-
-
Simon Marlow authored
For consistency with other RTS exported symbols
-
- 24 Jun, 2009 1 commit
-
-
Simon Marlow authored
-
- 13 Jun, 2009 1 commit
-
-
Duncan Coutts authored
-
- 10 Jun, 2009 1 commit
-
-
Duncan Coutts authored
Using global temp vars is really ugly and in the threaded case it needs slots in the StgRegTable. It'd also be pretty silly once we move the cmm primops out of the rts, into the integer-gmp package.
-
- 02 Jun, 2009 1 commit
-
-
Ian Lynagh authored
-
- 15 May, 2009 1 commit
-
-
Simon Marlow authored
-
- 13 Mar, 2009 1 commit
-
-
Simon Marlow authored
This reduces the latency between a context-switch being triggered and the thread returning to the scheduler, which in turn should reduce the cost of the GC barrier when there are many cores. We still retain the old context_switch flag which is checked at the end of each block of allocation. The idea is that setting HpLim may fail if the the target thread is modifying HpLim at the same time; the context_switch flag is a fallback. It also allows us to "context switch soon" without forcing an immediate switch, which can be costly.
-
- 11 Mar, 2009 1 commit
-
-
Ian Lynagh authored
-
- 06 Mar, 2009 1 commit
-
-
Simon Marlow authored
- add newAlignedPinnedByteArray# for allocating pinned BAs with arbitrary alignment - the old newPinnedByteArray# now aligns to 16 bytes Foreign.alloca will use newAlignedPinnedByteArray#, and so might end up wasting less space than before (we used to align to 8 by default). Foreign.allocaBytes and Foreign.mallocForeignPtrBytes will get 16-byte aligned memory, which is enough to avoid problems with SSE instructions on x86, for example. There was a bug in the old newPinnedByteArray#: it aligned to 8 bytes, but would have failed if the header was not a multiple of 8 (fortunately it always was, even with profiling). Also we occasionally wasted some space unnecessarily due to alignment in allocatePinned(). I haven't done anything about Foreign.malloc/mallocBytes, which will give you the same alignment guarantees as malloc() (8 bytes on Linux/x86 here).
-
- 19 Feb, 2009 2 commits
-
-
Simon Marlow authored
-
Simon Marlow authored
-
- 27 Jan, 2009 1 commit
-
-
SamB authored
In this version, I untag R1 before using it, and even enter R2 at the end rather than simply returning it (which didn't work right when R2 was a thunk).
-
- 07 Jan, 2009 1 commit
-
-
Simon Marlow authored
-
- 10 Dec, 2008 1 commit
-
-
Simon Marlow authored
Patch originally by Ivan Tomac <tomac@pacific.net.au>, amended by Simon Marlow: - mkWeakFinalizer# commoned up with mkWeakFinalizerEnv# - GC parameters to ALLOC_PRIM fixed
-
- 14 Aug, 2008 1 commit
-
-
dias@eecs.harvard.edu authored
This merge does not turn on the new codegen (which only compiles a select few programs at this point), but it does introduce some changes to the old code generator. The high bits: 1. The Rep Swamp patch is finally here. The highlight is that the representation of types at the machine level has changed. Consequently, this patch contains updates across several back ends. 2. The new Stg -> Cmm path is here, although it appears to have a fair number of bugs lurking. 3. Many improvements along the CmmCPSZ path, including: o stack layout o some code for infotables, half of which is right and half wrong o proc-point splitting
-
- 07 Nov, 2008 1 commit
-
-
Simon Marlow authored
-
- 06 Nov, 2008 2 commits
-
-
Simon Marlow authored
lost in patch "Run sparks in batches"
-
Simon Marlow authored
Signficantly reduces the overhead for par, which means that we can make use of paralellism at a much finer granularity.
-
- 10 Oct, 2008 1 commit
-
-
Simon Marlow authored
-
- 08 Oct, 2008 1 commit
-
-
Simon Marlow authored
This should improve scaling when using atomicModifyIORef
-
- 19 Sep, 2008 1 commit
-
-
Simon Marlow authored
Fixes a long-standing bug that could in some cases cause sub-optimal scheduling behaviour.
-
- 12 Aug, 2008 1 commit
-
-
Ross Paterson authored
-
- 30 Jul, 2008 1 commit
-
-
Ian Lynagh authored
-
- 28 Jul, 2008 1 commit
-
-
Simon Marlow authored
When returning an unboxed tuple with a single non-void component, we now use the same calling convention as for returning a value of the same type as that component. This means that the return convention for IO now doesn't vary depending on the platform, which make some parts of the RTS simpler, and fixes a problem I was having with making the FFI work in unregisterised GHCi (the byte-code compiler makes some assumptions about calling conventions to keep things simple).
-
- 10 Jul, 2008 2 commits
-
-
Simon Marlow authored
-
Simon Marlow authored
fixes crash with -threaded -debug for me
-
- 09 Jul, 2008 1 commit
-
-
Simon Marlow authored
This showed up as a crash in conc032 for me.
-
- 17 Jun, 2008 1 commit
-
-
Simon Marlow authored
-
- 03 Jun, 2008 1 commit
-
-
Simon Marlow authored
-
- 16 Apr, 2008 3 commits
-
-
simonmarhaskell@gmail.com authored
-
simonmarhaskell@gmail.com authored
-
simonmarhaskell@gmail.com authored
-
- 14 Jun, 2008 1 commit
-
-
Ian Lynagh authored
-
- 26 Apr, 2008 1 commit
-
-
Ian Lynagh authored
In delayzh_fast we act as if tickInterval was 50, not 0.
-