- 11 Mar, 2010 1 commit
-
-
Simon Marlow authored
This replaces some complicated locking schemes with message-passing in the implementation of throwTo. The benefits are - previously it was impossible to guarantee that a throwTo from a thread running on one CPU to a thread running on another CPU would be noticed, and we had to rely on the GC to pick up these forgotten exceptions. This no longer happens. - the locking regime is simpler (though the code is about the same size) - threads can be unblocked from a blocked_exceptions queue without having to traverse the whole queue now. It's a rare case, but replaces an O(n) operation with an O(1). - generally we move in the direction of sharing less between Capabilities (aka HECs), which will become important with other changes we have planned. Also in this patch I replaced several STM-specific closure types with a generic MUT_PRIM closure type, which allowed a lot of code in the GC and other places to go away, hence the line-count reduction. The message-passing changes resulted in about a net zero line-count difference.
-
- 03 Aug, 2009 2 commits
-
-
Simon Marlow authored
-
Simon Marlow authored
-
- 18 Nov, 2008 1 commit
-
-
Simon Marlow authored
Eager blackholing can improve parallel performance by reducing the chances that two threads perform the same computation. However, it has a cost: one extra memory write per thunk entry. To get the best results, any code which may be executed in parallel should be compiled with eager blackholing turned on. But since there's a cost for sequential code, we make it optional and turn it on for the parallel package only. It might be a good idea to compile applications (or modules) with parallel code in with -feager-blackholing. ToDo: document -feager-blackholing.
-
- 31 Oct, 2007 1 commit
-
-
Simon Marlow authored
eg. use +RTS -g2 -RTS for 2 threads. Only major GCs are parallelised, minor GCs are still sequential. Don't use more threads than you have CPUs. It works most of the time, although you won't see much speedup yet. Tuning and more work on stability still required.
-
- 11 Oct, 2007 1 commit
-
-
Simon Marlow authored
Previously MVars were always on the mutable list of the old generation, which meant every MVar was visited during every minor GC. With lots of MVars hanging around, this gets expensive. We addressed this problem for MUT_VARs (aka IORefs) a while ago, the solution is to use a traditional GC write-barrier when the object is modified. This patch does the same thing for MVars. TVars are still done the old way, they could probably benefit from the same treatment too.
-
- 06 Jun, 2007 1 commit
-
-
Michael D. Adams authored
-
- 28 Feb, 2007 1 commit
-
-
Simon Marlow authored
We recently discovered that they aren't a win any more, and just cost code size.
-
- 07 Oct, 2006 1 commit
-
-
tharris@microsoft.com authored
-
- 07 Sep, 2006 1 commit
-
-
Simon Marlow authored
These closure types aren't used/needed, as far as I can tell. The commoning up of Chars/Ints happens by comparing info pointers, and the info table for a dynamic C#/I# is CONSTR_0_1. The RTS seemed a little confused about whether CONSTR_CHARLIKE/CONSTR_INTLIKE were supposed to be static or dynamic closures, too.
-
- 07 Apr, 2006 1 commit
-
-
Simon Marlow authored
Most of the other users of the fptools build system have migrated to Cabal, and with the move to darcs we can now flatten the source tree without losing history, so here goes. The main change is that the ghc/ subdir is gone, and most of what it contained is now at the top level. The build system now makes no pretense at being multi-project, it is just the GHC build system. No doubt this will break many things, and there will be a period of instability while we fix the dependencies. A straightforward build should work, but I haven't yet fixed binary/source distributions. Changes to the Building Guide will follow, too.
-
- 17 Jan, 2006 2 commits
-
-
simonmar authored
Improve the GC behaviour of IORefs (see Ticket #650). This is a small change to the way IORefs interact with the GC, which should improve GC performance for programs with plenty of IORefs. Previously we had a single closure type for mutable variables, MUT_VAR. Mutable variables were *always* on the mutable list in older generations, and always traversed on every GC. Now, we have two closure types: MUT_VAR_CLEAN and MUT_VAR_DIRTY. The latter is on the mutable list, but the former is not. (NB. this differs from MUT_ARR_PTRS_CLEAN and MUT_ARR_PTRS_DIRTY, both of which are on the mutable list). writeMutVar# now implements a write barrier, by calling dirty_MUT_VAR() in the runtime, that does the necessary modification of MUT_VAR_CLEAN into MUT_VAR_DIRY, and adding to the mutable list if necessary. This results in some pretty dramatic speedups for GHC itself. I've just measureed a 30% overall speedup compiling a 31-module program (anna) with the default heap settings :-D
-
simonmar authored
Improve the GC behaviour of IOArrays/STArrays See Ticket #650 This is a small change to the way mutable arrays interact with the GC, that can have a dramatic effect on performance, and make tricks with unsafeThaw/unsafeFreeze redundant. Data.HashTable should be faster now (I haven't measured it yet). We now have two mutable array closure types, MUT_ARR_PTRS_CLEAN and MUT_ARR_PTRS_DIRTY. Both are on the mutable list if the array is in an old generation. writeArray# sets the type to MUT_ARR_PTRS_DIRTY. The garbage collector can set the type to MUT_ARR_PTRS_CLEAN if it finds that no element of the array points into a younger generation (discovering this required a small addition to evacuate(), but rough tests indicate that it doesn't measurably affect performance). NOTE: none of this affects unboxed arrays (IOUArray/STUArray), only boxed arrays (IOArray/STArray). We could go further and extend the DIRTY bit to be per-block rather than for the whole array, but for now this is an easy improvement.
-
- 07 Nov, 2005 1 commit
-
-
simonmar authored
Fix some problems with array thawing/freezing and the GC.
-
- 25 Jul, 2005 1 commit
-
-
simonmar authored
Remove the ForeignObj# type, and all its PrimOps. The new efficient representation of ForeignPtr doesn't use ForeignObj# underneath, and there seems no need to keep it.
-
- 20 Apr, 2005 1 commit
-
-
simonmar authored
Update to match ClosureTypes.h
-
- 18 Nov, 2004 1 commit
-
-
tharris authored
Support for atomic memory transactions and associated regression tests conc041-048
-
- 12 Sep, 2004 1 commit
-
-
panne authored
Removed the annoying "Id" CVS keywords, they're a real PITA when it comes to merging...
-
- 11 Dec, 2002 1 commit
-
-
simonmar authored
Merge the eval-apply-branch on to the HEAD ------------------------------------------ This is a change to GHC's evaluation model in order to ultimately make GHC more portable and to reduce complexity in some areas. At some point we'll update the commentary to describe the new state of the RTS. Pending that, the highlights of this change are: - No more Su. The Su register is gone, update frames are one word smaller. - Slow-entry points and arg checks are gone. Unknown function calls are handled by automatically-generated RTS entry points (AutoApply.hc, generated by the program in utils/genapply). - The stack layout is stricter: there are no "pending arguments" on the stack any more, the stack is always strictly a sequence of stack frames. This means that there's no need for LOOKS_LIKE_GHC_INFO() or LOOKS_LIKE_STATIC_CLOSURE() any more, and GHC doesn't need to know how to find the boundary between the text and data segments (BIG WIN!). - A couple of nasty hacks in the mangler caused by the neet to identify closure ptrs vs. info tables have gone away. - Info tables are a bit more complicated. See InfoTables.h for the details. - As a side effect, GHCi can now deal with polymorphic seq. Some bugs in GHCi which affected primitives and unboxed tuples are now fixed. - Binary sizes are reduced by about 7% on x86. Performance is roughly similar, some programs get faster while some get slower. I've seen GHCi perform worse on some examples, but haven't investigated further yet (GHCi performance *should* be about the same or better in theory). - Internally the code generator is rather better organised. I've moved info-table generation from the NCG into the main codeGen where it is shared with the C back-end; info tables are now emitted as arrays of words in both back-ends. The NCG is one step closer to being able to support profiling. This has all been fairly thoroughly tested, but no doubt I've messed up the commit in some way.
-
- 19 Apr, 2002 1 commit
-
-
simonmar authored
Update this file to not use ISO C99 labelled initializers - this means it will compile on MacOS/X.
-
- 14 Aug, 2001 1 commit
-
-
sewardj authored
Change the story about POSIX headers in C compilation. Until now, all C code in the RTS and library cbits has by default been compiled with settings for POSIXness enabled, that is: #define _POSIX_SOURCE 1 #define _POSIX_C_SOURCE 199309L #define _ISOC9X_SOURCE If you wanted to negate this, you'd have to define NON_POSIX_SOURCE before including headers. This scheme has some bad effects: * It means that ccall-unfoldings exported via interfaces from a module compiled with -DNON_POSIX_SOURCE may not compile when imported into a module which does not -DNON_POSIX_SOURCE. * It overlaps with the feature tests we do with autoconf. * It seems to have caused borkage in the Solaris builds for some considerable period of time. The New Way is: * The default changes to not-being-in-Posix mode. * If you want to force a C file into Posix mode, #include as the **first** include the new file ghc/includes/PosixSource.h. Most of the RTS C sources have this include now. * NON_POSIX_SOURCE is almost totally expunged. Unfortunately we have to retain some vestiges of it in ghc/compiler so that modules compiled via C on Solaris using older compilers don't break.
-
- 23 Jul, 2001 1 commit
-
-
simonmar authored
Add a compacting garbage collector. It isn't enabled by default, as there are still a couple of problems: there's a fallback case I haven't implemented yet which means it will occasionally bomb out, and speed-wise it's quite a bit slower than the copying collector (about 1.8x slower). Until I can make it go faster, it'll only be useful when you're actually running low on real memory. '+RTS -c' to enable it. Oh, and I cleaned up a few things in the RTS while I was there, and fixed one or two possibly real bugs in the existing GC.
-
- 22 Mar, 2001 1 commit
-
-
hwloidl authored
-*- outline -*- Time-stamp: <Thu Mar 22 2001 03:50:16 Stardate: [-30]6365.79 hwloidl> This commit covers changes in GHC to get GUM (way=mp) and GUM/GdH (way=md) working. It is a merge of my working version of GUM, based on GHC 4.06, with GHC 4.11. Almost all changes are in the RTS (see below). GUM is reasonably stable, we used the 4.06 version in large-ish programs for recent papers. Couple of things I want to change, but nothing urgent. GUM/GdH has just been merged and needs more testing. Hope to do that in the next weeks. It works in our working build but needs tweaking to run. GranSim doesn't work yet (*sigh*). Most of the code should be in, but needs more debugging. ToDo: I still want to make the following minor modifications before the release - Better wrapper skript for parallel execution [ghc/compiler/main] - Update parallel docu: started on it but it's minimal [ghc/docs/users_guide] - Clean up [nofib/parallel]: it's a real mess right now (*sigh*) - Update visualisation tools (minor things only IIRC) [ghc/utils/parallel] - Add a Klingon-English glossary * RTS: Almost all changes are restricted to ghc/rts/parallel and should not interfere with the rest. I only comment on changes outside the parallel dir: - Several changes in Schedule.c (scheduling loop; createThreads etc); should only affect parallel code - Added ghc/rts/hooks/ShutdownEachPEHook.c - ghc/rts/Linker.[ch]: GUM doesn't know about Stable Names (ifdefs)!! - StgMiscClosures.h: END_TSO_QUEUE etc now defined here (from StgMiscClosures.hc) END_ECAF_LIST was missing a leading stg_ - SchedAPI.h: taskStart now defined in here; it's only a wrapper around scheduleThread now, but might use some init, shutdown later - RtsAPI.h: I have nuked the def of rts_evalNothing * Compiler: - ghc/compiler/main/DriverState.hs added PVM-ish flags to the parallel way added new ways for parallel ticky profiling and distributed exec - ghc/compiler/main/DriverPipeline.hs added a fct run_phase_MoveBinary which is called with way=mp after linking; it moves the bin file into a PVM dir and produces a wrapper script for parallel execution maybe cleaner to add a MoveBinary phase in DriverPhases.hs but this way it's less intrusive and MoveBinary makes probably only sense for mp anyway * Nofib: - nofib/spectral/Makefile, nofib/real/Makefile, ghc/tests/programs/Makefile: modified to skip some tests if HWL_NOFIB_HACK is set; only tmp to record which test prgs cause problems in my working build right now
-
- 02 Mar, 2001 1 commit
-
-
simonmar authored
Add a new closure flag, IND, to identify indirections.
-
- 29 Jan, 2001 1 commit
-
-
simonmar authored
Remove the old Hugs CAF code, install our own (minimal, somewhat cryptic, but better commented) CAF reversion story. See Storage.c:newCaf() for the details.
-
- 13 Jan, 2000 1 commit
-
-
hwloidl authored
Merged GUM-4-04 branch into the main trunk. In particular merged GUM and SMP code. Most of the GranSim code in GUM-4-04 still has to be carried over.
-
- 12 Jan, 2000 1 commit
-
-
simonmar authored
mark INDirections as non-sparkable.
-
- 09 Nov, 1999 1 commit
-
-
simonmar authored
A slew of SMP-related changes. - New locking scheme for thunks: we now check whether the thunk being entered is in our private allocation area, and if so we don't lock it. Well, that's the upshot. In practice it's a lot more fiddly than that. - I/O blocking is handled a bit more sanely now (but still not properly, methinks) - deadlock detection is back - remove old pre-SMP scheduler code - revamp the timing code. We actually get reasonable-looking timing info for SMP programs now. - fix a bug in the garbage collector to do with IND_OLDGENs appearing on the mutable list of the old generation. - move BDescr() function from rts/BlockAlloc.h to includes/Block.h. - move struct generation and struct step into includes/StgStorage.h (sigh) - add UPD_IND_NOLOCK for updating with an indirection where locking the black hole is not required.
-
- 02 Nov, 1999 1 commit
-
-
simonmar authored
This commit adds in the current state of our SMP support. Notably, this allows the new way 's' to be built, providing support for running multiple Haskell threads simultaneously on top of any pthreads implementation, the idea being to take advantage of commodity SMP boxes. Don't expect to get much of a speedup yet; due to the excessive locking required to synchronise access to mutable heap objects, you'll see a slowdown in most cases, even on a UP machine. The best I've seen is a 1.6-1.7 speedup on an example that did no locking (two optimised nfibs in parallel). - new RTS -N flag specifies how many pthreads to start. - new driver -smp flag, tells the driver to use way 's'. - new compiler -fsmp option (not for user comsumption) tells the compiler not to generate direct jumps to thunk entry code. - largely rewritten scheduler - _ccall_GC is now done by handing back a "token" to the RTS before executing the ccall; it should now be possible to execute blocking ccalls in the current thread while allowing the RTS to continue running Haskell threads as normal. - you can only call thread-safe C libraries from a way 's' build, of course. Pthread support is still incomplete, and weird things (including deadlocks) are likely to happen.
-
- 11 May, 1999 1 commit
-
-
keithw authored
(this is number 9 of 9 commits to be applied together) Usage verification changes / ticky-ticky changes: We want to verify that SingleEntry thunks are indeed entered at most once. In order to do this, -ticky / -DTICKY_TICKY turns on eager blackholing. We blackhole with new blackholes: SE_BLACKHOLE and SE_CAF_BLACKHOLE. We will enter one of these if we attempt to enter a SingleEntry thunk twice. Note that CAFs are dealt with in by codeGen, and ordinary thunks by the RTS. We also want to see how many times we enter each Updatable thunk. To this end, we have modified -ticky. When -ticky is on, we update with a permanent indirection, and arrange that when we enter a permanent indirection we count the entry and then convert the indirection to a normal indirection. This gives us a means of counting the number of thunks entered again after the first entry. Obviously this screws up profiling, and so you can't build a ticky and profiling compiler any more. Also a few other changes that didn't make it into the previous 8 commits, but form a part of this set.
-
- 15 Mar, 1999 1 commit
-
-
simonm authored
Remove flags field from info tables; create a separate table of flags indexed by the closure type in the RTS.
-