- 10 Feb, 2005 1 commit
-
-
simonmar authored
GC changes: instead of threading old-generation mutable lists through objects in the heap, keep it in a separate flat array. This has some advantages: - the IND_OLDGEN object is now only 2 words, so the minimum size of a THUNK is now 2 words instead of 3. This saves some amount of allocation (about 2% on average according to my measurements), and is more friendly to the cache by squashing objects together more. - keeping the mutable list separate from the IND object will be necessary for our multiprocessor implementation. - removing the mut_link field makes the layout of some objects more uniform, leading to less complexity and special cases. - I also unified the two mutable lists (mut_once_list and mut_list) into a single mutable list, which lead to more simplifications in the GC.
-
- 07 Oct, 2004 1 commit
-
-
wolfgang authored
Position Independent Code and Dynamic Linking Support, Part 1 This commit allows generation of position independent code (PIC) that fully supports dynamic linking on Mac OS X and PowerPC Linux. Other platforms are not yet supported, and there is no support for actually linking or using dynamic libraries - so if you use the -fPIC or -dynamic code generation flags, you have to type your (platform-specific) linker command lines yourself. nativeGen/PositionIndependentCode.hs: New file. Look here for some more comments on how this works. cmm/CLabel.hs: Add support for DynamicLinkerLabels and PIC base labels - for use inside the NCG. needsCDecl: Case alternative labels now need C decls, see the codeGen/CgInfoTbls.hs below for details cmm/Cmm.hs: Add CmmPicBaseReg (used in NCG), and CmmLabelDiffOff (used in NCG and for offsets in info tables) cmm/CmmParse.y: support offsets in info tables cmm/PprC.hs: support CmmLabelDiffOff Case alternative labels now need C decls (see the codeGen/CgInfoTbls.hs for details), so we need to pprDataExterns for info tables. cmm/PprCmm.hs: support CmmLabelDiffOff codeGen/CgInfoTbls.hs: no longer store absolute addresses in info tables, instead, we store offsets. Also, for vectored return points, emit the alternatives _after_ the vector table. This is to work around a limitation in Apple's as, which refuses to handle label differences where one label is at the end of a section. Emitting alternatives after vector info tables makes sure this never happens in GHC generated code. Case alternatives now require prototypes in hc code, though (see changes in PprC.hs, CLabel.hs). main/CmdLineOpts.lhs: Add a new option, -fPIC. main/DriverFlags.hs: Pass the correct options for PIC to gcc, depending on the platform. Only for powerpc for now. nativeGen/AsmCodeGen.hs: Many changes... Mac OS X-specific management of import stubs is no longer, it's now part of a general mechanism to handle such things for all platforms that need it (Darwin [both ppc and x86], Linux on ppc, and some platforms we don't support). Move cmmToCmm into its own monad which can accumulate a list of imported symbols. Make it call cmmMakeDynamicReference at the right places. nativeGen/MachCodeGen.hs: nativeGen/MachInstrs.hs: nativeGen/MachRegs.lhs: nativeGen/PprMach.hs: nativeGen/RegAllocInfo.hs: Too many changes to enumerate here, PowerPC specific. nativeGen/NCGMonad.hs: NatM still tracks imported symbols, as more labels can be created during code generation (float literals, jump tables; on some platforms all data access has to go through the dynamic linking mechanism). driver/mangler/ghc-asm.lprl: Mangle absolute addresses in info tables to offsets. Correctly pass through GCC-generated PIC for Mac OS X and powerpc linux. includes/Cmm.h: includes/InfoTables.h: includes/Storage.h: includes/mkDerivedConstants.c: rts/GC.c: rts/GCCompact.c: rts/HeapStackCheck.cmm: rts/Printer.c: rts/RetainerProfile.c: rts/Sanity.c: Adapt to the fact that info tables now contain offsets. rts/Linker.c: Mac-specific: change machoInitSymbolsWithoutUnderscore to support PIC.
-
- 13 Aug, 2004 2 commits
- 12 Nov, 2003 1 commit
-
-
sof authored
Tweaks to have RTS (C) sources compile with MSVC. Apart from wibbles related to the handling of 'inline', changed Schedule.h:POP_RUN_QUEUE() not to use expression-level statement blocks.
-
- 22 Apr, 2003 1 commit
-
-
simonmar authored
Fix an obscure bug: the most general kind of heap check, HEAP_CHECK_GEN(), is supposed to save the contents of *every* register known to the STG machine (used in cases where we either can't figure out which ones are live, or doing so would be too much hassle). The problem is that it wasn't saving the L1 register. A slight complication arose in that saving the L1 register pushed the size of the frame over the 16 words allowed for the size of the bitmap stored in the frame, so I changed the layout of the frame a bit. Describing all the registers using a single bitmap is overkill when only 8 of them can actually be pointers, so now the bitmap is only 8 bits long and we always skip over a fixed number of non-ptr words to account for all the non-ptr regs. This is all described in StgMacros.h.
-
- 27 Mar, 2003 1 commit
-
-
simonmar authored
Two performance tweaks: - Use specialised indirections, which perform the right kind of return without needing to enter the object they point to. This saves a small percentages of memory reads. - Tweak the update code to generate better code with gcc. This saves a few instructions per update.
-
- 26 Mar, 2003 1 commit
-
-
sof authored
wibbles - drop references to PleaseStopAllocating(), use CloseNursery() to express ExtendNursery()
-
- 24 Mar, 2003 1 commit
-
-
simonmar authored
Fix some bugs in compacting GC. Bug 1: When threading the fields of an AP or PAP, we were grabbing the info table of the function without unthreading it first. Bug 2: eval_thunk_selector() might accidentally find itself in to-space when going through indirections in a compacted generation. We must check for this case and bale out if necessary. Bug 3: This is somewhat more nasty. When we have an AP or PAP that points to a BCO, the layout info for the AP/PAP is in the BCO's instruction array, which is two objects deep from the AP/PAP itself. The trouble is, during compacting GC, we can only safely look one object deep from the current object, because pointers from objects any deeper might have been already updated to point to their final destinations. The solution is to put the arity and bitmap info for a BCO into the BCO object itself. This means BCOs become variable-length, which is a slight annoyance, but it also means that looking up the arity/bitmap is quicker. There is a slight reduction in complexity in the byte code generator due to not having to stuff the bitmap at the front of the instruction stream.
-
- 21 Mar, 2003 1 commit
-
-
sof authored
Friday morning code-wibbling: - made RetainerProfile.c:firstStack a 'static' - added RetainerProfile.c:retainerStackBlocks()
-
- 13 Dec, 2002 1 commit
-
-
simonmar authored
Fix bug in stack_frame_sizeW
-
- 11 Dec, 2002 1 commit
-
-
simonmar authored
Merge the eval-apply-branch on to the HEAD ------------------------------------------ This is a change to GHC's evaluation model in order to ultimately make GHC more portable and to reduce complexity in some areas. At some point we'll update the commentary to describe the new state of the RTS. Pending that, the highlights of this change are: - No more Su. The Su register is gone, update frames are one word smaller. - Slow-entry points and arg checks are gone. Unknown function calls are handled by automatically-generated RTS entry points (AutoApply.hc, generated by the program in utils/genapply). - The stack layout is stricter: there are no "pending arguments" on the stack any more, the stack is always strictly a sequence of stack frames. This means that there's no need for LOOKS_LIKE_GHC_INFO() or LOOKS_LIKE_STATIC_CLOSURE() any more, and GHC doesn't need to know how to find the boundary between the text and data segments (BIG WIN!). - A couple of nasty hacks in the mangler caused by the neet to identify closure ptrs vs. info tables have gone away. - Info tables are a bit more complicated. See InfoTables.h for the details. - As a side effect, GHCi can now deal with polymorphic seq. Some bugs in GHCi which affected primitives and unboxed tuples are now fixed. - Binary sizes are reduced by about 7% on x86. Performance is roughly similar, some programs get faster while some get slower. I've seen GHCi perform worse on some examples, but haven't investigated further yet (GHCi performance *should* be about the same or better in theory). - Internally the code generator is rather better organised. I've moved info-table generation from the NCG into the main codeGen where it is shared with the C back-end; info tables are now emitted as arrays of words in both back-ends. The NCG is one step closer to being able to support profiling. This has all been fairly thoroughly tested, but no doubt I've messed up the commit in some way.
-
- 21 Oct, 2002 1 commit
-
-
simonmar authored
Bite the bullet and generalise the central memory allocation scheme. Previously we tried to allocate memory starting from a fixed address, which was set for each architecture (0x5000000 was a common one), and to decide whether a particular address was in the heap or not we would do a simple comparison against this address. This doesn't work too well, because: - if we dynamically-load some objects above the boundary, the heap-allocated test becomes invalid - on windows we have less control, and the heap might be split into multiple sections - it turns out that on some Linux kernels we don't get memory where we asked for it. This might be a bug in those kernels, but it exposes the fragility of our allocation scheme. The solution is to bite the bullet and maintain a table mapping addresses to a value indicating whether that address is in the heap or not. Since we normally allocate heap in chunks of 1Mb, the table is quite small: 4k on a 32-bit machine, using one byte for each 1Mb block. Testing an address for heap residency now involves a memory access, but the table is normally cache-resident. I didn't manage to measure any slowdown after making the change. On a 64-bit machine, we'll need to use a 2-level table; I haven't implemented that yet. Now we can generalise the procedure used to grab memory from the OS. In the general case, we allocate one megablock more than we need to, and trim off the slop around the allocation to leave an aligned chunk. The next time around, however, we try to allocate memory right after the last chunk allocated, on the grounds that it is aligned and probably free: if this doesn't work, we have to back off to the general mechanism (it seems to work most of the time). This cleans up the Windows story too: is_heap_alloced() has gone, and we should be able to handle more than 256M of memory (or whatever the arbitrary limit was before). MERGE TO STABLE (after lots of testing)
-
- 26 Mar, 2002 2 commits
-
-
sof authored
TEXT_BEFORE_HEAP & cygwin: same as for mingw
-
simonmar authored
A couple of cleanups to the previous change: we should test TABLES_NEXT_TO_CODE rather than USE_MINIINTERPRETER to enable the MacOSX "plan C", and use structure field selection rather than array indexing to get the entry code ptr from the info table.
-
- 21 Mar, 2002 1 commit
-
-
sebc authored
Implement Plan C, with correct code to detect the data and text sections for MacOS X. Also add a sanity check in initStorage, to make sure we are able to make the distinction between closures and infotables.
-
- 14 Feb, 2002 1 commit
-
-
sof authored
widen the scope of is_heap_alloced() proto; for all mingw builds
-
- 04 Feb, 2002 1 commit
-
-
sof authored
- sm_mutex is now a Mutex (not a pthread_mutex_t). - sm_mutex lock/unlocks are only done for SMP builds.
-
- 01 Feb, 2002 1 commit
-
-
simonmar authored
When distinguishing between code & data pointers, rather than testing for membership of the text section, test for not membership of one of the data sections. The reason for this change is that testing for membership of the text section was fragile: we could only test whether a value was smaller than the end address, because there doesn't appear to be a portable way to find the beginning of the text section. Indeed, the test breaks on very recent Linux kernels which mmap() memory below the program text. In fact, the reversed test may be faster because the expected common case is when the pointer is into the dynamic heap, and we eliminate these case immediately in the new test. A quick test shows no measurable performance difference with the change. MERGE TO STABLE
-
- 25 Jan, 2002 1 commit
-
-
simonmar authored
Fix bit-rot in TICKY_TICKY
-
- 22 Nov, 2001 1 commit
-
-
simonmar authored
Retainer Profiling / Lag-drag-void profiling. This is mostly work by Sungwoo Park, who spent a summer internship at MSR Cambridge this year implementing these two types of heap profiling in GHC. Relative to Sungwoo's original work, I've made some improvements to the code: - it's now possible to apply constraints to retainer and LDV profiles in the same way as we do for other types of heap profile (eg. +RTS -hc{foo,bar} -hR -RTS gives you a retainer profiling considering only closures with cost centres 'foo' and 'bar'). - the heap-profile timer implementation is cleaned up. - heap profiling no longer has to be run in a two-space heap. - general cleanup of the code and application of the SDM C coding style guidelines. Profiling will be a little slower and require more space than before, mainly because closures have an extra header word to support either retainer profiling or LDV profiling (you can't do both at the same time). We've used the new profiling tools on GHC itself, with moderate success. Fixes for some space leaks in GHC to follow...
-
- 08 Aug, 2001 1 commit
-
-
simonmar authored
Had a brainwave on the way to work this morning, and realised that the garbage collector can handle "pinned objects" as long as they don't contain any pointers. This is absolutely ideal for doing temporary allocation in the FFI, because what we really want to do is allocate a pinned ByteArray and let the GC clean it up later. So this set of changes adds the required framework. There are two new primops: newPinnedByteArray# :: Int# -> State# s -> (# State# s, MutByteArr# s #) byteArrayContents# :: ByteArr# -> Addr# obviously byteArrayContents# is highly unsafe. Allocating a pinned ByteArr# isn't the default, because a pinned ByteArr# will hold an entire block (currently 4k) live until it is garbage collected (that doesn't mean each pinned ByteArr# requires 4k of storage, just that if a block contains a single live pinned ByteArray, the whole block must be retained).
-
- 24 Jul, 2001 1 commit
-
-
ken authored
Innocent changes to resurrect/add 64-bit support.
-
- 23 Jul, 2001 2 commits
-
-
simonmar authored
Add a compacting garbage collector. It isn't enabled by default, as there are still a couple of problems: there's a fallback case I haven't implemented yet which means it will occasionally bomb out, and speed-wise it's quite a bit slower than the copying collector (about 1.8x slower). Until I can make it go faster, it'll only be useful when you're actually running low on real memory. '+RTS -c' to enable it. Oh, and I cleaned up a few things in the RTS while I was there, and fixed one or two possibly real bugs in the existing GC.
-
simonmar authored
Small changes to improve GC performance slightly: - store the generation *number* in the block descriptor rather than a pointer to the generation structure, since the most common operation is to pull out the generation number, and it's one less indirection this way. - cache the generation number in the step structure too, which avoids an extra indirection in several places.
-
- 03 May, 2001 1 commit
-
-
simonmar authored
silence gcc 2.96 warning
-
- 02 Mar, 2001 2 commits
-
-
simonmar authored
ASSERT in updateWithIndirection() that we haven't already updated this object with an indirection, and fix two places in the RTS where this could happen. The problem only occurs when we're in a black-hole-style loop, and there are multiple update frames on the stack pointing to the same object (this is possible because of lazy black-holing). Both stack squeezing and asynchronous exception raising walk down the stack and remove update frames, updating their contents with indirections. If we don't protect against multiple updates, the mutable list in the old generation may get into a bogus state.
-
simonmar authored
Add some ASSERT()s so we can catch updates where updatee==target.
-
- 11 Feb, 2001 1 commit
-
-
simonmar authored
Bite the bullet and make GHCi support non-optional in the RTS. GHC 4.11 should be able to build GHCi without any additional tweaks now. - the Linker is split into two parts: LinkerBasic.c, containing the routines required by the rest of the RTS, and Linker.c, containing the linker proper, which is not referred to from the rest of the RTS. Only Linker.c requires -ldl, so programs which don't make use of the linker (everything except GHC, in other words) won't need -ldl.
-
- 09 Feb, 2001 2 commits
- 08 Feb, 2001 1 commit
-
-
simonmar authored
Fix bitrot in SMP code.
-
- 29 Jan, 2001 1 commit
-
-
simonmar authored
Remove the old Hugs CAF code, install our own (minimal, somewhat cryptic, but better commented) CAF reversion story. See Storage.c:newCaf() for the details.
-
- 26 Jan, 2001 2 commits
- 24 Jan, 2001 1 commit
-
-
simonmar authored
Add a CAF list for GHCI. Retaining all looked-up symbols in a list in the interpreter was the Wrong Thing To Do, since we can't guarantee that the transitive closure of this list points to all the CAFs so far evaluated (the transitive closure gets smaller as reachable CAFs are evaluated). A Better Thing To Do is just to retain all the CAFs. A refinement is to only retain all CAFs in dynamically linked code, which is what this patch implements.
-
- 09 Jan, 2001 1 commit
-
-
sewardj authored
Various bug fixes for the interpreter/byte-code-gen combination.
-
- 19 Dec, 2000 1 commit
-
-
simonmar authored
Remove setHeapSize, we'll do this in the compiler proper now.
-
- 11 Dec, 2000 1 commit
-
-
simonmar authored
- update representation of BCOs - add setHeapSize for use from within GHC
-
- 04 Dec, 2000 1 commit
-
-
simonmar authored
merge recent changes from before-ghci-branch onto the HEAD
-