- 20 Oct, 2014 1 commit
-
-
Edward Z. Yang authored
This reverts commit 35672072. Conflicts: compiler/main/DriverPipeline.hs
-
- 02 Oct, 2014 1 commit
-
-
Edward Z. Yang authored
Summary: In preparation for indirecting all references to closures, we rename _closure to _static_closure to ensure any old code will get an undefined symbol error. In order to reference a closure foobar_closure (which is now undefined), you should instead use STATIC_CLOSURE(foobar). For convenience, a number of these old identifiers are macro'd. Across C-- and C (Windows and otherwise), there were differing conventions on whether or not foobar_closure or &foobar_closure was the address of the closure. Now, all foobar_closure references are addresses, and no & is necessary. CHARLIKE/INTLIKE were not changed, simply alpha-renamed. Part of remove HEAP_ALLOCED patch set (#8199) Depends on D265 Signed-off-by:
Edward Z. Yang <ezyang@mit.edu> Test Plan: validate Reviewers: simonmar, austin Subscribers: simonmar, ezyang, carter, thomie Differential Revision: https://phabricator.haskell.org/D267 GHC Trac Issues: #8199
-
- 26 Sep, 2014 1 commit
-
-
thomie authored
Summary: More thorough version of a75383cd Test Plan: change of comments only [skip ci] Reviewers: austin, simonmar, ekmett Reviewed By: austin, ekmett Subscribers: simonmar, ezyang, carter Differential Revision: https://phabricator.haskell.org/D239
-
- 04 Apr, 2014 1 commit
-
-
Simon Marlow authored
-
- 21 Oct, 2013 1 commit
-
-
Edward Z. Yang authored
Signed-off-by:
Edward Z. Yang <ezyang@mit.edu>
-
- 01 Oct, 2013 1 commit
-
-
Simon Marlow authored
We were passing the function address to stg_gc_prim_p in R9, which was wrong because the call was a high-level call and didn't declare R9 as a parameter. Passing R9 as an argument is the right way, but unfortunately that exposed another bug: we were using the same macro in some low-level Cmm, where it is illegal to call functions with arguments (see Note [syntax of cmm files]). So we now have low-level variants of STK_CHK() and STK_CHK_P() for use in low-level Cmm code.
-
- 29 Mar, 2013 1 commit
-
-
nfrisby authored
* the new StgCmmArgRep module breaks a dependency cycle; I also untabified it, but made no real changes * updated the documentation in the wiki and change the user guide to point there * moved the allocation enters for ticky and CCS to after the heap check * I left LDV where it was, which was before the heap check at least once, since I have no idea what it is * standardized all (active?) ticky alloc totals to bytes * in order to avoid double counting StgCmmLayout.adjustHpBackwards no longer bumps ALLOC_HEAP_ctr * I resurrected the SLOW_CALL counters * the new module StgCmmArgRep breaks cyclic dependency between Layout and Ticky (which the SLOW_CALL counters cause) * renamed them SLOW_CALL_fast_<pattern> and VERY_SLOW_CALL * added ALLOC_RTS_ctr and _tot ticky counters * eg allocation by Storage.c:allocate or a BUILD_PAP in stg_ap_*_info * resurrected ticky counters for ALLOC_THK, ALLOC_PAP, and ALLOC_PRIM * added -ticky and -DTICKY_TICKY in ways.mk for debug ways * added a ticky counter for total LNE entries * new flags for ticky: -ticky-allocd -ticky-dyn-thunk -ticky-LNE * all off by default * -ticky-allocd: tracks allocation *of* closure in addition to allocation *by* that closure * -ticky-dyn-thunk tracks dynamic thunks as if they were functions * -ticky-LNE tracks LNEs as if they were functions * updated the ticky report format, including making the argument categories (more?) accurate again * the printed name for things in the report include the unique of their ticky parent as well as if they are not top-level
-
- 16 Nov, 2012 1 commit
-
-
Simon Marlow authored
This improves GC performance when there are a lot of TVars in the heap. For instance, a TChan with a lot of elements causes a massive GC drag without this patch. There's more to do - several other STM closure types don't have write barriers, so GC performance when there are a lot of threads blocked on STM isn't great. But fixing the problem for TVar is a good start.
-
- 05 Nov, 2012 1 commit
-
-
Simon Marlow authored
-
- 19 Oct, 2012 1 commit
-
-
Simon Marlow authored
-
- 08 Oct, 2012 1 commit
-
-
Simon Marlow authored
The main change here is that the Cmm parser now allows high-level cmm code with argument-passing and function calls. For example: foo ( gcptr a, bits32 b ) { if (b > 0) { // we can make tail calls passing arguments: jump stg_ap_0_fast(a); } return (x,y); } More details on the new cmm syntax are in Note [Syntax of .cmm files] in CmmParse.y. The old syntax is still more-or-less supported for those occasional code fragments that really need to explicitly manipulate the stack. However there are a couple of differences: it is now obligatory to give a list of live GlobalRegs on every jump, e.g. jump %ENTRY_CODE(Sp(0)) [R1]; Again, more details in Note [Syntax of .cmm files]. I have rewritten most of the .cmm files in the RTS into the new syntax, except for AutoApply.cmm which is generated by the genapply program: this file could be generated in the new syntax instead and would probably be better off for it, but I ran out of enthusiasm. Some other changes in this batch: - The PrimOp calling convention is gone, primops now use the ordinary NativeNodeCall convention. This means that primops and "foreign import prim" code must be written in high-level cmm, but they can now take more than 10 arguments. - CmmSink now does constant-folding (should fix #7219) - .cmm files now go through the cmmPipeline, and as a result we generate better code in many cases. All the object files generated for the RTS .cmm files are now smaller. Performance should be better too, but I haven't measured it yet. - RET_DYN frames are removed from the RTS, lots of code goes away - we now have some more canned GC points to cover unboxed-tuples with 2-4 pointers, which will reduce code size a little.
-
- 29 Nov, 2011 1 commit
-
-
Simon Marlow authored
This means that both time and heap profiling work for parallel programs. Main internal changes: - CCCS is no longer a global variable; it is now another pseudo-register in the StgRegTable struct. Thus every Capability has its own CCCS. - There is a new built-in CCS called "IDLE", which records ticks for Capabilities in the idle state. If you profile a single-threaded program with +RTS -N2, you'll see about 50% of time in "IDLE". - There is appropriate locking in rts/Profiling.c to protect the shared cost-centre-stack data structures. This patch does enough to get it working, I have cut one big corner: the cost-centre-stack data structure is still shared amongst all Capabilities, which means that multiple Capabilities will race when updating the "allocations" and "entries" fields of a CCS. Not only does this give unpredictable results, but it runs very slowly due to cache line bouncing. It is strongly recommended that you use -fno-prof-count-entries to disable the "entries" count when profiling parallel programs. (I shall add a note to this effect to the docs).
-
- 16 Nov, 2011 1 commit
-
-
Simon Marlow authored
-
- 14 Nov, 2011 1 commit
-
-
Simon Marlow authored
-
- 02 Nov, 2011 1 commit
-
-
Simon Marlow authored
User visible changes ==================== Profilng -------- Flags renamed (the old ones are still accepted for now): OLD NEW --------- ------------ -auto-all -fprof-auto -auto -fprof-exported -caf-all -fprof-cafs New flags: -fprof-auto Annotates all bindings (not just top-level ones) with SCCs -fprof-top Annotates just top-level bindings with SCCs -fprof-exported Annotates just exported bindings with SCCs -fprof-no-count-entries Do not maintain entry counts when profiling (can make profiled code go faster; useful with heap profiling where entry counts are not used) Cost-centre stacks have a new semantics, which should in most cases result in more useful and intuitive profiles. If you find this not to be the case, please let me know. This is the area where I have been experimenting most, and the current solution is probably not the final version, however it does address all the outstanding bugs and seems to be better than GHC 7.2. Stack traces ------------ +RTS -xc now gives more information. If the exception originates from a CAF (as is common, because GHC tends to lift exceptions out to the top-level), then the RTS walks up the stack and reports the stack in the enclosing update frame(s). Result: +RTS -xc is much more useful now - but you still have to compile for profiling to get it. I've played around a little with adding 'head []' to GHC itself, and +RTS -xc does pinpoint the problem quite accurately. I plan to add more facilities for stack tracing (e.g. in GHCi) in the future. Coverage (HPC) -------------- * derived instances are now coloured yellow if they weren't used * likewise record field names * entry counts are more accurate (hpc --fun-entry-count) * tab width is now correct (markup was previously off in source with tabs) Internal changes ================ In Core, the Note constructor has been replaced by Tick (Tickish b) (Expr b) which is used to represent all the kinds of source annotation we support: profiling SCCs, HPC ticks, and GHCi breakpoints. Depending on the properties of the Tickish, different transformations apply to Tick. See CoreUtils.mkTick for details. Tickets ======= This commit closes the following tickets, test cases to follow: - Close #2552: not a bug, but the behaviour is now more intuitive (test is T2552) - Close #680 (test is T680) - Close #1531 (test is result001) - Close #949 (test is T949) - Close #2466: test case has bitrotted (doesn't compile against current version of vector-space package)
-
- 01 Sep, 2011 1 commit
-
-
Simon Marlow authored
maskUninterruptible state instead of ordinary mask, due to a misinterpretation of the way the TSO_INTERRUPTIBLE flag works. Remarkably this must have been broken for quite some time. Indeed we even had a test that demonstrated the wrong behaviour (conc015a) but presumably I didn't look hard enough at the output to notice that it was wrong.
-
- 15 Dec, 2010 1 commit
-
-
Simon Marlow authored
This patch makes two changes to the way stacks are managed: 1. The stack is now stored in a separate object from the TSO. This means that it is easier to replace the stack object for a thread when the stack overflows or underflows; we don't have to leave behind the old TSO as an indirection any more. Consequently, we can remove ThreadRelocated and deRefTSO(), which were a pain. This is obviously the right thing, but the last time I tried to do it it made performance worse. This time I seem to have cracked it. 2. Stacks are now represented as a chain of chunks, rather than a single monolithic object. The big advantage here is that individual chunks are marked clean or dirty according to whether they contain pointers to the young generation, and the GC can avoid traversing clean stack chunks during a young-generation collection. This means that programs with deep stacks will see a big saving in GC overhead when using the default GC settings. A secondary advantage is that there is much less copying involved as the stack grows. Programs that quickly grow a deep stack will see big improvements. In some ways the implementation is simpler, as nothing special needs to be done to reclaim stack as the stack shrinks (the GC just recovers the dead stack chunks). On the other hand, we have to manage stack underflow between chunks, so there's a new stack frame (UNDERFLOW_FRAME), and we now have separate TSO and STACK objects. The total amount of code is probably about the same as before. There are new RTS flags: -ki<size> Sets the initial thread stack size (default 1k) Egs: -ki4k -ki2m -kc<size> Sets the stack chunk size (default 32k) -kb<size> Sets the stack chunk buffer size (default 1k) -ki was previously called just -k, and the old name is still accepted for backwards compatibility. These new options are documented.
-
- 23 Oct, 2010 1 commit
-
-
Ian Lynagh authored
-
- 24 Sep, 2010 1 commit
-
-
Simon Marlow authored
-
- 08 Jul, 2010 1 commit
-
-
Simon Marlow authored
As discussed on the libraries/haskell-cafe mailing lists http://www.haskell.org/pipermail/libraries/2010-April/013420.html This is a replacement for block/unblock in the asychronous exceptions API to fix a problem whereby a function could unblock asynchronous exceptions even if called within a blocked context. The new terminology is "mask" rather than "block" (to avoid confusion due to overloaded meanings of the latter). In GHC, we changed the names of some primops: blockAsyncExceptions# -> maskAsyncExceptions# unblockAsyncExceptions# -> unmaskAsyncExceptions# asyncExceptionsBlocked# -> getMaskingState# and added one new primop: maskUninterruptible# See the accompanying patch to libraries/base for the API changes.
-
- 05 May, 2010 1 commit
-
-
Simon Marlow authored
-
- 11 Mar, 2010 1 commit
-
-
Simon Marlow authored
This replaces some complicated locking schemes with message-passing in the implementation of throwTo. The benefits are - previously it was impossible to guarantee that a throwTo from a thread running on one CPU to a thread running on another CPU would be noticed, and we had to rely on the GC to pick up these forgotten exceptions. This no longer happens. - the locking regime is simpler (though the code is about the same size) - threads can be unblocked from a blocked_exceptions queue without having to traverse the whole queue now. It's a rare case, but replaces an O(n) operation with an O(1). - generally we move in the direction of sharing less between Capabilities (aka HECs), which will become important with other changes we have planned. Also in this patch I replaced several STM-specific closure types with a generic MUT_PRIM closure type, which allowed a lot of code in the GC and other places to go away, hence the line-count reduction. The message-passing changes resulted in about a net zero line-count difference.
-
- 14 Oct, 2009 1 commit
-
-
Simon Marlow authored
While fixing #3578 I noticed that this function was just a field access to StgTRecHeader, so I inlined it manually.
-
- 18 Aug, 2009 1 commit
-
-
Simon Marlow authored
There were two bugs, and had it not been for the first one we would not have noticed the second one, so this is quite fortunate. The first bug is in stg_unblockAsyncExceptionszh_ret, when we found a pending exception to raise, but don't end up raising it, there was a missing adjustment to the stack pointer. The second bug was that this case was actually happening at all: it ought to be incredibly rare, because the pending exception thread would have to be killed between us finding it and attempting to raise the exception. This made me suspicious. It turned out that there was a race condition on the tso->flags field; multiple threads were updating this bitmask field non-atomically (one of the bits is the dirty-bit for the generational GC). The fix is to move the dirty bit into its own field of the TSO, making the TSO one word larger (sadly).
-
- 03 Aug, 2009 2 commits
-
-
Simon Marlow authored
For consistency with other RTS exported symbols
-
Simon Marlow authored
amazing this hasn't caused any problems before now
-
- 16 Jun, 2009 1 commit
-
-
Simon Marlow authored
See comments for details
-
- 10 Jun, 2009 1 commit
-
-
Duncan Coutts authored
It using the mp_tmp_w register/global as a convenient temporary variable. This is naughty because those vars are supposed to be for gmp. Also, we want to remove the gmp temp vars so we must now use a local stack slot instead.
-
- 14 Aug, 2008 1 commit
-
-
dias@eecs.harvard.edu authored
This merge does not turn on the new codegen (which only compiles a select few programs at this point), but it does introduce some changes to the old code generator. The high bits: 1. The Rep Swamp patch is finally here. The highlight is that the representation of types at the machine level has changed. Consequently, this patch contains updates across several back ends. 2. The new Stg -> Cmm path is here, although it appears to have a fair number of bugs lurking. 3. Many improvements along the CmmCPSZ path, including: o stack layout o some code for infotables, half of which is right and half wrong o proc-point splitting
-
- 28 Jul, 2008 1 commit
-
-
Simon Marlow authored
When returning an unboxed tuple with a single non-void component, we now use the same calling convention as for returning a value of the same type as that component. This means that the return convention for IO now doesn't vary depending on the platform, which make some parts of the RTS simpler, and fixes a problem I was having with making the FFI work in unregisterised GHCi (the byte-code compiler makes some assumptions about calling conventions to keep things simple).
-
- 09 Jul, 2008 1 commit
-
-
Simon Marlow authored
-
- 17 Apr, 2008 1 commit
-
-
simonmarhaskell@gmail.com authored
-
- 23 May, 2008 1 commit
-
-
Ian Lynagh authored
This fixes a segfault in #1657
-
- 02 Apr, 2008 1 commit
-
-
Simon Marlow authored
This has several advantages: - -fvia-C is consistent with -fasm with respect to FFI declarations: both bind to the ABI, not the API. - foreign calls can now be inlined freely across module boundaries, since a header file is not required when compiling the call. - bootstrapping via C will be more reliable, because this difference in behavour between the two backends has been removed. There is one disadvantage: - we get no checking by the C compiler that the FFI declaration is correct. So now, the c-includes field in a .cabal file is always ignored by GHC, as are header files specified in an FFI declaration. This was previously the case only for -fasm compilations, now it is also the case for -fvia-C too.
-
- 23 Mar, 2008 1 commit
-
-
Ian Lynagh authored
Integer, Bool and Unit/Inl/Inr are now in new packages integer and ghc-prim.
-
- 17 Dec, 2007 1 commit
-
-
Simon Marlow authored
-
- 18 Oct, 2007 1 commit
-
-
Simon Marlow authored
-
- 10 Aug, 2007 1 commit
-
-
Clemens Fruhwirth authored
Properly guard imports because they have to be precise on Windows and Darwin sets __PIC__ automatically
-
- 06 Aug, 2007 1 commit
-
-
Clemens Fruhwirth authored
-
- 27 Jun, 2007 1 commit
-
-
Michael D. Adams authored
-