GHC issueshttps://gitlab.haskell.org/ghc/ghc/-/issues2019-07-07T18:37:48Zhttps://gitlab.haskell.org/ghc/ghc/-/issues/10046Linker script patch in rts/Linker.c doesn't work for (non-C or non-en..) locales2019-07-07T18:37:48ZHoward B. GoldenLinker script patch in rts/Linker.c doesn't work for (non-C or non-en..) localesPlease see [ticket:2615\#comment:95729](https://gitlab.haskell.org//ghc/ghc/issues/2615#note_95729) and replies.
A bug is illustrated by this Haskell program:
```
import ObjLink
import Foreign
import Foreign.C.Types
import Foreign.C.St...Please see [ticket:2615\#comment:95729](https://gitlab.haskell.org//ghc/ghc/issues/2615#note_95729) and replies.
A bug is illustrated by this Haskell program:
```
import ObjLink
import Foreign
import Foreign.C.Types
import Foreign.C.String
foreign import ccall "setlocale" c_setlocale :: CInt -> CString -> IO CString
main = do
withCString "zh_CN.UTF-8" $ \lc -> c_setlocale 5 lc
r <- loadDLL "/usr/lib/libc.so"
putStrLn (show r)
```
which outputs:
```
Just "/usr/lib/libc.so: \26080\25928\30340 ELF \22836"
```
The "\\26080\\25928\\30340 ELF \\22836" part is "无效的ELF头" in Chinese.
This error only occurs on systems where linker scripts are used. The linker script patch (as it has evolved) assumes that the error messages it will receive are in English. This would be true if the locale (LC_MESSAGES) is C or en (or one of the en variants). However, in other locales, the message will be in a different language. Unfortunately, the semantics of POSIX dlerror() specify that the error is returned as a pointer to a human-readable text string, rather than an error code. The string returned depends on the locale.
The code could be made more robust by momentarily changing the locale (LC_MESSAGES) to C before calling dlerror() and reverting it to its previous value immediately after. This has been tested on a zh_CN.utf-8 (see [ticket:2615\#comment:95752](https://gitlab.haskell.org//ghc/ghc/issues/2615#note_95752)) and works. The only concern I have is in the case of multithreaded code that _might_ be affected if it is running while the locale is changed. I don't know enough to know if this is a real issue or not, nor do I know how to deal with it if necessary.
Also see #9237 for another corner case in the linker script code that should be dealt with at the same time.8.0.1Simon MarlowSimon Marlowhttps://gitlab.haskell.org/ghc/ghc/-/issues/8733I/O manager causes unnecessary syscalls in send/recv loops2019-07-07T18:43:44ZtibbeI/O manager causes unnecessary syscalls in send/recv loopsNetwork applications often call `send` followed by `recv`, to send a message and then read an answer. This causes syscall traces like this one:
```
recvfrom(13, ) -- Haskell thread A
sendto(13, ) -- Haske...Network applications often call `send` followed by `recv`, to send a message and then read an answer. This causes syscall traces like this one:
```
recvfrom(13, ) -- Haskell thread A
sendto(13, ) -- Haskell thread A
recvfrom(13, ) = -1 EAGAIN -- Haskell thread A
epoll_ctl(3, ) -- Haskell thread A (a job for the IO manager)
recvfrom(14, ) -- Haskell thread B
sendto(14, ) -- Haskell thread B
recvfrom(14, ) = -1 EAGAIN -- Haskell thread B
epoll_ctl(3, ) -- Haskell thread B (a job for the IO manager)
```
The `recvfrom` call always fails, as the response from the partner we're communicating with won't be available right after we send the request.
We ought to consider descheduling the thread as soon as sending is "done". The hard part is to figure out when that is.
See http://www.yesodweb.com/blog/2014/02/new-warp for a real world example.8.0.1https://gitlab.haskell.org/ghc/ghc/-/issues/8732Global big object heap allocator lock causes contention2019-07-07T18:43:44ZtibbeGlobal big object heap allocator lock causes contentionThe lock `allocate` takes when allocating big objects hurts scalability of I/O bound application. `Network.Socket.ByteString.recv` is typically called with a buffer size of 4096, which causes a `ByteString` of that size to be allocated. ...The lock `allocate` takes when allocating big objects hurts scalability of I/O bound application. `Network.Socket.ByteString.recv` is typically called with a buffer size of 4096, which causes a `ByteString` of that size to be allocated. The size of this `ByteString` causes it to be allocated from the big object space, which causes contention of the global lock that guards that space.
See http://www.yesodweb.com/blog/2014/02/new-warp for a real world example.8.0.1Simon MarlowSimon Marlowhttps://gitlab.haskell.org/ghc/ghc/-/issues/8623Strange slowness when using async library with FFI callbacks2021-03-31T03:41:02ZJohnWiegleyStrange slowness when using async library with FFI callbacksI've attached a Haskell and a C file, when compiled as such:
```
ghc -DSPEED_BUG=0 -threaded -O2 -main-is SpeedTest SpeedTest.hs SpeedTestC.c
```
You should find that with 7.4.2, 7.6.3 or a recent build of 7.8, building with `SPEED_BUG...I've attached a Haskell and a C file, when compiled as such:
```
ghc -DSPEED_BUG=0 -threaded -O2 -main-is SpeedTest SpeedTest.hs SpeedTestC.c
```
You should find that with 7.4.2, 7.6.3 or a recent build of 7.8, building with `SPEED_BUG=0` produces an executable that takes more than a second to run, while building with `SPEED_BUG=1` runs very quickly. I've also attached the Core for both scenarios.8.0.1Simon MarlowSimon Marlowhttps://gitlab.haskell.org/ghc/ghc/-/issues/8303defer StackOverflow exceptions (rather than dropping them) when exceptions ar...2019-07-07T18:45:39Zrwbartondefer StackOverflow exceptions (rather than dropping them) when exceptions are maskedSee http://www.reddit.com/r/haskell/comments/1luan1/strange_io_sequence_behaviour/ for a very simple program (`main'`) that accidentally evades the stack size limit, running to completion even though it has allocated hundreds of megabyte...See http://www.reddit.com/r/haskell/comments/1luan1/strange_io_sequence_behaviour/ for a very simple program (`main'`) that accidentally evades the stack size limit, running to completion even though it has allocated hundreds of megabytes of stack chunks, and [my comment](http://www.reddit.com/r/haskell/comments/1luan1/strange_io_sequence_behaviour/cc32ec4) for an explanation of this behavior.
ryani suggested that when a thread exceeds its stack limit but it is currently blocking exceptions, the RTS shouldn't simply drop the `StackOverflow` exception, but rather deliver it when the `mask` operation completes. That sounds sensible to me and it would give a nice guarantee that when any individual `mask` operation uses a small amount of stack, the stack size limit is approximately enforced.
(I know that the default stack size limit may go away or essentially go away, but it can still be nice when developing to use a small stack size limit, so that one's system isn't run into the ground by infinite recursion quickly gobbling up tons of memory.)
<details><summary>Trac metadata</summary>
| Trac field | Value |
| ---------------------- | -------------- |
| Version | 7.6.3 |
| Type | Bug |
| TypeOfFailure | OtherFailure |
| Priority | normal |
| Resolution | Unresolved |
| Component | Runtime System |
| Test case | |
| Differential revisions | |
| BlockedBy | |
| Related | |
| Blocking | |
| CC | |
| Operating system | |
| Architecture | |
</details>
<!-- {"blocked_by":[],"summary":"defer StackOverflow exceptions (rather than dropping them) when exceptions are masked","status":"New","operating_system":"","component":"Runtime System","related":[],"milestone":"","resolution":"Unresolved","owner":{"tag":"Unowned"},"version":"7.6.3","keywords":[],"differentials":[],"test_case":"","architecture":"","cc":[""],"type":"Bug","description":"See http://www.reddit.com/r/haskell/comments/1luan1/strange_io_sequence_behaviour/ for a very simple program (`main'`) that accidentally evades the stack size limit, running to completion even though it has allocated hundreds of megabytes of stack chunks, and [http://www.reddit.com/r/haskell/comments/1luan1/strange_io_sequence_behaviour/cc32ec4 my comment] for an explanation of this behavior.\r\n\r\nryani suggested that when a thread exceeds its stack limit but it is currently blocking exceptions, the RTS shouldn't simply drop the `StackOverflow` exception, but rather deliver it when the `mask` operation completes. That sounds sensible to me and it would give a nice guarantee that when any individual `mask` operation uses a small amount of stack, the stack size limit is approximately enforced.\r\n\r\n(I know that the default stack size limit may go away or essentially go away, but it can still be nice when developing to use a small stack size limit, so that one's system isn't run into the ground by infinite recursion quickly gobbling up tons of memory.)","type_of_failure":"OtherFailure","blocking":[]} -->8.0.1thoughtpolicethoughtpolicehttps://gitlab.haskell.org/ghc/ghc/-/issues/8238Implement unloading of shared libraries2023-03-27T12:22:37ZSimon MarlowImplement unloading of shared librariesIn #8039 we added support for unloading static objects from the runtime linker, with the GC detecting when there are no references left. We could add this functionality for shared libraries too, using `dl_iterate_phdr`.
\@heisenbug's co...In #8039 we added support for unloading static objects from the runtime linker, with the GC detecting when there are no references left. We could add this functionality for shared libraries too, using `dl_iterate_phdr`.
\@heisenbug's comment from #8039 with the relevant pointers:
[Eli Bendersky's article](http://eli.thegreenplace.net/2011/08/25/load-time-relocation-of-shared-libraries) suggests to use the [dl_iterate_phdr](http://linux.die.net/man/3/dl_iterate_phdr) function for finding information about loaded libraries. Seems to be linux only. There is a [workaround on OSX](http://stackoverflow.com/questions/10009043/dl-iterate-phdr-equivalent-on-mac), though, on !StackOverflow.
And here is how Böhm-Demers-Weiser's GC implement a [callback function for dl_iterate_phdr](https://github.com/ivmai/bdwgc/blob/master/dyn_load.c#L451).
<details><summary>Trac metadata</summary>
| Trac field | Value |
| ---------------------- | -------------- |
| Version | 7.7 |
| Type | Task |
| TypeOfFailure | OtherFailure |
| Priority | normal |
| Resolution | Unresolved |
| Component | Runtime System |
| Test case | |
| Differential revisions | |
| BlockedBy | |
| Related | |
| Blocking | |
| CC | |
| Operating system | |
| Architecture | |
</details>
<!-- {"blocked_by":[],"summary":"Implement unloading of shared libraries","status":"New","operating_system":"","component":"Runtime System","related":[],"milestone":"7.10.1","resolution":"Unresolved","owner":{"tag":"Unowned"},"version":"7.7","keywords":[],"differentials":[],"test_case":"","architecture":"","cc":[""],"type":"Task","description":"In #8039 we added support for unloading static objects from the runtime linker, with the GC detecting when there are no references left. We could add this functionality for shared libraries too, using `dl_iterate_phdr`. \r\n\r\n@heisenbug's comment from #8039 with the relevant pointers:\r\n\r\n[http://eli.thegreenplace.net/2011/08/25/load-time-relocation-of-shared-libraries Eli Bendersky's article] suggests to use the [http://linux.die.net/man/3/dl_iterate_phdr dl_iterate_phdr] function for finding information about loaded libraries. Seems to be linux only. There is a [http://stackoverflow.com/questions/10009043/dl-iterate-phdr-equivalent-on-mac workaround on OSX], though, on !StackOverflow.\r\n\r\nAnd here is how Böhm-Demers-Weiser's GC implement a [https://github.com/ivmai/bdwgc/blob/master/dyn_load.c#L451 callback function for dl_iterate_phdr].","type_of_failure":"OtherFailure","blocking":[]} -->8.0.1https://gitlab.haskell.org/ghc/ghc/-/issues/7670StablePtrs should be organized by generation for efficient minor collections2019-07-07T18:48:41ZEdward Z. YangStablePtrs should be organized by generation for efficient minor collectionsCurrently, stable pointers are all in one giant pointer table (see markStablePtrTable); this results in pretty bad GC behavior when you create a lot of stable pointers (Peaker has a test-case which he thinks is suffering due to repeated ...Currently, stable pointers are all in one giant pointer table (see markStablePtrTable); this results in pretty bad GC behavior when you create a lot of stable pointers (Peaker has a test-case which he thinks is suffering due to repeated traversal of the stable pointers list.) We should partition them up into generations like we do for mutable lists. There might be some trickiness keeping the table up-to-date after GCs.
<details><summary>Trac metadata</summary>
| Trac field | Value |
| ---------------------- | -------------- |
| Version | 7.7 |
| Type | Bug |
| TypeOfFailure | OtherFailure |
| Priority | normal |
| Resolution | Unresolved |
| Component | Runtime System |
| Test case | |
| Differential revisions | |
| BlockedBy | |
| Related | |
| Blocking | |
| CC | |
| Operating system | |
| Architecture | |
</details>
<!-- {"blocked_by":[],"summary":"StablePtrs should be organized by generation for efficient minor collections","status":"New","operating_system":"","component":"Runtime System","related":[],"milestone":"","resolution":"Unresolved","owner":{"tag":"Unowned"},"version":"7.7","keywords":[],"differentials":[],"test_case":"","architecture":"","cc":[""],"type":"Bug","description":"Currently, stable pointers are all in one giant pointer table (see markStablePtrTable); this results in pretty bad GC behavior when you create a lot of stable pointers (Peaker has a test-case which he thinks is suffering due to repeated traversal of the stable pointers list.) We should partition them up into generations like we do for mutable lists. There might be some trickiness keeping the table up-to-date after GCs.","type_of_failure":"OtherFailure","blocking":[]} -->8.0.1https://gitlab.haskell.org/ghc/ghc/-/issues/7606Stride scheduling for Haskell threads with priorities2019-07-07T18:49:01ZEdward Z. YangStride scheduling for Haskell threads with prioritiesCurrently, GHC uses a round-robin scheduler for Haskell threads, with some heuristics for when threads should be bumped to the front of the queue. This patch set replaces this scheduler with 'stride scheduling', as described by [Waldspur...Currently, GHC uses a round-robin scheduler for Haskell threads, with some heuristics for when threads should be bumped to the front of the queue. This patch set replaces this scheduler with 'stride scheduling', as described by [Waldspurger and Weihl '95](http://read.seas.harvard.edu/~kohler/class/aosref/waldspurger95stride.pdf), which is an efficient, deterministic method for scheduling processes with differing priorities. Priorities are assigned by giving 'tickets' to threads; a thread with twice as many tickets as another will run twice as often. I’d like to replace the round-robin scheduler completely with this scheduler.
Here are nofib benchmarks comparing the old scheduler to the new:
```
--------------------------------------------------------------------------------
Program Size Allocs Runtime Elapsed TotalMem
--------------------------------------------------------------------------------
Min -0.0% -52.2% -18.8% -18.6% -83.1%
Max +1.0% +4.2% +4.9% +5.1% +7.7%
Geometric Mean +0.1% -2.8% -0.9% -0.9% -2.9%
```
Here are some technical details:
- Without any changes to ticket values, scheduling behavior should be functionally identical to round-robin. (By default, new threads, including the IO thread, get allocated the max nubmer of tickets possible.) This is not quite identical, since our heap does not have FIFO property (see below.)
- The current patch-set uses a very simple (e.g. undergrad level) resizable-array backed heap to implement the priority queue; we can play some tricks to reduce the memory footprint of the priority queue (e.g. using the container_of macro to eliminate the extra keys store); and a more fancy data structure would make it easier for us to surgically remove entries or reweight them, but I wanted to keep overhead low. If anyone has a pet implementation of priority queues in C feel free to swap it in. Right now, this only affects the uses of promoteInRunQueue() in Messages.c; I still need to check if #3838 has regressed.
- We get three new primops: `setTickets#`, `getTickets#` and `modifyTickets#`. We don't support creating threads with specific numbers of tickets (mostly because that would have added an annoyingly large set of extra primops); instead, you're expected to spawn a thread which gets max-ticket allocation, and then weight it down.
- `_link` is no longer used for linking TSOs in the run queue. I have tried my best to stamp out any code which operated on this assumption, but I may have missed some.
- Modifying a TSO's tickets takes out the scheduler lock; the hope is that this operation is quick and rare enough that a global lock here is not too bad.
- We are unfortunately stuck with some magic constants w.r.t. ticket values: 1 \<\< 20 is the maximum number of tickets our implementation is hard-coded to support.
- Sleeping and blocked tasks do not get any 'bonus' for being blocked.
- In an ideal world, when a thread hits a black hole, it should temporarily give its tickets to the thread evaluating the black hole, so it will unblock more quickly. Unfortunately, implementing this is pretty complicated (the blackhole evaluating thread could die; or it could get stuck on a blackhole itself and need to gift its tickets; it shouldn't be able to give away the tickets it was gifted.) So this implementation leaves that out. Similar semantics for MVars are probably possible, but will require userland assistance too.
I haven't prepared a patch to base yet with a user-level API, but here is a proposed draft:
```
type Tickets = Int
-- | Maximum number of tickets we support a thread having. (Currently 2 >> 20)
-- Note that this doesn't bound the *global* maximum tickets.
maxTickets :: Tickets
-- | Changes the number of tickets allocated to a thread. The ticket count must
-- not be less than or equal to zero, or greater than maxTickets. (Corresponds
-- to setTickets# primop)
setTickets :: ThreadId -> Tickets -> IO ()
-- | Returns the number of tickets currently allocated to a thread. (Corresponds to
-- getTickets# primop)
getTickets :: ThreadId -> IO Tickets
-- | Atomically performs a linear transformation on the number of tickets a thread;
-- e.g. scaling the number of tickets by a rational number, adding another fixed
-- set of tickets, and then returning the number of 'leftover' tickets from the operation; e.g.
-- if the net amount of tickets is reduced, then the returned result is positive;
-- if the net amount of tickets is increased, the returned result is negative.
-- In the absence of concurrent threads, the following property holds forall
-- t, m and x:
--
-- do r <- getTickets t
-- d <- scaleTickets t m x
-- r' <- getTickets t
-- return (r == r' + d)
--
-- If the scaling would reduce the number of tickets below zero or increase the
-- number of tickets beyond maxTickets, this function will instead fail with
-- an exception. This function will be subject to some rounding error; powers of two
-- are, however, likely to be exact. (Corresponds to modifyTickets# primop; note
-- that the sentinel value for failure is maxTickets + 1, since it is impossible for
-- a thread's ticket value to change by that much.)
modifyTickets :: ThreadId -> Ratio Int -> Tickets -> IO Tickets
-- | Forks a new thread, transferring some percentage of tickets from the current
-- thread to it (so the net number of tickets stays constant.) Fails if the rational
-- is greater than 1 or less than or equal to zero, or if there are not enough tickets
-- in the current thread.
forkIOWith :: IO a -> Ratio Int -> IO ThreadId
```8.0.1Edward Z. YangEdward Z. Yanghttps://gitlab.haskell.org/ghc/ghc/-/issues/7300Allow CAFs kept reachable by FFI to be forcibly made unreachable for GC2019-07-07T18:50:28ZabsenceAllow CAFs kept reachable by FFI to be forcibly made unreachable for GCCAFs used by a foreign exported function are kept reachable the entire session because GHC can't know when the function will be called from C. If such a CAF is an evolving expression, like an FRP network, a space leak occurs because (I'm...CAFs used by a foreign exported function are kept reachable the entire session because GHC can't know when the function will be called from C. If such a CAF is an evolving expression, like an FRP network, a space leak occurs because (I'm guessing) the thunks that build up during iteration go all the way back to the initial CAF, and the GC can't start collecting because it considers the CAF reachable. According to JaffaCake on the \#haskell IRC channel, the runtime is capable of sovling this problem, it just needs a function that tells it to consider the specific CAF unreachable. It is then the responsibility of the user to not call the foreign exported function after the CAF is forced unreachable.
<details><summary>Trac metadata</summary>
| Trac field | Value |
| ---------------------- | -------------- |
| Version | 7.4.1 |
| Type | FeatureRequest |
| TypeOfFailure | OtherFailure |
| Priority | normal |
| Resolution | Unresolved |
| Component | Compiler (FFI) |
| Test case | |
| Differential revisions | |
| BlockedBy | |
| Related | |
| Blocking | |
| CC | |
| Operating system | |
| Architecture | |
</details>
<!-- {"blocked_by":[],"summary":"Allow CAFs kept reachable by FFI to be forcibly made unreachable for GC","status":"New","operating_system":"","component":"Compiler (FFI)","related":[],"milestone":"","resolution":"Unresolved","owner":{"tag":"Unowned"},"version":"7.4.1","keywords":["caf","gc","unsafe"],"differentials":[],"test_case":"","architecture":"","cc":[""],"type":"FeatureRequest","description":"CAFs used by a foreign exported function are kept reachable the entire session because GHC can't know when the function will be called from C. If such a CAF is an evolving expression, like an FRP network, a space leak occurs because (I'm guessing) the thunks that build up during iteration go all the way back to the initial CAF, and the GC can't start collecting because it considers the CAF reachable. According to JaffaCake on the #haskell IRC channel, the runtime is capable of sovling this problem, it just needs a function that tells it to consider the specific CAF unreachable. It is then the responsibility of the user to not call the foreign exported function after the CAF is forced unreachable.","type_of_failure":"OtherFailure","blocking":[]} -->8.0.1https://gitlab.haskell.org/ghc/ghc/-/issues/5219need a version of hs_init that returns an error code for command-line errors2019-07-07T18:56:22ZSimon Marlowneed a version of hs_init that returns an error code for command-line errorsThis ticket is extracted from Roman's comment in #4464:
`hs_init` simply aborts if it doesn't like the RTS arguments which is quite unhelpful for dynamic libraries. I took me a day to find out that an application crash was caused by a f...This ticket is extracted from Roman's comment in #4464:
`hs_init` simply aborts if it doesn't like the RTS arguments which is quite unhelpful for dynamic libraries. I took me a day to find out that an application crash was caused by a failing hs_init (because of the -rtsopts problem). I would like to add a check for this to our code but there doesn't seem to be a way to do this. It would be much nicer if hs_init returned a failure/success code, at least for dynamic libraries.
To which I responded:
If hs_init needs to return an error condition rather than aborting, then we need to define a new API for that, and fix setupRtsFlags. I don't think we need to do this for every case of stg_exit and certainly not for barf: these are all internal error conditions and we have no sensible way to recover.
<details><summary>Trac metadata</summary>
| Trac field | Value |
| ---------------------- | -------------- |
| Version | 7.0.3 |
| Type | FeatureRequest |
| TypeOfFailure | OtherFailure |
| Priority | normal |
| Resolution | Unresolved |
| Component | Runtime System |
| Test case | |
| Differential revisions | |
| BlockedBy | |
| Related | |
| Blocking | |
| CC | rl |
| Operating system | |
| Architecture | |
</details>
<!-- {"blocked_by":[],"summary":"need a version of hs_init that returns an error code for command-line errors","status":"New","operating_system":"","component":"Runtime System","related":[],"milestone":"7.4.1","resolution":"Unresolved","owner":{"tag":"Unowned"},"version":"7.0.3","keywords":[],"differentials":[],"test_case":"","architecture":"","cc":["rl"],"type":"FeatureRequest","description":"This ticket is extracted from Roman's comment in #4464:\r\n\r\n`hs_init` simply aborts if it doesn't like the RTS arguments which is quite unhelpful for dynamic libraries. I took me a day to find out that an application crash was caused by a failing hs_init (because of the -rtsopts problem). I would like to add a check for this to our code but there doesn't seem to be a way to do this. It would be much nicer if hs_init returned a failure/success code, at least for dynamic libraries. \r\n\r\nTo which I responded:\r\n\r\nIf hs_init needs to return an error condition rather than aborting, then we need to define a new API for that, and fix setupRtsFlags. I don't think we need to do this for every case of stg_exit and certainly not for barf: these are all internal error conditions and we have no sensible way to recover.\r\n","type_of_failure":"OtherFailure","blocking":[]} -->8.0.1https://gitlab.haskell.org/ghc/ghc/-/issues/5197Support static linker semantics for archives and weak symbols2021-11-11T17:51:13ZduncanSupport static linker semantics for archives and weak symbolsWhile looking at #5004, I had a go at getting the GHCi linker to load the LLVM libs, as in making this work: `ghci -package llvm`.
This involves loading a whole bunch of LLVM\*.a files and linking them against the C and C++ standard lib...While looking at #5004, I had a go at getting the GHCi linker to load the LLVM libs, as in making this work: `ghci -package llvm`.
This involves loading a whole bunch of LLVM\*.a files and linking them against the C and C++ standard libs. This in turn requires a few more features in the GHCi linker:
- support for weak symbols
- search archives only on demand
- support .oS files in archives
The last is trivial.
The first is not so hard to do. When the linker finds a definition of a weak symbol, if there's already a symbol with that name in the symbol table then we just ignore the new one. When resolving symbols if we find an unresolved weak symbol we just give it the value zero. Doing this is enough to load .e.g libstdc++.a and libc.a (rather than libc.so).
The next part is a bit more work. When you give system static linker some .a files, it only uses them to provide definitions for unresolved symbols in the main target. In particular it is fine for two .a files to provide definitions of the same symbol because the linker just looks for the first. (This is in contrast to duplicate symbols in the main .o input files). On linux, both libm and libc define some of the same symbols, such as `__isinf`.
Similarly, there is a problem that the GHCi linker predefines the `atexit` symbol, but that is also defined by `libc.a`.
So for the GHCi linker to load both libm and libc then it has to follow the standard semantics for linking archives. Currently it treats an archive as a request to link all the .o files in that archive.
Probably it is not sensible to go as far as implementing the full standard static linking semantics in the GHCi linker. It's of somewhat questionable value since we mainly need it for linking Haskell code, not C/C++ code.
Nevertheless, if we do need this, then the attached Linker.c has a partial implementation. It does the weak symbols and `.oS` files in archives.
<details><summary>Trac metadata</summary>
| Trac field | Value |
| ---------------------- | -------------- |
| Version | 7.0.3 |
| Type | FeatureRequest |
| TypeOfFailure | OtherFailure |
| Priority | normal |
| Resolution | Unresolved |
| Component | GHCi |
| Test case | |
| Differential revisions | |
| BlockedBy | |
| Related | |
| Blocking | |
| CC | |
| Operating system | |
| Architecture | |
</details>
<!-- {"blocked_by":[],"summary":"Support static linker semantics for archives and weak symbols","status":"New","operating_system":"","component":"GHCi","related":[],"milestone":"","resolution":"Unresolved","owner":{"tag":"Unowned"},"version":"7.0.3","keywords":[],"differentials":[],"test_case":"","architecture":"","cc":[""],"type":"FeatureRequest","description":"While looking at #5004, I had a go at getting the GHCi linker to load the LLVM libs, as in making this work: `ghci -package llvm`.\r\n\r\nThis involves loading a whole bunch of LLVM*.a files and linking them against the C and C++ standard libs. This in turn requires a few more features in the GHCi linker:\r\n\r\n * support for weak symbols\r\n * search archives only on demand\r\n * support .oS files in archives\r\n\r\nThe last is trivial.\r\n\r\nThe first is not so hard to do. When the linker finds a definition of a weak symbol, if there's already a symbol with that name in the symbol table then we just ignore the new one. When resolving symbols if we find an unresolved weak symbol we just give it the value zero. Doing this is enough to load .e.g libstdc++.a and libc.a (rather than libc.so).\r\n\r\nThe next part is a bit more work. When you give system static linker some .a files, it only uses them to provide definitions for unresolved symbols in the main target. In particular it is fine for two .a files to provide definitions of the same symbol because the linker just looks for the first. (This is in contrast to duplicate symbols in the main .o input files). On linux, both libm and libc define some of the same symbols, such as `__isinf`.\r\n\r\nSimilarly, there is a problem that the GHCi linker predefines the `atexit` symbol, but that is also defined by `libc.a`.\r\n\r\nSo for the GHCi linker to load both libm and libc then it has to follow the standard semantics for linking archives. Currently it treats an archive as a request to link all the .o files in that archive.\r\n\r\nProbably it is not sensible to go as far as implementing the full standard static linking semantics in the GHCi linker. It's of somewhat questionable value since we mainly need it for linking Haskell code, not C/C++ code.\r\n\r\nNevertheless, if we do need this, then the attached Linker.c has a partial implementation. It does the weak symbols and `.oS` files in archives.","type_of_failure":"OtherFailure","blocking":[]} -->8.0.1https://gitlab.haskell.org/ghc/ghc/-/issues/5143Soft heap limit flag2019-07-07T18:56:42ZSimon MarlowSoft heap limit flagThis came up in discussion on IRC yesterday. The `+RTS -M<size>` flag does two things:
- it starts tuning the GC to be more frugal as we get closer to `<size>`, by enabling in-place compaction and making major GC more frequent (reducing...This came up in discussion on IRC yesterday. The `+RTS -M<size>` flag does two things:
- it starts tuning the GC to be more frugal as we get closer to `<size>`, by enabling in-place compaction and making major GC more frequent (reducing `-F` in effect).
- it stops the system with an out of memory error if memory usages gets too close to `<size>`
The problem is often you want the first but not the second, because you'd like to say to the RTS "try to use no more than 2GB, because after that we're into swap space", but you don't want the program to fail if the limit is exceeded.
<details><summary>Trac metadata</summary>
| Trac field | Value |
| ---------------------- | -------------- |
| Version | 7.0.3 |
| Type | Task |
| TypeOfFailure | OtherFailure |
| Priority | normal |
| Resolution | Unresolved |
| Component | Runtime System |
| Test case | |
| Differential revisions | |
| BlockedBy | |
| Related | |
| Blocking | |
| CC | |
| Operating system | |
| Architecture | |
</details>
<!-- {"blocked_by":[],"summary":"Soft heap limit flag","status":"New","operating_system":"","component":"Runtime System","related":[],"milestone":"7.2.1","resolution":"Unresolved","owner":{"tag":"OwnedBy","contents":"simonmar"},"version":"7.0.3","keywords":[],"differentials":[],"test_case":"","architecture":"","cc":[""],"type":"Task","description":"This came up in discussion on IRC yesterday. The `+RTS -M<size>` flag does two things:\r\n\r\n * it starts tuning the GC to be more frugal as we get closer to `<size>`, by enabling in-place compaction and making major GC more frequent (reducing `-F` in effect).\r\n\r\n * it stops the system with an out of memory error if memory usages gets too close to `<size>`\r\n\r\nThe problem is often you want the first but not the second, because you'd like to say to the RTS \"try to use no more than 2GB, because after that we're into swap space\", but you don't want the program to fail if the limit is exceeded.\r\n","type_of_failure":"OtherFailure","blocking":[]} -->8.0.1Simon MarlowSimon Marlowhttps://gitlab.haskell.org/ghc/ghc/-/issues/4913Make event tracing conditional on an RTS flag only2019-07-07T18:57:57ZtibbeMake event tracing conditional on an RTS flag onlyThe current event tracing mechanism is enabled at link time by linking in one of two different versions of a C function, one being a no-op function. We could allow users to enable/disable tracing using a RTS flag instead, making event lo...The current event tracing mechanism is enabled at link time by linking in one of two different versions of a C function, one being a no-op function. We could allow users to enable/disable tracing using a RTS flag instead, making event logging more convenient to use. At the same time we would introduce tracing levels.
Design:
Add a static memory location where the current tracing level is stored:
```
uint tracelevel = 0;
```
Modify `GHC.Exts.traceEvent` to read
```
foreign import ccall unsafe "&tracelevel" :: Ptr Word
traceEvent :: String -> IO ()
traceEvent msg =
if unsafePerformIO (peek tracelevel) > 0
then do
withCString msg $ \(Ptr p) -> IO $ \s ->
case traceEvent# p s of s' -> (# s', () #)
else return ()
```
and (optionally) add some more functions that log at different trace levels.
This should be no slower than the current system. With inlining this should result in a load and a branch at the call site. The load should be cheap as the value is likely to be in cache (as it never changes). The branch should be easy to predict as it's always the same. With some cooperation from the code generator we could make sure that the branch is always cheap.
<details><summary>Trac metadata</summary>
| Trac field | Value |
| ---------------------- | -------------- |
| Version | 7.0.1 |
| Type | FeatureRequest |
| TypeOfFailure | OtherFailure |
| Priority | normal |
| Resolution | Unresolved |
| Component | Runtime System |
| Test case | |
| Differential revisions | |
| BlockedBy | |
| Related | |
| Blocking | |
| CC | |
| Operating system | |
| Architecture | |
</details>
<!-- {"blocked_by":[],"summary":"Make event tracing conditional on an RTS flag only","status":"New","operating_system":"","component":"Runtime System","related":[],"milestone":"","resolution":"Unresolved","owner":{"tag":"Unowned"},"version":"7.0.1","keywords":[],"differentials":[],"test_case":"","architecture":"","cc":[""],"type":"FeatureRequest","description":"The current event tracing mechanism is enabled at link time by linking in one of two different versions of a C function, one being a no-op function. We could allow users to enable/disable tracing using a RTS flag instead, making event logging more convenient to use. At the same time we would introduce tracing levels.\r\n\r\nDesign:\r\n\r\nAdd a static memory location where the current tracing level is stored:\r\n\r\n{{{\r\nuint tracelevel = 0;\r\n}}}\r\n\r\nModify `GHC.Exts.traceEvent` to read\r\n\r\n{{{\r\nforeign import ccall unsafe \"&tracelevel\" :: Ptr Word\r\n\r\ntraceEvent :: String -> IO ()\r\ntraceEvent msg =\r\n if unsafePerformIO (peek tracelevel) > 0\r\n then do\r\n withCString msg $ \\(Ptr p) -> IO $ \\s ->\r\n case traceEvent# p s of s' -> (# s', () #)\r\n else return ()\r\n}}}\r\n\r\nand (optionally) add some more functions that log at different trace levels.\r\n\r\nThis should be no slower than the current system. With inlining this should result in a load and a branch at the call site. The load should be cheap as the value is likely to be in cache (as it never changes). The branch should be easy to predict as it's always the same. With some cooperation from the code generator we could make sure that the branch is always cheap.\r\n","type_of_failure":"OtherFailure","blocking":[]} -->8.0.1https://gitlab.haskell.org/ghc/ghc/-/issues/4824Windows: Dynamic linking doesn't work out-of-the-box2021-04-21T21:11:06ZOrphiWindows: Dynamic linking doesn't work out-of-the-boxI did this:
1. Download and install GHC 7.0.1
1. Compile `HelloWorld.hs` with the "-dynamic" flag.
1. Run the resulting binary.
When I run the program, I get a dialog box telling me that the program can't find the RTS DLL.
If I move t...I did this:
1. Download and install GHC 7.0.1
1. Compile `HelloWorld.hs` with the "-dynamic" flag.
1. Run the resulting binary.
When I run the program, I get a dialog box telling me that the program can't find the RTS DLL.
If I move the necessary DLLs to somewhere in the search path, the binary runs just fine. The bug is that the Windows installer for GHC does not do this itself.
(I've classified this bug as "runtime system", but really it's an installer bug, not an RTS bug.)
Related to this are two complications:
1. The documentation fails to describe how GHC tries to find dynamic libraries on Windows. (Section 4.12.4 of the User Guide only explains how this is done for Unix and Mac OS.)
1. The DLLs are scattered all over the place. If they were all in one folder, I could just add that to the search path. But they aren't.8.0.1https://gitlab.haskell.org/ghc/ghc/-/issues/4243Make a proper options parser for the RTS2022-12-16T03:52:35ZIan Lynagh <igloo@earth.li>Make a proper options parser for the RTSThe RTS options parsing is getting increasingly crufty, and the new `rtsOptsEnabled` options have made it even worse. We should really have a proper options parser.
<details><summary>Trac metadata</summary>
| Trac field | V...The RTS options parsing is getting increasingly crufty, and the new `rtsOptsEnabled` options have made it even worse. We should really have a proper options parser.
<details><summary>Trac metadata</summary>
| Trac field | Value |
| ---------------------- | -------------- |
| Version | 6.13 |
| Type | Task |
| TypeOfFailure | OtherFailure |
| Priority | high |
| Resolution | Unresolved |
| Component | Runtime System |
| Test case | |
| Differential revisions | |
| BlockedBy | |
| Related | |
| Blocking | |
| CC | |
| Operating system | |
| Architecture | |
</details>
<!-- {"blocked_by":[],"summary":"Make a proper options parser for the RTS","status":"New","operating_system":"","component":"Runtime System","related":[],"milestone":"7.4.1","resolution":"Unresolved","owner":{"tag":"Unowned"},"version":"6.13","keywords":[],"differentials":[],"test_case":"","architecture":"","cc":[""],"type":"Task","description":"The RTS options parsing is getting increasingly crufty, and the new `rtsOptsEnabled` options have made it even worse. We should really have a proper options parser.\r\n","type_of_failure":"OtherFailure","blocking":[]} -->8.0.1carlostomecarlostomehttps://gitlab.haskell.org/ghc/ghc/-/issues/3462New codegen: allocate large objects using allocateLocal()2019-07-07T19:03:45ZSimon MarlowNew codegen: allocate large objects using allocateLocal()See #3424.
In the new code generator, we should allocate large objects (larger than F \* block size, for some suitable fraction F) using the RTS `allocateLocal()` API rather than from the nursery. It works to allocate them from the nurs...See #3424.
In the new code generator, we should allocate large objects (larger than F \* block size, for some suitable fraction F) using the RTS `allocateLocal()` API rather than from the nursery. It works to allocate them from the nursery -- this is what GHC 6.12 does after the fix in #3424 -- but then they will not be treated as large objects and will be copied during GC. Also, the allocation is likely to fail, requiring a trip through the RTS to put a large enough block in the nursery to satisfy the allocation.
<details><summary>Trac metadata</summary>
| Trac field | Value |
| ---------------------- | ------------ |
| Version | 6.11 |
| Type | Task |
| TypeOfFailure | OtherFailure |
| Priority | normal |
| Resolution | Unresolved |
| Component | Compiler |
| Test case | |
| Differential revisions | |
| BlockedBy | |
| Related | |
| Blocking | |
| CC | |
| Operating system | |
| Architecture | |
</details>
<!-- {"blocked_by":[],"summary":"New codegen: allocate large objects using allocateLocal()","status":"New","operating_system":"","component":"Compiler","related":[],"milestone":"6.14.1","resolution":"Unresolved","owner":{"tag":"Unowned"},"version":"6.11","keywords":[],"differentials":[],"test_case":"","architecture":"","cc":[""],"type":"Task","description":"See #3424.\r\n\r\nIn the new code generator, we should allocate large objects (larger than F * block size, for some suitable fraction F) using the RTS `allocateLocal()` API rather than from the nursery. It works to allocate them from the nursery -- this is what GHC 6.12 does after the fix in #3424 -- but then they will not be treated as large objects and will be copied during GC. Also, the allocation is likely to fail, requiring a trip through the RTS to put a large enough block in the nursery to satisfy the allocation.\r\n\r\n","type_of_failure":"OtherFailure","blocking":[]} -->8.0.1Simon MarlowSimon Marlowhttps://gitlab.haskell.org/ghc/ghc/-/issues/3251split rts headers into public and private2019-07-07T19:04:44Zduncansplit rts headers into public and privateC code calling into the rts, eg to initialise it, uses header files like `HsFFI.h` or `RtsAPI.h`. However these header files that describe the public API of the rts are installed in `$libdir/ghc-x.y/include` where as the standard locatio...C code calling into the rts, eg to initialise it, uses header files like `HsFFI.h` or `RtsAPI.h`. However these header files that describe the public API of the rts are installed in `$libdir/ghc-x.y/include` where as the standard location for public header files should be `$prefix/include/ghc-x.y`.
The private header files that are only used by `.hc` files when compiling `-fvia-C` should remain where they are. So this would involve identifying which headers are public and which are private.
Once we have a set of public header files it might be nice to provide a `pkg-config` .pc file to make it easy for C programs to locate the header files and libraries. Eg it should be possible to compile and link a C program that uses the RTS public API with just:
```
gcc -o main main.c `pkg-config --cflags --libs ghc-rts-6.12.1`
```
and this would supply the flags like
```
-I/usr/include/ghc-6.12.1 -lHSrts -L/usr/lib/ghc-6.12.1
```
Note that `pkg-config` supports both shared and static libs (ie allows specifying the extra private deps of a static lib, eg `-lm -ldl` etc).
<details><summary>Trac metadata</summary>
| Trac field | Value |
| ---------------------- | -------------- |
| Version | 6.10.2 |
| Type | FeatureRequest |
| TypeOfFailure | OtherFailure |
| Priority | normal |
| Resolution | Unresolved |
| Component | Runtime System |
| Test case | |
| Differential revisions | |
| BlockedBy | |
| Related | |
| Blocking | |
| CC | |
| Operating system | |
| Architecture | |
</details>
<!-- {"blocked_by":[],"summary":"split rts headers into public and private","status":"New","operating_system":"","component":"Runtime System","related":[],"milestone":"","resolution":"Unresolved","owner":{"tag":"Unowned"},"version":"6.10.2","keywords":[],"differentials":[],"test_case":"","architecture":"","cc":[""],"type":"FeatureRequest","description":"C code calling into the rts, eg to initialise it, uses header files like `HsFFI.h` or `RtsAPI.h`. However these header files that describe the public API of the rts are installed in `$libdir/ghc-x.y/include` where as the standard location for public header files should be `$prefix/include/ghc-x.y`.\r\n\r\nThe private header files that are only used by `.hc` files when compiling `-fvia-C` should remain where they are. So this would involve identifying which headers are public and which are private.\r\n\r\nOnce we have a set of public header files it might be nice to provide a `pkg-config` .pc file to make it easy for C programs to locate the header files and libraries. Eg it should be possible to compile and link a C program that uses the RTS public API with just:\r\n{{{\r\ngcc -o main main.c `pkg-config --cflags --libs ghc-rts-6.12.1`\r\n}}}\r\nand this would supply the flags like\r\n{{{\r\n-I/usr/include/ghc-6.12.1 -lHSrts -L/usr/lib/ghc-6.12.1\r\n}}}\r\nNote that `pkg-config` supports both shared and static libs (ie allows specifying the extra private deps of a static lib, eg `-lm -ldl` etc).","type_of_failure":"OtherFailure","blocking":[]} -->8.0.1https://gitlab.haskell.org/ghc/ghc/-/issues/3107Over-eager GC when blocked on a signal in the non-threaded runtime2019-07-07T19:05:21ZawickOver-eager GC when blocked on a signal in the non-threaded runtimeRight now, in the non-threaded runtime, if you have the situation where all the threads are blocked on MVars, the deadlock detection / avoidance code runs a GC. This is fine, generally, but if they're blocked on MVars held by signal hand...Right now, in the non-threaded runtime, if you have the situation where all the threads are blocked on MVars, the deadlock detection / avoidance code runs a GC. This is fine, generally, but if they're blocked on MVars held by signal handlers, it has a profoundly bad effect on performance. In short, the runtime starts collection (blocking signals?), and we're stuck waiting for it to finish before handling signals. Or, if the signal doesn't come in, we sit waiting in a loop, garbage collecting.
It would be nice if the runtime either didn't trigger collection if signal handlers are in place, or, at the very least, held off on doing so for a nontrivial period of time.
<details><summary>Trac metadata</summary>
| Trac field | Value |
| ---------------------- | -------------- |
| Version | 6.10.1 |
| Type | Bug |
| TypeOfFailure | OtherFailure |
| Priority | normal |
| Resolution | Unresolved |
| Component | Runtime System |
| Test case | |
| Differential revisions | |
| BlockedBy | |
| Related | |
| Blocking | |
| CC | dons |
| Operating system | |
| Architecture | |
</details>
<!-- {"blocked_by":[],"summary":"Over-eager GC when blocked on a signal in the non-threaded runtime","status":"New","operating_system":"","component":"Runtime System","related":[],"milestone":"","resolution":"Unresolved","owner":{"tag":"Unowned"},"version":"6.10.1","keywords":[],"differentials":[],"test_case":"","architecture":"","cc":["dons"],"type":"Bug","description":"Right now, in the non-threaded runtime, if you have the situation where all the threads are blocked on MVars, the deadlock detection / avoidance code runs a GC. This is fine, generally, but if they're blocked on MVars held by signal handlers, it has a profoundly bad effect on performance. In short, the runtime starts collection (blocking signals?), and we're stuck waiting for it to finish before handling signals. Or, if the signal doesn't come in, we sit waiting in a loop, garbage collecting.\r\n\r\nIt would be nice if the runtime either didn't trigger collection if signal handlers are in place, or, at the very least, held off on doing so for a nontrivial period of time.","type_of_failure":"OtherFailure","blocking":[]} -->8.0.1https://gitlab.haskell.org/ghc/ghc/-/issues/3061GHC's GC default heap growth strategy is not as good as other runtimes2019-07-07T19:05:35ZdonsGHC's GC default heap growth strategy is not as good as other runtimesThis is a GC performance ticket. The benchmark is the binary-trees benchmark, and the issue is whether or not GHC's ability to grow the heap without a heap check is comparably worse than other similar language runtimes.
Looking at the b...This is a GC performance ticket. The benchmark is the binary-trees benchmark, and the issue is whether or not GHC's ability to grow the heap without a heap check is comparably worse than other similar language runtimes.
Looking at the binary-trees benchmark, we see that GHC does very well on a parallel system, when we give a GC hint to the size.
E.g. the attached binary-trees program is very very competitive:
Compile with:
```
ghc -O2 -fasm --make -threaded A.hs
```
Run with:
```
$ /A 20 +RTS -N4 -H340M
```
And we get:
```
$ time ./A +RTS -N4 -H340M -RTS 20
stretch tree of depth 21 check: -1
2097152 trees of depth 4 check: -2097152
524288 trees of depth 6 check: -524288
131072 trees of depth 8 check: -131072
32768 trees of depth 10 check: -32768
8192 trees of depth 12 check: -8192
2048 trees of depth 14 check: -2048
512 trees of depth 16 check: -512
128 trees of depth 18 check: -128
32 trees of depth 20 check: -32
long lived tree of depth 20 check: -1
./A +RTS -N4 -H340M -RTS 20 17.08s user 0.30s system 315% cpu 5.511 total
```
However, without that GC hint performance deteriorates dramatically,
```
$ time ./A +RTS -N4 -RTS 20
stretch tree of depth 21 check: -1
2097152 trees of depth 4 check: -2097152
524288 trees of depth 6 check: -524288
131072 trees of depth 8 check: -131072
32768 trees of depth 10 check: -32768
8192 trees of depth 12 check: -8192
2048 trees of depth 14 check: -2048
512 trees of depth 16 check: -512
128 trees of depth 18 check: -128
32 trees of depth 20 check: -32
long lived tree of depth 20 check: -1
./A +RTS -N4 -RTS 20 33.83s user 0.42s system 136% cpu 25.033 total
```
So 5x slow down.
Could the GC heap growth algorithm be tuned? Other language runtimes, that are slower than Haskell's on this benchmark when our GC hint is in play, don't seem to suffer as badly without the hint:
> http://shootout.alioth.debian.org/u64q/benchmark.php?test=binarytrees&lang=all
See e.g. Erlang. So: is there a better growth algorithm?
<details><summary>Trac metadata</summary>
| Trac field | Value |
| ---------------------- | -------------- |
| Version | 6.10.1 |
| Type | Bug |
| TypeOfFailure | OtherFailure |
| Priority | normal |
| Resolution | Unresolved |
| Component | Runtime System |
| Test case | |
| Differential revisions | |
| BlockedBy | |
| Related | |
| Blocking | |
| CC | |
| Operating system | |
| Architecture | |
</details>
<!-- {"blocked_by":[],"summary":"GHC's GC default heap growth strategy is not as good as other runtimes","status":"New","operating_system":"","component":"Runtime System","related":[],"milestone":"","resolution":"Unresolved","owner":{"tag":"Unowned"},"version":"6.10.1","keywords":["GC","performance,"],"differentials":[],"test_case":"","architecture":"","cc":[""],"type":"Bug","description":"This is a GC performance ticket. The benchmark is the binary-trees benchmark, and the issue is whether or not GHC's ability to grow the heap without a heap check is comparably worse than other similar language runtimes. \r\n\r\nLooking at the binary-trees benchmark, we see that GHC does very well on a parallel system, when we give a GC hint to the size.\r\n\r\nE.g. the attached binary-trees program is very very competitive:\r\n\r\nCompile with:\r\n\r\n{{{\r\n ghc -O2 -fasm --make -threaded A.hs \r\n}}}\r\n\r\nRun with:\r\n\r\n{{{\r\n $ /A 20 +RTS -N4 -H340M\r\n}}}\r\n\r\nAnd we get:\r\n\r\n{{{\r\n $ time ./A +RTS -N4 -H340M -RTS 20 \r\nstretch tree of depth 21\t check: -1\r\n2097152\t trees of depth 4\t check: -2097152\r\n524288\t trees of depth 6\t check: -524288\r\n131072\t trees of depth 8\t check: -131072\r\n32768\t trees of depth 10\t check: -32768\r\n8192\t trees of depth 12\t check: -8192\r\n2048\t trees of depth 14\t check: -2048\r\n512\t trees of depth 16\t check: -512\r\n128\t trees of depth 18\t check: -128\r\n32\t trees of depth 20\t check: -32\r\nlong lived tree of depth 20\t check: -1\r\n./A +RTS -N4 -H340M -RTS 20 17.08s user 0.30s system 315% cpu 5.511 total\r\n}}}\r\n\r\nHowever, without that GC hint performance deteriorates dramatically,\r\n\r\n{{{\r\n$ time ./A +RTS -N4 -RTS 20 \r\nstretch tree of depth 21\t check: -1\r\n2097152\t trees of depth 4\t check: -2097152\r\n524288\t trees of depth 6\t check: -524288\r\n131072\t trees of depth 8\t check: -131072\r\n32768\t trees of depth 10\t check: -32768\r\n8192\t trees of depth 12\t check: -8192\r\n2048\t trees of depth 14\t check: -2048\r\n512\t trees of depth 16\t check: -512\r\n128\t trees of depth 18\t check: -128\r\n32\t trees of depth 20\t check: -32\r\nlong lived tree of depth 20\t check: -1\r\n./A +RTS -N4 -RTS 20 33.83s user 0.42s system 136% cpu 25.033 total\r\n}}}\r\n\r\nSo 5x slow down. \r\n\r\nCould the GC heap growth algorithm be tuned? Other language runtimes, that are slower than Haskell's on this benchmark when our GC hint is in play, don't seem to suffer as badly without the hint:\r\n\r\n http://shootout.alioth.debian.org/u64q/benchmark.php?test=binarytrees&lang=all\r\n\r\nSee e.g. Erlang. So: is there a better growth algorithm?\r\n\r\n\r\n\r\n","type_of_failure":"OtherFailure","blocking":[]} -->8.0.1https://gitlab.haskell.org/ghc/ghc/-/issues/1466Stack check for AP_STACK2019-07-07T19:13:26ZSimon MarlowStack check for AP_STACKThis is a bug I uncovered in the interpreter, that I think is also present in the compiler.
When compiling code for a function or thunk, we aggregate the stack usage from case continuations up to the top of the function/thunk and perfor...This is a bug I uncovered in the interpreter, that I think is also present in the compiler.
When compiling code for a function or thunk, we aggregate the stack usage from case continuations up to the top of the function/thunk and perform a single stack check. This normally works fine: we know the maximum amount of stack that will be required in the evaluation of this function/thunk, and we check for it up front.
However, this goes wrong if the evaluation is suspended by an asynchronous exception: the stack is broken into chunks and stored in `AP_STACK` objects, which may later be resumed. On resumption, we might not have enough stack any more: the code might now be running on another stack entirely, even.
Our proposed solution is as follows:
- Continuations that require more than a certain amount of stack (say 4k)
do their own stack checks.
- an AP_STACK object records the amount of stack available at the time it
was suspended, if the amount is \<4k. On resumption of an AP_STACK, we
ensure that at least this amount of stack is available. (there's a
spare half-word field in AP_STACK that we can use for this).
The 4k limit is important: it puts a bound on the amount of stack growth due to evaluating an AP_STACK. Essentially the 4k limit is large enough that it almost never happens.
<details><summary>Trac metadata</summary>
| Trac field | Value |
| ---------------------- | -------------- |
| Version | 6.6.1 |
| Type | Bug |
| TypeOfFailure | OtherFailure |
| Priority | normal |
| Resolution | Unresolved |
| Component | Runtime System |
| Test case | |
| Differential revisions | |
| BlockedBy | |
| Related | |
| Blocking | |
| CC | |
| Operating system | Unknown |
| Architecture | Unknown |
</details>
<!-- {"blocked_by":[],"summary":"Stack check for AP_STACK","status":"New","operating_system":"Unknown","component":"Runtime System","related":[],"milestone":"6.8 branch","resolution":"Unresolved","owner":{"tag":"OwnedBy","contents":"simonmar"},"version":"6.6.1","keywords":[],"differentials":[],"test_case":"","architecture":"Unknown","cc":[""],"type":"Bug","description":"This is a bug I uncovered in the interpreter, that I think is also present in the compiler.\r\n\r\nWhen compiling code for a function or thunk, we aggregate the stack usage from case continuations up to the top of the function/thunk and perform a single stack check. This normally works fine: we know the maximum amount of stack that will be required in the evaluation of this function/thunk, and we check for it up front.\r\n\r\nHowever, this goes wrong if the evaluation is suspended by an asynchronous exception: the stack is broken into chunks and stored in `AP_STACK` objects, which may later be resumed. On resumption, we might not have enough stack any more: the code might now be running on another stack entirely, even.\r\n\r\nOur proposed solution is as follows:\r\n\r\n * Continuations that require more than a certain amount of stack (say 4k)\r\n do their own stack checks.\r\n\r\n * an AP_STACK object records the amount of stack available at the time it\r\n was suspended, if the amount is <4k. On resumption of an AP_STACK, we\r\n ensure that at least this amount of stack is available. (there's a\r\n spare half-word field in AP_STACK that we can use for this).\r\n\r\nThe 4k limit is important: it puts a bound on the amount of stack growth due to evaluating an AP_STACK. Essentially the 4k limit is large enough that it almost never happens.","type_of_failure":"OtherFailure","blocking":[]} -->8.0.1Simon MarlowSimon Marlow