RTS with timerfd based ticker delays for up to 10 ms during shutdown

Summary

On POSIX systems with timerFD functionality, during shutdown of the RTS we currently wait for the ticker thread to stop before continuing with shutdown. This is done by setting the exiting variable to true and then calling pthread_join() on the ticker thread here to wait for the ticker thread to finish sleeping and notice the new value of exiting. Because the RTS timer signal interval is usually 10 ms (unless altered directly or by enabling profiling), this means that a Haskell program will spend on average 5 milliseconds doing nothing during program shutdown, which is a waste of perfectly good milliseconds.

In most cases 5 ms will not be significant, but for short-lived (CLI) programs the savings as part of the total runtime might be significant. For example, hledger runs in ~23 milliseconds on my machine, so 5 ms saving would represent more than 20% of the total runtime. In extreme cases that run very quickly, the time savings as a percentage would be even greater. "Hello world" goes from 11 to 2 ms with the RTS ticker disabled, see below. For Haskell programs which spawn (many) additional Haskell processes, like cabal and hadrian, the savings would occur multiple times as well.

Steps to reproduce

Create a small "hello world" program and compile it with -rtsopts.
Run it with time ./hello_world and observe that it takes ~11 milliseconds.
Run it with time ./hello_world +RTS -V0 to disable the RTS timer and observe that it takes much shorter (~2 ms on my machine)
Run it with strace -T ./hello_world and observe that the missing 9 milliseconds occur in waiting for a futex, which is caused by pthread_join() in this line.

Expected behavior

I expect the RTS to shutdown promptly when my program code ends and not do nothing for several ms.

Proposed solution

Similar to how the IO manager manages sleeping, one way to fix this would be to create a pipe during ticker initialization and change from blocking read() on the timerfd to poll() on both the timerfd and the read end of the pipe. In normal operation the pipe would be empty and so the poll() would only return when the timerfd becomes readable. During the shutdown the RTS can stop the "sleep" early by writing some bytes to the pipe and thus making the pipe readable.

Adding the pipe would add minimal time and memory overhead and both pipe() and poll() are posix standard so they should be available everywhere that timerfd is available.

Environment

GHC version used: latest master branch.

Optional:

Operating System: POSIX
System Architecture: probably all.

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information