We have a web application using happstack. It always used ~1% CPU, which was fine. Recently, we started caching more things in memory, and CPU usage during idle time rose to 15-30%. We eventually determined this was due to the idle time GC (we compile with -threaded). Using +RTS -I2 removed the problem.
This is similar to #3408 (closed), except that that says it is fixed in 6.12, and that the interval is set to 5s. The docs [1] still mention 0.3s, though, and since changing it to 2s fixes our problem, it seems it's not 5s by default.
It is difficult to attach a simple test case, but I can test things if needed.
The idle GC is supposed to run just once, and then the application should go idle. However, there are some things that can cause it to run repeatedly:
the idle GC invokes a finalizer; the finalizer creates a new finalizable object, which then gets finalized by the next idle GC, and so on
the idle GC detects a deadlocked thread and sends it an exception. The deadlocked thread catches the exception and proceeds to deadlock again.
if your application has a timer signal that executes some Haskell code every 1s, say, then you'll get an idle GC after each one. Setting the idle GC to a longer period than your timer signal (e.g. 2s) will prevent this happening.
In the case of #3408 (closed), the latter was happening: the Windows console was sending events every 1s, which caused some code to execute in GHCi.
So my question is, does any of these scenarios apply to you? You can compile the program with -debug and watch the events with +RTS -Ds to see what is happening during the idle time. On a Unix system, strace may help too, or Process Monitor on Windows.
I compiled using -debug, and ran using +RTS -N -Ds. Here is the output I get:
b0103000: cap 0: starting GCall threads:threads on capability 0:threads on capability 1: thread 32 @ 0x8c2d000 is not blocked (TSO_DIRTY)other threads: thread 29 @ 0x8c6246c is blocked on an MVar @ 0x7c9c744 thread 2 @ 0x8c2b000 is blocked on an external call thread 3 @ 0x8da3000 is blocked on an MVar @ 0x7ca5840 thread 10 @ 0x7cb0af0 is blocked on an MVar @ 0x8c2f368 thread 11 @ 0x8c1a250 is blocked on an STM operation thread 13 @ 0x8c63000 is blocked on an STM operation b0185000: cap 1: starting GC a039a720: woken up on capability 1 a039a720: capability 1 is owned by another task b0185000: cap 1: finished GC b0185000: cap 1: running thread 32 (ThreadRunGHC) b0185000: cap 1: thread 32 stopped (yielding) b0185000: cap 1: thread 32 appended to run queue b0185000: cap 1: running thread 32 (ThreadRunGHC) b0185000: cap 1: thread 32 stopped (heap overflow) b0185000: cap 1: requesting parallel GC b0185000: ready_to_gc, grabbing GC threads b0103000: cap 0: finished GC
This repeats continuously. Sometimes, after ready_to_gc, there are 10-20 lines saying b0185000: passing capability 0 to worker 0xb0103000. I've attached a full log to this ticket which shows a slightly different run. There is a lot of noise during startup. The server is running at line 25194. At 26162, I do a simple request; this triggers the CPU usage. Before that, CPU usage is 0%, after that, 9%. At 28362, I shut down the server again.
As for the three points: we have finalizers that free/destroy C/C++ structures, nothing else. We don't catch deadlock exceptions anywhere. I'm not sure about the timers; I don't see anything in the -Ds logs, but I don't really know how to read them.
One thing I forgot to mention earlier: the cache we use is some Data.Map's and Data.Set's in TVar's. When we 'deepseq' these before storing them, the CPU usage drops to 2-10%, which is still high, but lower than before. I'm guessing this is just the GC having less indirections to traverse, but perhaps it is useful info.
It might help to add the timestamps too (+RTS -vt). Anyway, it looks like between every two idle GCs the IO manager thread (thread 2) wakes up one of the program threads (thread 29). I imagine you have a thread in your program that is sitting in a loop calling threadDelay, with maybe a 1 second timeout?
I don't understand why this is happening:
a039a720: woken up on capability 1 a039a720: resuming capability 1 a039a720: giving up capability 1 a039a720: freeing capability 1 a039a720: woken up on capability 1 a039a720: resuming capability 1 a039a720: giving up capability 1 a039a720: freeing capability 1 a039a720: woken up on capability 1 a039a720: resuming capability 1 a039a720: giving up capability 1 a039a720: freeing capability 1 a039a720: woken up on capability 1 a039a720: resuming capability 1 a039a720: giving up capability 1 a039a720: freeing capability 1
that's the main thread waking up several times, and finding nothing to do it goes back to sleep again.
Yes, that lead me to it! In happstack there is Happstack.Server.HTTP.Clock, which starts a thread that fills an IORef with the current time every second, using threadDelay. It is used for the request headers. I don't know why it is there; perhaps some kind of caching that improves performance over querying the time every request.
I imagine removing this would fix the problem, but haven't tried yet.
Would you say that this is a bug in happstack then, or just something users should be aware of? Because this is a hard to debug problem (profiling for example doesn't point you to the right place), and it would be nice if we could change something to prevent others from running into this in the future.
which marks a thread in such a way that if it wakes up it will not cause the runtime to think that activity has occurred, which would trigger another idle GC. We would mark the IO manager thread in the same way, since all these wakeups involve the IO manager thread too.
Even with this, I suggest that for your application you probably want to set the idle GC to something large like 30s.
Note that we don't see any difference between an idle GC time of 2s and 30s. Once we get above the 1s of the threadDelay, the CPU usage drops sharply from ~10% to ~0.25% and stays stable.
It seems I still don't understand something. If the threadDelay causes a idle GC, I should see a GC run every 1s, right? (the threadDelay is 1e6) But I just created a log with -vt, and, assuming times are in microseconds as well (which they seem to be), I see a GC run every 1000-2000 microseconds, which is 1000x more than I'd expect.