This is an alternative to !4787 addressing #15774
-
are either individually buildable or squashed -
have commit messages which describe what they do (referring to [Notes][notes] and tickets using #NNNN
syntax when appropriate) -
have added source comments describing your change. For larger changes you likely should add a [Note][notes] and cross-reference it from the relevant places. -
add a testcase to the testsuite. -
replace this message with a description motivating your change
Merge request reports
Activity
I created this MR for easier review of the idea described in #15774 (comment 325899)
mentioned in issue #15774
71 72 * Global variables 72 73 * -------------------------------------------------------------------------- */ 73 74 75 /* Counter for number of requests for backtrace, wrap around allowed 76 * Typically triggered by SIGQUIT, i.e. Ctrl^\ pressed by user 77 */ 78 volatile StgWord backtrace_request = 0; I'm really pleased this solved your problem! I've left a few inline comments.
I'm quite unfamiliar with this part of the codebase, so I doubt I can offer much useful advice.
I don't understand what the forking codepath is doing. In particular the static variable looks suspicious, it reads like you intended it to be thread local? A static variable won't accomplish that.
We would want a better way of exposing this behaviour than swapping it into the SIGQUIT with
#ifdefs
. I don't know what the interface should look like.It looks like a SIGQUIT commits every TSO to do a stacktrace the next time they return from Haskell code? That sounds annoying when there are thousands of TSOs.
Thanks for your review @duog , I doubt this approach is desirable to all users of GHC as a general solution, so only intend to start some discussion about the situation here. Your comments to specific code are valid but I think they are technical concerns so would put more thought only after we can decide to go deeper or not along this direction.
I don't understand what the forking codepath is doing. In particular the static variable looks suspicious, it reads like you intended it to be thread local? A static variable won't accomplish that.
It is designed to be able to diagnose the situations that many threads are kept being forked but live so short that no chance to have their backtraces dumped. By printing forker's backtrace after Ctrl^\ has been pressed. Obviously we should never do that forever once Ctrl^\ has been pressed, but printing only one forker's backtrace is also less desirable, the hardware thread's caching behavior is exploited here to have more (but not too many) forkers printed roughly per number of cores, which is desirable. And I mean not thread local storage.
We would want a better way of exposing this behaviour than swapping it into the SIGQUIT with
#ifdefs
. I don't know what the interface should look like.Quite agree, I anticipate more ideas about the setup in further discussions.
It looks like a SIGQUIT commits every TSO to do a stacktrace the next time they return from Haskell code? That sounds annoying when there are thousands of TSOs.
Yes, my use cases are many but short living threads by far, won't be that many number of threads at any instant of time, but there would be others having thousands of threads, could be rather annoying if readable at all.
the hardware thread's caching behavior is exploited here
Does this mean that you are intentionally not using atomic ops, and riding the wave of undefined behaviour? That is both awesome and disgusting!
Stepping back a little, I assume you tried using the profiling report,
+RTS -p
to diagnose your issue? I would expect that to highlight hot cost centres in the same way that your backtrace printing does.intentionally not using atomic ops
Um, atomicity is desired (at least on x86_64 it's sure to be there), only memory barrier/fence is intentionally left out, so I don't think it's undefined behavior anyway. The coolness about this exploit is, more forker backtraces are printed only when they happen in a short burst, avoided us seeing loosely stressing forking behaviors causing too many backtraces dumped.
+RTS -p
to diagnose your issue?I didn't put that under consideration in the 1st place, as my situation can not be reproduced reliably and quickly enough, aware of how massive traffic the LSP would impose, I just wanted to save myself from analyzing input of info/noise ratio way too low.
Edited by Compl YueI would rather take another approach. See my comment on the issue: #15774 (comment 330116)