SIGQUIT only reports backtrace for one capability

changed milestone to %8.8.1

changed weight to 5

There is the question of what the precise semantics of SIGQUIT should be. Should we dump the state of all threads, all schedulable threads, or only currently scheduled threads? My feeling is the latter.

This isn't entirely trivial to implement, especially given that we need to ensure that the output is readable (e.g. prevent interleaving of output from different capabilities). One option would be to add a new Message variety which can be used to request a backtrace from a capability. The thread handling the SIGQUIT could then send this message to each capability and wait for their replies and print the result. This sounds complex for a debugging feature, however.

Surely you meant SIGQUIT

changed title from SIGKILL only reports backtrace for one capability to SIGQUIT only reports backtrace for one capability

changed the description

Replying to [ticket:15774#comment:165231 lelf]:

Surely you meant SIGQUIT

Indeed I did. Fixed.

changed milestone to %8.10.1

Bumping milestones of low-priority tickets.

added Pnormal label

mentioned in issue #17451

changed the description

removed milestone

It will help my situation (trouble-shooting 100% ~ 400% CPU hogging) a lot more to just print the stack trace on busy capabilities.

But I have no experience with libdw, do you think it feasible to snapshot all busy thread's frames from the signal handling thread? I intend to work on such a tiny improvement if that's possible.

I see https://gitlab.haskell.org/ghc/ghc/-/blob/master/rts/Libdw.c#L274-276

static pid_t next_thread(Dwfl *dwfl, void *arg, void **thread_argp) {
    /* there is only the current thread */
    if (*thread_argp != NULL)
        return 0;

    *thread_argp = arg;
    return dwfl_pid(dwfl);
}

Maybe iteration of more threads can be done here? But seems our code is responsible to figure out the list of TIDs, I have no clue on how to obtain the list of busy threads at the moment, some insights please?

Looked into libdwfl, seems like we can iterate threads via dwfl_getthreads and obtain frames via dwfl_thread_getframes, then print like currently done.

I'm wondering whether need to check a thread be serving as a capability, or that just doesn't matter, if needed, how to?

I'll tinkering with it and report back later.

I'v got limited success by propagating SIGQUIT to all threads of capabilities, with some patch (against 8.10.3 src-dist) like this:

diff --git a/rts/Libdw.c b/rts/Libdw.c
--- a/rts/Libdw.c
+++ b/rts/Libdw.c
 Backtrace *libdwGetBacktrace(LibdwSession *session) {
     if (session->cur_bt != NULL) {
         sysErrorBelch("Already collecting backtrace. Uh oh.");
         return NULL;
     }
 
     Backtrace *bt = backtraceAlloc();
     session->cur_bt = bt;
     session->max_depth = max_backtrace_depth;
 
-    int pid = getpid();
-    int ret = dwfl_getthread_frames(session->dwfl, pid,
+    pid_t curr_tid = pthread_self();
+    int ret = dwfl_getthread_frames(session->dwfl, curr_tid,
                                     getBacktraceFrameCb, session);
     if (ret == -1)
         sysErrorBelch("Failed to get stack frames of current process: %s",
                       dwfl_errmsg(dwfl_errno()));
 
     session->cur_bt = NULL;
     return bt;
 }
 
 static pid_t next_thread(Dwfl *dwfl, void *arg, void **thread_argp) {
-    /* there is only the current thread */
-    if (*thread_argp != NULL)
+    /* always return current thread only */
+    pid_t *last_tid = thread_argp;
+    pid_t curr_tid = pthread_self();
+    if( curr_tid == *last_tid) {
         return 0;
-
-    *thread_argp = arg;
-    return dwfl_pid(dwfl);
+    }
+    *last_tid = curr_tid;
+    return *last_tid;
 }

diff --git a/rts/posix/Signals.c b/rts/posix/Signals.c
--- a/rts/posix/Signals.c
+++ b/rts/posix/Signals.c
 
 /* -----------------------------------------------------------------------------
  * SIGQUIT handler.
  *
  * We try to give the user an indication of what we are currently doing
  * in response to SIGQUIT.
  * -------------------------------------------------------------------------- */
 static void
 backtrace_handler(int sig STG_UNUSED)
 {
 #if USE_LIBDW
+
+#if defined(THREADED_RTS)
+    /* propagate SIGQUIT to all threads running as a capability */
+    static OSThreadId sigReceiverThId = 0;
+
+    OSThreadId propagatorThId = ACQUIRE_LOAD(&sigReceiverThId);
+    OSThreadId currThId = osThreadId();
+    if (propagatorThId == 0) {
+      propagatorThId = currThId;
+      RELEASE_STORE(&sigReceiverThId, propagatorThId);
+    }
+    if (currThId == propagatorThId) {
+      for (uint32_t i=0; i < n_capabilities; i++) {
+        Task *cap_task = ACQUIRE_LOAD(&capabilities[i]->running_task);
+        if (!cap_task) { continue; }
+        OSThreadId tid = ACQUIRE_LOAD(&cap_task->id);
+        if (tid != currThId) {
+          /* TODO need chg after USE_LIBDW doesn't imply pthread anymore */
+          pthread_kill(tid, SIGQUIT);
+        }
+      }
+    }
+#endif /* THREADED_RTS */
+
     LibdwSession *session = libdwInit();
     Backtrace *bt = libdwGetBacktrace(session);
     fprintf(stderr, "\nCaught SIGQUIT; Backtrace:\n");
     libdwPrintBacktrace(session, stderr, bt);
     backtraceFree(bt);
     libdwFree(session);
 #else
     fprintf(stderr, "This build does not support backtraces.\n");
 #endif
 }

I'm not sure the dump result can help pinpoint my bug with 2 fake busy loop concurrently running like this:

$ cabal run els
Up to date
>> Đ (Edh) Language Server <<
ℹ️ /home/m3c3/edh-universe/e-wrks/nedh/edh_modules/net/repl.edh:19:24
Đ (Edh) web REPL listening: ws://127.0.0.1:2723
ℹ️ /home/m3c3/edh-universe/e-wrks/nedh/edh_modules/net/repl.edh:42:20
Đ (Edh) web REPL listening: http://127.0.0.1:2716
ℹ️ /home/m3c3/edh_modules/els/__main__.edh:30:6
Đ (Edh) Language Server listening: Addr('127.0.0.1', 1709)
⚠️ /home/m3c3/edh_modules/els/__main__.edh:43:18
Failed reporting els port 1709 to fd 3
^\
Caught SIGQUIT; Backtrace:
                 0x3f68e62    set_initial_registers (rts/Libdw.c:291.0)
            0x7f6711a6fc58    dwfl_thread_getframes (/usr/lib/x86_64-linux-gnu/libdw-0.170.so)
            0x7f6711a7013c    dwfl_getthread_frames (/usr/lib/x86_64-linux-gnu/libdw-0.170.so)
                 0x3f6952d    libdwGetBacktrace (rts/Libdw.c:260.0)
                 0x3f8b7df    backtrace_handler (/usr/include/x86_64-linux-gnu/bits/stdio2.h:97.0)
            0x7f67128c397f    (null) (/lib/x86_64-linux-gnu/libpthread-2.27.so)
            0x7f67128bead2    pthread_cond_wait@@GLIBC_2.3.2 (../sysdeps/unix/sysv/linux/futex-internal.h:88.0)
                 0x3f8af69    waitCondition (rts/posix/OSThreads.c:117.0)

Caught SIGQUIT; Backtrace:
                 0x3f617cb    yieldCapability (rts/Capability.c:708.0)
                 0x3f5e7b2    schedule (rts/Schedule.c:690.0)
                 0x3f68e62    set_initial_registers (rts/Libdw.c:291.0)
            0x7f6711a6fc58    dwfl_thread_getframes (/usr/lib/x86_64-linux-gnu/libdw-0.170.so)
            0x7f6711a7013c    dwfl_getthread_frames (/usr/lib/x86_64-linux-gnu/libdw-0.170.so)
                 0x3f5fc3e    scheduleWaitThread (rts/Schedule.c:2612.0)
                 0x3f6952d    libdwGetBacktrace (rts/Libdw.c:260.0)
                 0x3f72e9e    hs_main (rts/RtsMain.c:73.0)
                 0x3f8b7df    backtrace_handler (/usr/include/x86_64-linux-gnu/bits/stdio2.h:97.0)
            0x7f67128c397f    (null) (/lib/x86_64-linux-gnu/libpthread-2.27.so)
                  0x415a08    (null) (/home/m3c3/edh-universe/dist-newstyle/build/x86_64-linux/ghc-8.10.3/els-0.1.0.0/x/els/noopt/build/els/els)
            0x7f6712260bf7    __libc_start_main (../csu/libc-start.c:344.0)
                 0x3f906aa    stg_ap_0_fast (rts/Apply.cmm:79.9)
                 0x3f95958    stg_restore_cccs_info (rts/StgMiscClosures.cmm:51.1)
                  0x41203a    _start (/home/m3c3/edh-universe/dist-newstyle/build/x86_64-linux/ghc-8.10.3/els-0.1.0.0/x/els/noopt/build/els/els)
                 0x3f98df8    stg_ap_pp_info (rts/dist/build/AutoApply.cmm:958.1)
                 0x3f98df8    stg_ap_pp_info (rts/dist/build/AutoApply.cmm:958.1)
                 0x3f98df8    stg_ap_pp_info (rts/dist/build/AutoApply.cmm:958.1)
                 0x3f98df8    stg_ap_pp_info (rts/dist/build/AutoApply.cmm:958.1)
                 0x3f98df8    stg_ap_pp_info (rts/dist/build/AutoApply.cmm:958.1)
...

I'm try reproduce my real situation and see what's dumped.

Meanwhile I appreciate it if you can review my patch, and see any potential problem or give directions.

Hi, this is cool! Could you post this as a merge request?

I don't have much insight on libdw, but a few comments:

Your sigRecieverThId logic looks racy. I would use a cas(&sigRecieverThid, 0, currThId)
You don't set sigRecieverThId back to 0 when you're finished. I guess this mostly works, the same thread is always the propogator, but it would be more robust to set back to 0.
As bgamari mentions in the comments above, your output callstacks can be interleaved. One (disgusting) way to achieve this would be to build a linked list with cass, then have the propogator busy-wait checking if the list is the right length, then writing the stacktraces and free the memory. I believe your posted callstacks actually are interleaved, e.g. the yieldCapability and schedule entries at the top of the second stack belong to the first stack.
n_capabilities can increase. It would be safest to have n_caps = RELAXED_LOAD(&n_capabilities) at the start and use n_caps, especially if you end up doing more logic with it.

@duog Thanks for your encouragement, I've further polished the code and opened !4786 (closed) , please continue review on it.

I gave up the propagator thread logic, as I figured out it won't work on other posix systems other than Linux, as the first receiver thread won't always be the sole receiver afterward, per posix signal delivery spec, any thread not blocking SIGQUIT can receive it from the os, and the solution to block SIGQUIT from all threads except one will not work, because when the propagated SIGQUIT is blocked, we can't be triggered to dump backtrace then.

Setting sigRecieverThId back to 0 is a solution hadn't come into my mind before I drafted the other solution in that MR, it might be a valid solution too, let's consider it along too.

The output above is actually truncated from seemingly infinite looping spitting the last line, anyhow, I later find all sort of strange things (partial trace backs and even no trace back at all e.g.) if I don't rm -rf ~/.cabal/store/ghc-8.10.3/ and cabal clean before cabal run again, after GHC rebuilt with slight source modification. A fresh cabal run with cleared intermediate artifacts (update: not stably) ~~printed much more sane output.~~

RELAXED_LOAD(&n_capabilities) should definitely be incorporated, and also I read fprintf() is not safe to be used from a signal handler, so I'd put more thought on how to make it safer, that may even need a separate serializer thread I feel, I'll experiment some and update the MR for more discussion & review.

@duog a second thought on setting sigRecieverThId back to 0, I think it is still racy for other threads to see it set or cleared, when another thread responding to a propagated SIGQUIT but sees sigRecieverThId being 0, it will wrongly decide to further propagate the signal, thus causing storms of echos, I don't think we can safely instruct the 0-clearing by propagator thread to happen-after all other threads seeing a non-zero value, as those seeings are in signal handler code, usual mutex-like sync seems dangerous. Or do we have such options?

And I remembered that the original idea is to have it record the thread that ever received the very first SIGQUIT sent to the process, and given I wrongly assumed that subsequent SIGQUIT will be solely delivered to that very same thread later on (though that's true on Linux in usual cases), the expected protocol was sigRecieverThId get set once and remain unchanged since on. I later realized it won't work on other posix systems so abandoned the solution.

I opened !4787 which rebased !4786 (closed) onto master FYI.

I later find all sort of strange things (partial trace backs and even no trace back at all e.g.) if I don't rm -rf ~/.cabal/store/ghc-8.10.3/ and cabal clean

Yes, it can be tricky working on your own ghcs. cabal has a --store-dir flag which you can use to point it a separate global store. If you are working with master branch then https://ghc.gitlab.haskell.org/head.hackage/ is a collection of patches to make various packages work.

I read fprintf() is not safe to be used from a signal handler, Me too! I created #19205 for this. My understanding is that it's probably fine if you don't try to print floats. I don't think you should worry about this too much.

I see what you mean about the storm of echos. Unfortunately I don't think your timing solution solves this either. Processes can have arbitrarily long gaps between time slices; this can manifest on a system under heavy load, beeing oomkilled, waking from suspend, or if the process is SIGSTOPped.

usual mutex-like sync seems dangerous. Or do we have such options? We don't really have such options. We can busy-wait though. while(SEQ_CST_LOAD(&still_going)) { busy_wait_nop(); }

may even need a separate serializer thread This would likely be too high a cost to pay for a niche feature. However having an existing thread serialise the output may be a solution(bgamari elaborates in a comment above).

and the solution to block SIGQUIT from all threads except one will not work, because when the propagated SIGQUIT is blocked,

This is true, but there is nothing special about SIGQUIT here. We could use SIGQUIT for the leader, then it could send a SIG??? to follower threads. I don't know what a good choice is, and this would have to be documented.

I'm interested in how you intend to deal with interleaved output. I think a good first step is to print the threadId in each frame, so that we can easily see the interleaving. My inclination would be to have the leader busy-wait for each follower to finish, then do all the printing itself, which helps solve the "storm of echos" too.

I don't think your timing solution solves this either.

It's not perfect, but at least I think it will work in usual cases, and won't crash or flush in extreme situations, is there potential worst case failure unacceptable that I'm not aware of?

I'm interested in how you intend to deal with interleaved output.

I hesitate to add any form of wait to a signal handler, a more sophisticated solution in my mind might be something like Android's circular log buffer in the kernel, which can be consumed by its logcat utility.

Then here we still need a rival of logcat, could that be an external process, or an in-process thread, to dump async payload to stderr.

A full featured in-process circular log buffer is good to have IMHO, just how its cost justify given where we are, I'm glad to work on a basic version if we can agree on an implementation simple enough. Then maybe an RTS option controlled in-process thread, to keep dumping the payload to stderr.

Anyway, the unstable output as in !4787 is blocking my proceeding, it's meaningless before we can get stable meaningful stack traces from dwarf, I really need some hint to get over it first ...

I think a good first step is to print the threadId in each frame

Sure will do that.

maybe an RTS option controlled in-process thread, to keep dumping the payload to stderr.

On my mind now is something like a broadcast TChan for simplicity, instead of a real circular buffer consists of binary blocks to be filled with bytes; but C doesn't have garbage collector, so the desirable feature of a broadcast TChan, i.e. incoming items get simply discarded when no receiver attached, will not work right away, seems a dedicated thread is unavoidable, to at least free the malloc'ed payload, if configured not to really dump to stderr.

Or do you think we can just implement it using a real broadcast TChan, then add Haskell api to dup & read-print-loop? (But I'm not sure how safe it is to atomically write to a TChan from a signal handler).

mentioned in merge request !4786 (closed)

mentioned in merge request !4787

SIGQUIT only reports backtrace for one capability

Child items ...

Activity