Skip to content

Shutdown interacts badly with requestSync()

I've been investigating #10860 (closed), and the problem is more involved than I thought. It's not an easy fix, so I'll record what I know and come back to it later.

We have a mechanism for synchronising all capabilities, requestSync(). This is used by

  • scheduleDoGC()
  • setNumCapabilities()
  • forkProcess()

It ensures that only one of these can be attempting to seize control of the whole runtime. If there are multiple of these in progress, the others will yieldCapability() to the one that got in first.

However, if we start to shutdown the runtime (exitScheduler()) and *then* attempt requestSync(), bad things can happen. Some capabilities might already be shut down, so we'll never be able to acquire them in acquireAllCapabilities(), and deadlock can ensue. This happens in #10860 (closed).

There are attempts to avoid this in scheduleDoGC(), but I don't think it's enough: checking for SCHED_SHUTTING_DOWN right after requestSync() isn't enough, because exitScheduler() can happen between requestSync() and acquireAllCapabilities().

Really, shutdown needs to be part of the requestSync() game, but that needs a lot of thought.

Trac metadata
Trac field Value
Version 7.10.3
Type Bug
TypeOfFailure OtherFailure
Priority normal
Resolution Unresolved
Component Runtime System
Test case
Differential revisions
BlockedBy
Related
Blocking #10860 (closed)
CC simonmar
Operating system
Architecture
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information