Skip to content
GitLab
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
  • Sign in / Register
  • GHC GHC
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
    • Locked Files
  • Issues 5,344
    • Issues 5,344
    • List
    • Boards
    • Service Desk
    • Milestones
    • Iterations
  • Merge requests 571
    • Merge requests 571
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
    • Test Cases
  • Deployments
    • Deployments
    • Releases
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Code review
    • Insights
    • Issue
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • Glasgow Haskell CompilerGlasgow Haskell Compiler
  • GHCGHC
  • Issues
  • #21651
Closed
Open
Issue created May 25, 2022 by Ben Gamari@bgamari🐢Maintainer

GHC.Event.Thread.closeFdWith isn't concurrency-safe

While looking at some preparatory work for !8073 I noticed that GHC.Event.Thread.closeFdWith is extremely suspicious. Specifically, it:

  1. iterates over the capabilities and takes the event callback table MVar of each
  2. iterates over the capabilities and calls GHC.Event.Manager.closeFd_ on each previously-taken table to remove the to-be-closed fd
  3. closes the fd
  4. iterates over the capabilities and for each:
    • puts the event callback table MVar
    • runs the event notifier registered for the closed fd, if any

All of this happens under an uninterruptibleMask_.

First, a bit of background: closeFdWith is needed because POSIX does not specify the behavior of select or poll when an fd being blocked on in one thread is closed by another thread. Consequently, the IO manager must (atomically) ensure that all events registered against the fd are dispatched before the fd is closed.

However, AFAICT, the above implementation can very easily break if the number of capabilities is concurrently changed. Specifically I can imagine the following problematic interleaving:

                                          │
       Capability 1                       │                  Capability 2
       ────────────                       │                  ────────────
                                          │
0. `closeFdWith fd` is called             │
                                          │
1. Takes event callback tables for all N  │
   capabilities                           │
                                          │
                                          │ 2. Calls `setNumCapabilities (N+1)`, blocks
                                          │    waiting to acquire all capabilities
                                          │
3. Yields its capability (e.g. due to GC) │
                                          │
                                          │ 3. Takes capability, and now holds all capabilities.
                                          │    A new capability is spun-up and all capabilities
                                          │    are again released.
                                          │
                                          │ 4. A thread is scheduled to capability N+1
                                          │
                                          │ 5. The thread registers an event for `fd` and blocks on `select`/`poll`
                                          │
6. `closeFdWith` continues, calling       │
   `closeFd_` on N capabilities.          │
   This removes the fd from the event     │
   list of all capabilities and           │
   wakes up those that are blocked        │
   on the fd                              │
                                          │
7. the fd is closed                       │
                                          │
                                          │ 8. The thread is now blocked on `fd`, which was closed while
                                          │    the thread was blocked, triggering undefined behavior.
                                          │

This is quite problematic.

Edited May 27, 2022 by Ben Gamari
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Assignee
Assign to
Time tracking