Skip to content
GitLab
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
  • Sign in / Register
  • GHC GHC
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
    • Locked Files
  • Issues 5,242
    • Issues 5,242
    • List
    • Boards
    • Service Desk
    • Milestones
    • Iterations
  • Merge requests 568
    • Merge requests 568
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
    • Test Cases
  • Deployments
    • Deployments
    • Releases
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Code review
    • Insights
    • Issue
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • Glasgow Haskell CompilerGlasgow Haskell Compiler
  • GHCGHC
  • Issues
  • #22528
Closed
Open
Issue created Nov 29, 2022 by Andreas Klebinger@AndreasKDeveloper

Race condition in GC logic of sparks.

The reproducer from #22373 (closed) triggered a race condition when GCing sparks. The issue is at follows:

  • In parallel GC every capability prunes it's own spark pool. (pruneSparkPool).
  • We do this by checking if the thing the spark is point has been evacuated. If so we can assume the spark is life and safely retain it. If it hasn't been evacuated we GC the spark.
  • However sometimes there is a race where we look at a spark before it has been GCed. Conclude it's dead and remove it from the thread pool.

The problem arises in gcWorkerThread there the call to pruneSparkQueue assumes the whole heap has been marked by the time it's called. But since there is no explicit synchronization point before we start gcing sparks that's a rather optimistic assumption.

While a thread will have finished marking it's assigned blocks there is no guarantee that all blocks have been marked as other GC threads might still be busy evacuating their share of work.

I think this can only arise when we collect without work_stealing being true. (Usually the case for minor collections with nursery size <= 32M).
In that case inside scavenge_until_all_done if(is_par_gc() && work_stealing && r != 0) { will always be false and a Thread will break out of scavenge_until_all_done as soon as it done it's work and move on to GC sparks. Possibly incorrectly as all of the heap hasn't been marked yet.

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Assignee
Assign to
Time tracking