Skip to content
Snippets Groups Projects
Commit a5d26f26 authored by Sergei Trofimovich's avatar Sergei Trofimovich Committed by Sergei Trofimovich
Browse files

rts: enable parallel GC scan of large (32M+) allocation area


Parallel GC does not scan large allocation area (-A)
effectively as it does not do work stealing from nursery
by default.

That leads to large imbalance when only one of threads
overflows allocation area: most of GC threads finish
quickly (as there is not much to collect) and sit idle
waiting while single GC thread finishes scan of single
allocation area for that thread.

The patch enables work stealing for (equivalent of -qb0)
allocation area of -A32M or higher.

Tested on a highlighting-kate package from Trac #9221

On 8-core machine the difference is around 5% faster
of wall-clock time. On 24-core VM the speedup is 20%.

Signed-off-by: default avatarSergei Trofimovich <siarheit@google.com>

Test Plan: measured wall time and GC parallelism on highlighting-kate build

Reviewers: austin, bgamari, erikd, simonmar

Reviewed By: bgamari, simonmar

Subscribers: thomie

Differential Revision: https://phabricator.haskell.org/D2483

GHC Trac Issues: #9221
parent 9d175605
Branches wip/fprof-overloaded
No related tags found
No related merge requests found
......@@ -449,7 +449,7 @@ performance.
.. rts-flag:: -qb <gen>
:default: 1
:default: 1 for ``-A`` < 32M, 0 otherwise
:since: 6.12.1
Use load-balancing in the parallel GC in generation ⟨gen⟩ and higher.
......
......@@ -227,7 +227,7 @@ void initRtsFlagsDefaults(void)
RtsFlags.ParFlags.parGcEnabled = 1;
RtsFlags.ParFlags.parGcGen = 0;
RtsFlags.ParFlags.parGcLoadBalancingEnabled = rtsTrue;
RtsFlags.ParFlags.parGcLoadBalancingGen = 1;
RtsFlags.ParFlags.parGcLoadBalancingGen = ~0u; /* auto, based on -A */
RtsFlags.ParFlags.parGcNoSyncWithIdle = 0;
RtsFlags.ParFlags.parGcThreads = 0; /* defaults to -N */
RtsFlags.ParFlags.setAffinity = 0;
......@@ -393,7 +393,8 @@ usage_text[] = {
" -qg[<n>] Use parallel GC only for generations >= <n>",
" (default: 0, -qg alone turns off parallel GC)",
" -qb[<n>] Use load-balancing in the parallel GC only for generations >= <n>",
" (default: 1, -qb alone turns off load-balancing)",
" (default: 1 for -A < 32M, 0 otherwise;"
" -qb alone turns off load-balancing)",
" -qn<n> Use <n> threads for parallel GC (defaults to value of -N)",
" -qa Use the OS to set thread affinity (experimental)",
" -qm Don't automatically migrate threads between CPUs",
......@@ -1450,6 +1451,22 @@ static void normaliseRtsOpts (void)
errorUsage();
}
if (RtsFlags.ParFlags.parGcLoadBalancingGen == ~0u) {
StgWord alloc_area_bytes
= RtsFlags.GcFlags.minAllocAreaSize * BLOCK_SIZE;
// If allocation area is larger that CPU cache
// we can finish scanning quicker doing work-stealing
// scan. Trac #9221
// 32M looks big enough not to fit into L2 cache
// of popular modern CPUs.
if (alloc_area_bytes >= 32 * 1024 * 1024) {
RtsFlags.ParFlags.parGcLoadBalancingGen = 0;
} else {
RtsFlags.ParFlags.parGcLoadBalancingGen = 1;
}
}
#ifdef THREADED_RTS
if (RtsFlags.ParFlags.parGcThreads > RtsFlags.ParFlags.nCapabilities) {
errorBelch("GC threads (-qn) must be between 1 and the value of -N");
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment