-
Nursery chunks help reduce the cost of GC when capabilities are unevenly loaded, by ensuring that we use more of the available nursery. The rationale for enabling this at -A16m is that any negative effects due to loss of cache locality are less likely to be an issue at -A16m and above. It's a conservative guess. If we had a lot of benchmark data we could probably do better. Results for nofib/parallel at -N4 -A32m with and without -n4m: ``` ------------------------------------------------------------------------ Program Size Allocs Runtime Elapsed TotalMem ------------------------------------------------------------------------ blackscholes 0.0% -9.5% -9.0% -15.0% -2.2% coins 0.0% -4.7% -3.6% -0.6% -13.6% mandel 0.0% -0.3% +7.7% +13.1% +0.1% matmult 0.0% +1.5% +10.0% +7.7% +0.1% nbody 0.0% -4.1% -2.9% 0.085 0.0% parfib 0.0% -1.4% +1.0% +1.5% +0.2% partree 0.0% -0.3% +0.8% +2.9% -0.8% prsa 0.0% -0.5% -2.1% -7.6% 0.0% queens 0.0% -3.2% -1.4% +2.2% +1.3% ray 0.0% -5.6% -14.5% -7.6% +0.8% sumeuler 0.0% -0.4% +2.4% +1.1% 0.0% ------------------------------------------------------------------------ Min 0.0% -9.5% -14.5% -15.0% -13.6% Max 0.0% +1.5% +10.0% +13.1% +1.3% Geometric Mean +0.0% -2.6% -1.3% -0.5% -1.4% ``` Not conclusive, but slightly better. This matters a lot more when you have more cores. Test Plan: validate, nofib/paralel Reviewers: niteria, ezyang, nh2, trofi, austin, erikd, bgamari Reviewed By: bgamari Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2581 GHC Trac Issues: #9221
85e81a85