Skip to content

GitLab

  • Menu
Projects Groups Snippets
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
  • Sign in / Register
  • GHC GHC
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
    • Locked Files
  • Issues 4,866
    • Issues 4,866
    • List
    • Boards
    • Service Desk
    • Milestones
    • Iterations
  • Merge requests 458
    • Merge requests 458
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
    • Test Cases
  • Deployments
    • Deployments
    • Releases
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Code review
    • Insights
    • Issue
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • Glasgow Haskell Compiler
  • GHCGHC
  • Issues
  • #21460
Closed
Open
Created Apr 30, 2022 by Andreas Klebinger@AndreasKDeveloper

Increase default -fmax-worker-args value

In #20325 I mention how I think we should likely consider some sort of cost model for W/W.

Until someone tackles this however we should probably increase -fmax-worker-args. Currently we (seemingly) arbitrarily use 10.

It seems that in the past we assumed the register pressure/higher argument count past 10 was more costly than the benefits from reducing allocations.

I was always a bit skeptical about claims that 10 is a good cutoff and that more args are generally worse even if we can unbox more in exchange. So today I went and benchmarked the range 6-20 because I wanted to check the behaviour.

Here are the general results. Although notice that in this case fmax-worker-args=6 is the baseline.

I would summarize the results thus:

  • Changing -fmax-worker-args to 15 or higher seems decently beneficial in both bytes allocated and instructions executed. I assume that's where we are able to fully unbox some common dictionary?
  • Values between 6 and 14 don't make a huge difference overall. Although they can be beneficial or harmful to some individual benchmarks.
  • If allocations go down through more W/W splits generally instructions executed do too.
# bytes allocated

+-------------------------------++--+--------------+-----------+--------------------+-----------+--------------------+-----------+--------------------+-----------+--------------------+-----------+--------------------+-----------+--------------------+-----------+--------------------+-----------+--------------------+-----------+--------------------+-----------+--------------------+-----------+--------------------+-----------+--------------------+-----------+--------------------+-----------+--------------------+-----------+
|                               ||  | _workers_06/ | std. err. | _workers_07/ (rel) | std. err. | _workers_08/ (rel) | std. err. | _workers_09/ (rel) | std. err. | _workers_10/ (rel) | std. err. | _workers_11/ (rel) | std. err. | _workers_12/ (rel) | std. err. | _workers_13/ (rel) | std. err. | _workers_14/ (rel) | std. err. | _workers_15/ (rel) | std. err. | _workers_16/ (rel) | std. err. | _workers_17/ (rel) | std. err. | _workers_18/ (rel) | std. err. | _workers_19/ (rel) | std. err. | _workers_20/ (rel) | std. err. |

|                     geom mean ||  |              |           |             -0.02% |           |             +0.29% |           |             +0.08% |           |             +0.08% |           |             +0.08% |           |             +0.09% |           |             +0.09% |           |             +0.09% |           |             -0.14% |           |             -0.13% |           |             -0.14% |           |             -0.14% |           |             -0.14% |           |             -0.13% |           |


# instructions


|                               ||  | _workers_06/ | std. err. | _workers_07/ (rel) | std. err. | _workers_08/ (rel) | std. err. | _workers_09/ (rel) | std. err. | _workers_10/ (rel) | std. err. | _workers_11/ (rel) | std. err. | _workers_12/ (rel) | std. err. | _workers_13/ (rel) | std. err. | _workers_14/ (rel) | std. err. | _workers_15/ (rel) | std. err. | _workers_16/ (rel) | std. err. | _workers_17/ (rel) | std. err. | _workers_18/ (rel) | std. err. | _workers_19/ (rel) | std. err. | _workers_20/ (rel) | std. err. |

|                     geom mean ||  |              |           |             -0.03% |           |             +0.07% |           |             +0.02% |           |             +0.03% |           |             +0.03% |           |             +0.03% |           |             +0.05% |           |             +0.04% |           |             -0.10% |           |             -0.10% |           |             -0.10% |           |             -0.10% |           |             -0.09% |           |             -0.09% |           |




# LLC cache misses


|                               ||  | _workers_06/ | std. err. | _workers_07/ (rel) | std. err. | _workers_08/ (rel) | std. err. | _workers_09/ (rel) | std. err. | _workers_10/ (rel) | std. err. | _workers_11/ (rel) | std. err. | _workers_12/ (rel) | std. err. | _workers_13/ (rel) | std. err. | _workers_14/ (rel) | std. err. | _workers_15/ (rel) | std. err. | _workers_16/ (rel) | std. err. | _workers_17/ (rel) | std. err. | _workers_18/ (rel) | std. err. | _workers_19/ (rel) | std. err. | _workers_20/ (rel) | std. err. |

|                     geom mean ||  |              |           |             -0.00% |           |             -0.06% |           |             +0.00% |           |             -0.06% |           |             +0.02% |           |             -0.05% |           |             -0.07% |           |             -0.02% |           |             -0.06% |           |             +0.05% |           |             -0.08% |           |             -0.04% |           |             -0.05% |           |             -0.01% |           |




# L1 cache misses


|                               ||  | _workers_06/ | std. err. | _workers_07/ (rel) | std. err. | _workers_08/ (rel) | std. err. | _workers_09/ (rel) | std. err. | _workers_10/ (rel) | std. err. | _workers_11/ (rel) | std. err. | _workers_12/ (rel) | std. err. | _workers_13/ (rel) | std. err. | _workers_14/ (rel) | std. err. | _workers_15/ (rel) | std. err. | _workers_16/ (rel) | std. err. | _workers_17/ (rel) | std. err. | _workers_18/ (rel) | std. err. | _workers_19/ (rel) | std. err. | _workers_20/ (rel) | std. err. |

|                     geom mean ||  |              |           |             +0.04% |           |             +0.28% |           |             -0.02% |           |             +0.06% |           |             +0.11% |           |             +0.09% |           |             +0.07% |           |             +0.08% |           |             -0.06% |           |             -0.09% |           |             -0.11% |           |             -0.11% |           |             -0.09% |           |             -0.08% |           |


results

What does that mean?

  • I think we can in good confidence increase -fmax-worker-args to 15, at least on x64.
  • I don't think we have to be all that careful about increase the argument count of workers through W/W in general. So #20325 is indeed probably a good idea.
Edited Apr 30, 2022 by Andreas Klebinger
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Assignee
Assign to
Time tracking