STM Implementation: Make the coarse grained locking variant available at runtime, configure time, or remove it completely.

Summary

GHC's STM implementation surprisingly offers different locking implementations.

No locking (non-threaded runtime).
Fine grained locking (STM_FG_LOCK) - used by threaded builds by default.
Coarse locks - critial sections in the STM implementation are guarded by one global mutex. (STM_CG_LOCK)

I tried enabling it, it seems to work but compiles with a number of minor warnings caused by missing casts.

As far as I can tell coarse locking seems to be currently faster in certain scenarios. In particular in some toy examples derived from #24142, although based on comments in STM.c I assume in other cases it would fare far worse than fine grained locks.

Perhaps there would be value in making this configuration easier available to users than changing #defines by hand in the source if they want to use it.

Personally I'm not sure if the advantages provided by this option warrant the bigger configuration space and carrying this code forward. It hasn't been used in released GHC's for 19 years and it adds complexity when understanding the STM code.

For now I plan to figure out the details of #24142, get bens improvements to STM TRecs merged, and perhaps check if this variant still provides speed upgrades afterwards.

See the comments in STM.c for more details.

Environment

GHC version used: Any GHC from the last 19 years.

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information