runST isn't free
While optimizing some code I discovered that runST isn't free. I had a function on the form:
f ... = ...let x = runST in (f x)...
Manually transforming it into
f ... = runST (g ...)
where g ... = do
...
x <- ...
g x
...
(The real example is at https://github.com/tibbe/unordered-containers/commit/58b7815a057b3575c58b746d5971d59d806b1333)
improved performance quite a bit on this, already highly tuned, function. Unfortunately, combining all the calls to runST into one and pulling them "up" is not all good. The code is now less modular, has a somewhat over specified evaluation order, and generally looks more imperative.
The cause of the decrease in performance is that runSTRep cannot be inlined, which causes allocation of closures inline in the code (where none should be necessary.)
The comment next to runSTRep explains why it's implemented the way it is, but I wonder if matters could be improved by making it a primop. If we create a fresh state token every time, instead of reusing realWorld#, it should be impossible for mutable structures to let-float and become CAFs (which is what runSTRep tries to avoid.)
Trac metadata
| Trac field | Value |
|---|---|
| Version | 7.4.1 |
| Type | Bug |
| TypeOfFailure | OtherFailure |
| Priority | normal |
| Resolution | Unresolved |
| Component | Compiler |
| Test case | |
| Differential revisions | |
| BlockedBy | |
| Related | |
| Blocking | |
| CC | |
| Operating system | |
| Architecture |