Speed up the RTS hash table
The RTS hash table is rather slow. Every lookup involves at least one indirect call (to get a hash code). It also uses separate chaining, which is itself slow.
Until recently, this didn't really matter, since the RTS hash table wasn't used for anything particularly performance critical other than StableName
operations. However, it has since become the bottleneck when compacting with sharing, to the point that compacting without sharing is around 10x faster and is thus the default.
Fortunately, there are easy ways to make the hash table faster. These include:
- Use linear probing instead of separate chaining.
- Specialize for the case of pointer keys
- Don't use indirect calls for hashing
- Use a fast, universal hash function for pointers.
- Use SSE instructions where available to do fast searches.
- Minimize the number of indirections in the use of the table.
In both the case of the StablePtr table and that of compact regions, the keys of the table are not GC pointers, but the values are, so there needs to be a way to ensure that the GC handles the table correctly