Now that the SMP and threaded runtimes are the same, there is some unnecessary locking going on in the single threaded case. For example, MVar operations now require an atomic memory operation, which can be quite slow. It might be a good idea to dynamically turn on these locks if we're running multi-threaded (ie. +RTS -N2 or greater). !ToDo: measure the difference.