Optimize casMutVar# for single-threaded runtime
In the non-threaded RTS, stg_casMutVarzh
, etc., shouldn't need to actually use atomic instructions, but they seem to do so. I believe this makes them substantially slower than necessary in that context.