rts: Simplify atomicModifyMutVar2# implementation
Previously we would perform a redundant load in the non-threaded RTS in atomicModifyMutVar2# implementation for the benefit of the non-moving GC's write barrier. Eliminate this.
Also add atomicModifyIORef
test which has been quite useful in finding memory ordering issues.