Memory fence on writes to MutVar/Array missing on ARM
The memory model question has been debated now and again. This thread from ten years back (https://mail.haskell.org/pipermail/haskell-prime/2006-April/001237.html) lays out the basic situation with thunk update, writeIORef, and memory fences.
But we recently began experimenting with GHC on ARM platforms, and it seems to lack a memory fence that the participants in the cited thread expect it to have.
Here's an attempt to construct a program which writes fields of a data structure, and then writes the pointer to that structure to an IORef, without the proper fence inbetween:
import Data.IORef
import Control.Concurrent
data Foo = Foo Int deriving Show
{-# NOINLINE mkfoo #-}
mkfoo x = Foo x
{-# NOINLINE dowrite #-}
dowrite r n = writeIORef r $! mkfoo n
main =
do r <- newIORef (Foo 3)
forkIO (dowrite r 4)
x <- readIORef r
print x
Here are the relevant bits of the CMM that results when compiled on an ARM 64 machine:
mkfoo_rn1_entry() // []
{ []
}
{offset
c40i:
P64[MainCapability+872] = P64[MainCapability+872] + 16;
if (P64[MainCapability+872] > I64[MainCapability+880]) goto c40m; else goto c40l;
c40m:
I64[MainCapability+928] = 16;
P64[MainCapability+24] = mkfoo_rn1_closure;
call (I64[MainCapability+16])(R1) args: 16, res: 0, upd: 8;
c40l:
I64[P64[MainCapability+872] - 8] = Foo_con_info;
P64[P64[MainCapability+872]] = P64[I64[MainCapability+856]];
P64[MainCapability+24] = P64[MainCapability+872] - 7;
I64[MainCapability+856] = I64[MainCapability+856] + 8;
call (I64[P64[I64[MainCapability+856]]])(R1) args: 8, res: 0, upd: 8;
}
}
dowrite_entry() // []
{ []
}
{offset
c44j:
call a_r3Dy_entry() args: 24, res: 0, upd: 8;
}
}
a_r3Dy_entry() // [R1]
{ []
}
{offset
c41D:
if (I64[MainCapability+856] - 16 < I64[MainCapability+864]) goto c41H; else goto c41I;
c41H:
P64[MainCapability+24] = a_r3Dy_closure;
call (I64[MainCapability+16])(R1) args: 24, res: 0, upd: 8;
c41I:
I64[I64[MainCapability+856] - 8] = block_c41B_info;
P64[I64[MainCapability+856] - 16] = P64[I64[MainCapability+856] + 8];
I64[MainCapability+856] = I64[MainCapability+856] - 16;
call mkfoo_rn1_entry() args: 16, res: 8, upd: 8;
}
}
block_c41B_entry() // [R1]
{ []
}
{offset
c41B:
_s3Ep::P64 = P64[I64[MainCapability+856] + 8];
I64[I64[MainCapability+856] + 8] = block_c41G_info;
_s3Es::P64 = P64[MainCapability+24];
P64[MainCapability+24] = _s3Ep::P64;
P64[I64[MainCapability+856] + 16] = _s3Es::P64;
I64[MainCapability+856] = I64[MainCapability+856] + 8;
if (P64[MainCapability+24] & 7 != 0) goto u41S; else goto c41K;
u41S:
call block_c41G_entry(R1) args: 0, res: 0, upd: 0;
c41K:
call (I64[I64[P64[MainCapability+24]]])(R1) args: 8, res: 8, upd: 8;
}
}
block_c41G_entry() // [R1]
{ []
}
{offset
c41G:
_s3Ev::P64 = P64[P64[MainCapability+24] + 7];
P64[_s3Ev::P64 + 8] = P64[I64[MainCapability+856] + 8];
call "ccall" arg hints: [PtrHint,
PtrHint] result hints: [] dirty_MUT_VAR(MainCapability+24, _s3Ev::P64);
P64[MainCapability+24] = ()_closure+1;
I64[MainCapability+856] = I64[MainCapability+856] + 16;
call (I64[P64[I64[MainCapability+856]]])(R1) args: 8, res: 0, upd: 8;
}
}
The fence should happen before the write of the pointer into the IORef. I can't find the fence, and can't find a codepath in the compiler that would insert it (i.e. with MO_WriteBarrier).
dirty_MUT_VAR
is actually too late to perform the fence, but it
doesn't either:
void
dirty_MUT_VAR(StgRegTable *reg, StgClosure *p)
{
Capability *cap = regTableToCapability(reg);
if (p->header.info == &stg_MUT_VAR_CLEAN_info) {
p->header.info = &stg_MUT_VAR_DIRTY_info;
recordClosureMutated(cap,p);
}
}
(Neither does recordClosureMutated
.)