Relax load_load_barrier for aarch64
This patch relaxes the instruction for load_load_barrier().
Current load_load_barrier() implements full-barrier with dmb sy
.
It's too strong to order load-load instructions.
We can relax it by using dmb ld
.
If current load_load_barrier() is used for full-barriers (load/store - load/store barrier), this patch is not suitable.
See also linux-kernel's smp_rmb() implementation:
Hopefully, it's better to use dmb ishld
rather than dmb ld
to improve performance. However, I can't validate effects on
a real many-core Arm machine.
I've only checked this patch statically. I haven't validated this patch on an Arm many-core machine.