Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yes, this can make sense if

- the value is often doesn't require an update, and

- there's contention on the cache line, i.e., at least two cores frequently read or write that cache line.

But there are important details to consider:

1) The probing load must be atomic. Both the compiler and the processor in general are allowed to split non-atomic loads into two or more partial loads. Only atomic loads – even with relaxed ordering – are guaranteed to not return intermediate or mixed values from other atomic stores.

2) If the ordering on the read part of the atomic read-modify-write operation is not relaxed, the probing load must reflect this. For example, an acq-rel RMW op would require an acquire ordering on the probing read.



Thanks for your insights. (2) makes sense to me, but for (1), on ARM64 can an aligned 64-bit store really tear in a 64-bit non-atomic load? The spec says "A write that is generated by a store instruction that stores a single general-purpose register and is aligned to the size of the write in the instruction is single-copy atomic" (B2.2.1)


> […] on ARM64 […]

Well, if you target a specific architecture, then of course you can assume more guarantees than in general, portable code. And in general, a processor might distinguish between non-atomic and relaxed-atomic reads and writes – in theory.

But more important, and relevant in practice, is the behavior of the compiler. C, C++, and Rust compilers are allowed to assume that non-atomic reads aren't influenced by concurrent writes, so the compiler is allowed to split non-atomic reads into smaller reads (unlikely) or even optimize the reads away if it can prove that the memory location isn't written to by the local thread (more likely).


Sure, no doubt a non-atomic load would be dangerous to write in C, C++, or Rust rather than in assembly




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: