> Non-temporal instructions don't have anything to do with correctness. They are...

m0th87 · 2025-08-11T09:19:08 1754903948

I had interpreted GP to mean that you don’t slap on NTs for correctness reasons, rather you do it for performance reasons.

orlp · 2025-08-11T09:21:05 1754904065

That is something I can agree with, but I can't in good faith just let "it's just a hint, they don't have anything to do with correctness" stand unchallenged.

Sesse__ · 2025-08-11T09:27:39 1754904459

You mean if you access it from a different core? I believe that within the same core, you still have the normal ordering, but indeed, non-temporal writes don't have an implicit write fence after them like x86 stores normally do.

In any case, if so they are potentially _less_ correct; they never help you.

m0th87 · 2025-08-11T10:07:09 1754906829

There are no guarantees even if everything operates on the same core. Rust docs have some details: https://doc.rust-lang.org/stable/core/arch/x86_64/fn._mm_sfe...

Sesse__ · 2025-08-11T10:42:36 1754908956

Do you have any Intel references for it? I mean, Rust has its own memory model and it will not always give the same guarantees as when writing assembler.

m0th87 · 2025-08-11T11:46:28 1754912788

https://www.intel.com/content/www/us/en/docs/intrinsics-guid...

Intel's docs are unfortunately spartan, but the guarantees around program order is a hint that this is what it does.

Sesse__ · 2025-08-11T12:30:44 1754915444

That doc is about visibility _outside the core_ (“globally visible”), so it's not what I'm looking for.

Similarly, if I look up MOVNTDQ in the Intel manuals (https://www.intel.com/content/dam/www/public/us/en/documents...), they say:

“Because the WC protocol uses a weakly-ordered memory consistency model, a fencing operation implemented with the SFENCE or MFENCE instruction should be used in conjunction with VMOVNTDQ instructions if multiple processors might use different memory types to read/write the destination memory locations”

Note _if multiple processors_.