At that point it's probably almost always better to just use some temporary spac...

Snild · on Jan 18, 2021

The xor swap truck is rarely better. It causes pipeline stalls, and so is likely to be significantly slower than the trivial swap. The temporary storage will be in a register, so it's (usually) not causing memory accesses.

Taniwha · on Jan 18, 2021

Yeah exactly this, it only really makes sense in some very limited situations in embedded software, for example when you might be in an interior handler and have no free registers

Someone · on Jan 18, 2021

If they’re local variables, the compiler may not need to generate any code for a swap. It could do the equivalent of register renaming (https://en.wikipedia.org/wiki/Register_renaming)

saagarjha · on Jan 26, 2021

And if your compiler can't do it, your hardware might.

mhh__ · on Jan 19, 2021

On top of that, the CPU can literally elide movs at runtime so the whole swap could end up being zero latency depending on the code motion