At that point it's probably almost always better to just use some temporary space, rather than dealing with all the overhead of a branch (what if it's mispredicted, what about the resources in the branch predictor tied up by this that might cause something else to be mispredicted).
On the other hand there are many situations where you can guarantee that x and y are not the same space in memory (for example because they are local variables). There this trick might still be interesting (for the compiler or assembly programmer)
The xor swap truck is rarely better. It causes pipeline stalls, and so is likely to be significantly slower than the trivial swap. The temporary storage will be in a register, so it's (usually) not causing memory accesses.
Yeah exactly this, it only really makes sense in some very limited situations in embedded software, for example when you might be in an interior handler and have no free registers
On the other hand there are many situations where you can guarantee that x and y are not the same space in memory (for example because they are local variables). There this trick might still be interesting (for the compiler or assembly programmer)