If each of the four variables is a vector, naively you might do (x + y) first, p...

If each of the four variables is a vector, naively you might do (x + y) first, producing a new vector. Then you would multiply by z, producing a second new vector. Finally, subtract w, producing a third new vector. You have now iterated over the length of the vectors three times, and allocated three new vectors (two of which are no longer needed).

A better way to do all that would be to allocate a single result vector and populate it with the full computed expression for each element. This can be much faster for large vectors. Python's numexpr (among others) is designed to do just this.