Sharing queues wins. It's as simple as that. It's a general result in queue theo...

Sharing queues wins. It's as simple as that.

It's a general result in queue theory that you want one global queue as early as possible in the system. This is because requests don't take exactly the average service time to clear backends: it's a distribution. The more workers that can pull from a queue, the less the worst case service times affect the average service time.

One queue per worker with no global queue is the worst configuration and should be avoided if at all possible. Anyone who's run large reverse proxy installs knows this pain well.

The ideal system would be for the balancer machine(s) to hold the requests, and for backends to pull them in a sort of ping/pong fashion. I gather fuzed runs in a pattern like this, though I've not used it.

Telcom folks have analyzed this stuff in detail for the better part of a century. There's a lot of theory out there and it's surprisingly practical and applicable to real world web applications.