There are times when a cache is appropriate, but I often find that it's more appropriate for the cache to be on the side of whoever is making all the requests. This isn't applicable when that is e.g. millions of different clients all making their own requests, but rather when we're talking about one internal service putting heavy load on another one.
The team with the demanding service can add a cache that's appropriate for their needs, and will be motivated to do so in order to avoid hitting the rate limit (or reduce costs, which should be attributed to them).
You cannot trust your clients. Period. It doesn’t matter if they’re internal or external. If you design (and test!) with this assumption in mind, you’ll never have a bad day. I’ve really never understood why teams and companies have taken this defensive stance that their service is being “abused” despite having nothing even resembling an SLA. It seemed pretty inexcusable to not have a horizontally scaling service back in 2010 when I first started interning at tech companies, and I’m really confused why this is still an issue today.
I fully agree. The rate limits are how you control the behaviour of the clients. My suggestion of leaving caching to the clients, which they may want to do in order to avoid hitting the rate limit.
>why teams and companies have taken this defensive stance that their service is being “abused” despite having nothing even resembling an SLA.
I mean because bad code on a fast client system can cause a load higher than all other users put together. This is why half the internet is behind something like cloudflare these days. Limiting, blocking, and banning has to be baked in.
In all seriousness sometimes a cache is what you need. Inline caching is a classic example.