I know at AWS they drill into folks that spot capacity isn’t guaranteed capacity. I didn’t even think about more demand for spot reducing supply or additionally AWS just simply not investing in as much hardware to accommodate spot because of lack of growth. That said, we're seeing leading indicators that this trend will reverse in as cloud growth returns, https://www.vantage.sh/cloud-cost-report/2023-q1
Even on-demand isn't guaranteed capacity. There have been a handful of times in my career where I've tried to spin up a lot more instances and got met with an out of capacity error. There's a special kind of reserved instance for guaranteed capacity.
You have the option to get On-Demand Capacity Reservations now which are separate from RIs. Capacity reservations don't require 1- or 3-year commitments like RIs do. This can be used if you want to get your savings through Regional RIs or Savings Plans instead.
A few years ago we ran builds on m5d instances (the ones with local nvme) and they'd pretty regularly run out of capacity in certain AZs (although it was usually only a single AZ and there'd be a slight delay before the ASG grabbed one from a different AZ)
Exactly this. Our group regularly maxes out instance types in a region for some of our simulations. Have to overflow the requests over to other regions in those cases. Unfortunately the load is not consistent enough to pay for the reserved capacity.
The q1 on-demand metrics are definitely very interesting!
One possible cause of this trend is actually still consistent with spot demand growth in aggregate: when customers deploy spot fleets with on-demand backup capacity and all spot prices get closer to on-demand, those fleets would become more likely to revert to on-demand, rather than a more expensive pool of spot instances.
Measuring this would be a great opportunity to leverage the vantage point that your team has!
It would be a bit odd to build capacity for spot instances. The whole point of low priority, preemptable instances is to fill up the capacity that you have available, but have earmarked for scheduling big jobs or handling failures (say a power domain or decent chunk of the network goes down).
Otherwise they would be no different from any other instance type.