Since Gemini CLI was recently released, many people on the "free" tier noticed that their sessions immediately devolved from Gemini 2.5 Pro to Flash "due to high utilization". I asked Gemini itself about this and it reported that the finite GPU/TPU resources in Google's cloud infrastructure can get oversubscribed for Pro usage. Google (no secret here) has a subscription option for higher-tier customers to request guaranteed provisioning for the Pro model. Once their capacity gets approached, they must throttle down the lower-tier (including free) sessions to the less resource-intensive models.
Price is one lever to move once capacity becomes constrained. Yet, as the top voted comment of this post explains, it's not honest to simply label this as a price increase. They raised Flash pricing on input tokens but lowered pricing on output tokens up to certain limits -- which gives creedence to the theory that they are trying to shape the demand in order for it to better match their capacity.
Since Gemini CLI was recently released, many people on the "free" tier noticed that their sessions immediately devolved from Gemini 2.5 Pro to Flash "due to high utilization". I asked Gemini itself about this and it reported that the finite GPU/TPU resources in Google's cloud infrastructure can get oversubscribed for Pro usage. Google (no secret here) has a subscription option for higher-tier customers to request guaranteed provisioning for the Pro model. Once their capacity gets approached, they must throttle down the lower-tier (including free) sessions to the less resource-intensive models.
Price is one lever to move once capacity becomes constrained. Yet, as the top voted comment of this post explains, it's not honest to simply label this as a price increase. They raised Flash pricing on input tokens but lowered pricing on output tokens up to certain limits -- which gives creedence to the theory that they are trying to shape the demand in order for it to better match their capacity.