More

toredash · 2025-11-14T08:10:36 1763107836

I often find myself trying to tell people that KISS is a good thing. If something is somewhat complex it will be really complex after a few years and a few rotations of personnel.

friendzis · 2025-11-14T08:33:47 1763109227

Quite often the tradeoff is not between complexity (to cover a bunch of different cases) and simplicity (do one thing simply), but rather where that complexity lies. Do you have dependency fanout? It probably makes sense to shove all that complexity into the central component and manage it centrally. Otherwise it probably makes sense to make all the components a bit more complex than they could be, but still manageable.

hoherd · 2025-11-14T13:46:54 1763128014

Another great one is PLOS, the Principal of Least Astonishment. Stable and reliable software and systems should avoid astonishing surprises.

https://en.wikipedia.org/wiki/Principle_of_least_astonishmen...

toredash · 2025-10-15T06:18:43 1760509123

> Most people should start with a single-zone setup and just accept that there's a risk associated with zone failure. If you have a single-zone setup, you have a node group in that one zone, you have the managed database in the same zone, and you're done.

I don't disagree, but there is one issue with this approach and that is that RDS is a multi AZ service by itself. That means that when a maintenance event occur on your insaance, AWS will start a new instance in a new zone, and fail over to that one.

You could of course manually failover RDS afterwards to your primary zone. Not sure if that is better than manually scaling up a node pool if a zone fails.

> So you are presuming that, when RDS automatically fails over to zone b to account for zone a failure, that you will certainly be able to scale up a full scale production environment in zone b as well, in spite of nearly every other AWS customer attempting more or less the same strategy;

Thats up to the user to decide via the Kyverno policy. We used the preferredDuringSchedulingIgnoredDuringExecution affinity setting to instruct the scheduler to attempt to schedule the pods in the optimal zone.

I believe the only way to be 100% sure that you have compute capacity available in your AWS account is the use EC2 On-Demand Capacity Reservations (https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-capa...). If your current zone is at full capacity, and for some reason the nodes your VMs are running on dies, that capacity is lost, and you wont get it back either.

solatic · 2025-10-15T08:23:21 1760516601

> That means that when a maintenance event occur on your insaance, AWS will start a new instance in a new zone, and fail over to that one.

Not true for single-AZ deployments. There is downtime during the maintenance event, but this is also true in multi-AZ deployments when the instance in the second AZ is promoted; a multi-AZ maintenance window has slightly less downtime, but not much; downtime is downtime, but generally not enough to affect a 99.9% SLA anyway.

> EC2 On-Demand Capacity Reservations

Also quite expensive to maintain just for outage recovery events.

The point I'm trying to make is that formal risk analysis forces you to think about actual sources of risk, and SRE/FinOps principles force you think about how much budget you are willing to spend to address those risks. And I don't understand how a tool like this fits into formal risk analysis and where it presents an optimum solution for those risks.

toredash · 2025-10-15T09:49:48 1760521788

> And I don't understand how a tool like this fits into formal risk analysis and where it presents an optimum solution for those risks.

Seems it does not fit your risk analysis?

toredash · 2025-10-15T06:10:42 1760508642

Are you thinking about already-cached container images on the host level ? Not sure how AZP fits in here?

Since you mentioned it, what I've done before when it comes to improving CI builds, is to use karpenter + local SSD mounts with very large instance types in an idle timeout of ~1h. This allowed us to have very performant build machines at a low cost. The first build of the day took a while to get going, but for the price-benefit perspective it was great.

westurner · 2025-10-16T12:35:43 1760618143

Are the container image repositories and the container images also "external resources" that could make CI build pod placement more efficient?

Thanks; that sounds faster than most self-hosted CI services.

toredash · 2025-10-16T12:50:04 1760619004

If the image repositories were AZ bound resources, that would make the CI build process more efficient.

Or, if the resources that CI build is utilizing within the image (after the image is pulled and started) is AZ bound, then yes the build process would be improved since the CI build would fetch AZ local resources, rather than crossing the AZ boundary

toredash · 2025-10-15T06:07:09 1760508429

AWS publish their own metrics for cross-AZ and internal-AZ latency: https://eu-central-1.console.aws.amazon.com/nip/ (Network Manager > Infrastructure Performance)

> In general the goal should be to deploy as much of the stack in one zone as possible

Agree. The can be a few downsides one has to consider if you have to fail over to another zone. Worst case, there isn't sufficient capacity available when you fail over if everyone else is asking for capacity at the same time. If one uses e.g. karpenter, you should be able to be very diverse in the instance selection process, so that you get at least some capacity, but maybe not the preferred.

toredash · 2025-10-15T05:59:38 1760507978

That was the origin for this solution. A client app had to issue millions of small SQL queries where the first query had to complete before the second query could be made. Millions of MS adds up.

Lowest possible latency would of course be running the client code on the same physical box as the SQL server, but thats hard to do.

toredash · 2025-10-15T05:57:00 1760507820

I would LOVE to pitch something else I'm working on that is solving this problem in EKS, cross zone data transfer.

It's a plugin that enables traffic re-direction for any service that is using an IP in any given VPC. If you have say multiple RDS Reader instances, it will first attempt to use local AZ instances first, but the other instances are available if local instances are non-functional. So you do not loose HA or failover features.

The plugin does not require any reconfiguration on your apps. It works similar to Topology Aware Routing (https://kubernetes.io/docs/concepts/services-networking/topo...) in Kubernetes, but it works for services outside of Kubernetes. The plugin even works for non-Kubernetes setup as well.

This AZP solution is fine for services that is have one IP or primary instance, like RDS Writer instance. It does not work for anything that is "stateless" and multi-AZ, like RDS Read-only instances or ALBs.

toredash · 2025-10-14T20:14:27 1760472867

I was surprised to. Of course it makes sense when you look at it hard enough, two seperate DCs won't have the same latency than internal DC communication. It might have the same physical wire-speed, but physical distance matter.

toredash · 2025-10-14T19:49:42 1760471382

The nice thing about this solution, its not limited to RDS. I used RDS as an example as many are familiar with it and are known to the fact that it will change AZ during maintenance events.

Any hostname for a service in AWS that can relocate to another AZ (for whatever reason), can use this.

toredash · 2025-10-14T19:47:20 1760471240

> Kyverno requirement makes it limited.

You don't have to use Kyverno. You could use a standard mutating webhook, but you would have to generate your own certificate and mutate on every Pod.CREATE operations. Not really a problem but, it depends.

> There is no "automatic-zone-placement-disabled"

True. Thats why I choose to use preferredDuringSchedulingIgnoredDuringExecution instead of requiredDuringSchedulingIgnoredDuringExecution. In my case, where this solutions originated from, Kubernetes was already a multi AZ solution where there was always at least one node in each AZ. It was nice if the Pod could be scheduled into the same AZ, but it was not a hard requirement,

> No automatic look up of IPs and Zones. Yup, it would generate a lot of extra "stuff" to mess with: IAM Roles, how to lookup IP/subnet information from multi account AWS setup with VPC Peerings. In our case it was "good enough" with a static approach. Subnet/network topology didnt change frequently enough to add another layer of complexity.

> What if we only have one node in specific zone?

Thats why we defaulted to preferredDuringSchedulingIgnoredDuringExecution and not required.

toredash · 2025-10-14T18:17:26 1760465846

Totally agree!

This service is published more as a concept to be built on top of, than a complete solution.

You wouldn't even need IAM rights to read RDS information, you need subnet information. As subnets are zonal, it does not if the service is RDS or Redis/ElastiCache. The IP returned from the hostname lookup, at the time your pod is scheduled, determines which AZ that Pod should (optimally) be deployed to.

Where this solution was created, was in a multi AWS account environment. Doing describe subnets API calls across multiple accounts is a hassle. It was "good enough" to have a static mapping of subnets, as they didn't change frequently.