As far as I can tell and from some quick researching of the guys previous experience, that's all it is. I think the implication is that LLM's will be architecting and deploying the cluster setups at some point? Which sounds horrific so I'm assuming I am interpreting it long
The article itself reminds me of the enthusiasm I felt for plan9 when I first heard about it back in uni. I also thought everyone should have their own compute grids and that clustered computing was the future; of course now I realize there's a lot of reasons why that doesn't actually work. Considering this appears to be a start-up ad, I hope the author knows something I don't.
I'm assuming you're at least overseeing the creation/updates of the Ansible playbooks and have some familiarity with what is being managed outside of that. While I personally would not do that[0], I can see the reasoning behind it.
ClusterdOS appears to be a kubernetes-in-a-box multiple node setup that's goal is to work so well that the user doesn't know or care what it's doing. I wouldn't trust an LLM with managing one machine by itself, let alone a whole cluster of them running the incredibly complex mess that Kubernetes is (and that's not even counting the 8 other layers of software this is), so this feels like an order of magnitude worse.
[0] Using LLMs for sysadmin research or boilerplate writing is one thing, but after a certain amount of use you're really just paying $X a month for Anthropic to manage your systems for you. I'd rather just pay a real person to do it at that point. I'd also rather people get over their pathological fear of learning how to run a server but I've given up on that.
I've been using various UNIXes and clones since the 90s so I do generally know what's going on but I also have no desire to fill my brain with the syntax to the new new new commands to configure an Ethernet interface on Linux etc, or the work necessary to understand fully why VA-API on a certain chip has specific quirks that break freerdp, nor the toil of backporting and patching the necessary libraries, or the specific dance required to set up a machine to TFTP new firmware onto one of the switches, or.... You get the idea.
I'm also not a fan of all the complexity of Kubernetes, one directory with simple to read files makes it a lot more transparent what's where and how it's set up, and the commit history + changelogs make it relatively clear what's changing and why. No distributed database or fancy bootstrapping, just a ubiquitous config format and tool to apply. Changes at the granularity of "a new host is available at A.B.C.D, configure it as a dev server" or "add a new Debian system container named 'blah' to X, bridge it to the research network only, limit to 16 hyperthreads / 64 gb ram, set up for development on git://<whatever>". It works ok for now.
The next major change will be when models that run locally are capable enough to drive the config changes themselves.
Fair read.
Concrete version: clusterdOS is an ArgoCD Application that points at a GitLab repo. It installs and reconciles a defined stack (Cilium, Rook/Ceph, GPU Operator, Traefik, cert-manager, Kubeseal, the Grafana LGTM stack, Node Feature Discovery, vLLM+llm-d for inference, Slinky for Slurm-on-K8s, a few others) as modular gitapps you toggle in YAML. GPL v3, repo's at gitlab.com/aranya-tech/public/clusterdos.
The "OS" framing isn't about the bundle though. Agreed that opinionated bundles already exist. The claim is about what the bundle standardizes: observability primitives, state representation, lifecycle hooks, control plane behavior. Every cluster running it emits the same signals and exposes the same ground truth. That's the substrate. Whether that's worth calling an OS is more of a semantic argument, but the standardization is what the post should have led with.
Cluster creation is out of scope on purpose. Bring your own K8s. kubeadm, Talos, a managed control plane, doesn't matter.
clusterdOS sits on top.
How does the IncusOS API compare to Talos? When I first looked at it it seemed very minimal and I didn't see a lot of options for more complex installs (eg network bonding, disk partitioning).
I was early container adopter at a large RHEL shop and they absolutely required us to use their forked version of docker for the daemon and RHEL based images with systemd.
This was mostly so containers could register with systems manager and count against our allowed systems.
We ignored them because it was so bad and buggy. This is when I switched to CoreOS for containerized workloads.
Everything trends upwards. Even the services they killed in the past year I’m sure were getting new customers.
But Amazon isn’t interested and doesn’t have capacity to support hundreds of services that don’t make a lot of money.
I would be fine if they built a new tool with 2024 IaC experience and control. But I think trying to evolve CFN into a new thing would take far too long and have a lot of edge cases that they should just start over and stop trying to paper over it with CDK, Proton, ACK, etc.
CDK seems like it could support multiple backends, and in fact there’s cdk8s already (having never used cdk8s, I can’t comment more on it). The CF data model seems fine enough though, so I think overall CF just needs a lot of love. Not holding my breath though.
Shoot, looks like just a me problem after disabling some extensions. Currently narrowing down which one is misbehaving. Sorry for the false alarm, should have tried this first.
EB has languished for 1 reason. The team that built/maintained EB was reorged to build AppRunner (as EB v2) and they never had enough cycles to maintain both and weren't allowed to deprecate v1.
It sits on top of Kubernetes and seems very hand wavy about how you create and manage those clusters.