I think there's a fundamental difference between programming language repos and package repositories like the official RPM, deb, and ports trees.
These (typically) operating system repos have oversight and are tested to work within a set of versions. Repositories with public contribution and publishing don't have any compatibility guarantees, so the cruft described in the article must be kept indefinitely.
Unfortunately, I don't think abstracting those repositories to work within the OS package ecosystem would solve that problem and I suspect the package manager SAT solvers would have a hard time calculating dependencies.
I agree re: the fundamental difference when it comes to compiled languages. I wrote rashly and out of frustration without thinking about it too deeply.
re: interpreted languages, though, I think it's still a shit show. I don't want to run "composer" or "npm" or whatever the Ruby and Python equivalents are on my production environment. I just want packages analogous to binaries that I can cleanly deploy / remove with OS package management functionality.
I suspect that there's some marketing component at play here. People who do not own but observe devices making seemingly unnecessary noises might perceive these devices as premium. Think about the various beeps that occur when locking a car and arming the alarm, the startup sound that infotainment systems in some EVs play, the twinkle twinkle little star of a fancy rice cooker.
This project appears to make use of both vLLM and Inference Gateway (an official Kubernetes extension to the Gateway resource). The contributions of llm-d itself seems to mostly be a scheduling algorithm for load balancing across vLLM instances.
What drive is this and does it need a trim? Not all NVMe devices are created equal, especially in consumer drives. In a previous role I was responsible for qualifying drives. Any datacenter or enterprise class drive that had that sort of latency in direct IO write benchmarks after proper pre-conditioning would have failed our validation.
Unfortunately, this data is harder to find than it should be. For instance, just looking at Kioxia, which I've found to be very performant, their datasheets for the CD series drives don't mention write latency at all. Blocks and Files[1] mentions that they claim <255us average, so they must have published that somewhere. This is why we would extensively test multiple units ourselves, following proper preconditioning as defined by SNIA. Averaging 250us for direct writes is pretty good.
I have the original M1 air that I got the day it released in 2020. In a typical week, I will let my battery discharge to less than 10% twice and recharge it to 100%. I've logged 458 cycles and lost 11% of my capacity. Not too bad.
We used to have a pub rate of about 200k msgs/s, from about 400 producers all to a single exchange and had similar issues. However, we were able to mitigate this by using lazy queues.
This worked fine until things got behind and then we couldn't keep up. We were able to work around that by using a hashed exchange that spread messages across 4 queues. It hashed based on timestamp inserted by a timestamp plugin. Since all operations for a queue happen in the same event loop, any sort of backup led to pub and sub operations fighting for CPU time. By spreading this across 4 queues we wound up with 4x the CPU capacity for this particular exchange. With 2000 queues you probably didn't run into that issue very often.
These (typically) operating system repos have oversight and are tested to work within a set of versions. Repositories with public contribution and publishing don't have any compatibility guarantees, so the cruft described in the article must be kept indefinitely.
Unfortunately, I don't think abstracting those repositories to work within the OS package ecosystem would solve that problem and I suspect the package manager SAT solvers would have a hard time calculating dependencies.