Without measuring quality of output, this seems irrelevant to me.
My use of CLAUDE.md is to get Claude to avoid making stupid mistakes that will require subsequent refactoring or cleanup passes.
Performance is not a consideration.
If anything, beyond CLAUDE.md I add agent harnesses that often increase the time and tokens used many times over, because my time is more expensive than the agents.
CLAUDE.md isn't a silver bullet either, I've had it lose context a couple of questions deep. I do like GSD[1] though, it's been a great addition to the stack. I also use multiple, different LLMs as a judge for PRs, which captures a load of issues too.
In this context, "performance" means "does it do what we want it to do" not "does it do it quickly". Quality of output is what they're measuring, speed is not a consideration.
The point is that whether it does what you tell it in a single iteration is less important then whether it avoids stupid mistakes. Any serious use will put it in a harness.
I did work on a supervised fine-tuning project for one of the major providers a while back, and the documentation for the project was exceedingly clear about the extent to which they would not tolerate the model responding as if it was a person.
Some of the labs might be less worried about this, but they're not by any means homogenous.
Prodigy launched online ads from the 1980s. AOL as well.
HotWired (Wired's first online venture) sold their first banner ads in 1994.
DoubleClick was founded in 1995.
Neither were limited to 90's hardware:
Web browsers were available for machines like the Amiga, launched in 1985, and today you can find people who have made simple browsers run on 8-bit home computers like the C64.
I don't see how this contradicts any of what they said, unless they've edited their comment.
You're right we had graphical apps, but we did also have very little video. CuSeeMe existed - video conferencing would've still been a thing, but with limited resolution due to bandwidth constraints. Video in general was an awful low res mess and would have remained so if most people were limited to ISDN speeds.
While there were still images on the web, the amount of graphical flourishes were still heavily bandwidth limited.
The bandwidth limit they proposed would be a big deal even if CPU speeds continued to increase (it could only mitigate so much with better compression).
Because humans also make stupid random mistakes, and if your test suite and defensive practices don't catch it, the only difference is the rate of errors.
It may be that you've done the risk management, and deemed the risk acceptable (accepting the risk, in risk management terms) with human developers and that vibecoding changes the maths.
But that is still an admission that your test suite has gaping holes. If that's been allowed to happen consciously, recorded in your risk register, and you all understand the consequences, that can be entirely fine.
But the problem then isn't reflecting a problem with vibe coding, but a risk management choice you made to paper over test suite holes with an assumed level of human dilligence.
If the failure mode is invisible, that is a huge risk with human developers too.
Where vibecoding is a risk, it generally is a risk because it exposes a systemic risk that was always there but has so far been successfully hidden, and reveals failing risk management.
i agree, and its strange that this failure mode continually gets lumped onto AI. The whole point of longer term software engineering was to make it so that the context within a particular persons head should not impact the ability of a new employee to contribute to a codebase. turns out everything we do to make sure that is the case for a human also works for an agent.
As far as i can tell, the only reason AI agents currently fail is because they dont have access to the undocumented context inside of peoples heads and if we can just properly put that in text somehwere there will be no problems.
The failure mode is getting lumped into AI because AI is a lot more likely to fail.
We've done this with Neural Networks v1, Expert Systems, Neural Networks v2, SVM, etc, etc. only a matter of time before we figured it out with deep neural networks. Clearly getting closer with every cycle, but no telling how many cycles we have left because there is no sound theoretical framework.
At the same time, we have spent a large part of the existence of civilisation figuring out organisational structures and methods to create resilient processes using unreliable humans, and it turns out a lot of those methods also work on agents. People just often seem miffed that they have to apply them on computers too.
Even if the models stopped getting better today, we'd still see many years of improvements from improving harnesses and understanding of how to use them. Most people just talk to their agent, and don't e.g. use sub-agents to make the agent iterate and cross-check outcomes for example. Most people who use AI would see a drastic improvement in outcomes just by experimenting with the "/agents" command in Claude Code (and equivalent elsewhere). Much more so with a well thought out agent framework.
A simple plan -> task breakdown + test plan -> execute -> review -> revise (w/optional loops) pipeline of agents will drastically cut down on the amount of manual intervention needed, but most people jump straight to the execute step, and do that step manually, task by task while babysitting their agent.
reply