The interview example is funny but it hits on something real. I've watched teams spend months building internal tools that Google Sheets with a couple of formulas would have handled fine. Not because Sheets is always the right answer, but because nobody stopped to ask whether the problem actually needed custom software.
The career incentive thing is spot on too. Nobody writes "migrated team from a complex internal system to a shared spreadsheet" on their CV. But "architected and shipped a real-time collaborative data platform" looks great, even if it does the same thing worse.
The token budget angle is what makes this a real architectural decision rather than a philosophical one.
I've been using both approaches in projects and the pattern I've landed on: MCP for anything stateful (db connections, authenticated sessions, browser automation) and CLI for stateless operations where the output is predictable. The reason is simple - MCP tool definitions sit in context permanently, so you're paying tokens whether you use them or not. A CLI you can invoke on demand and forget.
The discovery aspect is underrated though. With MCP the model knows what tools exist and what arguments they take without you writing elaborate system prompts. With CLI the model either needs to already know the tool (grep, git, curl) or you end up describing it anyway, which is basically reinventing tool definitions.
Honestly the whole debate feels like REST vs GraphQL circa 2017. Both work, the answer depends on your constraints, and in two years we'll probably have something that obsoletes both.
Yup. I’ve been using CLIs with skills that define some common workflows I use and then just tell Claude to use —help for understanding how to use it. Works perfectly and I end up writing the documentation in a way that I would for any other developer.
The real problem isn't just the .env file — it's that secrets leak through so many channels. I run a Node app with OAuth integrations for multiple accounting platforms and the .env is honestly the least of my worries. Secrets end up in error stack traces, in debug logs when a token refresh fails at 3am, in the pg connection string that gets dumped when the pool dies.
The surrogate credentials + proxy approach mentioned above is probably the most robust pattern. Give the agent a token that maps to the real one at the boundary. That way even if the agent leaks it, the surrogate token is scoped and revocable.
For local dev with AI coding assistants, I've settled on just keeping the .env out of the project root entirely and loading from a path that's not in the working directory. Not bulletproof but it means the agent has to actively go looking rather than stumbling across it.
I've had similar concerns with letting agents view any credentials, or logs which could include sensitive data.
Which has left me feeling torn between two worlds. I use agents to assist me in writing and reviewing code. But when I am troubleshooting a production issue, I am not using agents. Now troubleshooting to me feels slow and tedious compared to developing.
I've solved this in my homelab by building a service which does three main things:
1. exposes tools to agents via MCP (e.g. 'fetch errors and metrics in the last 15min')
2. coordinates storage/retrieval of credentials from a Vault (e.g. DataDog API Key)
3. sanitizes logs/traces returned (e.g. secrets, PII, network topology details, etc.) and passes back a tokenized substitution
This sets up a trust boundary between the agent and production data. The agent never sees credentials or other sensitive data. But from the sanitized data, an agent is still very helpful in uncovering error patterns and then root causing them from the source code. It works well!
I'm actively re-writing this as a production-grade service. If this is interesting to you or anyone else in this thread, you can sign up for updates here: https://ferrex.dev/ (marketing is not my strength, I fear!).
Generally how are others dealing with the tension between agents for development, but more 'manual' processes for troubleshooting production issues? Are folks similarly adopting strict gates around what credentials/data they let agents see, or are they adopting a more 'YOLO' disposition? I imagine the answer might have to do with your org's maturity, but I am curious!
This matches what I've seen. The .env file is one vector, but the more common pattern with AI coding tools is secrets ending up directly in source code that never touch .env at all.
The ones that come up most often:
- Hardcoded keys: const STRIPE_KEY = "sk_live_..."
- Fallback patterns: process.env.SECRET || "sk_live_abc123" (the AI helpfully provides a default)
- NEXT_PUBLIC_ prefix on server-only secrets, exposing them to the client bundle
- Secrets inside console.log or error responses that end up in production logs
These pass type-checks and look correct in review. I built a static analysis tool that catches them automatically: https://github.com/prodlint/prodlint
It checks for these patterns plus related issues like missing auth on API routes, unvalidated server actions, and hallucinated imports. No LLM, just AST parsing + pattern matching, runs in under 100ms.
gitleaks and trufflehog are great for scanning git history for leaked secrets but that's one of 52 rules. prodlint catches the structural patterns AI coding tools specifically create: hallucinated npm packages that don't exist, server actions with no auth or validation, NEXT_PUBLIC_ on server-only env vars, missing rate limiting, empty catch blocks, and more. It's closer to a vibe-coding-aware ESLint than a secrets scanner.
Can't say it's a perfect solution but one way I've tried to prevent this is by wrapping secrets in a class (Java backend) where we override the toString() method to just print "***".
We've been exposing tools via MCP and the biggest lesson so far: the tool description is basically a meta tag. It's the only thing the model reads before deciding whether to call your tool.
Two things that surprised us: (1) being explicit about what the tool doesn't do matters as much as what it does - vague descriptions get hallucinated calls constantly, and (2) inline examples in the description beat external documentation every time. The agent won't browse to your docs page.
The schema side matters too - clean parameter names, sensible defaults, clear required vs optional. It's basically UX design for machines rather than humans. Different models do have different calling patterns (Claude is more conservative, will ask before guessing; others just fire and hope) so your descriptions need to work for both styles.
> inline examples in the description beat external documentation every time. The agent won't browse to your docs page.
That seems... surprising, and if necessary something that could easily be corrected on the harness side.
> The schema side matters too - clean parameter names, sensible defaults, clear required vs optional. It's basically UX design for machines rather than humans.
I don't follow. Wouldn't you do all those things to design for humans anyway?
Dependabot works when you have a team that reviews PRs promptly and CI that catches breaking changes. For solo founders and tiny teams, those automated PRs pile up into noise and you stop reviewing them entirely. Then you've got 30 unmerged dependency bumps you're too scared to batch-merge.
What I do instead: monthly calendar reminder, run npm audit, update things that actually matter (security patches, breaking bugs), ignore patch bumps on stable deps. The goal isn't "every dep is always current" - it's "nothing in production has a known vulnerability". Very different targets.
I build accounting automation tools and this resonates hard. The codebase has ~60 backend services handling things like pattern matching, VAT classification, invoice reconciliation - stuff where a subtle bug doesn't crash anything, it just silently posts the wrong number to someone's accounts.
Vibe coding would be catastrophic here. Not because the AI can't write the code - it usually can - but because the failure mode is invisible. A hallucinated edge case in a tax calculation doesn't throw an error. It just produces a slightly wrong number that gets posted to a real accounting platform and nobody notices until the accountant does their review.
Where I've found AI genuinely useful is as a sophisticated autocomplete. I write the architecture, define the interfaces, handle the domain logic myself. Then I'll use it to fill in boilerplate, write test scaffolding, or explore an API I'm not familiar with. The moment I hand it the steering wheel on anything domain-specific, things go sideways fast.
The article's point about understanding your codebase is spot on. When something breaks at 2am in production, "the AI wrote that part" isn't an answer. You need to be able to trace through the logic yourself.
> Vibe coding would be catastrophic here. Not because the AI can't write the code - it usually can - but because the failure mode is invisible. A hallucinated edge case in a tax calculation doesn't throw an error. It just produces a slightly wrong number that gets posted to a real accounting platform and nobody notices until the accountant does their review.
How is that different from handwritten code ? Sounds like stuff you deal with architecturally (auditable/with review/rollback) and with tests.
It’s shocking to me that people even ask this type of question. How do you not see the difference between a machine that will hallucinate something random if it doesn’t know the answer vs a human that will logic through things and find the correct answer.
Because I've seen the results ? Failure mode of LLMs are unintuitive, and the ability to grasp the big picture is limited (by context mostly I'd say), but I find CC to follow instructions better than 80% of people I've worked with. And the amount of mental stamina it would take to grok that much context even when you know the system vs. what these systems can do in minutes.
As for the hallucinations - you're there to keep the system grounded. Well the compiler is, then tests, then you. It works surprisingly well if you monitor the process and don't let LLM wander off when it gets confused.
Because humans also make stupid random mistakes, and if your test suite and defensive practices don't catch it, the only difference is the rate of errors.
It may be that you've done the risk management, and deemed the risk acceptable (accepting the risk, in risk management terms) with human developers and that vibecoding changes the maths.
But that is still an admission that your test suite has gaping holes. If that's been allowed to happen consciously, recorded in your risk register, and you all understand the consequences, that can be entirely fine.
But the problem then isn't reflecting a problem with vibe coding, but a risk management choice you made to paper over test suite holes with an assumed level of human dilligence.
> How do you not see the difference between a machine that will hallucinate something random if it doesn’t know the answer vs a human...
Your claim here is that humans can't hallucinate something random. Clearly they can and do.
> ... that will logic through things and find the correct answer.
But humans do not find the correct answer 100% of the time.
The way that we address human fallibility is to create a system that does not accept the input of a single human as "truth". Even these systems only achieve "very high probability" but not 100% correctness. We can employ these same systems with AI.
Almost all current software engineering practices and projects rely on humans doing ongoing "informal" verification. The engineers' knowledge is integral part of it and using LLMs exposes this "vulnerability" (if you want to call it that). Making LLMs usable would require such a degree of formalization (of which integration and end-to-end tests are a part), that entire software categories would become unviable. Nobody would pay for an accounting suite that cost 10-20x more.
Which interestingly is the meat of this article. The key points aren’t that “vibe coding is bad” but that the design and experience of these tools is actively blinding and seductive in a way that impairs ability to judge effectiveness.
Basically, instead of developers developing, they've been half-elevated to the management class where they manage really dumb but really fast interns (LLM's).
But they dont get the management pay, and they are 100% responsible for the LLMs under them. Whereas real managers get paid more and can lay blame and fire people under them.
Humans who fail to do so find the list of tasks they’re allowed to do suddenly curtailed. I’m sure there is a degree of this with LLMs but the fanboys haven’t started admitting it yet.
> It’s shocking to me that people even ask this type of question. How do you not see the difference between a machine that will hallucinate something random if it doesn’t know the answer vs a human that will logic through things and find the correct answer.
I would like to work with the humans you describe who, implicitly from your description, don't hallucinate something random when they don't know the answer.
I mean, I only recently finished dealing with around 18 months of an entire customer service department full of people who couldn't comprehend that they'd put a non-existent postal address and the wrong person on the bills they were sending, and this was therefore their own fault the bills weren't getting paid, and that other people in their own team had already admitted this, apologised to me, promised they'd fixed it, while actually still continuing to send letters to the same non-existent address.
Don't get me wrong, I'm not saying AI is magic (at best it's just one more pair of eyes no matter how many models you use), but humans are also not magic.
Humans are accountable to each other. Humans can be shamed in a code review and reprimanded and threatened with consequences for sloppy work. Most,
humans once reprimanded , will not make the same kind of mistake twice.
> Humans can be shamed in a code review and reprimanded and threatened with consequences for sloppy work.
I had to not merely threaten to involve the Ombudsman, but actually involve the Ombudsman.
That was after I had already escalated several times and gotten as far as raising it with the Data Protection Officer of their parent company.
> Most, humans once reprimanded , will not make the same kind of mistake twice.
To quote myself:
other people in their own team had already admitted this, apologised to me, promised they'd fixed it, while actually still continuing to send letters to the same non-existent address.
> How do you not see the difference between a machine that will hallucinate something random if it doesn’t know the answer vs a human that will logic through things and find the correct answer.
I see this argument over and over agin when it comes to LLMs and vibe coding. I find it a laughable one having worked in software for 20 years. I am 100% certain the humans are just as capable if not better than LLMs at generating spaghetti code, bugs, and nonsensical errors.
It's shocking to me that people make this claim as if humans, especially in some legacy accounting system, would somehow be much better at (1) recognizing their mistakes, and (2) even when they don't, not fudge-fingering their implementation. Like the criticisms of agents are valid, but the incredulity that they will ever be used in production or high risk systems to me is just as incredible. Of course they will -- where is Opus 4.6 compared to Sonnet 4? We've hit an inflection point where replacing hand coding with an agent and interacting only via prompt is not only doable, highly skilled people are already routinely doing it. Companies are already _requiring_ that people do it. We will then hit an inflection point at some time soon where the incredulity at using agents even in the highest stakes application will age really really poorly. Let's see!
Your point is the speculative one, though. We know humans can and have built incredibly complex and reliable systems. We do not have the same level of proof for LLMs.
Claims like your should wait at least 2-3 years, if not 5.
That is also speculative. Well let's just wait and see :) but the writing is on the wall. If your criticism is where we're at _now_ and whether or not _today_ you should be vibe coding in highly complex systems I would say: why not? as long as you hold that code to the same standard as human written code, what is the problem? If you say "well reviews don't catch everything" ok but the same is true for humans. Yes large teams of people (and maybe smaller teams of highly skilled people) have built wonderfully complex systems far out of reach of today's coding agents. But your median programmer is not going to be able to do that.
Your comment is shocking to me. AI coding works. I have seen it with my own eyes last week and today.
I can therefore only assume that you have not coded with the latest models. If you experiences are with GPT 4o or earlier all you have only used the mini or light models, then I can totally understand where you’re coming from. Those models can do a lot, but they aren’t good enough to run on their own.
The latest models absolutely are I have seen it with my own eyes. Ai moves fast.
I think the point he is trying to make is that you can't outsource your thinking to a automated process and also trust it to make the right decisions at the same time.
In places where a number, fraction, or a non binary outcome is involved there is an aspect of growing the code base with time and human knowledge/failure.
You could argue that speed of writing code isn't everything, many times being correct and stable likely is more important. For eg- A banking app, doesn't have be written and shipped fast. But it has to be done right. ECG machines, money, meat space safety automation all come under this.
Replace LLM with employee in your argument - what changes ? Unless everyone at your workplace owns the system they are working on - this is a very high bar and maybe 50% of devs I've worked with are capable of owning a piece of non trivial code, especially if they didn't write it.
Realiy is you don't solve these problems by to relying on everyone to be perfect - everyone slips up - to achieve results consistently you need process/systems to assure quality.
Safety critical system should be even better equipped to adopt this because they already have the systems to promote correct outputs.
The problem is those systems weren't built for LLMs specifically so the unexpected failure cases and the volume might not be a perfect fit - but then you work on adapting the quality control system.
>>replace LLM with employee in your argument - what changes ?
I mentioned this part in my comment. You cannot trust an automated process to a thing, and expect the same process to verify if it did it right. This is with regards to any automated process, not just code.
This is not the same as manufacturing, as in manufacturing you make the same part thousands of times. In code the automated process makes a specific customised thing only once, and it has to be right.
>>The problem is those systems weren't built for LLMs specifically so the unexpected failure cases ...
We are not talking of failures. There is a space between success and failure where the LLM can go into easily.
That's not what I get out of the comment you are replying to.
In the case being discussed here, one of code matching the tax code, perfection is likely possible; perfection is defined by the tax code. The SME on this should be writing the tests that demonstrate adhering with the tax code. Once they do that, then it doesn't matter if they, or the AI, or a one shot consultant write it, as far as correctness goes.
If the resulting AI code has subtle bugs in it that pass the test, the SME likely didn't understand the corner cases of this part of the tax code as well as they thought, and quite possibly could have run into the same bugs.
That's what I get out of what you are replying to.
With handwritten code, the humans know what they don’t know. If you want some constants or some formula, you don’t invent or guess it, you ask the domain expert.
Let's put it this way: the human author is capable of doing so. The LLM is not. You can cultivate the human to learn to think in this way. You can for a brief period coerce an LLM to do so.
Humans make such mistakes slowly. It's much harder to catch the "drift" introduced by LLM because it happens so quickly and silently. By the time you notice something is wrong, it has already become the foundation for more code. You are then looking at a full rewrite.
The rate of the mistakes versus the rate of consumers and testers finding them was a ratio we could deal with and we don’t have the facilities to deal with the new ratio.
It is likely over time that AI code will necessitate the use of more elaborate canary systems that increase the cost per feature quite considerably. Particularly for small and mid sized orgs where those costs are difficult to amortize.
If the failure mode is invisible, that is a huge risk with human developers too.
Where vibecoding is a risk, it generally is a risk because it exposes a systemic risk that was always there but has so far been successfully hidden, and reveals failing risk management.
i agree, and its strange that this failure mode continually gets lumped onto AI. The whole point of longer term software engineering was to make it so that the context within a particular persons head should not impact the ability of a new employee to contribute to a codebase. turns out everything we do to make sure that is the case for a human also works for an agent.
As far as i can tell, the only reason AI agents currently fail is because they dont have access to the undocumented context inside of peoples heads and if we can just properly put that in text somehwere there will be no problems.
The failure mode is getting lumped into AI because AI is a lot more likely to fail.
We've done this with Neural Networks v1, Expert Systems, Neural Networks v2, SVM, etc, etc. only a matter of time before we figured it out with deep neural networks. Clearly getting closer with every cycle, but no telling how many cycles we have left because there is no sound theoretical framework.
At the same time, we have spent a large part of the existence of civilisation figuring out organisational structures and methods to create resilient processes using unreliable humans, and it turns out a lot of those methods also work on agents. People just often seem miffed that they have to apply them on computers too.
It doesn't seem obvious that it's a problem for LLM coders to write their own tests (if we assume that their coding/testing abilities are up to snuff), given human coders do so routinely.
This thread is talking about vibe coding, not LLM-assisted human coding.
The defining feature of vibe coding is that the human prompter doesn't know or care what the actual code looks like. They don't even try to understand it.
You might instruct the LLM to add test cases, and even tell it what behavior to test. And it will very likely add something that passes, but you have to take the LLM's word that it properly tests what you want it to.
The issue I have with using LLM's is the test code review. Often the LLM will make a 30 or 40 line change to the application code. I can easily review and comprehend this. Then I have to look at the 400 lines of generated test code. While it may be easy to understand there's a lot of it. Go through this cycle several times a day and I'm not convinced I'm doing a good review of the test code do to mental fatigue, who knows what I may be missing in the tests six hours into the work day?
> This thread is talking about vibe coding, not LLM-assisted human coding.
I was writing about vibe-coding. It seems these guys are vibe-coding (https://factory.strongdm.ai/) and their LLM coders write the tests.
I've seen this in action, though to dubious results: the coding (sub)agent writes tests, runs them (they fail), writes the implementation, runs tests (repeat this step and last until tests pass), then says it's done. Next, the reviewer agent looks at everything and says "this is bad and stupid and won't work, fix all of these things", and the coding agent tries again with the reviewer's feedback in mind.
Models are getting good enough that this seems to "compound correctness", per the post I linked. It is reasonable to think this is going somewhere. The hard parts seem to be specification and creativity.
Maybe it’s just the people I’m around but assuming you write good tests is a big assumption. It’s very easy to just test what you know works. It’s the human version of context collapse, becoming myopic around just what you’re doing in the moment, so I’d expect LLMs to suffer from it as well.
> the human version of context collapse, becoming myopic around just what you’re doing in the moment
The setups I've seen use subagents to handle coding and review, separately from each other and from the "parent" agent which is tasked with implementing the thing. The parent agent just hands a task off to a coding agent whose only purpose is to do the task, the review agent reviews and goes back and forth with the coding agent until the review agent is satisfied. Coding agents don't seem likely to suffer from this particular failure mode.
I have zero issues with things going sideways on even the most complicated task. I don't understand why people struggle so much, it's easy to get it to do the right thing without having to hand hold you just need to be better at what you're asking for.
Not necessarily. Double entry bookkeeping catches errors in cases where an amount posted to one account does not have an equally offsetting post in another account or accounts (i.e., it catches errors when the books do not balance). It would not on its own catch errors where the original posted amount is incorrect due to a mistaken assumption, or if the offset balances but is allocated incorrectly.
The bit about "we have automated coding, but not software engineering" matches my experience. LLMs are good at writing individual functions but terrible at deciding which functions should exist.
My project has a C++ matching engine, Node.js orchestration, Python for ML inference, and a JS frontend. No LLM suggested that architecture - it came from hitting real bottlenecks. The LLMs helped write a lot of the implementation once I knew what shape it needed to be.
Where I've found AI most dangerous is the "dark flow" the article describes. I caught myself approving a generated function that looked correct but had a subtle fallback to rate-matching instead of explicit code mapping. Two different tax codes both had an effective rate of 0, so the rate-match picked the wrong one every time. That kind of domain bug won't get caught by an LLM because it doesn't understand your data model.
Architecture decisions and domain knowledge are still entirely on you. The typing is faster though.
> LLMs are good at writing individual functions but terrible at deciding which functions should exist.
Have you tried explicitly asking them about the latter? If you just tell them to code, they aren't going to work on figuring out the software engineering part: it's not part of the goal that was directly reinforced by the prompt. They aren't really all that smart.
I think this continued anthropomorphism "Have you tried asking about..." is a real problem.
I get it. It quacks like a duck, so seems like if you feed it peas it should get bigger ". But it's not a duck.
There's a distinction between "I need to tell my LLM friend what I want" and "I need to adjust the context for my statistical LLM tool and provide guardrails in the form of linting etc".
It's not that adding prose description doesn't shift the context - but it assume a wrong model about what is going on, that I think is ultimately limiting.
There's a mundane version of this that hits small businesses every day. Platform terms of service pages, API documentation, pricing policies, even the terms you agreed to when you signed up for a SaaS product - these all live at URLs that change or vanish.
I've been building tools that integrate with accounting platforms and the number of times a platform's API docs or published rate limits have simply disappeared between when I built something and when a user reports it broken is genuinely frustrating. You can't file a support ticket saying "your docs said X" when the docs no longer say anything because they've been restructured.
For compliance specifically - HMRC guidance in the UK changes constantly, and the old versions are often just gone. If you made a business decision based on published guidance that later changes, good luck proving what the guidance actually said at the time. The Wayback Machine has saved me more than once trying to verify what a platform's published API behaviour was supposed to be versus what it actually does.
The SOC 2 / audit trail point upthread is spot on. I'd add that for smaller businesses, it's not just formal compliance frameworks - it's basic record keeping. When your payment processor's fee schedule was a webpage instead of a PDF and that webpage no longer exists, you can't reconcile why your fees changed.
I build automation tools for bookkeepers and accountants. The thing I keep seeing firsthand is that automation doesn't eliminate the job - it eliminates the boring part of the job, and then the job description shifts.
Before our tools: a bookkeeper spends 80% of their time on data entry and transaction categorisation, 20% on actually thinking about the numbers. After: those ratios flip. The bookkeeper is still there, still needed, but now they're doing the part that actually requires judgment.
The catch nobody talks about is the transition period. The people who were really good at the mechanical part (fast data entry, memorised category codes) suddenly find their competitive advantage has evaporated. And the people who were good at the thinking part but slow at data entry are suddenly the most valuable people in the room. That's a real disruption for real humans even if the total number of jobs stays roughly the same.
I think the "AI won't take your job" framing misses this nuance. It's not about headcount. It's about which specific skills get devalued and how quickly people can retool. In accounting at least, the answer is "slowly" because the profession moves at glacial speed.
You’re describing task reallocation, but the bigger second-order effect is where the firm can now source the remaining human judgment.
AI reduces the penalty for weak domain context. Once the work is packaged like that, the “thinking part” becomes far easier to offshore because:
- Training time drops as you’re not teaching the whole craft, you’re teaching exception-handling around an AI-driven pipeline.
- Quality becomes more auditable because outputs can be checked with automated review layers.
- Communication overhead shrinks with fewer back-and-forth cycles when AI pre-fills and structures the work.
- Labor arbitrage expands and the limiting factor stops being “can we find someone locally who knows our messy process” and becomes “who is cheapest who can supervise and resolve exceptions.”
So yeah, the jobs mostly remain and some people become more valuable. But the clearing price for that labor moves toward the global minimum faster than it used to.
The impact won’t show up as “no jobs,” it is already showing up as stagnant or declining Western salaries, thinner career ladders, and more of the value captured by the firms that own the workflows rather than the people doing the work.
Isn't that what a well run company does when creating a process? Bureaucracy and process, reduces the penalty of weak domain context and in fact is designed to obviate that need. It "diffuses" the domain knowledge to a set of specifications, documents, and processes. AI may be able to accelerate it, or subsume that bureaucracy. But since when has the limiting factor been "finding someone locally who knows the process?" Once you document a process, the power of computing means you can outsource any of that you want no? Again, AI may subsume, all the back office or bureaucratic office work. Perhaps it will totally restructure the way humans organize labor, run companies, and coordinate. But that system will have to select for a different set of skills than "filling out n forms quickly and accurately." The wage stagnation etc etc. predates AI and might be due to other structural factors.
Not necessarily. That's the old "I made Twitter in a weekend" joke.
That's not because you can technically replicate a product that your company will be successful. What makes a company successful are sales forces, internal processes and luck. Both are extremely difficult to replicate because sales forces are based on a human network you have to build, internal processes are either organic or kept secret, and luck can only be provoked by staying alive long enough, which means you need money.
I think something around that scale (say maybe 20 employees, but definitely not hundreds) was possible even before LLM got popular, but the people involved needed to be talented and focused. I'm not sure if AI will really change that though.
The salary compression point is the one I find hardest to push back on. Accounting BPO to the Philippines was already growing fast pre-AI - firms like TOA Global were scaling rapidly. With AI reducing the training overhead for domain-specific work, that arbitrage gets even easier. The remaining barrier is local regulatory knowledge (UK tax law, Companies House requirements, etc.) but even that erodes when you're mostly supervising exceptions rather than doing the full work yourself.
"it is already showing up as stagnant or declining Western salaries"
Real median salary, and real median wages are both rising for the last couple years. Maybe they would have risen faster if there was no AI, but I don't think you can say there has been a discernible impact yet.
I’d like a source for that. College graduates are no longer at an employment advantage compared to their uneducated peers. The average age of a new hire increased by 2 years over the past 4 years.
Young people in the west have definitely seen declining salaries, if only by virtue of the fact that they’re not being offered at all.
I don't think that's true, if you trust gemini at least.. "In 2025, U.S. software engineer pay is barely keeping pace with inflation, with median compensation growing 2.67% year-over-year compared to 2.7% inflation. While salaries held steady or increased during the 2021-2023 inflationary period, many professionals reported that real purchasing power remained stagnant or dipped, making it difficult to get ahead. "
This is why (personal experience) I am seeing a lot of FullStack jobs compared to specialized Backend, FE, Ops roles. AI does 90% of the job of a senior engineer (What the CEOs believe) and the companies now want someone that can do the full "100" and not just supply the missing "10". So that remaining 90 is now coming from an amalgamation of other responsibilities.
In my mind we will have a bimodal set of skills in software development, likely something like a product engineer (an engineer who is also a product manager-- this person conceptualizes features and systemically considers the software as a whole in terms of ergonomics, business sense, and the delight in building something used by others) and something like a deep-in-the-weeds engineer (an engineer who innovates on the margins of high performance, tuning, deep improvements to libraries and other things of that nature). The former is needing to skill in rapid context switching, keeping the full model of customer journey in their minds, while also executing on technical rigor enough to prevent inefficiencies. The latter will need to skill in being able to dive extremely deeply into nuanced subjects like fine-tuning the garbage collector, compiler, network performance, or internal parts of the DOM or OS or similar.
I would expect a lot of product engineering to specialize further into domains like healthtech, fintech, adtech, etc. While the in-the-weeds engineering will be platform, infra, and embedded systems type folks.
Actually, ideally I'd love to dig deep into and specialize in database management systems internals. I think data engineering in general is the underspoken but fundamental necessity to any sort of application, AI or otherwise, but especially any concept of a data warehouse.
> automation tools ... eliminates the boring part of the job, and then the job description shifts.
But the job had better take fewer people, or the automation is not justified.
There's also a tradeoff between automation flexibility and cost. If you need an LLM for each transaction, your costs will be much higher than if some simple CRUD server does it.
Here's a nice example from a more physical business - sandwich making.
Start with the Nala Sandwich Bot.[1] This is a single robot arm emulating a human making sandwiches. Humans have to do all the prep, and all the cleaning. It's slow, maybe one sandwich per minute. If they have any commercial installations, they're not showing them.
This is cool, but ineffective.
Next is a Raptor/JLS robotic sandwich assembly line.[2] This is a dozen robots and many conveyors assembling sandwiches. It's reasonably fast, at 100 sandwiches per minute. This system could be reconfigured to make a variety of sandwich-format food products, but it would take a fair amount of downtime and adjustment.
Not new robots, just different tooling. Everything is stainless steel or food grade plastic, so it can be routinely hosed down with hot soapy water. This is modern automation. Quite practical and in wide use.
Finally, there's the Weber automated sandwich line.[3] Now this is classic single-purpose automation, like 1950s Detroit engine lines. There are barely any robots at all; it's all special purpose hardware. You get 600 or more sandwiches per minute. Not only is everything stainless or food-grade plastic, it has a built-in self cleaning system so it can clean itself.
Staff is minimal. But changing to a product with a slightly different form factor requires major modifications and skills not normally present in the plant. Only useful if you have a market for several hundred identical sandwiches per minute.
These three examples show why automation hasn't taken over. To get the most economical production, you need extreme product standardization. Sometimes you can get this. There are food plants which turn out Oreos or Twinkies in vast quantities at low cost with consistent quality. But if you want product variations, productivity goes way, way down.
> But the job had better take fewer people, or the automation is not justified.
In many cases, this is a fallacy.
Much like programming, there is often essentially an infinite amount of (in this case) bookkeeping tasks that need to be done. The folks employed to do them work on the top X number of them. By removing a lot of the scut work, second order tasks can be done (like verification, clarification, etc.) or can be done more thoroughly.
Source: Me. I have worked waaaay too much on cleaning up the innards of less-than-perfect accounting processes.
Well said. It’s like they think that the only thing automation is good for is cutting costs. You can keep the same staff size but increase output instead, creating more value.
"They" don't think the only thing automation is good for is cutting costs. Management thinks the only thing worth doing, at all, using any means, is cutting costs.
> The firm simply assumes that if the top X was sufficient in the past, it is still sufficient now.
> From the perspective of modern management, there's really no reason to keep people if you can automate them away.
These are examples of how bad management thinks, or at best, how management at dying companies think.
Frankly, this take on “modern management” is absurd reductionist thinking.
Just a few points about how managers in successful companies think:
- Good employees are hard to find. You don’t let good people go just because you can. Retraining a good employee from a redundant role into a needed role is often cheaper than trying to hire a new person.
- That said, in any sufficiently large organization, there is usually dead weight that can be cut. AI will be a bright light that exposes the least valuable employees, imho.
- There is a difference between threshold levels of compliance (e.g., docs that have to be filed for legal reasons) and optimal functioning. In accounting, a good team will pay for themselves many times if they have the time to work on the right things (e.g., identifying fraud and waste, streamlining purchasing processes, negotiating payment terms, etc.). Businesses that optimize for making money rather than getting a random VP their next promotion via cost-cutting will embrace the enhanced capability.
Yes, AI will bring about significant changes to how we work.
Yes, there will be some turmoil as the labor market adjusts (which it will).
No, AI will not lead to a labor doomsday scenario.
> - Good employees are hard to find. You don’t let good people go just because you can. Retraining a good employee from a redundant role into a needed role is often cheaper than trying to hire a new person.
Your best employees at a given price though.
Part of firm behavior is to let go of their most expensive workers when they decide to tighten belts.
Unless your employee is unable to negotiate, lacking the information and leverage to be paid the market rate for their ability. Your best employees will be your more expensive, senior employees.
Everything is at a certain price. Firing your best employee when you can get the job done with cheaper, or you can make do with cheaper, is also a common and rational move.
While I agree it’s unlikely that there won’t be a labour doomsday scenario, I think ann under employment scenario is highly likely. Offshoring ended up decimating many cities and local economies, as factory foremen found new roles as burger flipper.
Nor do people retrain into new domains and roles easily. The more senior you are, the harder it is to recover into a commensurately well paying role.
AI promises to reduce the demand for the people in the prime age to earn money, in the few high paying roles that remain.
Not the apocalypse as people fear, but not that great either.
> Is Microsoft a "dying company"? The stock market certainly thinks otherwise.
This is the entire sentence that I wrote that you seem to be referring to:
“These are examples of how bad management thinks, or at best, how management at dying companies think.”
MS falls under the first part — bad management. Let literacy be your friend.
To elaborate, yes, I think that MS is managed incredibly poorly, and they succeed despite their management norms and culture, not because of it. They should be embarrassed by their management culture, but their success in other areas of the company allows the bad management culture to persist.
For a full cart, I expect a cashier or to be available.
If I have 3-5 items, I’d rather do it myself than wait.
That said, even 20-30 years ago, long before self checkout, at places like WalMart, one could wait 15-20 minutes in line. They had employees but were too cheap to have enough. They really didn’t care.
I don’t even understand how that math works. I might have kept going there if they had a few extra lowly paid cashiers around.
> But the job had better take fewer people, or the automation is not justified.
Not necessarily. Automation may also just result in higher quality output because it eliminates mistakes (less the case with "AI" automation though) and frees up time for the humans to actually quality control. This might require the people on average to be more skilled though.
Even if it only results in higher output volume you often have the effect that demand grows also because the price goes down.
There's a classic book on this, "Chapters on Machinery and Labor" (1926). [1]
They show three cases of what happened when a process was mechanized.
The "good case" was the Linotype. Typesetting became cheaper and the number of works printed went up, so printers did better.
The "medium case" was glassblowing of bottles. Bottle making was a skilled trade, with about five people working as a practiced team to make bottles. Once bottle-making was mechanized, there was no longer a need for such teams. But bottles became cheaper, so there were still a lot of bottlemakers. But they were lower paid, because tending a bottle-making machine is not a high skill job.
The "bad case" was the stone planer. The big application for planed stone was door and window lintels for brick buildings. This had been done by lots of big guys with hammers and chisels. Steam powered stone planers replaced them. Because lintels are a minor part of buildings, this didn't cause more buildings to be built, so employment in stone planing went way down.
Those are still the three basic cases. If the market size is limited by a non-price factor, higher productivity makes wages go down.
I think this is probably the trajectory for software development because while people claim there is a potentially unlimited demand that really only occurs at rock bottom prices.
In many cases you can saturate the market. The stone planer examples is an early case. Cheaper lintels don't mean more windows, because they are a minor part of the cost. Cheaper doorknobs do not generate demand for more doorknobs, because the market size is the number of doors. Cheap potatoes, soy, corn, and cheese have saturated their markets - people can only eat so much.
This might also be true of web analytics. At some point, more data will not improve profitability.
No? You don’t only gain justification for automation by cutting costs. You can gain justification by increasing profits. You can keep the same amount of people but use them more efficiently and you create more total value. The fact you didn’t consider this worries me.
Also the statement “show why automation hasn’t taken over” is truely hysterically wrong. Yeah, sure, no automation has taken over since the Industrial Revolution
You can increase profits by cutting costs. It is remarkably easier to do in the short term. And even if you choose not to downsize you can drop/stagnate wages to gain from the fact everyone else is downsizing.
The Nala bot reminded me of the guys at Felipe's in Cambridge MA. When they're building burritos during dinner rush, you'd swear to god that multiple different ingredients were following a ballistic trajectory toward the tortilla at any given time. If there was a salsa radar it would show multiple inbounds like the Russkies were finally nuking us.
ETA: It didn't remind me of this because the robot is good at what it does. It reminded me of just how far away from human capabilities SOTA robotic systems are.
Thank you. Having automation means process control, which means handling sources of variation for a defined standard/spec.
The claims of all jobs being done by AI end up also assuming that we will end up with factories running automated assembly lines of thought.
I have been losing my mind looking at the output of LLMs and having to nail variability down.
I recently did a contract at medium sized business with a large retail and online business that had a CFO and several accountants / bookkeepers. You're describing a situation where that CFO only needs two or three accountants and bookkeepers to run the business and would lay off two or three people.
Fair enough - I'm probably biased because I mostly see small practices (1-3 people) where headcount can't really shrink further. In that context it's about throughput per person. But you're right that in a larger org with a CFO making staffing decisions, the efficiency gains get captured as cost savings rather than more clients served. The 5-to-3 scenario you describe is realistic and happening now.
I keep seeing that small teams or individuals are getting most of the productivity gains from new ai.
Small teams or individuals that learn to use ai well can outpace larger teams, even if the larger teams also use ai, because communication / coordination overhead grows faster than team size. Tasks that before needed large teams to get done, can now be done by smaller teams.
Large Knowledge work teams have lost their competitive advantage.
I see this as a business opportunity for small actors. Every large knowledge work team that doesn't quickly adapt and downsize itself, is now something you can disrupt as a small team or individual.
Another component or view of this is that automating the rote work is "eliminating the boring parts" (I love this and have worked extensively on this) but it is also eliminating the less cognitively demanding work.
Once you have automated extensively, all of the remaining work is cognitively demanding and doing 8 hours of that work every day is exhausting.
Systems engineering is an extremely hard computer science domain with few engineers either interested in it, or good at it.
Building dashboards is tedious and requires organizational structure to deliver on. This is the bread and butter of what agents are good at building right now. You still need organization and communication skills in your company and to direct the coding agents towards that dashboard you want and need. Until you hit a implementation wall and someone will need to spend time trying to understand some of the code. At least with dashboards, you can probably just start over from scratch.
It's arguably more work to prompt in english to an AI agent to assist you in hard systems problems, and the signals the agent would need to add value aren't readily available (yet?!). Plus, there's no way systems engineers would feel comfortable taking generated code at face-value. So they definitely will spend the extra mental energy to read what is output.
So I don't know. I think we're going to keep marching forward, because that's what we do, but I also don't think this "vibe-coded" automated code generator phase we're in right now will ultimately last. It'll likely fall apart and the pieces we put back together will likely return us to some new kind of normal, but we'll all still need to know how to be damn good software engineers.
I understand where you're coming from, and think there is something missing in your final paragraph that I'm curious to understand. If LLMs do end up improving productivity, what would make them go away? I think automated code generators are here until something more performant supersedes them. So, what in your mind might be possibilities of that thing?
Well I guess I no longer believe that long term, all this code generation would make us more productive. At least not how the fan favorite claude-code currently does it.
I've found some power use cases with LLMs, like "explore", but everyone seems misty eye'd that these coding agents can one-shot entire features. I suspect it'll be fine until it's not and people get burned by what is essentially trusting these black boxes to barf out entire implementations leaving trails of code soup.
Worse is that junior engineers can say they're "more productive" but it's now at the expense of understanding what it is they just contributed.
So, sure, more productive, but in the same way that 2010s move fast and break things philosophy was, "more productive." This will all come back to bite us eventually.
>> The thing I keep seeing firsthand is that automation doesn't eliminate the job - it eliminates the boring part of the job, and then the job description shifts.
No, not necessarily. There are different kinds of automation.
Earlier in my career I sold and implemented enterprise automation solutions for large clients. Think document scanning, intelligent data extraction and indexing and automatic routing. The C-level buyers overwhelmingly had one goal: to reduce headcount. And that was almost always the result. Retraining redundant staff for other roles was rare. It was only done in contexts where retaining accumulated institutional knowledge was important and worth the expense.
Here's the thing though: to overcome objections from those staff, whom we had to interview to understand the processes we were automating, we told them your story: you aren't being replaced, you're being repurposed for higher-level work. Wouldn't it be nice if the computer did the boring and tedious parts of your job so that you can focus on more important things? Most of them were convinced. Some, particularly those who had been around the block, weren't.
Ultimately, technologies like AI will have the the same impact. They weren't quite there yet, but I think it's just a matter of time.
This is exactly why I'm not that worried. I've noticed that AI is great at the parts of software engineering that I'm bad at, like implementing a new unfamiliar library, deploy pipelines, infra configuration, knowing specific technical details and standard patterns.
It's bad at the stuff I'm good at: thinking about the wider context, architecture, how to structure the code in an elegant, maintainable way, debugging complex issues, figuring out complex algorithms. I've tried using AI for those things, but it sucks at them. But I've also used it to solve configuration problems that I doubt I'd been able to figure out on my own.
one reason why i started enjoying programming less and less was because i felt i was spending 95% of the time on the problems you described which i felt were more or less the same over the years and werent complicated but annoying. unfortunately or fortunately, after coding for over 15 years for the past 4 months ive only been prompting and reading the outputted code. it never really feels like writing something would be faster than just prompting, so now i prompt 2-3 projects at the same time and play a game on the side to fill in the time while waiting for the prompts to finish. its nice since im still judged as if its taking the time to do it manually but if this ever becomes the norm and expectations rise it would become horribly draining. mentally managing the increased speed in adding complexity if very taxing for me. i no longer have periods where i deep dive into a problem for hours or do some nice refactoring which feels like its massaging my brain. now all i do is make big decisions
This is also my experience. I am personally really happy about it. I never cared about the typing part of programming. I got into programming for the thinking about hard problems part. I now think hard more than ever. It's hard work, but it feels much more fulfilling to me.
I miss the deep dives. I make time for them again. A month or two ago, I was working on a really complex problem where I relied way too much on AI, and that reliance kept my thinking about the problem relatively shallow, which meant that while I understood the big picture of the problem, I didn't really understand the intricacies. And the AI didn't either; I must have wasted about a week just trying to get the AI to solve it.
Eventually, I switched. I stopped using the AI in my IDE, and instead used a standalone Copilot app that I had to actually explain the problem. That forced me to understand it, and that helped me solve it. It demoted the AI to an interactive rubber duck (which is a great use for AI). That moment when I finally started to understand the real problem, that was great. That's the stuff I love about this work, and I won't let the AI take that away from me again.
I would imagine, in this example, that the fact that you put in the numbers yourself gives you a mental map of where the numbers are and how they relate to each other, that having AI do it for you doesn't give you.
You could stare at a large sheet of numbers for a long time, and perhaps never get the kind of context you gained by entering them.
Additionally, if there was a mistake, it may not be as noticeable.
> The bookkeeper is still there, still needed, but now they're doing the part that actually requires judgment.
The argument might be fundamentally sound, but now we're automating the part that requires judgement. So if the accountants aren't doing the mechanical part or the judgement part, where exactly is the role going? Formalised reading of an AI provided printout?
It seems quite reasonable to predict that humans just won't be able to make a living doing anything that involves screens or thinking, and we go back to manual labour as basically what humans do.
Even manual labor is uncertain. Nothing in principle prevents a robot from being a mass produceable, relatively cheap, 24/7 manual worker.
We've presumably all seen the progress of humanoid robotics; they're currently far from emulating human manual dexterity, but in the last few years they've gotten pretty skilled at rapid locomotion. And robots will likely end up with a different skill profile at manual tasks than humans, simply due to being made of different materials via a more modular process. It could be a similar story to the rise of the practical skills of chatbots.
In theory we could produce a utopia for humans, automating all the bad labor. But I have little optimism left in my bones.
By what logic are the "manual labor" jobs available? And if you're right and they somehow are, isn't that just another way of saying humanity is enslaving itself to the machines?
You’re not taking into account that a successful bookkeeper may have hired someone like a new grad to take the drudgery off of their hands and now they can just do it themselves.
I'd imagine that when the 80% of less productive time is automated, the market doesn't respond by demanding 80% more output. There's just 20% as much work either making this a part time job or more likely a much smaller workforce as the number of man*hours demanded by the market greatly reduces.
Good accounting teams will have more time and resources to do things like identify fraud, waste, duplicated processes, etc. They will also have time to streamline/optimize existing practices.
Good teams will earn many multiples of their cost in terms of savings or increased earnings.
There may be increased competition for the low-cost “just meet the legal compliance requirements” offerings, but any business that makes money and wants to make more will gladly spend more than the minimum for better service.
He does 100 units of product per 100 units of time.
80 units of time on data entry
20 units of time on “thinking”
We now automatise the task in such a way that ratios flip:
So now we do 20 units of time for 100 products. Let’s assume we use same thinking as before of 20. So we use 40 units of time to produce 100 units of product.
Now let’s assume it’s linear growth:
We use 40 units of time for each task and we produce 200 units of product for 80 units of time.
Let’s now do 50 units of time for each and produce 250 units of product with same time as before. It’s definitely not the same.
you either work 40 and produce the same or work the same and produce 250. NOT THE SAME
The desktop PC was the same - everyone said that it was going to wipe out jobs, when the main thing it wiped out was filing cabinets.
AI commentators seem to overlook that one of the primary functions of capitalism is to keep people in busywork: what David Graber called Bullshit Jobs. So AI is going to automate most of the bullshit away but the bullshit employees will keep working, because there wasn’t much need for them in the first place.
You are describing in cases where small businesses have little headcount and cant shrink any further.
But in a much bigger picture AI is akin to what Excel did to a building of people doing accounting and bookkeeping. Except at the time there were plenty of opportunities for those people doing different thing in the market. Something that economists constantly burp about.
I dont see this now. For whatever reason the economy has so much more bullshit job than those days, despite computer and technology we have far more administration hurdle and employees than before. And 70% of those will go away in the next 5 years. We automated those needless complexity. It isn't clear to me in a world today where many jobs are specialised, there is enough time and room for them to relearn the skills required for other job opportunities, if there are that many to fill the ones who were laid off.
Accountants will still exist, but we'll need fewer of them at any given time. In your example of flipping the 80/20 ratio, you are implying that each accountant would be able to (theoretically) handle a 5x workload with AI making up the gap.
Perhaps in reality more like a 3x advantage, due to human inefficiencies and the overhead of scaling the business to handle more clients.
Given that, 3x increase of productivity implies we either need 1/3 the accountants, or the accountancy supply brings down prices and more clients start hiring accountants due to affordability.
If AI tools worked, they would eliminate the bookkeepers. Their job is data entry and validation.
But bookkeeping is extremely important. Bad bookkeeping has killed more companies than bad accounting. Without proper books, the accounting, finance, and tax teams are just cosplaying.
> And the people who were good at the thinking part but slow at data entry are suddenly the most valuable people in the room.
No, they aren't. They are now competing with everyone - the slow thinkers, the barely-conscious thinkers, the erratic thinkers, the "unable to reach a conclusion" thinkers as well as the people quick at "data entry", with the caveat that the people quick at "data entry" are almost certainly going to be better thinkers than those that weren't quick at data entry.
IOW, you think AI isn't coming for some specific class of programmers, but you are wrong. You and the "other types" will continue this debate in the soup kitchen.
Yeah bro, its been three years. We are just beginning. We will replace the vast majority of professional service workers in 10 years including lawyers as Ai shifts to local and moves away from the cloud.
If we wipe out the vast majority of white collar jobs in just 10 years, we’re talking complete economic collapse.
No society can possibly absorb that kind of disruption over such a short time.
Also even assuming AI could completely replace lawyers. Lawyers control the legislature. They may not be able to stop your local model from telling you how to do something, but they can stop you from actually doing it without a lawyer.
Even subway train operators in NYC, whose job can be safely automated away, and has been for like 20 years, were able to legally mandate their jobs. I bet lawyers will, too. But the numbers of junior partners, and of paralegals, will dwindle.
Correct, which is why we will have the first worldwide revolution as people realize their democracies are fake, they are simply enslaved by capitalists; which is exactly what they told us Commies would do.
The chances of all of those revolutions not touching off world war 3 and decimating infrastructure and trade to the point that we can’t produce the chips to run AI is what now?
I'm glad we have intelligent, mature, uncorrupted politicians who will be able to work together to make sure that this doesn't cause a depression so profound that the entire economy ceases to be viable.
Solo founder here building B2B SaaS in a niche vertical (accounting automation). My take: your moat at pre-seed isn't IP, it's domain depth.
I decided early not to patent anything. Not because our technical approach isn't novel - we have some genuinely interesting pattern matching pipelines - but because the defensibility comes from accumulated domain knowledge that's almost impossible to replicate quickly. We've hand-built logic for 16 different VAT code classifications, edge cases in how platforms handle API fields differently, quirks in how bank statement descriptions vary across hundreds of merchants. None of that is patentable. All of it is incredibly hard to reproduce.
A competitor could read a patent filing and build a workaround in months. They can't shortcut two years of discovering that one accounting platform silently ignores a field that every other platform respects, or that merchant descriptions follow completely different patterns at different transaction amounts.
The practical concern with provisionals is the 12-month clock. If you file one, you've committed to either converting to a full patent (expensive, time-consuming) or letting it lapse. At pre-seed, that's a decision you're forcing on your future self at potentially the worst possible time.
If your "core technical approaches" are truly about AI model architecture, that's a different conversation - but if it's mostly about how you apply models to a specific domain, I'd argue your time is better spent going deeper into the domain than drafting patent claims.
The career incentive thing is spot on too. Nobody writes "migrated team from a complex internal system to a shared spreadsheet" on their CV. But "architected and shipped a real-time collaborative data platform" looks great, even if it does the same thing worse.
reply