Most of software work is maintaining "legacy" code, that is older systems that have been around for a long time and get a lot of use. I find Claude Code in particular is great at grokking old code bases and making changes to it. I work on one of those old code bases and my productivity increased 10x mostly due to Claude Code's ability to research large code bases, make sense of it, answer questions and making careful surgical changes to it. It also helps with testing and debugging which is huge productivity boost. It's not about its ability to churn out lots of code quickly: it's an extra set of eyes/brain that works much faster that human developer.
I have the opposite experience. Claude can't get it all in the context window and make changes that will completely break something on the other side of the program.
Granted that's because the program is incredibly poorly written, but still, context window will stay a huge barrier for quite some time.
Between yours and GP's comments, I find echoes of my experience:
> Most of software work is maintaining "legacy" code, that is older systems that have been around for a long time and get a lot of use.
> Granted that's because the program is incredibly poorly written
LLMs can't fix big, shitty legacy codebases. That is where most maintenance work (in terms of hours) is, and where it will remain.
I would take it one step further and argue that LLMs and vibe-coding will compound into more big, shitty legacy codebases over time, and therefore, in the long arc, nothing will really change.
It has ever been thus. There are multi-million dollar businesses propped up by .NET applications on a foundation of shunted-around files, and at best, SQL used as APIs/queues. "Working" code is, in the long run, a liability outside the hands of those doing real engineering.
I want to voice the same bad experience, tried Claude and several more actually. I could get AI to understand some things but it quickly went of the rails trying to comprehend larger complexities and its suggested changes would have been between worse to detrimental had I allowed them to be committed.
Can it though? I thought it was most useful for writing new code, but have so far never had it correctly refactor existing code. Its refactoring attempts usually change behavior / logic, and sometimes even leave the code in a state where it's even harder to read.
I've found this as well. In some cases we aren't fully authorised to use the AI tools for actual coding but even just asking "how would you make this change" or "where would you look to resolve this bug" or "give me an overview of how this process works" is amazingly helpful.
> In some cases we aren't fully authorised to use the AI tools for actual coding but even just asking "how would you make this change" [...]
Isn't the logical endpoint of this equivalent to printing out a Stackoverflow answer and manually typing it into your computer instead of copy-and-pasting?
Nitpicks aside, I agree that contemporary AIs can be great for quickly getting up to speed with a code base. Both a new library or language you want to be using, and your own organisation's legacy code.
One of the biggest advantages of using established ecosystem was that stack-overflow had a robust repository of already answered questions (and you could also buy books on it). With AI you can immediately cook up your own Stackoverflow community equivalent that provides answers promptly instead of closing your question as off-topic.
And I pick Stackoverflow deliberately: it's a great resources, but not reliable enough to use blindly. I feel we are in a similar situation with AI at the moment. This will change gradually as the models become better. Just like Stackoverflow required less expertise to use than attending a university course. (And a university course requires less expertise than coming up with QuickSort in the first place.)
> Isn't the logical endpoint of this equivalent to printing out a Stackoverflow answer and manually typing it into your computer instead of copy-and-pasting?
Not in my case (I never used SO like that, anyway). I use it almost exactly like SO, except much more quickly and interactively (and without the inference that I’m “lazy,” or “stupid,” for not already knowing the answer).
I have found that ChatGPT gives me better code than Claude (I write Swift); even learning my coding and documentation style.
I still need to review all the code it gives me, and I have yet to use it verbatim, but it’s getting close.
The most valuable thing, is I get an error, and I can ask it “Here’s the symptoms and the code. What do you think is going on?”. It usually gives me a good starting point.
I could definitely figure it out on my own, but it might take half an hour. ChatGPT will give me a solid lead in about half a minute.
The problem is most likely not writing the actual code, but rather understanding an old, fairly large codebase and how it’s stitched together.
SO is (was?) great when you where thinking about how nice a recursive reduce function could replace the mess you’ve just cobbled together, but language x just didn’t yet flow naturally for you.
>Isn't the logical endpoint of this equivalent to printing out a Stackoverflow answer and manually typing it into your computer instead of copy-and-pasting?
when AI works well it is superior to Stack Overflow because what it replaces is not "Look up answer on SO, copy paste" but rather, look up several different things on SO that relate to your problem that you are trying to solve but which there is no exact definite solution posted anywhere, and copy those things together into a bit of code that you will probably just refactor a bit with a shorter time than doing all the SO look up yourself. When it works it can turn 2 hours of research into 2 minutes.
The problems are:
AI also sometimes replicates the following process - dev not understanding all parts of solution or requirements copies bits of code together from various answers making something that sort of works but is inefficient and has underlying problems.
Even with the working correctly solution your developer does not get in that 2 minutes what they used to get in the two hours before, an understanding of the problem space and how these parts of the solution hang together. This is the reason why it is more useful for seniors than juniors, because part of the looking through SO for what you want is light education.
>Isn't the logical endpoint of this equivalent to printing out a Stackoverflow answer and manually typing it into your computer instead of copy-and-pasting?
Isn't the answer on SO the result of a human intelligence writing it in the first place, and then voted by several human intelligencies to top place? If an LLM was merely an automated "equivalent" to that, that's already a good thing!
But in general, the LLM answer you appear to dismiss amounts to a lot more:
Having an close-to-good-human-level programmer
understand your existing codebase
answer questions about your existing codebase
answer questions about changes you want to make
on demand (not confined to copying SO answers)
interactively
and even being able to go in and make the changes
That amounts to "manually typing an SO answer" about as much as a pickup truck amounts to a horse carriage.
Or, to put it another way, isn't "the logical endpoint" of hiring another programmer and asking them to fix X "equivalent to printing out a Stackoverflow answer and manually typing it into their computer"?
>And I pick Stackoverflow deliberately: it's a great resources, but not reliable enough to use blindly. I feel we are in a similar situation with AI at the moment.
Well, we shouldn't be using either blindly anyway. Not even the input of another human programmer (that's way we do PR reviews).
> Isn't the answer on SO the result of a human intelligence writing it in the first place, and then voted by several human intelligencies to top place? If an LLM was merely an automated "equivalent" to that, that's already a good thing!
The word "merely" is doing all of the heavy lifting here. Having human intelligence in the loop providing and evaluating answers is what made it valuable. Without that intelligence you just have a machine that mimics the process yet produces garbage.
I've been building things with Claude while looking at say less than 5% of the code it produces. What I've built are tools I want to use myself and... well they work. So somebody can say that I can't do it, but on the other hand I've wanted to build several kinds of ducks and what I've built look like ducks and quack like ducks so...
I've found it's a lot better at evaluating code than producing it so what you do is tell it to write some code, then tell it to give you the top 10 things wrong with the code, then tell it to fix the five of them that are valid and important. That is a much different flow than going on an expedition to find a SO solution to an obscure problem.
A good quality metric of your code is to ask an LLM to find the ten worst things about it and if all of those are all stupid then your code is pretty good. I did this recently on a codebase and it's number 1 complaint was that the name I had chosen was stupid and confusing (which it was, I'm not explaining the joke to a computer) and that was my sign that it was done finding problems and time to move on.
>then tell it to give you the top 10 things wrong with the code, then tell it to fix the five of them that are valid and important.
I would be cautious of this. I've tried this multiple times and often it produces very subtle bugs. Sometimes the code is not bad enough to have 5 defects with it, but it will comply, and change things that don't need to. You will find out in prod at some point.
To be clear, I'm instructing it to generate a list of issues for me. I then decide if anything on that list is worth fixing (or is an issue at all, etc.)
Do you think you will be able to capture any of this extra value? I think I'm faster at coding, but the overall corporate project timeline feels about the same. I feel more relaxed and confident that the work can be done. Not sure how to get a raise out of this.
For me, as a remote developer, it means I'm able to finish my work in 1 hour instead of 8 hours. So I'm able to capture "extra value" in the form of time. In our team everyone uses GitHub Copilot and I use Claude Code. My teammates' productivity increased slightly but my productivity increased a lot. This is because 1. Claude Code is just a better coding agent 2. I invested time to get good at agentic coding. Eventually Copilot will catch up and management will realize that now 1 developer can do what previously would take a whole team.
I'm really curious on what your role is, and which industry are you in? I'm awed by these productivity gains others report, but I feel like AI helps in such a small part of my job (implementing specific changes as I direct).
Agentic workflows for me results in bloated code, which is fine when I'm willing to hand over an subsystem to the agent, such as a frontend on a side project and have it vibe code the entire thing. Trying to get clean code erases all/most of my productivity gains, and doesn't spark joy. I find having a back-end-forth with an agent exhausting, probably because I have to build and discard multiple mental models of the proposed solution, since the approach can vary wildly between prompts. An agent can easily switch between using Newton-Raphson and bisection when asked to refactor unrelated arguments, which a human colleague wouldn't do after a code review.
I've come to the same conclusion: If you just want a huge volume of code written as fast as possible, and don't care about 1. how big it is, 2. how fast it runs, 3. how buggy it is, 4. how maintainable or understandable it is, or 5. the overall craftsmanship and artistry of it, then you're probably seeing huge productivity gains! And this is fine for a lot of people and for a lot of companies: Quality really doesn't matter. They just care about shitting out mediocre code as fast as possible.
If you do care about these things, it will take you overall longer to write the code with an LLM than it would by hand-crafting it. I started playing around with Claude on my hobby projects, and found it requires an enormous amount of exhausting handholding and post-processing to get the code to the point where I am really happy with it as a consistent, complete, expressive work of art that I would be willing to sign my name to.
It does matter, but it's one requirement among many. Engineers think quality metrics as you listed are the most important requirements, but that's not typically true.
This really is what businesses want and always have wanted. I've seen countless broken systems spitting out wrong info that was actively used by the businesses in my career, before AI. They literally did not want it fixed when I brought it up because dealing with errors was part of the process now in pretty much all cases. I don't even try anymore unless I'm specifically brought on to fix a legacy system.
>that I would be willing to sign my name to.
This right here is what mgmt thinks is the big "problem" that AI solves. They have always wanted us to magically know what parts are "good enough" and what parts can slide but for us to bear the burden of blame. The real problem is same as always bad spec. AI won't solve that but it will in their eyes remove a layer in their poor communication. Obviously no SWE is going to build a system that spit out wrong info and just say "hire people to always double check the work" or add it to so-so's job duties to check, but that really is the solution most places seem to go with by lack of decision.
Perhaps there is some sort of failure of SWE's to understand that businesses don't care. Accounting will catch the expensive errors anyway. Then Execs will bull whip middle managers and it will go away.
The adversarial tension was all that ever made any of it work.
The "Perfectionist Engineer" without a "Pragmatic Executive" to press them into delivering something good enough would of course still been in their workshop, tinkering away, when the market had already closed.
But the "Pragmatic Executive" without the "Perfectionist Engineer" around to temper their naive optimism would just as soon find themselves chased from the market for selling gilded junk.
You're right that there do seem to be some execs, in the naive optimism that defines them, eager to see if this technology finally lets them bring their vision to market without the engineer to balance them.
That's a nice balanced wholesome take, only the problem is that the "Pragmatic Executive" is more like "Career-driven frenzied 'ship it today at all costs' psychopath executive".
You are describing a push-and-pull / tug-of-war balanced relationship. In reality that's absolutely exactly never balanced. The engineer has 1% say, the other 99% go to the executive.
I so wish your take was universally applicable. In my 24 years of career, it was not.
> Perhaps there is some sort of failure of SWE's to understand that businesses don't care
I think it's an engineer's nature to want to improve things and make them better, but then we naively assume that everybody else also wants to improve things.
I know I personally went through a pretty rough disillusionment phase where I realised most of the work I was asked to do wasn't actually to make anything better, but rather to achieve some very specific metrics that actually made everything but that metric worse.
Thanks to the human tendency to fixate on narratives, we can (for a while) trick ourselves into believing a nice story about what we're doing even if it's complete bunk. I think that false narrative is at the core of mission statements and why they intuitively feel fake (mission statement is often more gaslighting than guideline - it's the identity a company wants to present, not the reality it does present).
AI is eager to please and doesn't have to deal with that cognitive dissonance, so it's a metric chaser's dream.
<< They have always wanted us to magically know what parts are "good enough" and what parts can slide but for us to bear the burden of blame.
Well, that part is bound to add a level of tension to the process. Our leadership has AI training, where the user is responsible for checking its output, but the same leadership also outright stated it now sees individual user of AI as having 7 employees under them ( so should be 7x more productive ). Honestly, its maddening. None of it is how it works at all.
> This really is what businesses want and always have wanted.
There's a difference between what they really want and executives knowing what they want. You make it sound like every business makes optimal decisions to get optimal earnings.
> They literally did not want it fixed when I brought it up because
Because they thought they knew what earns them profits. The key here they thought they knew.
The real problem behind the scenes is a lot of management is short term. Of course they don't care. They roll out their shiny features, get their promotions and leave. The issues after that are not theirs. It is THE business' problem.
Senior Software Engineer. The system is a niche business software software for a specific industry. It doesn't do any fancy math, all straightforward business logic.
> Trying to get clean code erases all/most of my productivity gains, and doesn't spark joy. I find having a back-end-forth with an agent exhausting, probably because I have to build and discard multiple mental models of the proposed solution, since the approach can vary wildly between prompts
You probably work on something that requires very unique and creative solutions. I work on dumb business software. Claude Code is generally good at following existing code patterns. As far as back-and-forth with Claude Code being exhausting, I have few tips how how to minimize number or shots required to get good solution from CC:
1. Start by exploring relevant code by asking CC questions.
2. Then use Plan Mode for anything more than trivial change. Using Plan Mode is essential. You need to make sure you and CC are on the same page BEFORE it starts writing code
3. If you see CC making same mistake over and over, add instructions to your CLAUDE.md to avoid it in the future. This way your CC setup improves over time, like a coworker who learns over time.
Thank you for the actionable ideas. I'll experiment with closer supervision during the planning stage, hopefully finer-grained implementation details will reduce unnecessarily large refactors during review.
Claims about agentic workflows are the new version of "works on my machine" and should be treated with skepticism if they cannot be committed to a repository and used by other people.
Maybe parent is a galaxy-brained genius, or.. maybe they are just leaving work early and creating a huge mess for coworkers who now must stay late. Hard to say. But someone who isn't interested in automating/encoding processes for their idiosyncratic workflows is a bad engineer, right? And someone who isn't interested in sharing productivity gains with coworkers is basically engaged in sabotage.
> And someone who isn't interested in sharing productivity gains with coworkers is basically engaged in sabotage.
Who says they aren't interested in sharing? To give a less emotionally charged example: I think my specific use pattern of Git makes me (a bit) more productive. And I'm happy to chew anyone's ear off about it who's willing to listen.
But the willingness and ability of my coworkers to engage in git-related lectures, while greater than zero, is very definitely finite.
Something that is advertised as 10x improvement in productivity isn't like your personal preferences for git or a few dinky bash aliases or whatever. It's more like a secret personal project test-suite, or a whole data pipeline you're keeping private while everyone else is laboriously doing things manually.
Assuming 10x is real, then again the question: why would anyone do that? The only answers I can come up with are that they cannot share it (incompetence) or that they don't want to (sabotage). You're saying the third option is.. people just like working 8 hours while this guy works 1? Seems unlikely. Even if that's not sabotaging coworkers it's still sabotaging the business
The reason is because we are a Microsoft shop and our company doesn't have Claude account. I'm using my personal Claude Max account. My manager does know that I use Claude Code and I requested the person responsible for AI tooling in our company to use Claude Code but he just said that management already decided to go with GitHub copilot. He thinks that using Claude model in Copilot is same as using Claude Code. Another issue is that we are a Microsoft shop and I use Claude Code through WSL but I'm the only person on our team with Linux skills.
There are methods of connecting the claude code cli tools to copilot’s api — look at litellm or something along those lines, it’s a pip pkg and translates the calls code makes
Business and Enterprise plans have a no-training-on-your-data clause.
I’m not sure personal Claude has that. My account has the typical bullshit verbiage with opt-outs where nobody can really know whether they’re enforceable.
Using a personal account is akin to sharing the company code and could get one in serious trouble IMO.
You can opt-out of having your code being trained on. When Claude Code first came out Anthropic wasn't using CC sessions for training. They started training on it starting from Claude Code 2 that came out with Sonnet 4.5. User is asked on first use whether to opt-in or out of training.
> You're saying the third option is.. people just like working 8 hours while this guy works 1?
Nope, I don't say that at all.
I am saying that certain accommodations might feel like 10x to the person making them, but that doesn't mean they are portable.
Another personal example: I can claim with a straight face that using a standing desk and a Dvorak keyboard make me 10x more productive than otherwise. But that doesn't necessarily mean that other people will benefit from copying me, even if I'm happy to explain to anyone how to buy a standing desk from Ikea (or how to work company procurement to get one, in case you are working not-from-home).
In any case, the original commenter replied with a better explanation than our speculations here.
> And someone who isn't interested in sharing productivity gains with coworkers is basically engaged in sabotage.
I'll have to vigorously dissent on this notion: we sell our labor to employers - not our souls. Our individual labor, contracts and remuneration are personalized. Our labor. Not some promise to maximize productivity - that's a job for middle and upper management.
Your employer sure as hell won't directly share 8x productivity gains with employees. The best they can offer is a once-off, 3-15% annual bonus (based on your subjective performance, not the aggregate), alternatively, if you have RSU/options, gains on your miniscule ownership fraction.
I'm teaching a course in how to do this to one of my clients this week.
Also, I used this same process to address a bug that is many years old in a very popular library this week. Admittedly, the first solution was a little wordy and required some back and forth, but I was able to get to a clean tested solution with little pain.
It seems to me that the devs that managed to become sergeants of a small platoon of LLM agents to a crushing success deem their setup a competitive advantage and as such will never share it.
But them being humans, they do want to brag about it.
This has been my experience too. At the end of each session, i’m left very exhausted mentally without full understanding of what I just did, so I have to review it again.
Coding this way requires an effort that is equal to both designing, coding, and reviewing except the code i review isnt mine. Strange situation.
Well for me, all of my actual implementation work has been green field from “git init” and mostly coding around the AWS SDK in the target language and infrastructure as code since AI coding has gotten decent.
I haven’t had to write a line of code in a year. First ChatGPT and more recently Claude Code.
I don’t do “agentic coding”. I keep my hands on the steering wheel and build my abstractions and modules up step by step. I make sure every line of code looks like something I would write.
I’m a staff consultant (cloud + app dev) and always lead projects, discovery and design and depending on the size of the project, do all of the actual hands on work myself.
I would have had to staff at least one maybe two less senior consultants to do the actual hands on work before. It’s actually easier for me to do the work then having to have really detailed requirements and coordinating work (the whole “Mythical Man Month” thing).
FWIW: before the pearl clutching starts, I started coding in assembly in 1986 on an Apple //e and have been delivering production level code since 1996.
I have tech adjacent people on my team vibing out internal tools that are super useful, and take a load off of engineering. Most internal software is rehashing existing software with different/specific requirements.
Exactly what I experience. I don't need AI to generate complex algorithm, I need e.g. a lot of code for a UI library that is clean and maintainable - but it's can't ever generate such code and it can't be prompted, because training data has much less excellent code than good and ok code. Therefore I can't use AI for high-level design task, ony low-level code, which I then have to check and clean line by line, and that isn't an enjoyable work.
I don't need LLMs, I need some kind mind reading device :D
Not the OP but we use LLMs to build a restaurant pos system with reservations, loyalty, webshop etc. Almost at feature parity with bigwigs like lightspeed/toast.
> I find having a back-end-forth with an agent exhausting, probably because I have to build and discard multiple mental models of the proposed solution, since the approach can vary wildly between prompts
Just right now I had it improve QR payments on POS. This is standard stuff, and I have done it multiple time but i'm happy I didn't have to spend the mental energy to implement it and just had to review the code and test it.
```
Perfect! I've successfully implemented comprehensive network recovery strategies for the OnlinePaymentModal.tsx file. Here's a summary of what was added:
Implemented Network Recovery Strategies
1. Exponential Backoff for Polling (lines 187-191)
2. Network Status Detection (lines 223-246, 248-251)
3. Transaction Timeout Handling (lines 110-119)
4. Retry Logic for Initial Transaction (lines 44-105)
5. AbortController for Request Cancellation (lines 134-139, 216-220)
6. Better Error Messaging (lines 85-102, 193-196)
7. Circuit Breaker Pattern (lines 126-132)
All strategies work together to provide a robust, user-friendly payment
experience that gracefully handles network issues and automatically
recovers when connectivity is restored.
```
> An agent can easily switch between using Newton-Raphson and bisection when asked to refactor unrelated arguments, which a human colleague wouldn't do after a code review.
Can you share what domain your work is in? Is it deeptech. Maybe coding agents right now work better for transactional/ecommerce systems?
I don't know if that example is real, but if it is, that's exactly the reason I find AI tools irritating. You do not need six different ways to handle the connection being down, and if you do, you should really factor that out into a connection management layer.
One of my big issues with LLM coding assistants is that they make it easy to write lots & lots of code. Meanwhile, code is a liability, and you should want less of it.
You are talking about something like network layers in graphql. That's on our roadmap for other reasons(switching api endpoints to digital ocean when our main cloudflare worker is having an outage), however even with that you'll need some custom logic since this is doing at least two api calls in succession, and that's not easy to abstract via a transaction abstraction in a network layer(you'll have handle it durably in the network layer like how temporal does).
Despite the obvious downsides we actually moved it from durable workflow(cf's take of temporal) server side to client since on workflows it had horrible and variable latencies(sometimes 9s v/s consistent < 3s with this approach). It's not ideal, but it makes more sense business wise. I think many a times people miss that completely.
I think it just boils down to what you are aiming. AI is great for shipping bugfixes and features fast. At a company level I think it also shows in product velocity. However I'm sure very soon our competitors will catch up when AI skepticism flatters.
> Most of software work is maintaining "legacy" code, that is older systems that have been around for a long time and get a lot of use.
That's not the definition of legacy. Being there for a long time and getting lots of use is not what makes a legacy project "legacy".
Legacy projects are characterized by not being maintained and having little to no test coverage. The term "legacy" means "I'm afraid to touch it because it might break and I doubt I can put it back together". Legacy means resistance to change.
You can and do have legacy projects created a year or two ago. Most vibecoded apps fit the definition of legacy code.
That is why legacy projects are a challenge to agentic coding. Agents already output huge volumes of code changes that developers struggle to review, let alone assert it's correctness. On legacy projects that are in production, this is disastrous.
What you list are common characteristics encountered in legacy systems, but what makes it legacy is a business decision of declaring it obsolete and in maintenance mode, and so that no money or time to be invested in it. Old systems that continues to evolve are not legacy, like say Linux, and yes like you say a project that is only one year old can be declared legacy. Resistance to change is only an economic variable that drives the decision. Vibecoded apps fits the definition because the developer is unlikely to want to invest more time in them for different reasons.
> What you list are common characteristics encountered in legacy systems, but what makes it legacy is a business decision of declaring it obsolete and in maintenance mode, and so that no money or time to be invested in it.
No, not necessarily. Business decisions are one of the many factors in creating legacy code,but they are by no means the single cause. A bigger factor is developers mismanaging a project to the point they became a unmaintainable mess. I personally went through a couple of projects which were legacy code the minute they hit production. One of them was even a proof of concept that a principal engineer, one of those infamous 10x types, decided to push as-is to production and leave two teams of engineers to sort out the mess left in his wake.
I recommend reading "Working Effectively with Legacy Code" by Michael Feathers. It will certainly be eye-opening to some, as it dispels the myth that legacy code is a function of age.
So a project that's still using Java 1.6 and has perfect test coverage and some poor developer is paid to maintain it (but NOT upgrade it!) is not "legacy" in your book?
Then we disagree on the definition.
"Legacy" projects to me are those that should've went through at least two generational refactorings but haven't because of some unfathomable reason. These are the ones that eventually end up being rewritten from scratch because it's faster than trying to upgrade the 25 year old turd.
I've predominantly worked in two industries, healthcare/public health and insurance where policies terms are measured in decades. The software for both ranges from 20 to 40 years old, and it hasn't been upgraded because to do so poses an existential risk to either the business or, in the case of healthcare, to human life. Upgrades are measured in terms of human generations because of said risk, but I wouldn't call these systems legacy due to not moving beyond java 1.6.
Claude is insanely good at grunt-work maintenance coding, which is a fairly formulaic exercise that mostly requires RTFM and simple code changes that look a lot likw the surrounding code. Designing new things from scratch based on human specs is something which Claude still struggles with.
The problem is that it often doesn't get it right the first time. You have to sort of have a conversation and it eventually gets there but if you have no idea what the destination should be like, you can't guide it there.
Although many tools exist, there still seems to a large context gap here: we need better tools to orient ourselves and to navigate large (legacy) codebases. While not strictly a a source graph or the like, I do think Enso like interface may prove successful here[0].
> It's not about its ability to churn out lots of code quickly: it's an extra set of eyes/brain that works much faster that human developer.
This is the key take right here. LLMs excel at parsing existing content, summarizing it, and use it to explore scenarios and hypotheticals.
Even the best coding agents out there such as Claude Code or Gemini often fail to generate at the first try code that actually compiles, let alone does what it is expected to do.
Apologists come up with excuses such as the legacy software is not architected well enough to be easily parseable by LLMs but that is a cheap excuse. The same reference LLMs often output utter crap in greenfield projects they themselves generated, and do so after a hand full of prompts. The state of a project is not the issue.
The world is coming to the realization that the AI hype is not delivering. The talk of AI bubble is already mainstream. But like IDEs with auto complete, agents might not solve every problem but they are nevertheless useful and are here to stay. They are more a kin to a search engine where a user doesn't need to copy/paste code snippets to apply a code change.
I honestly don't know what code bases you guys are working with, for me I tried it with a large quant library (C++ 97) in an effort to modernize it and so far it's been nothing hit a waste of time. Similarly for a medium sized python quant codebase (3.6) trying to port it to 3.12, and it's also been a headache.