Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

To my experience, AIs can generate perfectly good code relatively easy things, the kind you might as well copy&paste from stackoverflow, and they'll very confidently generate subtly wrong code for anything that's non-trivial for an experienced programmer to write. How do people deal with this? I simply don't understand the value proposition. Does Google now have 25% subtly wrong code? Or do they have 25% trivial code? Or do all their engineers babysit the AI and bugfix the subtly wrong code? Or are all their engineers so junior that an AI is such a substantial help?

Like, isn't this announcement a terrible indictment of how inexperienced their engineers are, or how trivial the problems they solve are, or both?



> the kind you might as well copy&paste from stackoverflow

This bothers me. I completely understand the conversational aspect - "what approach might work for this?", "how could we reduce the crud in this function?" - it worked a lot for me last year when I tried learning C.

But the vast majority of AI use that I see is...not that. It's just glorified, very expensive search. We are willing to burn far, far more fuel than necessary because we've decided we can't be bothered with traditional search.

A lot of enterprise software is poorly cobbled together using stackoverflow gathered code as it is. It's part of the reason why MS Teams makes your laptop run so hot. We've decided that power-inefficient software is the best approach. Now we want to amplify that effect by burning more fuel to get the same answers, but from an LLM.

It's frustrating. It should be snowing where I am now, but it's not. Because we want to frivolously chase false convenience and burn gallons and gallons of fuel to do it. LLM usage is a part of that.


What I can't wrap my head around is that making good, efficient software doesn't (by and large) take significantly longer than making bloated, inefficient enterprise spaghetti. The problem is finding people to do it with who care enough to think rigorously about what they're going to do before they start doing it. There's this bizarre misconception popular among bigtech managers that there's some tunable tradeoff between quality and development speed. But it doesn't actually work that way at all. I can't even count anymore how many times I've had to explain how taking this or that locally optimal shortcut will make it take longer overall to complete the project.

In other words, it's a skill issue. LLMs can only make this worse. Hiring unskilled programmers and giving them a machine for generating garbage isn't the way. Instead, train them, and reject low quality work.


> What I can't wrap my head around is that making good, efficient software doesn't (by and large) take significantly longer than making bloated, inefficient enterprise spaghetti. The problem is finding people to do it with who care enough to think rigorously about what they're going to do before they start doing it.

I don't think finding such programmers is really difficult. What is difficult is finding such people if you expect them to be docile to incompetent managers and other incompetent people involved in the project who, for example, got their position not by merit and competence, but by playing political games.


"What I can't wrap my head around is that making good, efficient software doesn't (by and large) take significantly longer than making bloated, inefficient enterprise spaghetti."

In my opinion the reason we get enterprise spaghetti is largely due to requirement issues and scope creep. It's nearly impossible to create a streamlined system without knowing what it should look like. And once the system gets to a certain size, it's impossible to get business buy-in to rearchitect or refactor to the degree that is necessary. Plus the full requirements are usually poorly documented and long forgotten by that time.


When scopes creep and requirements change, simply refactor. Where is it written in The Law that you have to accrue technical debt? EDIT: I'm gonna double down on this one. The fact that your organization thinks they can demand of you that you can magically weathervane your codebase to their changeable whims is evidence that you have failed to realistically communicate to them what is actually possible to do well. The fact that they think it's a move you can make to creep the scope, or change the requirements, is the problem. Every time that happens it should be studied within the organization as a major, costly failure--like an outage or similar.

> it's impossible to get business buy-in to rearchitect or refactor to the degree that is necessary

That's a choice. There are some other options:

- Simply don't get business buy-in. Do without. Form a terrorist cell within your organization. You'll likely outpace them. Or you'll get fired, which means you'll get severance, unemployment, a vacation, and the opportunity to apply to a job at a better company.

- Fight viciously for engineering independence. You business people can do the businessing, but us engineers are going to do the engineering. We'll tell you how we'll do it, not the other way.

- Build companies around a culture of doing good, consistent work instead of taking expedient shortcuts. They're rare, but they exist!


> Fight viciously for engineering independence.

Or simply find a position in an industry or department where you commonly have more independence. In my opinion this fight is not worth it - look for another position instead is typically easier.


>When scopes creep and requirements change, simply refactor.

Congratulations, you just refactored out a use case which was documented in a knowledge base which has been replaced by 3 newer ones since then, happens once every 18 months and makes the company go bankrupt if it isn't carried out promptly.

The type of junior devs who think that making code tidy is fixing the application are the type of dev who you don't let near the heart of the code base, and incidentally the type who are best replaced with code gen AI.


Refactoring is improving the design of existing code. It shouldn't change behavior.

And regardless, the way you prevent loss of important functionality isn't by hoping people read docs that no longer exist. It's by writing coarse-grained tests that makes sure the software does the important things. If a programmer wants to change something that breaks a test like that, they go ask a product manager (or whatever you call yours) if that feature still matters.

And if nobody can say whether a feature still matters, the organization doesn't have a software problem, it has a serious management problem. Not all the coding techniques in the world can fix that.


If you don't understand your systems well enough to comfortably refactor them, you're losing the war. I probably should have put "simply" in scare quotes, it isn't simple--and that's the point. Responding to unreasonable demands, like completely changing course at the 11th hour, shouldn't come at a low price.


It's a market for lemons.

Without redoing their work or finding a way to have deep trust (which is possible, but uncommon at a bigcorp) it's hard enough to tell who is earnest and who is faking it (or buying their own baloney) when it comes to propositions like "investing in this piece of tech debt will pay off big time"

As a result, if managers tend to believe such plans, bad ideas drive out good and you end up investing in a tech debt proposal that just wastes time. Burned managers therefore cope by undervaluing any such proposals and preferring the crappy car that at least you know is crappy over the car that allegedly has a brand new 0 mile motor on it but you have no way of distinguishing from a car with a rolled back odometer. They take the locally optimal path because it's the best they can do.

It's taken me 15 years of working in the field and thinking about this to figure it out.

The only way out is an organization where everyone is trusted and competent and is worthy of trust, which again, hard to do at most random bigcorps.

This is my current theory anyway. It's sad, but I think it kind of makes sense.


Soviet vs NATO. The Soviet management style is micromanaging exactly how to do everything from the rear. The NATO style is delegating to the front line ranks.

Being good at the NATO style of management means focusing on the big picture--what, when, why--and leaving how to the people actually doing it.


Agreed.

The way I explain this to managers is that software development is unlike most work. If I'm making widgets and I fuck up, that widget goes out the door never to be seen again. But in software, today's outputs are tomorrow's raw materials. You can trade quality for speed in the very short term at the cost of future productivity, so you're really trading speed for speed.

I should add, though, that one can do the rigorous thinking before or after the doing, and ideally one should do both. That was the key insight behind Martin Fowler's "Refactoring: Improving the Design of Existing Code". Think up front if you can, but the best designs are based on the most information, and there's a lot of information that is not available until later in a project. So you'll want to think as information comes in and adjust designs as you go.

That's something an LLM absolutely can't do, because it doesn't have access to that flow of information and it can't think about where the system should be going.


> the best designs are based on the most information, and there's a lot of information that is not available until later in a project

This is an important point. I don't remember where I read it, but someone said something similar about taking a loss on your first few customers as an early stage startup--basically, the idea is you're buying information about how well or poorly your product meets a need.

Where it goes wrong is if you choose not to act on that information.


For sure. Or, worse, choose to run a company in such a way that anybody making choices is insulated from that information.


It's relatively easy to find a programmer(s) who can realize enterprise project X, it's hard to find a programmer(s) who cares about X. Throwing an increased requirement like speed at it makes this worse because it usually ends up burning out both ends of the equation.


> The problem is finding people to do it with who care enough to think rigorously

> ...

> train them, and reject low quality work.

I agree very strongly with both of these points.

But I've observed a truth about each of them over the last decade-plus of building software.

1) very few people approach the field of software engineering with anything remotely resembling rigor, and

2) there is often little incentive to train juniors and reject subpar output (move fast and break things, etc.)

I don't know where this takes us as an industry? But I feel your comment on a deep level.


> 1) very few people approach the field of software engineering with anything remotely resembling rigor

This is a huge problem. I don't know where it comes from, I think maybe sort of learned helplessness? Like, if systems are so complex that you don't believe a single person can understand it then why bother trying anyway? I think it's possible to inspire people to not accept not understanding. That motivation to figure out what's actually happening and how things actually work is the carrot. The stick is thorough, critical (but kind and fair) code--and, crucially, design--review, and demanding things be re-done when they're not up to par. I've been extremely lucky in my career to have had senior engineers apply both of these tools excellently in my general direction.

> 2) there is often little incentive to train juniors and reject subpar output (move fast and break things, etc.)

One problem is our current (well, for years now) corporate culture is this kind of gig-adjacent-economy where you're only expected to stick around for a few years at most and therefore in order to be worth your comp package you need to be productive on your first day. Companies even advertise this as a good thing "you'll push code to prod on your first day!" It reminds me of those scammy books from when I was a kid in the late 90s "Learn C In 10 Days!".


> This is a huge problem. I don't know where it comes from

I think it's a bunch of things, but one legitimate issue is that software is stupidly complex these days. I had the advantage of starting when computers were pretty simple and have had a chance to grow along with it. (And my dad started when you could still lift up the hood and look at each bit. [1])

When I'm working with junior engineers I have a hard time even summing up how many layers lie beneath what they're working on. And so much of what they have to know is historically contingent. Just the other day I had to explain what LF and CR mean and how it relates to physical machinery that they probably won't see outside of a museum: https://sfba.social/@williampietri/113387049693365012

So I get how junior engineers struggle to develop a belief that the can sort it all out. Especially when so many people end up working on garbage code, where little sense is to be had. It's no wonder so many turn to cargo culting and other superstitious rituals.

[1] https://en.wikipedia.org/wiki/Magnetic-core_memory


I agree as well. These are actually things that bother me a lot about the industry. I’d love to write software that should run problem-free in 2035, but the reality is almost no one cares.

I’ve had the good fortune of getting to write some firmware that will likely work well for a long time to come, but I find most things being written on computers are written with (or very close to) the minimum care possible in order to get the product out. Clean up is intended but rarely occurs.

I think we’d see real benefits from doing a better job, but like many things, we fail to invest early and crave immediate gratification.


> very few people approach the field of software engineering with anything remotely resembling rigor, and

I have this one opinion which I would not say at work:

In software development it's easy to feel smart because what you made "works" and you can show "effects".

- Does it wrap every failable condition in `except Exception`? Uhh, but look, it works.

- Does it define a class hierarchy for what should be a dictionary lookup? It works great tho!

- Does it create a cyclic graph of objects calling each other's methods to create more objects holding references to the objects that created them? And for what, to produce a flat dictionary of data at end of the day? But see, it works.

this is getting boring, maybe just skip past the list

- Does it stuff what should be local variables and parameters in self, creating a big stateful blob of an object where every attribute is optional and methods need to be called in the right order, otherwise you get an exception? Yes, but it works.

- Does it embed a browser engine? But it works!

The programmer, positively affirmed, continues spewing out crap, while the senior keep fighting fires to keep things running, while insulating the programmer from the taste of their own medicine.

But more generally, it's hard to expect people to learn how to solve problems simply if they're given gigantic OO languages with all the features and no apparent cost to any of them. People learn how to write classes and then never learn get good at writing code with a clear data flow.

Even very bright people can get fall for this trap because engineering isn't just about being smart but about using intelligence and experience to solve a problem while minmaxing correctly chosen properties. Those properties should generally be: dev time, complexity (state/flow), correctness, test coverage, ease of change, performance (anything else?). Anyway, "Affirming one's opinions about how things should be done" isn't one of them.


The whole one about the stateful blob of an object with all optional attributes got me real good. Been fighting that for years. But the dev that writes this produces code faster than me and understands parts of the system no one else does and doesn't speak great English, so it continues. And the company is still afloat. So who's right in the end? And does it matter?


I don't know who's right but I know that it's the ergonomics of programming languages that make producing stateful blobs fast and easy that are in the wrong.


You know it's a problem when you have to read a book having couple hundred pages to learn how to hold it right ;)


<< Instead, train them, and reject low quality work.

Ahh, well, in order to save money, training is done via an online class with multiple choice questions, or, if your company is like mine and really committed to making sure that you know they take your training seriously, they put portions of a generic book on 'tech Z' in pdf spread spread over a drm ridden web pages.

As for code, that is reviewed, commented and rejected by llms as well. It is used to be turtles. Now it truly is llms all the way down.

That said, in a sane world, this is what should be happening for a company that actually wants to get good results over time .


> The problem is finding people to do it with who care enough to think rigorously about what they're going to do before they start doing it.

There is no incentive to do it. I worked that way, focused on quality and testing and none of my changes blew up in production. My manager opined that this approach is too slow and that it was ok to have minor breakages as long as they are fixed soon. When things break though, it's blame game all around. Loads of hypocrisy.


"Slow is smooth and smooth is fast"


It's true every single time.


we've decided we can't be bothered with traditional search

Traditional search (at least on the web) is dying. The entire edifice is drowning under a rapidly rising tide of spam and scam sites. No one, including Google, knows what to do about it so we're punting on the whole project and hoping AI will swoop in like deus ex machina and save the day.


Maybe it is naive but I think search would probably work again if they could roll back code to 10 or 15 years ago and just make search engines look for text in webpages.

Google wasn’t crushed by spam, they decided to stop doing text search and build search bubbles that are user specific, location-specific, decided to surface pages that mention search terms in metadata instead of in text users might read, etc. Oh yeah, and about a decade before LLMs were actually usable, they started to sabotage simple substring searches and kind of force this more conversational interface. That’s when simple search terms stopped working very well, and you had to instead ask yourself “hmm how would a very old person or a small child phrase this question for a magic oracle”

This is how we get stuff like: Did you mean “when did Shakespeare die near my location”? If anyone at google cared more about quality than printing money, that thirsty gambit would at least be at the bottom of the page instead of the top.


I remember in like 5th grade rural PA schools learning about Boolean operators in search engines and falling in love with them. For context, they were presenting alta vista and yahoo kids search as the most popular with Google being a "simple but effective new search platform" we might want to check out.

By the time I graduated highschool you already couldn't trust that Boolean operators would be treated literally. By the time I graduated college, they basically didn't seem to do anything, at best a weak suggestion.

Nowadays quotes don't even seem to be consistently honored.


Even though I miss using boolean operators in search, I doubt that it was ever sustainable outside of specialized search engines. Very few people seem to think in those terms. Many of those who do would still have difficulty forming complex queries.

I suspect the real problem is that search engines ceased being search engines when they stopped taking things literally and started trying to interpret what people mean. Then they became some sort of poor man's AI. Now that we have LLMs, of course it is going to replace the poor excuse for search engines that exist today. We were heading down that road already, and it actually summarizes what is out there.


People were learning. Just like with mice and menus, people are capable of learning new skills and querying search engines was one. I remember when it was considered a really "n00b" thing to type a full question into a search engine.

Then Google decided to start enforcing that, because they had this idea that they would be able to divine your "intent" from a "natural question" rather than just matching documents including your search terms.


> just make search engines look for text in webpages.

Google’s verbatim search option roughly does that for me (plus an ad blocker that removes ads from the results page). I have it activated by default as a search shortcut.

(To activate it, one can add “tbs=li:1” as a query parameter to the Google search URL.)


To me the stupidest thing was the removal of things like + and -. You can say it's because of Google+ but annoyingly duckduckgo also doesn't seem to honor it. Kagi seems to and I hope they don't follow the others down the road of stupid


> ?tbs=li:1

Thank you, this is almost life-alteringly good to know.


Funny, I can’t even test this because I’d need to know another neat trick to get my browser to let me actually edit the URL.

Seems that Firefox on mobile allows editing the url for most pages, but on google search results pages, the url bar magically turns into a did-you-mean alternate search selector where I cannot see nor edit a url. Surprised but not surprised.

Sure, there’s a work around for this too, somehow. But I don’t want to spend my life collecting and constantly updating a huge list of temporary hacks to fix things that others have intentionally broken.


You can select verbatim search manually on the Google results page under Search tools > All results > Verbatim. You can also have a bookmark with a dummy search activating it, so you can then type your search terms into the Google search field instead of into the address bar.

Yes, it’s annoying that you can’t set it as the default on Google search itself.


Wow what? Thanks!


> Maybe it is naive but I think search would probably work again if they could roll back code to 10 or 15 years ago and just make search engines look for text in webpages.

Even more naive, but my personal preference: just ban all advertising. The fact that people will pay for ChatGPT implies people will also pay for good search if the free alternative goes away.


It's working for Kagi


Google results are not polluted with spam because Google doesn't know how to deal with it.

Google results are polluted with spam because it is more profitable for Google. This is a conscious decision they made five years ago.


because it is more profitable for Google

Then why are DuckDuckGo results also (arguably even more so) polluted with spam/scam sites? I doubt DDG is making any profit from those sites since Google essentially owns the display ad business.


Ddg is actually Bing. Search as a service.


And Bing is google.


If you own the largest ad network that spam sites use and own the traffic firehose, pointing the hose at the spam sites and ensuring people spend more time clicking multiple results that point to ad-filled sites will make you more money.

Google not only has multiple monopolies, but a cut and dry perverse incentive to produce lower quality results to make the whole session longer instead of short and effective.


I personally think a big problem with search is major search engines try to be all things to all people and hence suffer as a result.

For example: a beginner developer is possibly better served by some SEO-heavy tutorial blog post; an experienced developer would prefer results weighted towards the official docs, the project’s bug tracker and mailing list, etc. But since less technical and non-technical people vastly outnumber highly technical people, Google and Bing end up focusing on the needs of the former, at the cost of making search worse for the later.

One positive about AI: if an AI is doing the search, it likely wants the more advanced material not the more beginner-focused one. It can take more advanced material and simplify it for the benefit of less experienced users. It is (I suspect) less likely to make mistakes if you ask it to simplify the more advanced material than if you just gave it more beginner-oriented material instead. So if AI starts to replace humans as the main clients of search, that may reverse some of the pressure to “dumb it down”.


> But since less technical and non-technical people vastly outnumber highly technical people, Google and Bing end up focusing on the needs of the former, at the cost of making search worse for the later.

I mostly agree with your interesting comment, and I think your analysis basically jives with my sibling comment.

But one thing I take issue with is the idea that this type of thing is a good faith effort, because it’s more like a convenient excuse. Explaining substring search or even include/exclude ops to children and grandparents is actually easy. Setting preferences for tutorials vs API docs would also be easy. But companies don’t really want user-directed behavior as much as they want to herd users to preferred content with algorithms, then convince the user it was their idea or at least the result of relatively static ranking processes.

The push towards more fuzzy semantic search and “related content” everywhere is not to cater to novice users but to blur the line between paid advertisement and organic user-directed discovery.

No need to give megacorp the benefit of the doubt on stuff like this, or make the underlying problems seem harder than they are. All platforms land in this place by convergent evolution wherein the driving forces are money and influence, not insurmountable technical difficulties or good intentions for usability.


> For example: a beginner developer is possibly better served by some SEO-heavy tutorial blog post

Good luck finding those, you end op with SEO spam and clone page spam. These days you have to look for unobvious hidden meanings which only relate to your exact problem to find what you are looking for.

I have the strong feeling search these days is back to the Altavista era. You'd have to use trickery to find what you were looking for back then as well. Too bad + no longer works in google due to their stupid naming of a dead product (no, literal is not the same and no replacement).


Yeah but this is just the name of the game. How can you even stop SEO style gamification at this point? I’m sure even LLMs are vulnerable/have been trained on SEO bs. End of the day it takes an informed user. Remember back in the day? Don’t trust the internet? I think that mindset will become the main school of thought once again. Which tbh, I think maybe a good thing.


> Traditional search (at least on the web) is dying.

That's not my experience at all. While there are scammy sites, using the search engines as an index instead of an oracle still yields useful results. It only requires to learn the keywords which you can do by reading the relevant materials .


How do you read the relevant materials if you haven’t found them yet? It’s a chicken and egg problem. If your goal is to become an expert in a subject but you’re currently a novice, search can’t help you if it’s only giving you terrible results until you “crack the code.”


AI will make the problem of low quality, fake, fraudulent and arbitrage content way worse. I highly doubt it will improve searching for quality content at all.


But it can't save the day.

The problem with Google search is that it indexes all the web, and there's (as you say) a rising tide of scam and spam sites.

The problem with AI is that it scoops up all the web as training data, and there's a rising tide of scam and spam sites.


There's no way the search AI will beat out the spamgen AI.

Tailoring/retraining the main search AI will be so much more expensive that retraining the spam special purpose AIs.


Without a usable web search index, AI will be in trouble eventually as well. There is no substitute for it.


>The entire edifice is drowning under a rapidly rising tide of spam and scam sites.

You make this claim with such confidence, but what is it based on?

There have always been hordes of spam and scam websites. Can you point to anything that actually indicates that the ratio is now getting worse?


There have always been hordes of spam and scam websites. Can you point to anything that actually indicates that the ratio is now getting worse?

No, there haven't always been hordes of spam and scam websites. I remember the web of the 90s. When Google first arrived on the scene every site on the results page was a real site, not a spam/scam site.


That was PageRank flexing its capability. There were lots of sites with reams of honeypot text that caught the other search engines.


Google could fix the problem if they wanted to, but it’s not in their interests to fix it since the spam sites generally buy ads from Google and/or display Google ads on their spam websites. Google wants to maximize their income, so..


>> No one, including Google, knows what to do about it

I'm sure they can. But they have no incentive. Try to Google an item, and it will show you a perfect match of sponsored ads and some other not-so-relevant non-sponsored results


AI will generate even more spam and scam sites more trivially.


What do you mean “will”, we are a few years past that point.


It took the scam/spam sites a few years to catch up to Google search. Just wait a bit, equilibrium will return.


If only google was trying to solve search rather than shareholdet values.


Kagi has fixed traditional search for me.


Narrator: it did not, in fact, save the day.


Another frustration I have with these models is that it is yet another crutch and excuse for turning off your brain. I was tagged on a PR a couple days ago where a coworker had added a GIN index to a column in Postgres, courtesy of GPT-4o, of course.

He couldn't pronounce the name of the extension, apparently not noticing that trgm == trigram, or what that might even be. Copying the output from the LLM and pasting it into a PR didn't result in anything other than him checking off a box, moving a ticket in Jira, and then onto the next thing--not even a pretense of being curious about what any of it all meant. But look at those query times now!

It's been possible for a while to shut off your brain as a programmer and blindly copy-paste from StackOverflow etc., but the level of enablement that LLMs afford is staggering.


Out of curiosity- did it work though?


Doesn't this get to one of the fundamental issues though, that many of these frameworks and languages are poorly constructed in the first place? A lot of the times people turn to web searches, Stack Overflow, or AI is because they want to do X, and there's no quick, clear, and intuitive way to do X. I write cheat sheets for opaque parts of various frameworks myself. A lot of them aren't fundamentally difficult once you understand them, but they're constructed in an extremely convoluted way, and there's usually extremely poor documentation explaining how to actually use them.

In fact, I'd say I use AI more for documentation than I do for code itself, because AI generated documentation is often superior to official documentation.

In the end, these things shouldn't be necessary (or barely necessary) if we had well constructed languages, frameworks, libraries and documentation, but it appears like it's easier to build AI than to make things non-convoluted in the first place.


These models are simply much more powerful than a tradition search engine and stackoverflow, so many people use these models for a reason, a friend of mine that never tried ChatGPT until very recently managed to solve a problem he couldn't find a solution on stackoverflow using GPT-4o, next time he's probably going to ask the model directly.


I don't know what your friend's prompts were, but this probably speaks to the conversational aspect. I've found success in using LLMs to "search" for things I don't know how to search for - a 'tip of my tongue' type scenario.

"How do I do a for loop" though is a waste of time and energy and should be put into a search engine. There is no need to use the inefficient power needs of an LLM to answer that question. The search engine will have cached the results of that question, leading to a much faster discovery of the answer, and less power draw to do it, whereas an LLM needs to ponder your question EVERY. SINGLE. TIME. A huge waste.

Stop using LLMs for simple things.


> because we've decided we can't be bothered with traditional search

Traditional search was only Google, and Google figured out that they don't need to improve their tools to make it better, because everyone will continue to use it as a force of habit (google is a verb!). Traditional search is being abandoned because traditional search isn't good enough for the kinds of search we need (also, while google may claim their search is very useful, people rarely search stuff nowadays, instead prefer being passively fed content via recommendations algorithm (that also use AI!))


Algolia, Marginalia, Kagi, Scopus, ConnectedPapers, Lense[0] all stick to more or less traditional search and yield consistent high quality results. It shouldn't be one or the other, and I think the first one to combine both paradigms in a seamless fashion would be quite successfull (it has been tried, I know, but it's still a niche in many cases).

[0]: https://www.lens.org/lens/search/


> But the vast majority of AI use that I see is...not that. It's just glorified, very expensive search.

Since the collapse of Internet search (rose tinted hindsight - was it ever any good?) I have been using a LLM as my syntax advisor. I pay for my own tokens, and I can say it is astonishingly cheap

It is also very good.


A human can't be trusted to not make memory safety bugs. At the same time we can trust AI with logic bugs.


Since LLMs are just based on human output, we should trust LLMs (at best) as much as we trust the average human coder. And in reality we should probably trust them less.


>We are willing to burn far, far more fuel than necessary because we've decided we can't be bothered with traditional search.

That's because traditional search fucking sucks balls.


I don't get it either. People will say all sorts of strange stuff about how it writes the code for them or whatever, but even using the new Claude 3.5 Sonnet or whatever variant of GPT4, the moment I ask it anything that isn't the most basic done-to-death boilerplate, it generates stuff that's wrong, and often subtly wrong. If you're not at least pretty knowledgeable about exactly what it's generating, you'll be stuck trying to troubleshoot bad code, and if you are it's often about as quick to just write it yourself. It's especially bad if you get away from Python, and try to make it do anything else. SQL especially, for whatever reason, I've seen all of the major players generate either stuff that's just junk or will cause problems (things that your run of the mill DBA will catch).

Honestly, I think it will become a better Intellisense but not much more. I'm a little excited because there's going to be so many people buying into this, generating so much bad code/bad architecture/etc. that will inevitably need someone to fix after the hype dies down and the rug is pulled, that I think there will continue to be employment opportunities.


Supermaven is an incredible intellisense. Most code IS trivial and I barely write trivial code anymore. My imports appear instantly, with high accuracy. I have lots of embedded SQL queries and it’s able to guess the structure of my database very accurately. As I’m writing a query the suggested joins are accurate probably 80% of the time. I’m significantly more productive and having to type much less. If this is as good as it ever gets I’m quite happy. I rarely use AI for non trivial code, but non trivial code is what I want to work on…


This is all about the tooling most companies choose when building software: Things with more than enough boilerplate most code is trivial. We can build tools that have far less triviality and more density, where the distance between the code we write and business logic is very narrow.. but then every line of code we write is hard, because it's meaningful, and that feels bad enough to many developers, so we end up with tools where we might not be more productive, but we might feel productive, even though most of that apparent productivity is trivially generated.

We also have the ceremonial layers of certain forms of corporate architecture, where nothing actually happens, but the steps must exist to match the holy box, box cylinder architecture. Ceremonial input massaging here, ceremonial data transformation over there, duplicated error checking... if it's easy for the LLM to do, maybe we shouldn't be doing it everywhere in the first place.


>but then every line of code we write is hard, because it's meaningful, and that feels bad enough to many developers,

I don't know that I've ever even met a developer who wants to be writing endless pools of trivial boilerplate instead of meaningful code. Even the people at work who are willing to say they don't want to deal with the ambiguity and high level design stuff and just want to be told what to do pretty clearly don't want endless drudgery.


That, but boilerplate stuff is also incredibly easy to understand. As compared to high density, high meaning code anyway. I prefer more low density low meaning code as it makes it much easier to reason about any part of the system.


So basically it’s a presentation problem.

We want to control code at the call site, boilerplate helps with that by being locally modifiable.

We also want to systematize chunks of code so that they don’t flicker around and mess with a reader.

We wanted this since forever and no one does anything because anything above simple text completion is traditionally seen as an overkill, not the true way, not unix, etc. All sorts of stubborn arguments.

This can be solved by simply allowing code trees instead of lines of code (tree vs table). You drop a boilerplate into code marked as “boilerplate ‘foo’ {…}” and edit it as you see fit, which creates a boilerplate-local patch. Then you can instantly see diffs, find, update boilerplates, convert them to and from regular functions, merge best practices from boilerplate libraries, etc. Problem solved.

It feels like the development itself got collectively stuck in some stupid principles that no one dares to question. Everything that we invent stumbles upon the simple fact that we don’t have any sensible devtime structure, apart from this “file” and “import file” dullness.


In and of itself, it's usually easy to understand. But the more fluff you stuff between the nontrivial bits, the harder it is to take in all at once. I think large quantities of simple boilerplate make the overall project harder to understand and debug. Though that is in comparison to an imagined alternative that's somehow exactly the same but with all the glue removed, so maybe that's not entirely fair.


I think you just nailed the paradox of Go's popularity among developers, managers are obvious.


I don't think that is the signal that I think most people are hoping for here.

When I hear that most code is trivial, I think of this as a language design or a framework related issue making things harder than they should be.

Throwing AI or generates at the problem just to claim that they fixed it is just frustrating.


> When I hear that most code is trivial, I think of this as a language design or a framework related issue making things harder than they should be.

This was one of my thoughts too. If the pain of using bad frameworks and clunky languages can be mitigated by AI, it seems like the popular but ugly/verbose languages will win out since there's almost no point to better designed languages/framework. I would rather a good language/framework/etc where it is just as easy to just write the code directly. Similar time in implementation to a LLM prompt, but more deterministic.

If people don't feel the pain of AI slop why move to greener pastures? It almost encourages things to not improve at the code level.


I'm writing software independently, with an extremely barebones framework (just handles routing pretty much) and very lean architecture. Maybe I should re-phrase it, "a lot of characters in the code base are trivial". Imports, function declarations, variable declarations. Is this stuff code/logic? Barely, but it's completely unavoidable. It all takes time and it's now time I rarely have to spend.

Just as an example, I have "service" functions. They're incredibly simple, a higher order function where I can inject the DB handler, user permissions, config, etc. Every time I write one of these I have to import the ServiceDependencies type and declare which dependencies I need to write the service. I now spend close to zero time doing that and all my time focusing on the service logic. I don't see a downside to this.

Most of my business logic is done in raw SQL, which can be complex, but the autocomplete often helps there too. It's not helping me figure out the logic, it's simply cutting down on my typing. I don't know how anyone could be offered "do you want to have type significantly less characters on your keyboard to get the same thing done?" and say "no thanks". The AI is almost NEVER coding for me, it's just typing for me and it's awesome.

I don't care how lean your system is, there will at least be repetition in how you declare things. There will be imports, there will be dependencies. You can remove 90% of this repetitive work for almost no cost...

I've tried to use ChatGPT to "code for me", and I agree with you that it's not a good option if you're trying to do anything remotely complex and want to avoid bugs. I never do this. But integrated code suggestions (with Supermaven, NOT CoPilot) are incredibly beneficial and maybe you should just try it instead of trying to come up with theoretical arguments. I was also a non-believer once.


Well, Google did design Go...


Interesting that you believe your subjective experience outweighs the claims of all others who report successfully using LLMs for coding. Wouldn't a more charitable interpretation be that it doesn't fit the stuff you're doing?


Why wouldn't someone's subjective experience outweigh someone else's subjective experience?

Regardless, I do wonder how accurate those successful reports are. Do people take LLM output, use it verbatim, not notice subtle bugs, and report that as success?


There's a big difference between "I've seen X" and "I've not seen X". The latter does not invalidate the former, unless you believe the person is lying or being delusional.


I'm not a Google employee but I've heard enough stories to know that a surprising amount of code changes at google are basically updating API interfaces.

The way google works, the person changing an interface is responsible for updating all dependent code. They create PRs which are then sent to code owners for approval. For lower-level dependencies, this can involve creating thousands of PRs across hundreds of projects.

Google has had tooling to help with these large-scale refactors for decades, generally taking the form of static analysis tools. However, these would be inherently limited in their capability. Manual PR authoring would still be required in many cases.

With this background, LLM code gen seems like a natural tool to augment Google's existing process.

I expect Google is currently executing a wave of newly-unblocked refactoring projects.

If anyone works/worked at google, feel free to correct me on this.


Do they have tooling for generating scaffolding for various things (like unit/integration tests)?

If we’re guessing what code is easiest and largest proportion of codebase to write, my first guess would be test suites. Lots of lines of repetitive code patterns that repeat and AI is decent at dealing with


Most programming is trivial. Lots of non-trivial programming tasks can be broken down into pure, trivial sections. Then, the non-trivial part becomes knowing how the entire system fits together.

I've been using LLMs for about a month now. It's a nice productivity gain. You do have to read generated code and understand it. Another useful strategy is pasting a buggy function and ask for revisions.

I think most programmers who claim that LLMs aren't useful are reacting emotionally. They don't want LLMs to be useful because, in their eyes, that would lower the status of programming. This is a silly insecurity: ultimately programmers are useful because they can think formally better than most people. For the forseeable future, there's going to be massive demand for that, and people who can do it will be high status.


>I think most programmers who claim that LLMs aren't useful are reacting emotionally.

I don't think that's true. Most programmers I speak to have been keen to try it out and reap some benefits.

The almost universal experience has been that it works for trivial problems, starts injecting mistakes for harder problems and goes completely off the rails for anything really difficult.


> I don't think that's true. Most programmers I speak to have been keen to try it out and reap some benefits.

I’ve been seeing the complete opposite. So it’s out there.


> Most programming is trivial

That's a bold statement, and incorrect, in my opinion.

At a junior level software development can be about churning out trivial code in a previously defined box. I don't think its fair to call that 'most programming'.


Probably overloading of the term "programming" is the issue here. Most "software engineering" is non-programming work. Most programming is not actually typing code.

Most of the time, when I am typing code, the code I am producing is trivial, however.


Think of all the menial stuff you must perform regardless of experience level. E.g. you change the return type of a function and now you have to unpack the results slightly differently. Traditional automated tools fail at this. But if you show some examples to Cursor, it quickly catches on to the pattern and start autocompleting semi-automatically (semi because you still have to put the cursor to the right place but then you can tab, tab, tab…).


Don't misunderstand. I am not making an assertion that GenAI tools for development are useless.

I am just pointing out that the thread parent started his logical climb at a step one that is incorrect: 'Most programming is trivial'.

Given that they got it wrong on step one, how good do you thing step ten is?


From my perspective, writing out the requirements for an AI to produce the code I want is just as easy as writing it myself. There are some types of boilerplate code that I can see being useful to produce with an LLM, but I don't write them often enough to warrant actually setting up the workflow.

Even with the debugging example, if I just read what I wrote I'll find the bug because I understand the language. For more complex bugs, I'd have to feed the LLM a large fraction of my codebase and at that point we're exceeding the level of understanding these things can have.

I would be pretty happy to see an AI that can do effective code reviews, but until that point I probably won't bother.


It's reasonable to say that LLMs are not completely useless. There is also a very valid case to make that LLMs are not good at generating production ready code. I have found asking LLMs to make me Nix flakes to be a very nice way to make use of Nix without learning the Nix language.

As an example of not being production ready: I recently tried to use ChatGPT-4 to provide me with a script to manage my gmail labels. The APIs for these are all online, I didn't want to read them. ChatGPT-4 gave me a workable PoC that was extremely slow because it was using inefficient APIs. It then lied to me about better APIs existing and I realized that when reading the docs. The "vibes" outcome of this is that it can produce working slop code. For the curious I discuss this in more specific detail at: https://er4hn.info/blog/2024.10.26-gmail-labels/#using-ai-to...


I find a recurring theme in these kind of comments where people seem to blame their laziness on the tool. The problem is not that the tools are imperfect, it’s that you apparently use them in situations where you expect perfection.

Does a carpenter blame their hammer when it fails to drive in a screw?


I'd argue that a closer analogy is I bought a laser based measuring device. I point it a distant point and it tells me the distance from the tip of the device to that point. Many people are excited that this tool will replace rulers and measuring tapes because of the ease of use.

However this laser measuring tool is accurate within a range. There's a lot of factors that affect it's accuracy like time of day, how you hold it, the material you point it at, etc. Sometimes these accuracy errors are minimal, sometimes they are pretty big. You end up getting a lot of measurements that seem "close enough". but you still need to ask if each one is correct. "Measure Twice, Cut Once" begins to require one measurement with the laser tool and once with the conventional tool when accuracy matters.

One could have a convoluted analogy where the carpenter has an electric hammer that for some reason has a rounded head that does cause some number of nails to not go in cleanly, but I like my analogy better :)


>Does a carpenter blame their hammer when it fails to drive in a screw?

That's the exact problem. I have plenty of screwdrivers but there's so much pressure from people not in carpentry telling me to use this shiny new army Swiss knife contraption. Will it work? Probably, if I'm just screwing in a few screws. Would I readily abandon my set of precision built, magnetic tip, etc. Screwdriver set for it? Definitely not.

I'm sure it's great for non-carpenters to have so many tools in so small a space. But I developed skills and tools already. My job isn't just to screw in a few screws a day and call it quits. People wanting to replace me for a quarter the cost for this Swiss army carpenter will quickly see a quality difference and realize why it's not a solution to everything.

Or in the software sense, maybe they are fine with unlevel shelves and hanging nails in carpet. It's certainly not work I'd find acceptable.


> I think most programmers who claim that LLMs aren't useful are reacting emotionally. They don't want LLMs to be useful because, in their eyes, that would lower the status of programming.

I think revealing the domain each programmer works in and asking in hose domains would reveal obvious trends. I imagine if you work in Web that you'll get workable enough AI gen code, but something like High Performance computing would get slop worse than copying and lasting the first result on Stackoverflow.

A model is only as good as its learning set, and not all types are code are readily able to be indexable.


> Lots of non-trivial programming tasks can be broken down into pure, trivial sections. Then, the non-trivial part becomes knowing how the entire system fits together.

I think that’s exactly right. I used to have to create the puzzle pieces and then fit them together. Now, a lot of the time something else makes the piece and I’m just doing the fitting together part. Whether there will come a day when we just need to describe the completed puzzle remains to be seen.


Trivial is fine but as you compound all the triviality the system starts to have a difficult time with putting it together. I don't expect it to nail it but then you have to unwind everything and figure out the issues so it isn't all gravy - fair bit of debug.


It’s always harder to build a mental model of the code written by someone else. No matter what, if you trust an LLM on small things in the long run you’ll trust it for bigger things. And the most code the LLM writes, the harder it is to build this mental construct. In the end it’ll be « it worked on 90% of cases so we trust it ». And who will debug 300 millions of code written by a machine that no one read based on trust ?


They are useful, but so far, I haven't seen LLMs being obviously more useful than stackoverflow. It might generate code closer to what I need than what I find already coded, but it also produces buggier code. Sometimes it will show me a function I wasn't aware of or approach I wouldn't have considered, but I have to balance that with all the other attempts that didn't produce something useful.


Yes. Productivity tools make programmer time more valuable, not less. This is basic economics. You’re now able to generate more value per hour than before.

(Or if you’re being paid to waste time, maybe consider coding in assembly?)

So don’t be afraid. Learn to use the tools. They’re not magic, so stop expecting that. It’s like anything else, good at some things and not others.


A good farmer isn’t likely to complain about getting a new tractor. But it might put a few horses out of work.


I would add that a lot of the time when I'm programming, I'm an expert on the problem domain but not the solution domain — that is, I know exactly what the pseudocode to solve my problem should look like; but I'm not necessarily fluent in the particular language and libraries/APIs I happen to have to use, in the particular codebase I'm working on, to operationalize that pseudocode.

LLMs are great at translating already-rigorously-thought-out pseudocode requirements, into a specific (non-esoteric) programming language, with calls to (popular) libraries/APIs of that language. They might make little mistakes — but so can human developers. If you're good at catching little mistakes, then this can still be faster!

For a concrete example of what I mean:

I hardly ever code in JavaScript; I'm mostly a backend developer. But sometimes I want to quickly fix a problem with our frontend that's preventing end-to-end testing; or I want to add a proof-of-concept frontend half to a new backend feature, to demonstrate to the frontend devs by example the way the frontend should be using the new API endpoint.

Now, I can sit down with a JS syntax + browser-DOM API cheat-sheet, and probably, eventually write correct code that doesn't accidentally e.g. incorrectly reject reject zero or empty strings because they're "false-y", or incorrectly interpolate the literal string "null" into a template string, or incorrectly try to call Element.setAttribute with a boolean true instead of an empty string (or any of JS's other thousand warts.) And I can do that because I have written some JS, and have been bitten by those things, just enough times now to recognize those JS code smells when I see them when reviewing code.

But just because I can recognize bad JS code, doesn't mean that I can instantly conjure to mind whole blocks of JS code that do everything right and avoid all those pitfalls. I know "the right way" exists, and I've probably even used it before, and I would know it if I saw it... but it's not "on the tip of my tongue" like it would be for languages I'm more familiar with. I'd probably need to look it up, or check-and-test in a REPL, or look at some other code in the codebase to verify how it's done.

With an LLM, though, I can just tell it the pseudocode (or equivalent code in a language I know better), get an initial attempt at the JS version of it out, immediately see whether it passes the "sniff test"; and if it doesn't, iterate just by pointing out my concerns in plain English — which will either result in code updated to solve the problem, or an explanation of why my concern isn't relevant. (Which, in the latter case, is a learning opportunity — but one to follow up in non-LLM sources.)

The product of this iteration process is basically the same JS code I would have written myself — the same code I wanted to write myself, but didn't remember exactly "how it went." But I didn't have to spend any time dredging my memory for "how it went." The LLM handled that part.

I would liken this to the difference between asking someone who knows anatomy but only ever does sculpture, to draw (rather than sculpt) someone's face; vs sitting the sculptor in front of a professional illustrator (who also knows anatomy), and having the sculptor describe the person's face to the illustrator in anatomical terms, with the sketch being iteratively improved through conversation and observation. The illustrator won't perfectly understand the requirements of the sculptor immediately — but the illustrator is still a lot more fluent in the medium than the sculptor is; and both parties have all the required knowledge of the domain (anatomy) to communicate efficiently about the sculptor's vision. So it still goes faster!


> people who can do it will be high status

They don't have high status even today, imagine in a world where they will be seen as just reviewers for AI code...


> They don't have high status even today

Try putting on a dating website that you work at Google vs you work in agriculture and tell us which yielded more dates.


Does it matter? I imagine the tanned shirtless farmer would get more hits than the pasty million dollar salary Googler anyway. (no offense to Googleers).

With so many hits, it's about hitting all the checkmarks instead of minmaxing on one check.


You can't just arbitrarily change (confounding) variables like that for a proper experiment. All other factors (including physique) must remain the same while you change one thing only: occupation.


"confounding" implies occupancy doesn't influce other factors of your life. I'm sure everyone wants the supermodel millionaire genius who's perfectly in touch with the feelings of their parter. If that was the norm then sure, farmers would be in trouble.

My comment was more a critique on online dating culture and the values it weighs compared to in person meetups.


I think it’s possible to create 2 dating profiles with the same pictures and change occupation only. It doesn’t have to be real to measure the impact of occupation.


> Or do they have 25% trivial code?

We all have probably 25% or more trivial code. AI is great for that. I have X (table structure, model, data, etc) and I want to make Y with it. A lot of code is pretty much mindless shuffling data around.

The other thing is good for is anything pretty standard. If I'm using a new technology and I just want to get started with whatever is the best practice, it's going to do that.

If I ever have to do PowerShell (I hate PowerShell), I can get AI to generate pretty much whatever I want and then I'm smart enough to fix any issues. But I really don't like starting from nothing in a tech I hate.


I’ve already had one job interview where the applicant seemed broadly knowledgeable about everything we asked them during lead-in questions before actual debugging. Then when they had to actually dig deeper or demonstrate understanding while solving some problem, they fell short.

I’m pretty sure they weren’t the first and there’ve been others we didn’t know about. So now I don’t ask lead-in questions anymore. Surprisingly, it doesn’t seem to make much of a difference and I don’t need to get burned again.


Yes but then it would be more logical to say "AI makes our devs 25% more efficient". This is not what he said, but imo you are obviously right.


Not necessarily. If 25% of the code is written by AI but that code isn't very interesting or difficult, it might not be making the devs 25% more efficient. It could even possibly be more but, either way, these are different metrics.


The benefit doesn't translate 1:1. The generated code has to be read and verified and might require small adaptions. (Partially that can be done by AI as well)

But for me it massively improved all the boilerplate generic work. A lot of those things which are just annoying work, but not interesting.

Then I can focus on the bigger things, on the important parts.


> do they have 25% trivial code?

From what I've seen on Google Cloud, both as a user and from leaked source code, 25% of their code is probably just packing and unpacking of protobufs.


I'd bet at least 25% of code attributes to me in gitfarm at Amazon was generated by octane and/or bones.

God I miss that, thanks for the other person on HN introducing me to projen. Yeoman wasnt cutting it.

These days I write a surprising amount of shell script and awk with LLMs. I review and adapt it, of course, but for short snippets of low context scripting it's been a huge time saver. I'm talking like 3-4, up to 20 lines of POSIX shell.

Idk. Some day I'll actually learn AWK, and while I've gotten decent with POSIX shell (and bash), it's definitely been more monkey see monkey do than me going over all the libraries and reference docs like I did for python and the cpp FAQ.


> isn't this announcement a terrible indictment

Of obviously flawed corporate structures. This CEO has no particular programming expertise and most of his companies profits do not seem to flow from this activity. I strongly doubt he has a grip on the actual facts here and is uncritically repeating what was told to him in a meeting.

He should, given his position, been the very _first_ person to ask the questions you've posed here.


An example:

I'm looking for a new job, so I've been grinding leetcode (oof). I'm an experienced engineer and have worked at multiple FAANGs, so I'm pretty good at leetcode.

Today I solved a leetcode problem 95% of the way to completion, but there was a subtle bug (maybe 10% of the test cases failing). I decided to see if Claude could help debug the code.

I put the problem and the code into Claude and asked it to debug. Over the course of the conversation, Claude managed to provide 5 or 6 totally plausible but also completely wrong "fixes". Luckily, I am experienced enough at leetcode, and leetcode problems are simple enough, that I could easily tell that Claude was mistaken. Note that I am also very experienced with prompt engineering, as I ran a startup that used prompt engineering very heavily. Maybe it's a skill issue (my company did fail, hence why I need a job), but somehow I doubt it.

Eventually, I found the bug on my own, without Claude's help. But leetcode are super simple, with known answers, and probably mostly in the training set! I can't imagine writing a big system and using an LLM heavily.

Similarly, the other day I was trying to learn about e-graphs (the data structure). I went to Claude for help. I noticed that the more I used Claude, the more confused I became. I found other sources, and as it turns out, Claude was subtly wrong about e-graphs, an uncommon but reasonably well-researched data structure! Once again, it's lucky I was able to recognize that something was up. If the problem wasn't limited in scope, I'd have been totally lost!

I use LLMs to help me code. I'm pro new technology. But when I see people bragging on Twitter about their fully automated coding solutions, or coding complex systems, or using LLMs for medical records or law or military or other highly critical domains, I seriously question their wisdom and experience.


At what point are people going to stop shitting on the code that Copilot or other LLM tools generate?

> how trivial the problems they solve are

A single line of code IS trivial. Simple code is good code. If I write the first 3 lines of a complex method and I let Copilot complete the 4th, that's 25% of my code written by an LLM.

These tools have exploded in popularity for good reason. If they were no good, people wouldn't be using them.

I can only assume people making such comments don't actually code on a daily basis and use these tools daily. Either that or you haven't figured out the knack of how to make it work properly for you.


These tools have exploded in popularity for good reason. If they were no good, people wouldn't be using them.

You're saying anything that's ever been popular is popular for a good reason? You can't think of counter examples that disprove this?

You're saying anything that people decide to do is good, or else people wouldn't do it? People never act irrationally? People never blindly act on trends? People never sacrifice long-term results for short-term gain? You can't come up with any counter examples?


I don't really care to get into a philosophical debate about what's good or what people should use.

I use these tools daily and they help me immensely. If you prefer Googling for docs, browsing stack overflow, or even flicking through textbooks to find the answers/materials you need - that's great! Do what works for you. I value my time slightly more than that and prefer not to remain stuck in the past.

Perhaps you hold all the information you need in your head like an oracle and never need to learn new concepts or ever forget syntax? Wonderful. The rest of us aren't so naturally talented, so have found these new tools super helpful.


remembers Bitcoin et al


I havent seen anybody use them and be more productive.

With c++ my experience is that the results are completely worthless. It saves you from writing a few keywords but nothing that really helps in a big way.

Yes Copilot CAN work, for example writing some JS or filter functions, but in my job these trivial snippets are rather uncommon.

I‘d genuinely love to see some resources that show its usefulness that arent just PR bs.


> I havent seen anybody use them and be more productive.

What does that even mean? What are you expecting to see?

I've seen people who can't code ship entire new applications which actually work, in a few days or so. That to me seems more productive?

I use these tools daily in a FAANG level SWE role and they help me debug issues quickly - all the time, especially with tech I'm new to and have no experience with. I really don't understand the hate - it's like skipping stack overflow and giving you the ideal answer a lot faster.

Nobody likes to shout that they're using these tools but most people are.


I'll just answer here, but this isn't about this post in particular. It's about all of them. I've been struggling with a team of junior devs for the past months. How would I describe the experience? It's easy: just take any of these posts, replace "AI" with "junior dev", done.

Except of course AI at least can do spelling. (Or at least I haven't encountered a problem in that regard.)

I'm highly skeptical regarding LLM-assisted development. But I must admit: it works. If paired with an experienced senior developer. IMHO it must not be used otherwise.


Isn't the whole point of hiring a junior dev that they will learn and become senior devs eventually?


Your mindset is sadly a decade put of touch. Companies long since shifted to churn mentality. They not only slashed retention perks, they actively expect people to move around every few years. So they don't bother stopping them or counter offering unless they are a truly exceptional person.


> replace "AI" with "junior dev", done.

Damn, that’s a good way of putting it. But I’ll go one further:

replace "AI" with "junior dev who doesn’t like reading documentation or googling how things work so instead confidently types away while guessing the syntax and API so it kind of looks right”


I've been saying it's like an intern who has an incredible breadth of knowledge but very little depth, is excessively over confident in their own abilities given the error rates they commit, and is anxious to the point they'll straight up lie to you rather than admit a mistake.

Currently, they don't learn skills as fast as a motivated intern. A stellar intern can go from no idea to "makes relevant contributions to our product with significant independence and low error rate" (hi Matt if you ever see this) in 3 months. LLMs, to my understanding, take significantly more attention from super smart people working long hours and an army of mechanical Turks, but won't be able to independently implement a feature and will still have a higher error rate in the same 3 months.

It's still super impressive what LLMs can do, but that same intern is going to keep growing at that faster rate in skills and competency as they go from jr->mid->sr. Sure the intern won't have as large of a competency pool, and takes longer to respond to any given question, but the scope of what they can implement is so much greater.


> To my experience, AIs can generate perfectly good code relatively easy things, the kind you might as well copy&paste from stackoverflow, and they'll very confidently generate subtly wrong code for anything that's non-trivial for an experienced programmer to write. How do people deal with this?

Well, just in the last 24 hours, ChatGPT gave me solutions to some relatively complex problems that turned out to be significantly wrong.

Did that mean it was a complete waste of my time? I’m not sure. Its broken code gave me a starting point for tinkering and exploring and trying to understand why it wasn’t working (even if superficially it looked like it should). I’m not convinced I lost anything by trying its suggestions. And I learned some things in the process (e.g. asyncio doesn’t play well together with Flask-Sock)


> To my experience, AIs can generate perfectly good code relatively easy things, the kind you might as well copy&paste from stackoverflow,

This, imho, is what is happening. In the olden days, when StackOverflow + Google used to magically find the exact problem from the exact domain you needed every time - even then you'd often need to sift through the answers (top voted one was increasingly not what you needed) to find what you needed, then modify it further to precisely fit whatever you were doing. This worked fine for me for a long time until search rendered itself worthless and the overall answer quality of StackOverflow has gone down (imo). So, we are here, essentially doing the exact same thing in a much more expensive way, as you said.

Regarding future employment opportunities - this rot is already happening and hires are coming from it, at least from what I'm seeing in my own domain.


I'd be terribly scared to use it in a language that isn't statically typed with many, many compile time error checks.

Unless you're the type of programmer that is writing sabots all day (connecting round pegs into square holes between two data sources) you've got to be very critical of what these things are spitting out.


I can't help but think that Go might be one of the better languages for AI to target - statically typed, verbose with a lot of repeated scaffolding, yet generally not that easy to shoot yourself in the foot. Which might explain why this is a thing at Google specifically.


It is way more scary to use it for C or C++ than Python imo.


If you use it as advanced IntelliSense/auto-complete, it's not any worse than with typed languages.

If you just let it generate and run the code... yeah, probably, since you won't catch the issues at compile time.


I decided to go into programming instead of becoming an Engineer because most Engineering jobs seemed systematic and boring. (Software Engineers weren't really a thing at the time.)

For most of my career, Software Engineering was a misnomer. The field was too young, and the tools used changed too quickly, for an appreciable amount of the work to be systematic and boring enough to consider it an Engineering discipline.

I think we're now at the point where Software Engineering is... actually Engineering. Particularly in the case of large established companies that take software seriously, like Google (as opposed to e.g. a bank).

Call it "trivial" and "boring" all you want, but at some point a road is just a road, and a train track is just a train track, and if it's not "trivial and boring" then you've probably fucked up pretty badly.


Since when is engineering boring? Stranges ideas and claims u made.

I’m an engineer who writes code since 20 years and it’s far away from trivial . Maybe to do web dev for a simple Webshop is. Elsewhere software has often times special requirements. Be them technical or domain wise both make the process complex and not simple IMHO


Boring is the opposite of exciting/dynamic.

Not all engineering is boring. Also, boring is not bad.

A lot of my career has been spent working to make software boring. To the extent that I've helped contribute to the status quo, where we can build certain types of software in a relatively secure fashion and on relatively predictable timelines, I am proud to have made the world more boring!

(Also, complexity can be extraordinarily boring. Some of the most complex things are also the most boring. Nothing more boring than a set of business rules that has an irreducible complexity coming in at 5,211 lines of if-else blocks wrapped in two while loops! Give me a simple set of partial differential equations any day -- much more exciting to work with those! If you're the type of person who enjoys reading tax code, then we just have different definitions of boring; and if you're the type of person doesn't think tax code is complex, then I'm just a dummy compared to you :))

But e.g. in the early naughts doing structural engineering work for residential new build projects was certainly less engaging and exciting work than building websites.

Most engineering works aims for repeatable and predictable outcomes. That's a good thing, and it's not easy to achieve! But if Software has reached the point where the process of building certain types of software is "repeatable and predictable", and if Google needs a lot of that type of software, then if the main criticism of AI code assistants is "it's only good for repeatable and predictable", well, then the criticism isn't exactly the indictment that skeptics think it is.

There is nothing wrong with boring in the sense I'm using it. Boring can be tremendously intellectually demanding. Also, predictable and repeatable processes are incredibly important if you want quality work at scale. Engineering is a good thing. Maturing as a field is a good thing.

But if we're maturing from "wild west everything is a greenfield project" to "70% of things are pretty systematic and repeatable" then that says something about the criticism of AI coding assistants as being only good for the systematic and repeatable stuff, right?

Also: the AI coding assistant paradigm is coming for structural/mechanical/civil engineering next, and in a big way!


I was totally with you until "70% of things are pretty systematic and repeatable". This has not been my experience, and I think you acknowledged it yourself when you said "Google (as opposed to e.g. a bank)" - there are many more banks in the world than Googles. The main challenge will be transitioning all those "banks" to "Google's" and further still. They have 10y+ codebases written in 5 months by a single genius engineer (who later found his luck elsewhere), then hammered by multiple years of changing maintainers. That's the real "70% of things" :D


No, I think we agree! Google SWE roles will be automated faster SWE roles in the financial sector :)


I have a whole "chop wood, carry water" speech born from leading corporate software teams. A lot of work at a company of sufficient size boils down to keeping up with software entropy while also chipping away at some initiative that rolls up to an OKR. It can be such a demotivating experience for the type of smart, passionate people that FANNGs like to hire.

There's even a buzzword for it: KTLO (keep the lights on). You don't want to be spending 100% of your time on KTLO work, but it's unrealistic to expect to do done of it. Most software engineers would gladly outsource this type of scutwork.


> KTLO (keep the lights on)

Some places also call this "RTB" for "run the business" type work. Nothing but respect for the engineers who enjoy that kind of approach, I work with several!


No, AI is generating a quarter of all characters. It's an autocomplete engine. You press tab, it finishes the line. Doesn't do any heavy lifting at all.

Source: I work there, see my previous comment.


> Or do they have 25% trivial code?

Surely yes.

I (not at Google) rarely use the LLM for anything more than two lines at a time, but it writes/autocompletes 25% of my code no problem.

I believe Google have character-level telemetry for measuring things like this, so they can easily count it in a way that can be called "writing 25% of the code".

Having plenty of "trivial code" isn't an indictment of the organisation. Every codebase has parts that are straightforward.


I wouldn't call it an indictment necessarily, because so much is dependent upon circumstances. They can't all be "deep problems" in the real world. Projects tend to have two components, "deep" work which is difficult and requires high skill and cannot be made up with by using masses of inexperienced and "shallow" work where being skilled doesn't really help, or doesn't help too much compared to throwing more bodies at the problem. To use an example it is like advanced accounting vs just counting up sales receipts.

Even if their engineers were inexperienced that wouldn't be an indictment in itself so long as they had a sufficient necessary amount of shallow work. Using all experienced engineers to do shallow work is just inefficient, like having brain surgeons removing bunions. Automation is basically a way to transform deep work to a producer of "free" shallow work.

That said, the real impressive thing with code isn't in its creation but in its ability to losslessly delete code and maintain or improve functionality.


25% trivial code sounds like a reasonable guess.


This seems reasonable - but I'm interpreting this as most junior-level coding needs will end and be replaced with AI.


And the non junior developers will then just magically appear from the aether!With 10 years experience in a four year old stack.


> and they'll very confidently generate subtly wrong code for anything that's non-trivial for an experienced programmer to write

Thankfully I don't find it subtle but plain wrong for anything but trivial stuff. I use it (and pay an AI subscription) for things where false positive won't ruin the day, like parameters validation.

But for anything advanced, it's pretty hopeless.

I've talked with lawyers: same thing. With doctors: same thing.

Which ain't no surprise see how these things do work.

> Like, isn't this announcement a terrible indictment of how inexperienced their engineers are, or how trivial the problems they solve are, or both?

Probably lots of highly repetitive boilerplate stuff everywhere. Which in itself is quite horrifying if you think about it.


> Does Google now have 25% subtly wrong code?

How do you quantify "new code" - is it by lines of code or number of PRs/changesets generated? I can easily see it being the latter - if an AI workflow suggests 1 naming-change/cleanup commit to your PR made of 3 other human-authored commits, has it authored 25% of code? Arguably, yes - but it's trivial code that ought to be reviewed by humans. Dependabot is responsible for a good chunk of PRs already.

Having a monorepo brings plenty of opportunities for automation when refactoring - whether its AI, AST manipulation or even good old grep. The trick is not to merge the code directly, but have humans in the loop to approve, or take-over and correct the code first.


Google's internal codebase is nicer and more structured than the average open source code base.

Their internal AI tools are presumably trained on their code, and it wouldn't surprise me if the AI is capable of much more internally than public coding AIs are.


> Like, isn't this announcement a terrible indictment of how inexperienced their engineers are..

Well, Rob Pike said same thing about experience and that seemed to pissed lot of people endlessly.

However I don't think it as indictment It just seems very reasonable to me. In fact 25% seem to be on lower end. Amazon seems to have thousands of software engineers who are doing API calling API calling API.. kind of crap. Now their annual income might be more than my lifetime earnings. But to think that all these highly paid engineers are doing highly complex work that need high skills seems just a myth that is useful to boost ego of engineers and their employers alike.


> Or do they have 25% trivial code?

If anything that's probably an underestimate. Not to downplay the complexity in much of what Google does but I'm sure they also do an absolute ton of tedious, boring CRUD operations that an AI could write.


In my experience, that was always the case with gpt3.5, most times the case with gpt4, some times the case with the latest sonnet. It’s getting better FAST, and the kind of code they can handle is increasing fast too


A better analogy is a self driving car where you need to keep your hands on the wheel in case something goes wrong.

For the most part, it drives itself.

Yes, the majority of my code is trivial. But I've also had ai iterate on some very non trivial work including writing the test suite.

It's basically autocomplete on steroids that predicts your next change in the file, not just the next change on the line.

The copy paste from stack overflow trope is a bit weird, I haven't done that in ten years and I don't think the code it produces is that low quality either. Copy paste from an open source repo on GitHub maybe?


> Like, isn't this announcement a terrible indictment of how inexperienced their engineers are, or how trivial the problems they solve are, or both?

Or maybe there's a KPI around lines of code or commits.


> Does Google now have 25% subtly wrong code?

maybe the ai generates 100% of the company's new code, and then by the time the programmers have fixed it, only 25% is left of the AI's ship of Theseus


> Does Google now have 25% subtly wrong code?

I think you underestimate the amount of boiler-plate code that a typical job at Google requires. I found it soul-crushingly boring (though their pay is insane).


By definition, "trivial" code should make up a significant portion of any code base, so perhaps the 25% is precisely the bit that is trivial and easily automated.


I don't think the word "definition" means what you think it means!


if their sales and stock depends on saying that new shinny thing is changing the world then they have to say so, and say how it is changing their world .

It is not Netflix or Airbnb or Stripe etc making this claim, google managers have a vested interest in this.

If this metric was meaningful either of two things should have happened - google should have fired 25 % developers or built 25 % more product .

Both of this would visible in their financial reporting and has not happened.

metrics like this claim depends on how you count, that is easily gamed and can be made to show any % between 0-99 you want. Of the top of head

- I could count all AI generated code used for training as new code

- consider compiler output to assembly as AI code by adding some meaningless AI step in it

- code generated with boilerplate perhaps even generated by llm now

- mix autocomplete with llm prompts so on

The number only needs to believable , 25 is believable now, it is not true but you would believe it >50 has psychological significance and bad PR on machines replacing humans jobs , less than 10 is bad for AI sales , 25 works all the commenters in this thread is testament to that


I can generate in eclipse pojo classes or their accessor methods. I can let maven build entire packages from say XSDs (I know I am talking old boring tech, just giving an example). I can copy&paste half the code (if not more) from stack overflow.

Now replace all this and much more with 'AI'. If they said AI helped them increase say ad effectivity by 3-5%, I'll start paying attention.


``` Like, isn't this announcement a terrible indictment of how inexperienced their engineers are, or how trivial the problems they solve are, or both? ```

there is a 3rd possibility as well: having spent a huge chunk of change on these techniques, why not overhype it (not outright lie about it) and hope to, somewhat recoup the cost from unsuspecting masses ?


Depends if they include test code in this metric. I have found AI most valuable in generating test code. I usually want to keep tests as simple as possible, so I prefer some repetition over abstraction to make sure there's no issues with the test logic itself, AI makes this somewhat verbose process very easy and efficient.


I guess the obvious response would be - yes, they have _at least_ 25% trivial code (as any other enterprise), and yes, they should have lots of engineers 'babysitting' (aka generating training data). So in another year or two there will be no manpower at all needed for the trivial tasks.


trivial code could very easily include the vast majority of most apps we're building these days. Most of it's just glue, and AI can probably stitch together a bunch of API calls and some UI as well as a human. It could also be a lot of non-product code, tooling, one-time things, etc.


You're quick to jump to the assertion that AI only generates SO style utility code to do X, but it can also be used to generate boring mapping code (e.g. to/from SQL datasets). I heard one ex Google dev say that most of his job wat fiddling with Protobuf definitions and payloads.


Its been a while since I was really fully in the trenches, but not that long.

How people deal with this is they start by writing the test case.

Once they have that, debugging that 25% comes relatively easily and after that its basically packaging up the PR


I suspect that a lot of the hard, google-scale stuff has already been done and packaged as an internal service or library - and just gets re-used. So the AIs are probably churning out new settings dialogs and the like.


How would you react to a tech firm that in 2018, proudly announced that 25% of their code was generated by IntelliJ/Resharper/Visual Studio's codegen and autocomplete and refactoring tools?


They probably have ai that scans existing human written code and auto generates patches and fixes to improve performance or security. The 25% is just a top level stat with no real meaning without context.


Maybe the trick is to hide vetted correct code, of whatever origin, behind function calls for documented functions, thereby iteratively simplifying the work a later-trained LLM would need to do?


I've suspected for a while now that the people who find value in AI-generated code don't actually have hard problems to solve. I wonder how else they might justify their salary.


This subtly wrong thing happens maybe 10% of the time in my experience and asking it to generate unit tests or writing your own ahead of time almost completely eliminates it.


To your point, I don't buy the truth of the statement. I work in big tech and am convinced that 25% of the code being written is not coming from AI.


Yes 25% of code is trivial; certainly for companies like Google that have always been a bit NIH.


Does the figure include unit tests?


Or perhaps that even for excellent engineers and complicated problems a quarter of the code one writes is stupid almost copy-pasteable boilerplate which is now an excellent target for the magic lArge text Interpolator


You're doing circular reasoning based on your initial concern actually being a problem in practice. In my experience it's not, which makes all your other speculations inherently incorrect.


Or alternatively you don't know how to use AI to help you code and are in the 2020s equivalent of the 'Why do I need google when I have the yellow pages?' phase a lot of adults went through in the 2000s.

This is not a bad thing since you can improve, but constantly dismissing something that a lot of people are finding an amazing productivity boost should give you some pause.


It's like blockchain right now. I'm sure there is some killer feature that can justify its problem space.

But as of now the field is full of swamps. Of grifters, of people with a solution looking for a problem. Of outright scams of questionable legality being challenged as we speak.

I'll wait until the swamps work itself out before evaluating an LLM workflow.


Blockchain was always a solution looking for a problem.

LLMs are being used right now by a lot of people, myself included, to do tasks which we would have never bothered with before.

Again, if you don't know how to use them you can learn.


And the same was said with the last fad when Blockcbain was all investors wanted to hear about ("Big Data" I suppose). It's all a pattern.

It's a legal nightmare in my domain as of now, so I'll make sure the Sam Breaker-Friends are weeded out. If it's really all the hype it won't be going anywhere in 5 years.


It's been 5 years since GPT2. I'm really struggling to understand the amount of negativity towards the biggest breakthrough in computing since the WWW.


If you're unaware of the general mood towards big tech in the 2020's, the downward trend of the economy, extreme speculation in all the tech sector over AI (which again, is not new), and the dozens of ethical quandries towards the methods of how LLMs obtain their data set, then yes. I can see why you're struggling to understand. There's so much literature on each point that I will only implore you to research these things on your own time if you care to.

In a purely technical vacuum though: it is truly amazing tech. I will give it that. Although it both excites and alarms me that apparently the power output predicted to properly leverage this at scale is having tech companies consider an investment in nuclear power.


Yes it's wonderful that AI will solve global warming as a side effect.


This is kind of why I'm skeptical of AI. When supposed tech experts are wearing rose tinted lens and missing the red flags, it's either because they want to wear them or because their livelihood depends on wearing them.

I won't blame people for that latter, I'd love a good quick way out of traditional work as well (gives me more time to hack on stuff without money troubles). But it's not a good model for curiosity and scrutiny. Again, I'll wait it out. Take care.


The rose tinted glasses are everyone expecting batteries to become a major part of the grid so we don't have to shut it down when the sun isn't shining.

Investing trillions in carbon free energy for AI is the most benign form of bubble I can imagine. If the bubble pops we have enough base load for the next century and don't die from climate change. If it doesn't we have the expertise to keep building large nuclear power plants.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: