Hacker Newsnew | past | comments | ask | show | jobs | submit | dakiol's commentslogin

Why are we still building stuff in TS/npm? Given that LLMs can code in any language, I'd expect people would use "better" languages at this point.

So, I typically follow blogs from people I already knew (online) pre 2022. In that regard, I'm sure about the quality of such places.

I don't have social media accounts (HN counts as one?) so whatever happens in IG, YT, Twitter or facebook: i simply don't give a fuck

I don't really follow more of the internet tbh: dont use reddit, or read the news... I don't even have an adblocker (which reveals that I get bothered very little by ads on the sites I frequently use)

I read a bunch of ebooks, though (but again they all are pre 2022... there's so much to read out there).

Less is more!


$25K. Really? They make $65 million a day, so they pay you what they earn in about 33 seconds for a critical vulnerability. WTF

Well they lose $100M a day, so...

> On the flip side, there are hundreds of ways that these tools cause genuine harm, not just to individuals but to entire systems.

Yeah, agree. I think it's the first time I'm asking myself: Ok, so this new cool tech, what is it good for? Like, in terms of art, it's discarded (art is about humans), in terms of assets: sure, but people is getting tired of AI-generated images (and even if we cannot tell if an image is AI-generated, we can know if companies are using AI to generate images in general, so the appealing is decreasing). Ads? C'mon that's depressing.

What else? In general, I think people are starting to realize that things generated without effort are not worth spending time with (e.g., no one is going to read your 30-pages draft generated by AI; no one is going to review your 500 files changes PR generated by AI; no one is going to be impressed by the images you generate by AI; same goes for music and everything). I think we are gonna see a Renaissance of "human-generated" sooner rather than later. I see it already at work (colleagues writing in slack "I swear the next message is not AI generated" and the like)


> I think it's the first time I'm asking myself: Ok, so this new cool tech, what is it good for?

I feel like this is something people in the industry should be thinking about a lot, all the time. Too many social ills today are downstream of the 2000s culture of mainstream absolute technoöptimism.

Vide. Kranzberg's first law--“Technology is neither good nor bad; nor is it neutral.”


Completely unrelated, but I am curious about your keyboard layout since you mistyped ö instead of - these two symbols are side by side in the Icelandic layout, and the ö is where - in the English (US) layout. As such this is a common type-o for people who regularly switch between the Icelandic and the English (US) layout (source: I am that person). I am curious whether more layouts where that could be common.

This is also a stylistic choice that the New Yorker magazine uses for words with double vowels where you pronounce each one separately, like coöperate, reëlect, preëminent, and naïve. So possibly intentional.

Yes, this is exactly correct, and I will die on this hill. Additionally, I don't like the way a hyphenated "techno-optimism" looks and "technOOPtimism" is a bit too on-the-nose.

That makes sense[1] but it prompts the obvious question: does this style write it as typeö then?

1: Though personally I hate it, I just cannot not read those as completely different vowels (in particular ï → [i:] or the ee in need; ë → [je:] or the first e here; and ö → [ø] or the e in her)


No. Firstly because it is spelled “typo.” Secondly you typically use the diaeresis to tell the reader to not confuse it with a similarly spelled sound or diphthong. So it tells a reader that “reëlect” is not pronounced REEL-ect, “coöperate” is not COOP-uh-ray-t, and “naïve” is not NAY-v.

Because written English makes so much sense normally. God forbid someone has to figure out the ambiguous pronunciation of those particular words. It seems like a silly thing to provide extra guidance on to me.

I suspect the diaresis was intentional, in “New Yorker” style.

https://www.arrantpedantry.com/2020/03/24/umlauts-diaereses-...


I can’t design wallpapers/stickers/icons/…, but I can describe what I want to an image generation model verbally or with a source photo, and the new ones yield pretty good results.

For icons in particular, this opens up a completely new way of customizing my home screen and shortcuts.

Not necessary for the survival of society, maybe, but I enjoy this new capability.


So we get a fresh new cheap way to spread propaganda and lies and erode trust all across society while cementing power and control for a few at the top, and in return get a few measly icons (as if there weren’t literally thousands of them freely available already) and silly images for momentaneous amusement?

What a rotten exchange.


I wonder what will happen to the entire legal system. It used to be fairly difficult to create convincing photos and videos.

AI can probably fool most court judges now. Or the defense can refute legitimate evidence by saying “it’s AI / false”. How would that be refuted?


For better or worse, the only admissible evidence going forward will probably be either completely physical or originated in attestation-capable recording devices, i.e. something like a "forensics grade" camera with a signing key in trusted hardware issued by somebody deemed trustworthy.

Given the obvious personal safety upsell ("our phone/dashcam/... produces court-admissible evidence!"), I think we'll even see this in consumer devices before too long.


By having people also testify to authenticity and coming down like the hand of God on fakers, the same way we make sure evidence is real now.

Yes, that is a major worry of mine, too. CCTV evidence is worth nil now (could be generated in whole or part), and even eye-witness testimony can be trusted (sure, a witness may think they saw the alleged perpetrator, but perhaps they just saw an AI-generated video/projection of someone).

Trials have rules for evidence. You can't just pull out some footage out of nowhere. Where did that come from? From what camera? What was the chain of custody on its footage? Etc.

If it means anything, I have a 1990 Almanac from an old encyclopedia that warns the exact same thing about digital photo manipulation. I don't think it really matters at this point

MS13 was literally tattooed on his knuckles!

Multiple data sources, considering the trustworthiness of the source of the information, and accountability for lying.

You might generate an AI video of me committing a crime, But the CCTV on the street didn't show it happening and my phone cell tower logs show I was at home. For the legal system I don't think this is going to be the biggest problem. It's going to be social media that is hit hardest when a fake video can go viral far faster than fact checking can keep up.


AI can also be used to fight propaganda, for instance BiasScanner makes you aware of potentially manipulative news: https://biasscanner.org .

So that makes AI a "dual good", like a kitchen knife: you can cut your tomato or kill you neighbor with it, entirely up to the "user". Not all users are good, so we'll see an intense amplification of both good and bad.


It's more work to fight bullshit than it is to generate it, though. Saying "Use AI to fight it" is inherently a losing strategy when the other side also has an AI that is just as powerful.

And no amount of BS detecting tells you what is true. The challenge that I see a lot of people have is they really don't have a framework to incorporate new information into.

They're adrift, every new "fact" (whether true or false) blows them in a new direction. Often they get led in terrible directions from statements that are entirely true (but missing important context).

A lot of financial cons work that way, a long string of true statements that seem to lead to a particular conclusion. I know that if someone is offering me 20% APY there will usually be some risk or fee that offsets those market-beating gains (it may be a worthwhile risk or a well earned fee, but that number needs to trigger further investigation).

We need people to be equipped with that sort of framework in as many areas as possible, but we seem to be moving backwards in that area.


AI is certainly a dual good but I think the project is misguided at best.

I put in one of the driest descriptions of the Holocaust I could find and it got a very high score for bias, calling a factual description of a massacre emotional sensationalism because it inevitably contains a lot of loaded words.

It also doesn't differentiate between reporting, commentary, poetry, or anything else. It takes text and spits out a number, which is a very shallow analysis.


Don’t blame the tools. Stalin, Mao and Hitler didn’t need AI.

That pro forma response grows oh so very tiresome.

For the nth time: scale, easiness, and access, matter. AI puts propaganda abilities far beyond the reach of those men in the hands of many more people. Do you not understand the difference between one man with a revolver and an army with machine guns? They are not the same.

Nowhere in my comment am I “blaming the tools”. I’ll ask you engage with the argument honestly instead of simply parroting what you already believe absent reading.


Did you do a net benefit calculation? If not, all these knee jerk anti-AI comments are tiresome and predictable (see luddites).

> I’ll ask you engage with the argument honestly instead of simply parroting what you already believe absent reading

I did engage with argument. The argument is a tiresome old argument that is knee-jerk anti tech. You seem to be the thoughtless one in this discourse repeating for the infinite time an anti-tech position assuming net negatives outweigh massively net +ves.

Also, why attack me instead of the argument? Did I touch a logical sore point? I believe so.

> For the nth time: scale, easiness, and access, matter.

By that logic, So the printing press was evil? Remember, Mao/Stalin/Hitler used presses to spread their propaganda.

Also, for the n+1 time, using your own style, don't be lazy:

1. Come up with a net benefit calculation for AI. What? You can't? Then, don't try to claim this is all net negative.

2. Explain how AI is different from other tech like the printing press, that also had scale, easiness, and access.


Is that worth the cost of this technology? Both in terms of financial shenanigans and its environmental cost?

Are you asking if the 10 seconds it takes AI to generate an image is more costly to the environment than a commissioned graphics artist using a laptop for 5-6 hours, or a painter who uses physical media sourced from all over the world?

In short, yes.

A modern laptop is running almost fanless, like a 486 from the days of yore.

A single H200 pumps out 700W continuously in a data center, and you run thousands of them.

Also, don't forget the training and fine tuning runs required for the models.

Mass transportation / global logistics can be very efficient and cheap.

Before the pandemic, it was cheaper to import fresh tomatoes from half-world away rather than growing them locally in some cases. A single container of painting supplies is nothing in the grand scheme of things, esp. when compared with what data centers are consuming and emitting.


This argument is so flawed that its conclusion almost loops back around to being correct again:

No, in terms of unit economics, I'm almost certain that the painting supplies have a bigger ecological/resource footprint than an LLM per icon generated, and I'm pretty sure the cost of shipping tomatoes does not decrease that footprint, even if it possibly dwarfs it.

But yes, due to Jevon's paradox, the total resource use might well increase despite all that. I, for example, would have never commissioned a professional icon for my silly little iOS shortcuts on my homescreen, so my silly icon related carbon footprint went from exactly zero to slightly above that.


these are unfair comparisons. it's not just a single laptop running all day it's all the graphic designer laptops that get replaced. it's not a single container of painting supplies it's all off them, (which are toxic by the way).

so if power were plentiful and environmental you'd be onboard with it?


> these are unfair comparisons. it's not just a single laptop running all day it's all the graphic designer laptops that get replaced. it's not a single container of painting supplies it's all off them, (which are toxic by the way).

Please see my other comment about energy consumption and connect the dots with how open loop DLC systems are harmful to fresh water supplies (which is another comment of mine).

> so if power were plentiful and environmental you'd be onboard with it?

This is a pretty loaded way to ask this. Let me put this straight. I'm not against AI. I'm against how this thing is built. Namely:

    - Use of copyrighted and copylefted materials to train models and hiding under "fair use" to exploit people.
      - Moreover, belittling of people who create things with their blood sweat and tears and poorly imitating their art just for kicks or quick bucks.
    - Playing fast and loose with environment and energy consumption without trying to make things efficiently and sustainably to reduce initial costs and time to market.
    - Gaslighting the users and general community about how these things are built, and how it's a theater, again to make people use this and offload their thinking, atrophying their skills and making them dependent on these.
I work in HPC. I support AI workloads and projects, but the projects we tackle have real benefits, like ecosystem monitoring, long term climate science, water level warning and prediction systems, etc. which have real tangible benefits for the future of the humanity. Moreover, there are other projects trying to minimize environmental impact of computation which we're part of.

So it's pretty nuanced, and the AI iceberg goes well below OpenAI/Anthropic/Mistral trio.


> I support AI workloads and projects, but the projects we tackle have real benefits [...]

As opposed to the illusory/fake/immoral benefits of using LLMs for entertainment purposes (leaving aside all other applications for now)?

How do you feel about Hollywood, or even your local theater production? I bet the environmental unit economics don't look great on those either, yet I wouldn't be so quick to pass moral judgement.

Why not just focus on the environmental impact instead of moralizing about the utility? It seems hard to impossible to get consensus there, and the impact should be able to speak for itself if it's concerning.


This is a plainly dishonest comparison. A single H200 does not need to run continuously for you to generate a dozen pictures. And then you immediately pivot to comparing the paint usage against "the grand scheme of things"- 700W is nothing in the grand scheme of things.

In fact it's pretty fair.

Many people think that when a piece of hardware is idle, its power consumption becomes irrelevant, and that's true for home appliances and personal computers.

However, the picture is pretty different for datacenter hardware.

Looking now, an idle V100 (I don't have an idle H200 at hand) uses 40 watts, at minimum. That's more than TDP of many, modern consumer laptops and systems. A MacBook Air uses 35W power supply to charge itself, and it charges pretty quickly even if it's under relatively high stress.

I want to clarify some more things. A modern GPU server houses 4-8 high end GPUs. This means 3KW to 5KW of maximum energy consumption per server. A single rack goes well around 75KW-100KW, and you house hundreds of these racks. So, we're talking about megawatts of energy consumption. CERN's main power line on the Swiss side had a capacity around 10MW, to put things in perspective.

Let's assume an H200 uses 60W energy when it's idle. This means ~500W of wasted energy per server for sitting around. If a complete rack is idle, it's 10KW. So you're wasting energy consumption of 3-5 houses just by sitting and doing nothing.

This computation only thinks about the GPU. Server hardware also adds around 40% to these numbers. Go figure. This is wasting a lot for cat pictures.

And, these "small" numbers add up to a lot.


Definitely worth considering in a world in which there are any H200s idling in data centers.

Now that's one fine No True Scotsman.

    A: GPUs use a lot of power!
    B: Not all of them are running 100% continuously, eh?,
    A: They waste too much power when they're idle, too!
    C: None of the H200s are sitting idle, you knob!
I mean, they are either wasting energy sitting idle or doing barely useful work. I don't know what to say anymore.

We'll cook ourselves, anyway. Why bother? Enjoy the sauna. ¯\_(ツ)_/¯


B is supposed to be me? I said the H200 doesn't need to be running continuously to generate a dozen images. If a million people generate a dozen images, it no longer makes sense to compare to the costs of a single artist for 6 hours. I really don't understand why this is hard and that makes this feel very uncharitable.

I'm not saying that this isn't "true idling", I'm saying that idling H200s simply don't exist, i.e., I disagree with B. Do you, A, even disagree?

> they are either wasting energy sitting idle or doing barely useful work

Now here's a true (inverse) scotsman, or more accurately, a moved goalpost: Work on things you don't deem valuable is basically the same thing as idling?

> We'll cook ourselves, anyway. Why bother? Enjoy the sauna. ¯\_(ツ)_/¯

I'm very concerned about that too, but I don't think we'll avoid the sauna with fatalism or logically unsound appeals to morality about resource consumption.


Cheaper/faster tech increases overall consumption though. Without the friction of commissioning a graphics artist to design something, a user can generate thousands of images (and iterate on those images multiple times to achieve what they want), resulting in way more images overall.

I'm not really well versed on the environmental cost, more just (neutrally) pointing out that comparing a single 10s image to a 5-6 hour commission ignores the fact that the majority of these images probably would never have existed in the first place without AI.


Also, ignoring training when talking about the environmental costs is bad faith. Without training this image would not exist, and if nobody generating images like these, the training would not happen. So we should really ask, the 10 seconds it took for inference, plus the weeks or months of high intensity compute it took to train the model.

You'd want to compare against the fraction of training attributable to the image

Wow, do you hold a degree in false dichotomies?

The environmental cost is significantly overblown, especially water usage.

I work with direct liquid cooled systems. If the datacenter is working with open DLC systems (most AI datacenters in the US in fact do), there's a lot of water is being wasted, 7/24/365.

A mid-tier top-500 system (think about #250-#325) consumes about a 0.75MW of energy. AI data centers consume magnitudes more. To cool that behemoth you need to pump tons of water per minute in the inner loop.

Outer loop might be slower, but it's a lot of heated water at the end of the day.

To prevent water wastage, you can go closed loop (for both inner and outer loops), but you can't escape the heat you generate and pump to the atmosphere.

So, the environmental cost is overblown, as in Chernobyl or fallout from a nuclear bomb is overblown.

So, it's not.


It's not that it doesn't use water; it's that water is not scarce unless you live in a desert.

As a country, we use 322 billion gallons of water per day. A few million gallons for a datacenter is nothing.


The problem is you don't just use that water and give it back.

The water gets contaminated and heated, making it unsuitable for organisms to live in, or to be processed and used again.

In short, when you pump back that water to the river, you're both poisoning and cooking the river at the same time, destroying the ecosystem at the same time too.

Talk about multi-threaded destruction.


No, you're making that up. Datacenters do not poison rivers.

To reiterate, I work in a closed loop DLC datacenter.

Pipes rust, you can't stop that. That rust seeps to the water. That's inevitable. Moreover, if moss or other stuff starts to take over your pipes, you may need to inject chemicals to your outer loop to clean them.

Inner loops already use biocides and other chemicals to keep them clean.

Look how nuclear power plants fight with organism contamination in their outer cooling loops where they circulate lake/river water.

Same thing.


Dude you can’t fight Dunning Krueger. They all think they’re experts in everything now.

Just because some countries waste a lot at present time does not mean it's available as a resource indefinitely.

The environmental cost of Chernobyl is indeed often overblown. Nature in the exclusion zone is arguably off much better now than before!

The cost to humans living in affected areas was massive and high profile, but it’s very questionable if it was higher than that of an equivalent amount of coal-burning plants. Fortunately not a tradeoff we have to debate anymore, since there are renewables with much fewer downsides and externalities still.

Nuclear bombs (at least those being actually used) by design kill people, so I’m not sure what the externalities even are if the main utility is already to intentionally cause harm.


Depends on if you believe it will ever become cheaper. Either hardware, inspiring more efficient smaller models, or energy itself. The techno optimist believes that that is the inevitable and investable future. But on what horizon and will it get “zip drived” before then?

absolutely without a doubt it is

If that energy is used for research, maybe. If used to answer customer questions or generate Studio Ghibli knock-offs, it's not worth it, even a bit.

what’s the difference between those two? how can you say one has more value than the other?

One is trying to save the future of the planet and the humanity with science, the other one is mocking a man who devoted his whole life to his art, even if it means spending years to perfect a three-second sequence for kicks and monies.

If you see no difference between them, I can't continue to discuss this with you, sorry.


To you. Fortunately nobody elected you chief resource allocator of the planet.

And I say that as somebody that also finds Ghibli knock-off avatars used by AI bros in incredibly bad taste (or, arguably an even worse crime against taste, a dated 2025 vibe).


Thanks for your personal jab. Another nice comment to frame and hang to my wall.

I like your discussion style.


Passing moral judgement about other people's value preferences seems pretty preposterous to me as well, so I was being a bit glib, but to be clear:

I don't want to live in a world in which people get to decide what others can and can't do with their share of resources (after properly accounting for all externalities, including pollution, the potential future value of non-renewable present resources etc. – this is where today's reality often and massively misses that ideal) based on their subjective moral criteria.

Not even just for ethical/moral reasons, but also for practical ones: It’s infinitely harder to get everybody to additionally agree on value of use than on fairness of allocation alone.

After thoroughly mixing these two quite distinct concerns, you'll also have a very hard time convincing me that your concerns for river pollution etc. (which I take very seriously as potentially unaccounted negative externalities, if they exist) are completely free from motivated reasoning about "immoral usage".


This is where I’m at. If you can’t be bothered to write/make it, why would I be bothered to read or review it?

Because I'm not an artist and can't afford to pay one for whatever business I have? This idea that only experts are allowed to do things is just crazy to me. A band poster doesn't have to be a labor of love artisanal thing. Were you mad when people made band posters with MS word instead of hiring a fucking typesetter? I just don't get it.

I dunno, I have some band posters that are pretty cool pieces of art that obviously had a lot of thought put into them (pre-AI era stuff). I don't think I'd hang up an AI generated band poster, even if it was cool; I'd feel weird and tacky about it.

I was hosting a Karaoke event in my town and really went out of my way to ensure my promotional poster looked nothing like AI. I really really really did not want my townfolks thinking I would use AI to design a poster.

My design rules were: No gradients; no purple; prefer muted colors; plenty of sharp corners and overlapping shapes; Use the Boba Milky font face;



I mean: https://imgur.com/a/BYikxEI

The difference is very stark:

- The AI has a hard time making the geometric shapes regular. You see the stars have different size arms at different intervals in the AI version. This will take a human artist longer time to make it look worse.

- The 5-point stars are still a little rounded in the AI version.

- There is way too much text in the AI version (a human designer might make that mistake, but it is very typical of AI).

- The orange 10 point star in the right with the text “you are the star” still has a gradient (AI really can’t help it self).

- The borders around the title text “Karaoke night!” bleed into the borders of the orange (gradient) 10-point star on the right, but only half way. This is very sloppy, a human designer would fix that.

- The font face is not Milky Boba but some sort of an AI hybrid of Milky Boba, Boba Milky and comic sans.

- And finally, the QR code has obvious AI artifacts in them.

Point I’m making, it is very hard to prompt your way out of making a poster look like AI, especially when the design is intentional in making it not look like AI.


I hear what you’re saying and at the same time I don’t agree with some of your criticisms. The gradient, yep, it slipped one in. The imperfect stars? I have seen artists do this forever, presumably intentional flair. The few real “glitches” would be trivial to fix in Photoshop.

But they are very different certainly. ChatGPT generated a poster with a very sleek, “produced” style that apes corporate posters whereas you went with a much more personal touch. You are correct that yours does not look like typical AI.

My point is certainly not that the AI poster is better, only that it’s capable of producing surprising results. With minimal guidance it can also generate different styles: https://imgur.com/a/zXfOZaf

I think the trend to intentionally make stuff look “non-AI” is doomed to fail as AI gets better and better. A year or two ago the poster would have been full of nonsense letters.

> And finally, the QR code has obvious AI artifacts in them.

I wonder if this is intentional, to prevent AI from regurgitating someone’s real QR codes.

ETA: Actually, I wonder how much of the “flair” on human-drawn stars is to avoid looking like they are drag-and-drop from a program like Word. Ironic if we’ve circled back around to stars that look perfect to avoid looking like a different computer generated star.


My point is not that the AI version looks bad (although it does) it is that I hate AI, and so do many people around me. And I hate AI so much, and I know so many people around me hate AI as much, that I am consciously altering my designs such to be as far away from AI as I can. This is the moving from Seattle to Florida after a divorce of creative design.

About the stars. I know designers paint unperfect stars. I even did that in my design. In particular I stretched it and rotated slightly. A more ambitious designer might go further and drag a couple of vertices around to exaggerate them relative to the others. But usually there is some balance in their decisions. AI however just puts the vertices wherever, and it is ugly and unbalanced. A regular geometric shape with a couple of oddities is a normal design choice, but a geometric shape which is all oddities is a lot of work for an ugly design. Humans tend not do to that.


> I am consciously altering my designs such to be as far away from AI as I can

I don’t think this is a productive choice, but it’s certainly yours to make.

> but a geometric shape which is all oddities is a lot of work for an ugly design. Humans tend not do to that

I find this such an odd thing to say. It’s way easier to draw a wonky star than a symmetrical one. Unless “drawing” here means using a mouse to drag and drop a star that a program draws for you.

Vintage illustrations are full of nonsymmetrical shapes. The classic Batman “POW” and similar were hand drawn and rarely close to symmetrical.


I draw mine in Inkscape (because I like open source more then my sanity) and inkscape has special tools to draw regular geometric shapes. You don‘t need to use those tools, you can use the free draw pen, or the bezier curve tool, or even hand code the <path d="M43,32l5.34-2.43l3.54-0.53" />, etc. But using these other tools is suboptimal compared to the regular geometric tool.

Apart from me, my partner also does graphic design, and unlike me she values her sanity more then open source so she uses illustrator for her designs. In adobe’s walled garden world of proprietary software it is still the same story, you generally use the specific tools to get regular shapes (or patterns) and then alter them after the they are drawn. You don‘t draw them from scratch. If you are familiar with modular analog synthesizers, this is starting with a square wave, and then subtracting to modulate the signal into a more natural sounding form.


> I think the trend to intentionally make stuff look “non-AI” is doomed to fail as AI gets better and better.

What’s the mechanism that makes an AI ‘better’ at looking non-AI? Training on non-ai trend images? It’s not following prompts more closely. Even if that image had no gradients or pointier shapes, it still doesn’t look like it was made by an individual.

To your counterpoints, notice that you are apologizing for the AI by finding humans that may have done something, sometime, that the AI just did. Of course! It’s trained on their art. To be non-AI, art needs to counter all averages and trends that the models are trained on.


> What’s the mechanism that makes an AI ‘better’ at looking non-AI?

I don’t know. Better training data? More training data? The difference over the past year or two is stark so something is improving it.

> Even if that image had no gradients or pointier shapes, it still doesn’t look like it was made by an individual.

The fact that humans are actively trying to make art that does not look like AI makes it clear that AI is not so obvious as many would like to pretend. If it were obvious, no one would need to try to avoid their art looking like AI.

> To your counterpoints, notice that you are apologizing for the AI by finding humans that may have done something, sometime, that the AI just did. Of course! It’s trained on their art.

Obviously.

> To be non-AI, art needs to counter all averages and trends that the models are trained on.

So in order to not look like AI, art just has to be so unique that it’s unlike any training data. That’s a high bar. Tough time to be an artist.


I don't know why you're downvoted, I think that's a reasonable use of AI and it looks pretty good.

Edit: I think I misread what you were saying, but I do think it's a nice poster! I get that design is going to have to avoid doing things that AI does, which is kind of unfortunate, because AI is likely trained on a lot of things that are generally good ideas.


> can't afford to pay one for whatever business I have

At small scales what "art" does your business need? If you can't afford to hire an artist (which is completely fine, I couldn't for my business!) do you really need the art or are you trying to make your "brand" look more polished than it actually is? Leverage your small scale while you can because there isn't as much of an expectation for polish.

And no, a band poster doesn't have to be a labor of love. But it also doesn't have to be some big showy art either. If I saw a small band with a clearly AI generated poster it would make me question the sources for their music as well.


> band poster doesn't have to be a labor of love artisanal thing

Very few bands would agree with that statement.


I think you're misunderstanding - most people's beef with AI art isn't that it "isn't made by experts", it's that

1) it's made from copyrighted works, and the original authors receive no credit; 2) it is (typically) low-effort; 3) there are numerous negative environmental effects of the AI industry in general; 4) there are numerous negative social effects of AI in general, and more specifically AI generated imagery is used a lot for spreading misinformation; 5) there are numerous negative economic effects of AI, and specifically with art, it means real human artists are being replaced by AI slop, which is of significantly lower quality than the equivalent human output. Also, instead of supporting multiple different artists, you're siphoning your money to a few billion dollar companies (this is terrible for the economy)

As a side note, if you have a business which truly cannot afford to pay any artists, there are a lot of cheaper, (sometimes free!) pre-paid art bundles that are much less morally dubious than AI. Plus, then you're not siphoning all of your cash to tech oligarchs.


Because I'm not an artist and can't afford to pay one for whatever business I have?

If your business can't afford to spend $5 on Fivr, it's not a business. It's not even panhandling.


Why is that better? They're going to use AI anyway. It's fiver.

No one is saying that only experts can do things; that's a totally inaccurate reading of the argument and the post.

People are saying, very clearly, that they're not willing to put effort into something produced by someone who put no effort in.


What, a music band's poster, 'typeset' in Microsoft Word? I cannot imagine bothering to go to such a band's concert.

<joke>What's your rock band called, "SEC Form 10-K"?</joke>


I agree and whose to say your life experience isn't as valid as someone with less years but more time at just the traditional tools? I'd think either extreme could produce real art if the tools moat was reduced with AI.

I actually love MS word posters. It's a million times more authentic and enjoyable than a slop generation. If a band put up an AI poster I'd assume they lack any kind of taste which is the whole reason I'd want to listen to a band anyway.

I know this is controversial in tech spaces. But most people, particularly those in art spaces like music actually appreciate creativity, taste, effort, and personal connection. Not just ruthless efficiency creating a poster for the lowest cost and fastest time possible.


It's just as low effort. This is silly.

How about going without? I can’t afford an artist, either, so I don’t have art. Don’t foist slop on people because you are trying to be something that you aren’t.

I would rather see a MS word poster than be lied to.

Nobody can be bothered to make my cat out of Lego and the size of mount Everest but if an AI did I'd sure love to see it.

Your quip is pithy but meaningless.


I'm not saying it's worthless for yourself, it's worthless to me as a viewer. AI content is great for your own usage, but there is no point posting and distributing AI generation.

I could have generated my own content, so just send the prompt rather than the output to save everyone time.


And when the distilled knowledge/product is the result of multiple prompts, revisions, and reiterations? Shall we send all 30+ of those as well so as to reproduce each step along the way?

Maybe reread my comment. Would you not want to see a mount Everest sized Lego cat? Even if it were my cat?

Again - your quip sounds good but when you think about it, it's flatly wrong.


This doesn't make sense, if I want to see a lego-cat slopimage I can just prompt a model myself (and have it be of my own cat). There's no reason for you to be involved in any part of that process, because the point of this stuff is that you are not doing anything.

The claim is that people don't / shouldn't want to see something if humans can't be bothered to make it. I provided a counter example. So the claim is nonsense.

Exactly how I feel. There is already more art, movies, music, books, video games and more made by human beings than I can experience in my lifetime. Why should I waste any time on content generated by the word guessing machine?

The issue is that the signalling makes sense when human generated work is better than AI generated. Soon AI generated work will be better across the board with the rare exception of stuff the top X% of humans put a lot of bespoke highly personalized effort into. Preferring human work will be luxury status-signalling just like it is for clothing, food, etc.

I'm probably in a weird subgroup that isn't representative of the general public, but I've found myself preferring "rough" art/logos/images/etc, basically because it signals a human put time into it. Or maybe not preferring, but at least noticing it more than the generally highly refined/polished AI artwork that I've been seeing.

There’s no reason to think people broadly want “better” writing, images, whatever. Look at the indie game scene, it’s been booming for years despite simpler graphics, lower fidelity assets, etc. Same for retro music, slam poetry, local coffee shops, ugly farmers market produce, etc.

There is a mass, bland appeal to “better” things but it’s not ubiquitously desired and there will always be people looking outside of that purely because “better” is entirely subjective and means nothing at all.


I think "better" is doing a lot of heavy lifting in this argument. Better how?

Is an AI generated photo of your app/site going to be more accurate than a screenshot? Or is an AI generated image of your product going to convey the quality of it more than a photo would?

I think Sora also showed that the novelty of generating just "content" is pretty fleeting.

I would be interested to see if any of the next round of ChatGPT advertisements use AI generated images. Because if not, they don’t even believe in their own product.


The issue being, it's not an expression of anything. Merely like a random sensation, maybe some readable intent, but generic in execution, which isn't about anything even corporate art should be about. Are we going to give up on art, altogether?

Edit: One of the possible outcomes may be living in a world like in "Them" with glasses on. Since no expression has any meaning anymore, the message is just there being a signal of some kind. (Generic "BUY" + associated brand name in small print, etc.)


Can't the expression come from the person prompting the AI and sometimes taking hours inpainting or tweaking the prompt to try get the exact image / expression they had in their mind? A good use I've found is to be able to make scenes from a dream you had into an image. If that's not an expression of something then I'm not sure anything is.

Notably, this process of struggle is meant to go away, to make room for instant satisfaction. This is really about some kind of expression consumerism. (And what will be lost along the way is meaning.)

I always find this argument to ring hollow. Maybe it's because I've been through it with too many technologies already. Digital photography took out the art of film photography. CGI took out the wonder of practical effects. Digital art takes out the important brush strokes of someone actually painting. The real answer always is the mediums can coexist and each will be good for expression in their own way.

I'm not sure you immediately lose meaning if someone can make a highly personalized version of something easily. The % of completely meaningless video after YouTube and tiktok came about has skyrocketed. The amount of good stuff to watch has gone up as well though.


Only novel art is interesting. AI can't really do novel. It's a prediction algorithm; it imitates. You can add noise, but that mostly just makes it worse. It can be used to facilitate original stuff though.

But so many people want to make art, and it's so cheap to distribute it, that art is already commoditized. If people prefer human-created art, satisfying that preference is practically free.


AI can be novel, there is nothing in the transformer architecture which prohibits novelty, it's just that structurally it much prefers pattern-matching.

But the idea of novelty is a misnomer I think. Any random number generator can arbitrarily create a "novel" output that a human has never seen before. The issue is whether something is both novel and useful, which is hard for even humans to do consistently.


Anthropic recently changed their take-home test specifically to be more “out-of-distribution” and therefore more resistant to AI so they can assess humans.

I’m so tired of “there’s nothing preventing”, and “humans do that too”. Modern AI is just not there. It’s not like humans and has difficulties with adapting to novelty.

Whether transformers can overcome that remains to be seen, but it is not a guarantee. We’ve been dealing with these same issues for decades and AI still struggles with them.


There are lots of things that are novel to you without necessarily being novel to the universe.

"Artisanal art" as it were.

The goal of art isn't to be perfect or as realistic as possible. The goal of art is to express, and enjoy that unique expression.

> Preferring human work will be luxury status-signalling just like it is for clothing, food, etc.

What? Those items are luxuries when made by humans because they are physical goods where every single item comes with a production and distribution cost.


Here’s one example:

I just recently used for image generation to design my balcony.

It was a great way to see design ideas imagined in place and decide what to do.

There are many cases people would hire an artist to illustrate an idea or early prototype. AI generated images make that something you can do by yourself or 10x faster than a few years ago.


Did the same for my front garden.

Not withstanding a few code violations, it generated some good ideas we were then able to tweak. The main thing was we had no idea of what we wanted to do, but seeing a lot of possibilities overlaid over the existing non-garden got us going. We were then able to extend the theme to other parts of the yard.


100%. A picture is worth a thousand words only when it conveys something. I love to see the pictures from my family even when they are taken with no care to quality or composition but I would look at someone else’s (as in gallery/exhibitions) only when they are stunning and captured beautifully. The medium is only a channel to communicate.

Also, this can’t be real. How many publications did they train this stuff on and why are there no acknowledgment even if to say - we partnered with xyz manga house to make our model smarter at manga? Like what’s wrong with this company?


I'm working on an edutech game. Before I would've had much less of a product because I don't have the budget to hire an artist and it would've been much less interactive but because of this I'm able to build a much more engaging experience so that's one thing. For what it's worth.

We need to flip the script. AI is trying to do marketing: add “illegal usage will lead to X” is a gateway to spark curiosity. There is this saying that censoring games for young adults makes sure that they will buy it like crazy by circumventing the restrictions because danger is cool.

There is nothing that cannot harm. Knives, cars, alcohol, drugs. A society needs to balance risks and benefits. Word can be used to do harm, email, anything - it depends on intention and its type.


I see your point but reconsider: we will and need to see. Time will tell and this is simply economics: useful? Yes, no.

I started being totally indifferent after thinking about my spending habits to check for unnecessary stuff after watching world championships for niche sports. For some this is a calling for others waste. It is a numbers game then.


The technically (in both senses) astonishing and amazing output is not far off from some of the qualities of real advertising: Staged, attention grabbing, artificially created, superficially demanded, commercially attractive qualities. These align, and lots of similarities in the functions and outcomes of these two spheres come to mind.

>and even if we cannot tell if an image is AI-generated, we can know if companies are using AI to generate images in general, so the appealing is decreasing

Is that true? Don't think I'd get tired of images that are as good as human made ones just because I know/suspect there may have been AI involved


I think there's real value to be had in using this for diagrams.

Visual explanations are useful, but most people don't have the talent and/or the time to produce them.

This new model (and Nano Banana Pro before it) has tipped across the quality boundary where it actually can produce a visual explanation that moves beyond space-filling slop and helps people understand a concept.

I've never used an AI-generated image in a presentation or document before, but I'm teetering on the edge of considering it now provided it genuinely elevates the material and helps explain a concept that otherwise wouldn't be clear.


Are there any models that are specifically trained to produce diagrams as SVG? I'd much prefer that to diffusion-based raster image generation models for a few reasons:

- The usual advantages of vector graphics: resolution-independence, zoom without jagged edges, etc.

- As a consequence of the above, vector graphics (particularly SVG) can more easily be converted to useful tactile graphics for blind people.

- Vector graphics can more practically be edited.


You can get them to produce mermaid diagrams, but you can also generate these yourself from text.

This is the key point. In my view it's just like anything else, if AI can help humans create better work, it's a good thing.

I think what we'll find is that visual design is no longer as much of a moat for expressing concepts, branding, etc. In a way, AI-generated design opens the door for more competition on merits, not just those who can afford the top tier design firm.


yeah I'm not sure I'm in agreement that we can hand-wave assets and ads as entire classes of valuable content

I tend to share your same view. But is there really a line like you describe? Maybe AI just needs to get a few iterations better and we'll all love what it generates. And how's it really any different than any Photoshop computer output from the past?

>In general, I think people are starting to realize that things generated without effort are not worth spending time with

Agreed mostly, BUT

I'm building tools for myself. The end goal isn't the intermediate tool, they're enabling other things. I have a suspicion that I could sell the tools, I don't particularly want to. There's a gap between "does everything I want it to" and "polished enough to justify sale", and that gap doesn't excite me.

They're definitely not generated without effort... but they are generated with 1% of the human effort they would require.

I feel very much empowered by AI to do the things I've always wanted to do. (when I mention this there's always someone who comes out effectively calling me delusional for being satisfied with something built with LLMs)


> What else?

I used to have an assistant make little index-card sized agendas for gettogethers when folks were in town or I was organising a holiday or offsite. They used to be physical; now it's a cute thing I can text around so everyone knows when they should be up by (and by when, if they've slept in, they can go back to bed). AI has been good at making these. They don't need to be works of art, just cute and silly and maybe embedded with an inside joke.


I'm not seeing how it takes more than 5 minutes to type up an itinerary. If you want to make it cute and silly, just change up the font and color and add some clip art.

If this is the best use case that exists for AI image generation, I'm only further convinced the tech is at best largely useless.


> not seeing how it takes more than 5 minutes to type up an itinerary

Because I’ll then spend hours playing with the typography (because it’s fun) and making it look like whatever design style I’ve most recently read about (again, because it’s fun) and then fighting Word or Latex because I don’t actually know what I’m doing (less fun). Outsourcing it is the right move, particularly if someone else is handling requests for schedules to be adjusted. An AI handles that outsourcing quicker for low-value (but frequent) tasks.

> If this is the best use case that exists for AI image generation

I’ve also had good luck sketching a map or diagram and then having the AI turn it into something that looks clean.

Look, 99% of my use cases are e.g. making my cat gnaw on the Tetons or making a concert of lobsters watching Lady Gaga singing “I do it for the claws” or whatever so I can send two friends something stupid at 1AM. But there does appear to be a veneer of productivity there, and worst case it makes the world look a bit nicer.


You might not be able to tell how bad the AI slop looks, but I guarantee some of your friends can. AI is awful at maps and diagrams.

I’m not giving my friends AI maps and diagrams. And yes, they don’t look great. But they work. If I want to communicate something spatial, I can spend an hour in R or five minutes in Claude. The point is to communicate that information, and for a quick task, AI means the other person gets a map versus block of text they have to reason through.

I don't care how many times you write "cute," having my vacation time programmed with that level of granularity and imposed obligation sounds like the definition of "dystopian."

If I got one of your cute schedule cards while visiting you, I'd tear it up, check into a cheap motel, and spend the rest of my vacation actually enjoying myself.

Edit: I'm not an outlier here. There have even been sitcom episodes about overbearing hosts over-programming their guests' visits, going back at least to the Brady Bunch.


> If I got one of your cute schedule cards while visiting you, I'd tear it up, check into a cheap motel, and spend the rest of my vacation actually enjoying myself

Okay. I'd be confused why you didn't voice up while we were planning everything as a group, but those people absolutely exist. (Unless it's someone's, read: a best friend or my partner's, birthday. Then I'm a dictator and nobody gets a choice over or preview of anything.)

I like to have a group activity planned on most days. If we're going to drive to get in an afternoon hike in before a dinner reservation (and if I have 6+ people in town, I need a dinner reservation because no I'm not coooking every single evening), or if I've paid for a snowmobile tour or a friend is bringing out their telescope for stargazing, there are hard no-later-than departure times to either not miss the activity or be respectful of others' time.

My family used to resolve that by constantly reminding everyone the day before and morning of, followed by constantly shouting at each other in the hours and minutes preceding and–inevitably–through that deadline. I prefer the way I've found. If someone wants to fuck off from an activity, myself included, that's also perfectly fine.

(I also grew up in a family that overplanned vacations. And I've since recovered from the rebound instinct, which involves not planning anything and leaving everything to serendipity. It works gorgeously, sometimes. But a lot of other times I wonder why I didn't bother googling the cool festival one town over before hand, or regretted sleeping in through a parade.)

> There have even been sitcom episodes about overbearing hosts over-programming their guests' visits

Sure. And different groups have different strokes. When it comes to my friends and I, generally speaking, a scheduled activity every other day with dinners planned in advance (they all get hangry, every single fucking one of them) works best.


You are kidding, right?

It's good that my friends don't make a coffee date feel like a board meeting (with an agenda shared by post 14 working days ahead of the meeting, form for proxy voting attached).


>Like, in terms of art, it's discarded (art is about humans)

I dunno how long this is going to hold up. In 50 years, when OpenAI has long become a memory, post-bubble burst, and a half-century of bitrot has claimed much of what was generated in this era, how valuable do you think an AI image file from 2023 - with provenance - might be, as an emblem and artifact of our current cultural moment, of those first few years when a human could tell a computer, "Hey, make this," and it did? And many of the early tools are gone; you can't use them anymore.

Consider: there will never be another DallE-2 image generation. Ever.


While I agree with you, hacker news audience is not in the middle of the bell curve.

I get this sounds elitist - but tremendous percentage of population is happily and eagerly engaging with fake religious images, funny AI videos, horrible AI memes, etc. Trying to mention that this video of puppy is completely AI generated results in vicious defense and mansplaining of why this video is totally real (I love it when video has e.g. Sora watermarks... This does not stop the defenders).

I agree with you that human connection and artist intent is what I'm looking for in art, music, video games, etc... But gawd, lowest common denominator is and always has been SO much lower than we want to admit to ourselves.

Very few people want thoughtful analysis that contradicts their world view, very few people care about privacy or rights or future or using the right tool, very few people are interested in moral frameworks or ethical philosophy, and very few people care about real and verifiable human connection in their "content" :-/


HN is absolutely not more critical of AI output than the norm.

It's been true for various technologies that HN (and tech audiences in general) have a more nuanced view, but AI flips the script on that entirely. It's the tech world who are amazed by this, producing and being delighted by endless blogposts and 7-second concept trailers.


I think we are conflating usage vs consumption.

I think HN probably uses GenAI more than average population.

But I think HN consumes less GenAI content than average population.

Look at Facebook, Instagram, Youtube, TikTok, etc. All I see is my non-techie friends being amazed and mesmerized by - cute animals, creepy animals, political events, jokes, comedy, outrage, events, speeches - that never ever happened. As if we don't have actual real puppies that are cute, my acquintenances and family are oooing and awwwing at fake howling huskies, fake animals being jump-scared by fake surprises.

HN may be amazed by potential of AI output the improve the world more than average person. But hustlers are laughing their way to the bank as they actually use AI to make ridiculous, and I do mean ridiculous, amount of "content" for cheap, that is, absolutely is, being consumed at prodigious rate with no sign of stopping. This is not 7-second trailers and concepts for some future years - this is mega-years of actual content being liked, shared, engaged with and consumed, right now. This is what OP is hoping that tides will turn against, and this is what I see no sign of rejection in my non-techie/non-geeky circles :(


You're on a site where the commenters read AI-generated articles about how they can generate new images to include in their generated websites that they themselves generate more articles about.

Sure, the weird cat-people adverts aren't aimed at HN's commentariat, but every 'democratise art and build that game you've dreamt of' pitch is. Every breathless paean to AI assistants/companions/partners is targeted at the users here.

Usage is a form of consumption; thinking of yourself as a creator while you consume doesn't mean you consume less.

Non-tech users are being fed fake images when they browse idly. Tech users are restructuring their entire lives around these tools.


I recently shoulder-surfed a family member scrolling away on their social media feed, and every single image was obvious AI slop. But it didn't matter. She loved every single one, watched videos all the way through, liked and commented on them... just total zombie-consumption mode and it was all 100% AI generated. I've tried in the past pointing out that it's all AI generated and nothing is real, and they simply don't care. People are just pac-man gobbling up "content". It's pretty sad/scary.

I'd be a bit more humble rather than terrified, because I enjoy some AI slop too, especially funny animals that remind me of my old pets' antics. There are levels of slop. But tasteless stuff with crap graphics plastered all over, loud edits or badly calibrated tts voices were already all over reels/tiktok long before AI, and people still liked that.

The unsettling thing on social media is the mind hijacking with the recommendation algo and scrolling motion that resembles a slot machine, more than the content itself.


Seems good enough to generate 2D sprites. If that means a wave of pixel-art games I count it as a net win.

I dont think gamers hate AI, it is just a vocal miniority imo. What most people dislike is sloppy work, as they should, but that can happen with or without AI. The industry has been using AI for textures, voices and more for over a decade.


> Seems good enough to generate 2D sprites.

It’s really not. That's actually a pet peeve of mine as someone who used to spent a lot of time messing with pixel art in Aseprite.

Nobody takes the time to understand that the style of pixel art is not the same thing as actual pixel art. So you end up with these high-definition, high-resolution images that people try to pass off as pixel art, but if you zoom in even a tiny bit, you see all this terrible fringing and fraying.

That happens because the palette is way outside the bounds of what pixel art should use, where proper pixel art is generally limited to maybe 8 to 32 colors, usually.

There are plenty of ways to post-process generative images to make them look more like real pixel art (square grid alignment, palette reduction, etc.), but it does require a bit more manual finesse [1], and unfortunately most people just can’t be bothered.

[1] - https://github.com/jenissimo/unfake.js


There are already more games being released on Steam than anyone can keep up with, I'm not sure how adding another "wave" on top of it helps.

AI for textures for over a decade? What AI?

Efros–Leung, PatchMatch? Nearest neighbours was "AI" before difusion models.

Don't you think it's a huge stretch to compare those to modern generative AI in this context? Those don't raise any of the questions that make current usage questionable.

Are you kidding? I think I see more vitriol for AI in gaming communities than anywhere else. To the point where steam now requires you to disclose its usage

Crimson Desert failed to disclose on release and (almost) nobody cared, gamers kept buying it.

> Like, in terms of art, it's discarded (art is about humans)

If a work of art is good, then it's good. It doesn't matter if it came from a human, a neanderthal, AI, or monkeys randomly typing.


The connection with the artist, directly, or across space and time, is a critical part of any artwork. It is one human attempting to communicate some emotional experience to another human.

When I watch a Lynch film I feel some connection to the man David Lynch. When I see a AI artwork, there is nothing to connect with, no emotional experience is being communicated, it is just empty. It's highest aspiration is elevator music, just being something vaguely stimulating in the background.


I don't agree. If a poem is moving, it's moving. It doesn't matter who wrote it.

I understand these are fundamental questions about aesthetics that people differ over. But that's how it works for me. However, ultimately, I think people will realize that I'm right around the time that AI does start generating good art.


Provenance is part of the work. If a roomful of monkeys banged out something that looked like anything, I'd absolutely hang it on my wall. I would not say the same for 99% of AI generated art.

Whether art is considered good is in practice highly contextual. One of those contexts is who (what) made it.

My only actual use of image or video AI tools is self-entertainment. I like to give it prompts and see the results it gives me.

That's it. I can't think of a single actual use case outside of this that isn't deliberately manipulative and harmful.


The Human Renaissance is something I've been thinking of too and I hope it comes to pass. Of course, I feel like societally, things are gonna get worse for a lot of folks. You already see it in entire towns losing water or their water becoming polluted.

You'd think these kickbacks leaders of these towns are getting for allowing data centers to be built would go towards improving infrastructure but hah, that's unrealistic.

WTF is that unrealistic? SMH


>You already see it in entire towns losing water or their water becoming polluted

Do you have any references for such cases? I have seen talk of such thing at risk, but I am unaware of any specific instances of it occuring


I know I've seen such a story on HN before, you can probably find it by searching for "water" and "data center/AI."

The closest match I found was https://news.ycombinator.com/item?id=44562052

The article tries to play sleight of hand with the specific instance that they cite but it seems that the loss of water is alleged to be caused by sediment from construction rather than water use.

It's not great that it happened and it is something local government should take action on, but it is also something that could have been caused by any form of industrial construction. I suspect there are already laws in place that cover this. If they are not being enforced that's another issue entirely.


That's exactly the article I was thinking of.

Data center construction exposing weaknesses in local infrastructure is a double-edged sword; you wanna know if things need upgrading but you don't wanna be negatively affected by it.

Maybe there should be some clause in these contracts that mandate tech companies foot the bill for local infrastructure improvements.


In that case it does not depict the scenario you suggested.

This is not a data center issue at all, it is a construction issue, that it was a data center being constructed was incidental.

I believe there are regulations that cover things like this already.

To characterise it as representative or specific to data centers is ad best disingenuous.


I didn't write the article man

I completely disagree, this replaces art as a job. Why does human art need monetary feedback to be shared? If people require a paycheck to make art then it was never anything different than what Ai generated images are.

As for advertising being depressing - its a little late to get up on the high horse of anti-Ads for tech after 2 decades of ad based technology dominating everything. Go outside, see all those bright shiny glittery lights, those aren't society created images to embolden the spirit and dazzle the senses, those are ads.

North Korea looks weird and depressing because the don't have ads. Welcome to the west.


AI loopidity rearing it's head. Just send the bullet points that we all want anyway, right?! Stop sending globs of text and other generated content!

Porn and memes. Obviously. This is all that Stable Diffusion has been used for since it was released.

We dropped Claude. It's pretty clear this is a race to the bottom, and we don't want a hard dependency on another multi-billion dollar company just to write software

We'll be keeping an eye on open models (of which we already make good use of). I think that's the way forward. Actually it would be great if everybody would put more focus on open models, perhaps we can come up with something like the "linux/postgres/git/http/etc" of the LLMs: something we all can benefit from while it not being monopolized by a single billionarie company. Wouldn't it be nice if we don't need to pay for tokens? Paying for infra (servers, electricity) is already expensive enough


>we don't want a hard dependency on another multi-billion dollar company just to write software

One of two main reasons why I'm wary of LLMs. The other is fear of skill atrophy. These two problems compound. Skill atrophy is less bad if the replacement for the previous skill does not depend on a potentially less-than-friendly party.


I was worried about skill atrophy. I recently started a new job, and from day 1 I've been using Claude. 90+% of the code I've written has been with Claude. One of the earlier tickets I was given was to update the documentation for one of our pipelines. I used Claude entirely, starting with having it generate a very long and thorough document, then opening up new contexts and getting it to fact check until it stopped finding issues, and then having it cut out anything that was granular/one query away. And then I read what it had produced.

It was an experiment to see if I could enter a mature codebase I had zero knowledge of, look at it entirely through an AI, and come to understand it.

And it worked! Even though I've only worked on the codebase through Claude, whenever I pick up a ticket nowadays I know what file I'll be editing and how it relates to the rest of the code. If anything, I have a significantly better understanding of the codebase than I would without AI at this point in my onboarding.


Yeah, +1. I will never be working on unsolved problems anyhow. Skill atrophy is not happening if you stay curious and responsible.

I have never learned so quickly in my entire life than to post a forum thread in its entirety into a extended think LLM and then be allowed to ask free form questions for 2 hours straight if I want to. Having my questions answered NOW is so important for me to learn. Back in the day by the time I found the answer online I forgot the question

Same. I work in the film industry, but I’ve always been interested in computers and have enjoyed tinkering with them since I was about 5. However, coding has always been this insurmountably complicated thing- every time I make an effort to learn, I’m confronted with concepts that are difficult for me to understand and process.

I’ve been 90% vibe coding for a year or so now, and I’ve learned so much about networking just from spinning up a bunch of docker containers and helping GPT or Claude fix niggling issues.

I essentially have an expert (well, maybe not an expert but an entity far more capable than I am on my own) who’s shoulder I can look over and ask as many questions I want to, and who will explain every step of the process to me if I want.

I’m finally able to create things on my computer that I’ve been dreaming about for years.


I pivoted from the film industry into AI 10 years ago. My end game is to replace movie magic.

Some people talk like skill atrophy is inevitable when you use LLMs, which strikes me as pretty absurd given that you are talking about a tool that will answer an infinite number of questions with infinite patience.

I usually learn way more by having Claude do a task and then quizzing it about what it did than by figuring out how to do it myself. When I have to figure out how to do the thing, it takes much more time, so when I'm done I have to move on immediately. When Claude does the task in ten minutes I now have several hours I can dedicate entirely to understanding.


You lose some, you win some. The win could be short-term much higher, however imagine that the new tool suddenly gets ragged pulled from under your feet. What do you do then? Do you still know how to handle it the old way or do you run into skill atrophy issues? I’m using Claude/Codex as well, but I’m a little worried that the environment we work in will become a lot more bumpy and shifty.

> however imagine that the new tool suddenly gets ragged pulled from under your feet

When you have a headache, do you avoid taking ibuprofen because one day it may not be available anymore? Two hundred years ago, if you gave someone ibuprofen and told them it was the solution for 99% of the cases where they felt some kind of pain, they might be suspicious. Surely that's too good to be true.

But it's not. Ibuprofen really is a free lunch, and so is AI. It's weird to experience, but these kinds of technologies come around pretty often, they just become ubiquitous so quickly that we forget how we got by without them.


> the new tool suddenly gets ragged pulled from under your feet

If that happened at this point, it would be after societal collapse.


I don’t even wanna think about that scenario, maybe he gets averted somehow.

The "infinite patience" thing I find particularly interesting.

Every now and then I pause before I ask an LLM to undo something it just did or answer something I know it answered already, somewhere. And then I remember oh yeah, it's an LLM, it's not going to get upset.


Asking infinite questions about something does not make you good at “doing” that thing, you get pretty good at asking questions

Understanding is not learning. Zero effort gives zero rewards, I ask Claude plenty of things, I get answers but not learnings.

I used to speak Russian like I was born in Russia. I stopped talking Russian … every day I am curious ans responsible but I can hardly say 10 words in Russian today. if you don’t use it (not just be curious and responsible) you will lose it - period.

Programming language is not just syntax, keywords and standard libraries, but also: processes, best practices and design principles. The latter group I guess is more difficult to learn and harder to forget.

I respectfully completely disagree. not only will you just as easily lose thr processed, best practices and design principles but they will be changing over time (what was best practice when I got my first gig in 1997 is not a best practice today (even just 4-5 years ago not to go all the back to the 90’s)). all that is super easy to both forget and lose unless you live it daily

Forget, yes; lose, no. Like it would be much easier for you to relearn Russian - especially compared with someone who only knows English.

More fair comparison would be writing/talking about Russian language in English. That way you'd still focus on Russian. Same way with programming - it's not like you stop seeing any code. So why should you forget it?

Are you sure you would know if it didn't work? I use Claude extensively myself, so I'm not saying this from a "hater" angle, but I had 2 people last week who believe themselves to be in your shoes send me pull requests which made absolutely no sense in the context of the codebase.

That’s always been the case, AI or not.

No, it hasn't. I did not have a problem before AI with people sending in gigantic pull requests that made absolutely no sense, and justifying them with generated responses that they clearly did not understand. This is not a thing that used to happen. That's not to say people wouldn't have done it if it were possible, but there was a barrier to submitting a pull request that no longer exists.

In my experience, the people sending me garbage PRs with Claude are the same ones who wrote garbage code beforehand. Now there's just 10x more of it.

It just happens to be a lot worse now. Confidence through ignorance has come into the spotlight with the commoditization of LLMs.

Yeah, I test everything myself.

I have also found LLMs are a great tool for understanding a new code base, but it's not clear to me what your comment has to do with skill atrophy.

Well ultimately the skill I care about is understanding software, changing it, and making more of it. And clearly that isn't atrophying.

My syntax writing skills may well be atrophying, but I'll just do a leetcode by hand once in a while.


It's good that it's working for you but I'm not sure what this has to do with skill atrophy. It sounds like you never had this skill (in this case, working with that particular system) to begin with.

>I have a significantly better understanding of the codebase than I would without AI at this point in my onboarding

One of the pitfalls of using AI to learn is the same as I'd see students doing pre-AI with tutoring services. They'd have tutors explain the homework to do them and even work through the problems with them. Thing is, any time you see a problem or concept solved, your brain is tricked into thinking you understand the topic enough to do it yourself. It's why people think their job interview questions are much easier than they really are; things just seem obvious when you've thought about the solution. Anyone who's read a tutorial, felt like they understood it well, and then struggled for a while to actually start using the tool to make something new knows the feeling very well. That Todo List app in the tutorial seemed so simple, but the author was making a bunch of decisions constantly that you didn't have to think about as you read it.

So I guess my question would be: If you were on a plane flight with no wifi, and you wanted to do some dev work locally on your laptop, how comfortable would you be vs if you had done all that work yourself rather than via Claude?


> If you were on a plane flight with no wifi, and you wanted to do some dev work locally on your laptop, how comfortable would you be vs if you had done all that work yourself rather than via Claude?

Probably about as comfortable as I would be if I also didn't have my laptop and instead had to sketch out the codebase in a notebook. There's no sense preparing for a scenario where AI isn't available - local models are progressing so quickly that some kind of AI is always going to be available.


So then the argument isn't so much that skill decay isn't an issue but rather that the skill is inherently worthless moving forward. I'm not sure I agree, but I also got a compsci education because I have loved doing it since childhood rather than because I just wanted to make money, and I can see how the latter group would vehemently disagree with me.

What do you mean “cut out anything that was granular/one query away”? This was a very cool workflow to hear about—I will be applying it myself

For example, Claude was very eager to include function names, implementation details, and the exact variables that are passed between services. But all the info I need for a particular process is the names of the services involved, the files involved, and a one-sentence summary of what happens. If I want to know more, I can tell Claude to read the doc and find out more with a single query (or I can just check for myself).

Not so much atrophy as apathy.

I've worked with people who will look at code they don't understand, say "llm says this", and express zero intention of learning something. Might even push back. Be proud of their ignorance.

It's like, why even review that PR in the first place if you don't even know what you're working with?


I cringed when I saw a dev literally copy and paste an AI's response to a concern. The concern was one that had layers and implications to it, but instead of getting an answer as to why it was done a certain way and to allay any potential issues, that dev got a two paragraph lecture on how something worked on the surface of it, wrapped in em dashes and joviality.

A good dev would've read deeper into the concern and maybe noticed potential flaws, and if he had his own doubts about what the concern was about, would have asked for more clarification. Not just feed a concern into AI and fling it back. Like please, in this day and age of AI, have the benefit of the doubt that someone with a concern would have checked with AI himself if he had any doubts of his own concern...


Is this the same subset of people who copy/paste code directly from stack overflow without understanding ? I’m not sure this is a new problem.

It's a new problem in the sense that now executive management at many (if not most) software companies is pushing for all employees to work this way as much as possible. Those same people probably don't know what stack overflow even is.

In my experience, no - I think the ability to build more complete features with less/little/no effort, rather than isolated functions, is (more) appealing to (more) developers.

I don't think so. I'll spend a ton of time and effort thinking through, revising, and planning out the approach, but I let the agent take the wheel when it comes to transpiling that to code. I don't actually care about the code so long as it's secure and works.

I spent years cultivating expertise in C++ and .NET. And I found that time both valuable and enjoyable. But that's because it was a path to solve problems for my team, give guidance, and do so with both breadth and depth.

Now I focus on problems at a higher level of abstraction. I am certain there's still value in understanding ownership semantics and using reflection effectively, but they're broadly less relevant concerns.


It's difficult to copy & paste an entire app from Stack Overflow

Copied and pasted without noting the license that stack overflow has on code published there, no doubt

Hey. I resemble that remark sometimes!! quit being a hater (sarcasm) :P

We've had such developers around, long before LLMs.

They're so much louder now, though.

It’s a lot like someone bragging that they’re bad at math tossing around equations.

If I wanted to know what the LLM says, I would have asked it myself, thanks…

What is it in the broader culture that's causing this?

People who got into the job who don’t really like programming

I like programming, but I don’t like the job.

Then why are you letting Claude do the fun part?

Obviously, the fun part is delivering value for the shareholders.

These people have always existed. Hell, they are here, too. Now they have a new thing to delegate responsibility to.

And no, I don't understand them at all. Taking responsibility for something, improving it, and stewarding it into production is a fantastic feeling, and much better than reading the comment section. :)


You can argu that you will have skill atrophy by not using LLMs.

We have gone multi cloud disaster recovery on our infrastructure. Something I would not have done yet, had we not had LLMs.

I am learning at an incredible rate with LLMs.


I kind feel the same. I’m learning things and doing things in areas that would just skip due to lack of time or fear.

But I’m so much more detached of the code, I don’t feel that ‘deep neural connection’ from actual spending days in locked in a refactor or debugging a really complex issue.

I don’t know how a feel about it.


I strongly agree on the refactor, but for debugging I have another perspective: I think debugging is changing for the better, so it looks different.

Sure, you don't know the code by heart, but people debugging code translated to assembly already do that.

The big difference is being able to unleash scripts that invalidate enormous amount of hypothesis very fast and that can analyze the data.

Used to do that by hand it took hours, so it would be a last resort approach. Now that's very cheap, so validating many hypothesis is way cheaper!

I feel like my "debugging ability" in terms of value delivered has gone way up. For skill, it's changing. I cannot tell, but the value i am delivering for debugging sessions has gone way up


As someone who's switched from mobile to web dev professionally for the last 6 months now. If you care about code quality, you'll develop that neural connection after some time.

But if you don't and there's no PR process (side projects), the motivation to form that connection is quite low.


> If you care about code quality, you'll develop that neural connection after some time.

No, because you can get LLMs to produce high quality code that has gone through an infinite number of refinement/polish cycles and is far more exhaustive than the code you would have written yourself.

Once you hit that point, you find yourself in a directional/steering position divorced from the code since no matter what direction you take, you'll get high quality code.


Only if never find opportunities to simplify the code it's writing and you don't review the code at all.

> no matter what direction you take, you'll get high quality code

This is not the case today. You get medium-quality, sometimes over-engineered code 10x faster.


Yes, you certainly can argue that, but you'd be wrong. The primary selling point of LLMs is that they solve the problem of needing skill to get things done.

That is not the entire selling point - so you are very wrong.

You very much decide how you employ LLMs.

Nobody are keeping a gun to your head to use them. In a certain way.

Sonif you use them in a way that increase you inherent risk, then you are incredibly wrong.


I suggest you read the sales pitches that these products have been making. Again, when I say that this is the selling point, I mean it: This is why management is buying them.

I've read the sales pitches, and they're not about replacing the need for skill. The Claude Design announcement from yesterday (https://www.anthropic.com/news/claude-design-anthropic-labs) is pretty typical in my experience. The pitch is that this is good for designers, because it will allow them to explore a much broader range of ideas and collaborate on them with counterparties more easily. The tool will give you cool little sliders to set the city size and arc width, but it doesn't explain why you would want to adjust these parameters or how to determine the correct values; that's your job.

I understand why a designer might read this post and not be happy about it. If you don't think your management values or appreciates design skill, you'd worry they're going to glaze over the bullet points about design productivity, and jump straight to the one where PMs and marketers can build prototypes and ignore you. But that's not what the sales pitch is focused on.


The majority of examples in the document you linked describe 'person without<skill> can do thing needing <skill>'. It's very much selling 'more output, less skill'

Sales pitches dont mean jack, WTF are you talking about?

Sales pitches are literally the same thing as "the selling point".

Neither of those is necessarily a synonym for why you personally use them


They purportedly solve the problem of needing skill to get things done. IME, this is usually repeated by VC backed LLM companies or people who haven’t knowingly had to deal with other people’s bad results.

This all bumps up against the fact that most people default to “you use the tool wrong” and/or “you should only use it to do things where you already have firm grasp or at least foundational knowledge.”

It also bumps against the fact that the average person is using LLM’s as a replacement for standard google search.


I see it completely the opposite way, you use an LLM and correct all its mistakes and it allows you to deliver a rough solution very quickly and then refine it in combination with the AI but it still gets completely lost and stuck on basic things. It’s a very useful companion that you can’t trust, but it’s made me 4-5x more productive and certainly less frustrated by the legacy codebase I work on.

Yeah I whole hardheartedly disagree with this. Because I understand the basics of coding I can understand where the model gets stuck and prompt it in other directions.

If you don't know whats going on through the whole process, good luck with the end product.


You're learning at your standard rate of learning, you're just feeding yourself over-confidence on how much you're absorbing vs what the LLM is facilitating you rolling out.

This is such a weird statement in so many levels.

The latent assumption here is that learning is zero sum.

That you can take a 30 year old from 1856 bring them into present day and they will learn whatever subject as fast as a present day 20 year old.

That teachers doesn't matter.

That engagement doesn't matter.

Learning is not zero sum. Some cultural background makes learning easier, some mentoring makes is easier, and some techniques increases engagement in ways that increase learning speed.


> I am learning at an incredible rate with LLMs

Could you do it again without the help of an LLM?

If no, then can you really claim to have learned anything?


The challenge is not if you could do all of it without AI but any of it that you couldn't before.

Not everyone learns at the same pace and not everyone has the same fault tolerance threshold. In my experiencd some people are what I call "Japanese learners" perfecting by watching. They will learn with AI but would never do it themselves out of fear of getting something wrong while they understand most of it, others that I call "western learners" will start right away and "get their hands dirty" without much knowledge and also get it wrong right away. Both are valid learning strategies fitting different personalities.


I could definitely maintain the infrastructure without an llm. Albeit much slower.

And yes. If LLMs disappear, then we need to hire a lot of people to maintain the infrastructure.

Which naturally is a part of the risk modeling.


> I could definitely maintain the infrastructure without an llm

Not what I asked, but thanks for playing.


You literally asked that question

> Could you do it again without the help of an LLM?


And the question you answered was "could you maintain it without the help of an LLM"

So, you havent really learned anything from any teacher if you could not do it again without them?

> So, you havent really learned anything from any teacher if you could not do it again without them?

Well, yes?

What do you think "learning" means? If you cannot do something without the teacher, you haven't learned that thing.


That would be the definition of learning something, yes.

I mean...yeah?

If your child says they've learned their multiplication tables but they can't actually multiply any numbers you give them do they actually know how to do multiplication? I would say no.


For some reason people are perfectly able to understand this in the context of, say, cursive, calculator use, etc., but when it comes to their own skillset somehow it's going to be really different.

Yes that's exactly right.

Yes.

I think this is a bit dismissive.

It’s quite possible to be deep into solving a problem with an LLM guiding you where you’re reading and learning from what it says. This is not really that different from googling random blogs and learning from Stack Overflow.

Assuming everyone just sits there dribbling whilst Claude is in YOLO mode isn’t always correct.


>> I am learning a new skill with instructor at an incredible rate

> Could you do it again on your own?

Can you you see how nonsensical your stance is? You're straight up accusing GP of lying they are learning something at the increased rate OR suggesting if they couldn't learn that, presumably at the same rate, on they own, they're not learning anything.

That's not very wise to project your own experiences on others.


Actually, it’s much like taking a physics or engineering course, and after the class being fully able to explain the class that day, and yet realize later when you are doing the homework that you did not actually fully understand like you thought you did.

>I am learning at an incredible rate with LLMs.

I don't believe it. Having something else do the work for you is not learning, no matter how much you tell yourself it is.


If you've seen further it's only because you've stood on the shoulders of giants.

Having other people do work for you is how people get to focus on things they actually care about.

Do you use a compiler you didn't write yourself? If so can you really say you've ever learned anything about computers?


You have to build a computer to learn about computers!

I would argue that if you've just watched videos about building computers and haven't sat down and done one yourself, then yeah I don't see any evidence that you've learned how to build a computer.

And, so the anti-LLM argument goes, if you've not built the computer you can't learn anything about what computers could be used for.

That's not the anti-LLM argument, that's a brand new argument you made up.

Did you not read the comment thread you replied to? That's the exact argument that I_love_retros made above.

That is in fact the anti LLM argument you've ostensibly been discussing. If you want to talk to the person who made it up I'm not your guy.


It is easy to not believe if you only apply an incredibly narrow world view.

Open your eyes, and you might become a believer.


What is this, some sort of cult?

You mean the cult of "I can't see the viruses therefore they dint exist"? As in "I can't imagine something so it means it's a lie"?

Indeed, quite weird and no imagination.


No, it is an as snarky response to a person being snarky about usefulness of AI agents.

It does seem like there is a cult of people who categorically see LLMs as being poor at anything without it being founded in anything experience other than their 2023 afternoon to play around with it.


Who cares? Why are people so invested in trying to “convert” others to see the light?

Can’t you be satisfied with outcompeting “non believers”? What motivates you to argue on the internet about it? Deep down are you insecure about your reliance on these tools or something, and want everyone else to be as well?


Why do people invest themselves so hard in interjecting themselves into conversations about Ai telling people it doesn't work?

It feels so off rebuilding serious SaaS apps in days for production, only to be told it is not possible?


Who here said ai “doesn’t work”?

> We have gone multi cloud disaster recovery on our infrastructure. Something I would not have done yet, had we not had LLMs.

That’s product atrophy, not skill atrophy.


Using LLMs as a learning tool isn’t what causes skill atrophy. It’s using them to solve entire problems without understanding what they’ve done.

And not even just understanding, but verifying that they’ve implemented the optimal solution.


It's partly that, but also reading and surface level understanding something vs generating yourself are different skills with different depths. If you're learning a language, you can get good at listening without getting good at speaking for example.

Also AI could help you pick those skills up again faster, although you wouldn’t need to ever pick those skills up again unless AI ceased to exist.

What an interesting paradox-like situation.


I believe some professor warned us about being over reliant on Google/reddit etc: “how would you be productive if internet went down” dilemma.

Well, if internet is down, so is our revenue buddy. Engineering throughput would be the last of our concerns.


https://hex.ooo/library/power.html

When future humans rediscover mathematics.


Yeah I am worried about skill atrophy too. Everyone uses a compiler these days instead of writing assembly. Like who the heck is going to do all the work when people forget how to use the low level tools and a compiler has a bug or something?

And don’t get me started on memory management. Nobody even knows how to use malloc(), let alone brk()/mmap(). Everything is relying on automatic memory management.

I mean when was the last time you actually used your magnetized needle? I know I am pretty rusty with mine.


> an LLM is exactly like a compiler if a compiler was a black box hosted in a proprietary cloud and metered per symbol

Yeah, exactly.


Snark aside, this is an actual problem for a lot of developers in varying degrees, not understand anything about the layers below make for terrible layers above in very many situations.

Another aspect I haven’t seen discussed too much is that if your competitor is 10x more productive with AI, and to stay relevant you also use AI and become 10x more productive. Does the business actually grow enough to justify the extra expense? Or are you pretty much in the same state as you were without AI, but you are both paying an AI tax to stay relevant?

This is the “ad tax” reasoning, but ultimately I think the answer is greater efficiency. So there is a real value, even if all competitors use the tools.

It’s like saying clothing manufacturers are paying the “loom tax” tax when they could have been weaving by hand…


Software development is not a production line, the relationship between code output and revenue is extremely non-linear.

Where producing 2x the t-shirts will get you ~2x the revenue, it's quite unlikely that 10x the code will get you even close to 2x revenue.

With how much of this industry operates on 'Vendor Lock-in' there's a very real chance the multiplier ends up 0x. AI doesn't add anything when you can already 10x the prices on the grounds of "Fuck you. What are you gonna do about it?"


Yep and in a vendor lock in scenario, fixing deep bugs or making additions in surgical ways is where the value is. And Claude helps you do that, by giving you more information, analyzing options, but it doesn’t let you make that decision 10x faster.

We already know how to multiply the efficiency of human intelligence to produce better quality than LLMs and nearly match their productivity - open source - in fact coding LLMs wouldn't even exist without it.

Open source libraries and projects together with open source AI is the only way to avoid the existential risks of closed source AI.


Where's the evidence of competitors being 10x more productive? So far, everyone is simply bragging about how much code they have shipped last week, but that has zero relevance when it comes to productivity

I work at a 20-year-old mid-sized SaaS company. As long as the company has been around, product managers have longed for more engineers and strategies for engineers to ship features faster. As of around February, those same product managers across the org are complaining that they can't keep up with the pace at which engineers are shipping their features. This isn't just lines of code. This is the entire company trying to figure out how to help the PMs because engineers suddenly stopped being the bottleneck.

I don't know about 10x, but this could only happen if PMs suddenly got really lazy or the engineers actually got at least 1.5x faster. My gut says it's way more because we're now also consistently up to date on our dependencies and completing massive refactors we were putting off for years.

There are lots of reasons this could be the case. Quality suddenly changed, the nature of the work changed, engineers leveled up... But for this to have happened consistently across a bunch of engineering teams is quite the coincidence if not this one thing we are all talking about.


Read it as just a given rate. The number doesn’t matter too much here, if company B does believe claims from company A they are N times more productive that’s enough to force B to adopt the same tooling.

I feel like a lot of the AI advocacy today is like the Cloud advocacy of a few years ago or the Agile advocacy before that. It's this season's silver bullet to make us all 10x more effective according to metrics that somehow never translate into adding actually useful functionality and quality 10x as fast.

The evangelists told us 20 years ago that if we weren't doing TDD then we weren't really professional programmers at all. The evangelists told us 10 years ago that if we were still running stuff locally then we must be paying a fortune for IT admin or not spending our time on the work that mattered. The evangelists this week tell us that we need to be using agents to write all our code or we'll get left in the dust by our competitors who are.

I'm still waiting for my flying car. Would settle for some graphics software on Linux that matches the state of the art on Windows or even reliable high-quality video calls and online chat rooms that don't make continental drift look fast.


The alternative is probably also true. If your F500 competitor is also handicapped by AI somehow, then you're all stagnant, maybe at different levels. Meanwhile Anthropic is scooping up software engineers it supposedly made irrelevant with Mythos and moving into literally 2+ new categories per quarter

Either the business grows, or the market participants shed human headcount to find the optimal profit margin. Isn’t that the great unknown: what professions are going to see headcount reduction because demand can’t grow that fast (like we’ve seen in agriculture), and which will actually see headcount stay the same or even expand, because the market has enough demand to keep up with the productivity gains of AI? Increasingly I think software writ large is the latter, but individual segments in software probably are the former.

it's worse than a tie. 10x everyone just floods the market and tanks per-unit price. you pay the AI tax and your output is worth less.

> your competitor is 10x more productive with AI

This doesn't happen. Literally zero evidence of this.


The actual rate isn’t relevant for the discussion

Well it might.

If the actual rate is .9x then it matters a lot.

Or even if it's like 1.1x, is the cost worth the return?


The cost is so small relative to the increase. The cost whining on HN is bizarre to me. Feels like everyone here is on an individual plan and has no understanding of what margins look like for actual business.

Meta pays $750k+ TC and makes far more profit/eng, do you think they care about $5k/eng/mo in inference? A 1.1x increase would be so significant that it would justify the cost easily, especially when you can just compress comps to make up for it


Nobody is whining here.

What? You don't think businesses do financial planning and calculations for profit margins?

Do you really think they go on vibes - "welp, this AI thing seems to improve developer performance, I guess. Heck, what's an extra 5k per developer anyways, amirite".

Well, maybe they really do in your neck of the woods. Explains a lot, I guess.


Yes most companies do in fact operate like this. There are tens of thousands of companies that will pay more for the best thing and call it at that, because the cost is dwarfed by what even marginal gains in quality unlock for the business.

> the cost is dwarfed by what even marginal gains in quality

That is just, like, your opinion, man.

Also, I doubt these kinds of companies have "quality" of anything, never mind "gains in quality".


What if the rate is negative?

Would it matter?


It would matter but would be a different discussion than the one I was going for

If the business doesn’t grow then you shed costs like employees

Open models keep closing the eval gap for many tasks, and local inference continues to be increasingly viable. What's missing isn't technical capability, but productized convenience that makes the API path feel like the only realistic option.

Frontier labs are incentivized to keep it that way, and they're investing billions to make AI = API the default. But that's a business model, not a technical inevitability.


im hoping and praying that local inference finds it's way to some sort of baseline that we're all depending on claude for here. that would help shape hardware designs on personal devices probably something in the direction of what apple has been doing.

ive had to like tune out of the LLM scene because it's just a huge mess. It feels impossible to actually get benchmarks, it's insanely hard to get a grasp on what everyone is talking about, bots galore championing whatever model, it's just way too much craze and hype and misinformation. what I do know is we can't keep draining lakes with datacenters here and letting companies that are willing to heel turn on a whim basically control the output of all companies. that's not going to work, we collectively have to find a way to make local inference the path forward.

everyone's foot is on the gas. all orgs, all execs, all peoples working jobs. there's no putting this stuff down, and it's exhausting but we have to be using claude like _right now_. pretty much every company is already completely locked in to openai/gemini/claude and for some unfortunate ones copilot. this was a utility vendor lock in capture that happened faster than anything ive ever seen in my life & I already am desperate for a way to get my org out of this.


I'm frustrated that there's not "solid" instructional tooling. I either see people just saying "keep trying different prompts and switching models until you get lucky" or building huge cantilevered toolchains that seems incredibly brittle, and even then, how well do they really work?

I get choice paralysis when you show me a prompt box-- I don't know what I can reasonably ask for and how to best phrase it, so I just panic. It doesn't help when we see articles saying people are getting better outcomes by adding things like "and no bugs plz owo"

I'm sure this is by design-- anything with clear boundaries and best practices would discourage gacha style experimentation. Can you trust anyone who sells you a metered service to give you good guidance on how to use it efficiently?


yea that is probably the worst part of these techs becoming mainstream services and local-LLM'ing taking off in general: working with them at many points in any architecture no longer feels... deterministic i guess. way too fucking much "heres what i use" but no real best practices yet, just a lot of vague gray area and everyones still in discovery-mode on how to best find some level of determinism or workflow and ways we are benchmarking is seriously a moving target. everyone has their own branded take on what the technology is and their own branded approach on how to use it, and it's probably the murkiest and foggiest time to be in technology fields that i've ever seen :\ seems like weekly/monthly something is outdated, not just the models but the tooling people are parroting as the current best tooling to use. incredibly frustrating. there's simply too much ground to cover for any one person to have any absolute takes on any of it, and because a handful of entities are currently leading the charge draining lakes and trying to compete for every person and every businesses money, there's zero organized frameworks at the top to make some sense of this. they all are banking on their secret sauce, and i _really_ want us all to get away from this. local inference has to succeed imo but goddamn there needs to be some collective working together to rally behind some common strats/frameworks here. im sure there's already countless committees that have been established to try and get in front of this but even that's messy.

i don't know how else to phrase it: this feels like such an unstable landscape, "beta" software/services are running rampant in every industry/company/org/etc and there's absolutely no single resource we can turn to to help stay ahead of & plan for the rapidly-evolving landscape. every, and i mean every company, is incredibly irresponsible for using this stuff. including my own. once again though, cat's already out of the bag. now we fight for our lives trying to contain it and ensure things are well understood and implemented properly...which seems to be the steepest uphill battle of my life


I'm hopeful that new efficiencies in training (Deepseek et al.), the impressive performance of smaller models enhanced through distillation, and a glut of past-their-prime-but-functioning GPUs all converge make good-enough open/libre models cheap, ubiquitous, and less resource-intensive to train and run.

> we don't want a hard dependency on another multi-billion dollar company just to write software

My manager doesn't even want us to use copilot locally. Now we are supposed to only use the GitHub copilot cloud agent. One shot from prompt to PR. With people like that selling vendor lock in for them these companies like GitHub, OpenAI, Anthropic etc don't even need sales and marketing departments!


You are aware that using eg. Github copilot is not one shot? It will start an agentic loop.

Unnecessary nitpicking

Why?

One shoting has a very specific meaning, and agentic workflows are not it?

What is the implied meaning I should understand from them using one shot?

They might refer to the lack of humans in the loop.


You give a prompt, you get a PR. If it is ready to merge with the first attempt, that’s a one shot. The agentic loop is a detail in their context

The lock in is so incredibly poor. I could switch to whatever provider in minuets.

But it requires that one does not do something stupid.

Eg. For recurring tasks: keep the task specification in the source code and just ask Claude to execute it.

The same with all documentation, etc.


What open models are truly competing with both Claude Code and Opus 4.7 (xhigh) at this stage?

Spent a lot of time with "open models." None of them come close. They are benchmaxxed. But you won't hear many of the open model fans on HN admit this.

The open model mentality is also just so bizarre to me. You're going to use an inferior model to save, what, a couple hundred bucks a month? Is your time really worth that little?

No one working on a serious project at a serious company is downgrading their agent's intelligence for a marginal cost saving. Downgrading your model is like downgrading the toilet paper on your yacht.


> The open model mentality is also just so bizarre to me. You're going to use an inferior model to save, what, a couple hundred bucks a month? Is your time really worth that little?

I agree that people who claim that open models are as good as claude/openai/z are lying, delusional, or not doing very much. I've tried them all, included GLM 5.1.

GLM is not bad but the hardware needed will never recoup the ROI vs just using a commercial provider through its API.

That being said, you're being reductive here. For many use cases local models offer advantages that can't obtained through a commercial API : Privacy, ownership of the entire stack, predictability. They can't be rugpulled, they can't snitch on you. They will not give you 503.

Those advantages are very valuable for things like a local assistant, as an agent, for data extraction, for translations, for games (role playing and whatnot), etc.

That being said I know that many people are like you, they don't give a second thought about privacy. They'd plug Anthropic to their brain if they could. So I understand the sentiment. I just think that you should in turn try to understand why someone would use an open model.


Glm 5.1 getting 5% on ARC-AGI 2 private is all anyone needs to know.

I've had a good experience with GLM-5.1. Sure it doesn't match xhigh but comes close to 4.6 at 1/3rd the cost

1/3? Try 2/13 :P

5.1 is like $4 / 1m output, Opus 4.6 is $25. GPT 5.4 pro is $270 with large contexts :O


GLM 5.1 competes with Sonnet. I'm not confident about Opus, though they claim it matches that too.

I have it as failover to Opus 4.6 in a Claude proxy internally. People don't notice a thing when it triggers, maybe a failed tool call here and there (harness remains CC not OC) or a context window that has gone over 200k tokens or an image attachment that GLM does not handle, otherwise hunky-dory all the way. I would also use it as permanent replacement for haiku at this proxy to lower Claude costs but have not tried it yet. Opus 4.7 has shaken our setup badly and we might look into moving to Codex 100% (GLM could remain useful there too).

That's a lame attitude. There are local models that are last year's SOTA, but that's not good enough because this year's SOTA is even better yet still...

I've said it before and I'll say it again, local models are "there" in terms of true productive usage for complex coding tasks. Like, for real, there.

The issue right now is that buying the compute to run the top end local models is absurdly unaffordable. Both in general but also because you're outbidding LLM companies for limited hardware resources.

You have a $10K budget, you can legit run last year's SOTA agentic models locally and do hard things well. But most people don't or won't, nor does it make cost effective sense Vs. currently subsidized API costs.


I completely see your point, but when my / developer time is worth what it is compared to the cost of a frontier model subscription, I'm wary of choosing anything but the best model I can. I would love to be able to say I have X technique for compensating for the model shortfall, but my experience so far has been that bigger, later models out perform older, smaller ones. I genuinely hope this changes through. I understand the investment that it has taken to get us to this point, but intelligence doesn't seem like it's something that should be gated.

Right; but every major generation has had diminishing returns on the last. Two years ago the difference was HUGE between major releases, and now we're discussing Opus 4.6 Vs. 4.7 and people cannot seem to agree if it is an improvement or regression (and even their data in the card shows regressions).

So my point is: If you have the attitude that unless it is the bleeding edge, it may have well not exist, then local models are never going to be good enough. But truth is they're now well exceeding what they need to be to be huge productivity tools, and would have been bleeding edge fairly recently.


I feel like I'm going to have to try the next model. For a few cycles yet. My opinion is that Opus 4.7 is performing worse for my current work flow, but 4.6 was a significant step up, and I'd be getting worse results and shipping slower if I'd stuck with 4.5. The providers are always going to swear that the latest is the greatest. Demis Hassabis recently said in an interview that he thinks the better funded projects will continue to find significant gains through advanced techniques, but that open source models figure out what was changed after about 6 months or so. We'll see I guess. Don't get me wrong, I'd love to settle down with one model and I'd love it to be something I could self host for free.

> I completely see your point, but when my / developer time is worth what it is compared to the cost of a frontier model subscription, I'm wary of choosing anything but the best model I can.

Don't you understand that by choosing the best model we can, we are, collectively, step by step devaluating what our time is worth? Do you really think we all can keep our fancy paychecks while keep using AI?


Do you think if you or me stopped using AI that everyone else will too? We're still what we always were - problem solvers who have gained the ability to learn and understand systems better that the general population, communicate clearly (to humans and now AIs). Unfortunately our knowledge of language APIs and syntax has diminished in value, but we have so many more skills that will be just as valuable as ever. As the amount of software grows, so will the need for people who know how to manage the complexity that comes with it.

> Unfortunately our knowledge of language APIs and syntax has diminished in value, but we have so many more skills that will be just as valuable as ever.

There were always jobs that required those "many more skills" but didn't require any programming skills.

We call those people Business Analysts and you could have been doing it for decades now. You didn't, because those jobs paid half what a decent/average programmer made.

Now you are willingly jumping into that position without realising that the lag between your value (i.e. half your salary, or less) would eventually disappear.


I guess we will need to wait and see if AI can remove ALL of the complexity that requires a software engineer over a business analyst. I can't currently believe that it will. BA's I've worked with vary in technical capability from 'having coded before and understanding DB schema basics and network architecture' to 'I know how the business works but nothing about computers'. If we got to the point in the future where every computer system ran on the same frameworks in the same way, and AI understood it perfectly, then maybe. But while AI is a probabilistic technology manipulating deterministic systems, we will always need people to understand whats going on, and whether they write a lot of code or not, they will be engineers, not analysts. Whether it's more or less of those people, we will see.

> If we got to the point in the future where every computer system ran on the same frameworks in the same way, and AI understood it perfectly, then maybe.

They don't need to all run on the same frameworks, they just need to run on documented frameworks.

What possible value can you bring to a BA?

The system topology (say, if the backend was microservices vs Lambda vs something-else)? The LLM can explain to the BA what their options are, and the impact of those options.

The framework being used (Vue, or React, or something else)? The AI can directly twiddle that for the BA.

Solving a problem? If the observability is setup, the LLM can pinpoint almost all the problems too,and with a separate UAT or failover-type replica, can repro, edit, build, deploy and test faster than you can.

Like I already said, if[1] you're now able to build or enhance a system without actually needing programming skills, why are you excited about that? You could always do that. It's just that it pays half what programming skills gets you.

You (and many others who boast about not writing code since $DATE) appear to be willingly moving to a role that already pays less, and will pay even less once the candidates for that role double (because now all you programmers are shifting towards it).

It's supply and demand, that's all.

--------------

[1] That's a very big "If", I think. However, the programmers who are so glad to not program appear to believe that it's a very small "If", because they're the ones explaining just how far the capabilities have come in just a year, and expect the trend to continue. Of course, if the SOTA models never get better than what we have now, then, sure - your argument holds - you'll still provide value.


First, making sure to offer an upvote here. I happen to be VERY enthusiastic about local models, but I've found them to be incredibly hard to host, incredibly hard to harness, and, despite everything, remarkably powerful if you are willing to suffer really poor token/second performance...

> that are last year's SOTA

Early last year or late last year?

opus 4.5 was quite a leap


$10k is a lot of tokens.

At the rate its consuming now, I'd probably blow $10k in a month easy.

>perhaps we can come up with something like the "linux/postgres/git/http/etc" of the LLMs

I fear that this may not be feasible in the long term. The open-model free ride is not guaranteed to continue forever; some labs offer them for free for publicity after receiving millions in VC grants now, but that's not a sustainable business model. Models cost millions/billions in infrastructure to train. It's not like open-source software where people can just volunteer their time for free; here we are talking about spending real money upfront, for something that will get obsolete in months.

Current AI model "production" is more akin to an industrial endeavor than open-source arrangements we saw in the past. Until we see some breakthrough, I'm bearish on "open models will eventually save us from reliance on big companies".


"get obsolete in months"

If you mean obsolete in the sense of "no longer fit for purpose" I don't think that's true. They may become obsolete in terms of "can't do hottest new thing" but that's true of pretty much any technology. A capable local model that can do X will always be able to do X, it just may not be able to do Y. But if X is good enough to solve your problem, why is a newer better model needed?

I think if we were able to achieve ~Opus 4.6 level quality in a local model that would probably be "good enough" for a vast number of tasks. I think it's debatable whether newer models are always better - 4.7 seems to be somewhat of a regression for example.


I can recommend this stack. It works well with the existing Claude skills I had in my code repos:

1. Opencode

2. Fireworks AI: GLM 5.1

And it is SIGNIFICANTLY cheaper than Claude. I'm waiting eagerly for something new from Deepseek. They are going to really show us magic.


it is also significantly less capable than claude

That's fine. When the "best of the best" is offered only by a couple of companies that are not looking into our best interests, then we can discard them

Any recommendations on good open ones? What are you using primarily?

LMArena actually has a nice Pareto distribution of ELO vs price for this

  model                        elo   $/M
  ---------------------------------------
  glm-5.1                      1538  2.60
  glm-4.7                      1440  1.41
  minimax-m2.7                 1422  0.97
  minimax-m2.1-preview         1392  0.78
  minimax-m2.5                 1386  0.77
  deepseek-v3.2-thinking       1369  0.38
  mimo-v2-flash (non-thinking) 1337  0.24
https://arena.ai/leaderboard/code?viewBy=plot&license=open-s...

LMArena isn't very useful as a benchmark, however I can vouch for the fact that GLM 5.1 is astonishingly good. Several people I know who have a $100/mo Claude Code subscription are considering cancelling it and going all in on GLM, because it's finally gotten (for them) comparable to Opus 4.5/6. I don't use Opus myself, but I can definitely say that the jump from the (imvho) previous best open weight model Kimi K2.5 to this is otherworldly — and K2.5 was already a huge jump itself!

qwen3.5/3.6 (30B) works well,locally, with opencode

Mind you, a 30B model (3B active) is not going to be comparable to Opus. There are open models that are near-SOTA but they are ~750B-1T total params. That's going to require substantial infrastructure if you want to use them agentically, scaled up even further if you expect quick real-time response for at least some fraction of that work. (Your only hope of getting reasonable utilization out of local hardware in single-user or few-users scenarios is to always have something useful cranking in the background during downtime.)

For a business with ten or more engineers/people-using-ai, it might still make sense to set this up. For an individual though, I can’t imagine you’d make it through to positive ROI before the hardware ages out.

It's hard to tell for sure because the local inference engines/frameworks we have today are not really that capable. We have barely started exploring the implications of SSD offload, saving KV-caches to storage for reuse, setting up distributed inference in multi-GPU setups or over the network, making use of specialty hardware such as NPUs etc. All of these can reuse fairly ordinary, run-of-the-mill hardware.

Since you need at least a few of H100 class hardware, I guess you need at least few tens of coders to justify the costs.

I see the 512GB Mac Studios aren’t for sale anymore but that was a much cheaper path

I'm backing up a big dataset onto tapes, so I wanted to automate it. I have an idle 64Gb VRAM setup in my basement, so I decided to experiment and tasked it with writing an LTFS implementation. LTFS is an open standard for filesystems for tapes, and there's an implementation in C that can be used as the baseline.

So far, Qwen 3.6 created a functionally equivalent Golang implementation that works against the flat file backend within the last 2 days. I'm extremely impressed.


It is surprisingly competent. It's not Opus 4.6 but it works well for well structured tasks.

What near SOTA open models are you referring to?

I want to bump this more than just a +1 by recommending everyone try out OpenCode. It can still run on a Codex subscription so you aren’t in fully unfamiliar territory but unlocks a lot of options.

The Codex TUI harness is also open source and you can use open models with it, so you can stay in even more familiar territory.

pi-coding-agent (pi.dev) is also great. I've been using it with Gemma 4 and Qwen 3.6.

The thing I dislike about OpenCode is the lack of capabilities of their editor, also, resource intensive, for some reason on a VM it chuckles each 30 mins, that I need to discard all sessions, commits, etc.

I don't know if it is bun related, but in task manager, is the thing that is almost at the top always on CPU usage, turns out for me, bun is not production ready at all.

Wish Zed editor had something like BigPickle which is free to use without limits.


> turns out for me, bun is not production ready

What issue did you run into?


Is this sort of setup tenable on a consumer MBP or similar?

Qwen’s 30B models run great on my MBP (M4, 48GB) but the issue I have is cooling - the fan exhaust is straight onto the screen, which I can’t help thinking will eventually degrade it, given the thermal cycling it would go through. A Mac Studio makes far more sense for local inference just for this reason alone.

For a 30B model, you want at least 20GB of VRAM and a 24GB MBP can’t quite allocate that much of it to VRAM. So you’d want at least a 32GB MBP.

I have 24GB VRAM available and haven't yet found a decent model or combination. Last one I tried is Qwen with continue, I guess I need to spend more time on this.

Is there any model that practically compares to Sonnet 4.6 in code and vision and runs on home-grade (12G-24G) cards?

im currently running a custom Gemma4 26b MoE model on my 24gb m2... super fast and it beat deepseek, chatgpt, and gemini in 3 different puzzles/code challenges I tested it on. the issue now is the low context... I can only do 2048 tokens with my vram... the gap is slowly closing on the frontier models

It's a MoE model so I'd assume a cheaper MBP would simply result in some experts staying on CPU? And those would still have a sizeable fraction of the unified memory bandwidth available.

I haven’t tried this myself yet but you would still need enough non-vram ram available to the cpu to offload to cpu, right? This is a fully novice question, I have not ever tried it.

You're correct. If you don't have enough RAM for the model, it can still run but most of it will run on the CPU and be continuously reloaded from the SSD (through mmap).

A medium MoE like 35B can still achieve usable speeds in that setup, mind you, depending on what you're doing.


The Mac Minis (probably 64GB RAM) are the most cost effective.

How are you running it with opencode, any tips/pointers on the setup?

GLM 5.1 via an infra provider. Running a competent coding capable model yourself isn't viable unless your standards are quite low.

What infra providers are there?

There's DeepInfra. There's also OpenRouter where you can find several providers.

I am using GLM 5.1 and MiniMax 2.7.

> open models

Google just released Gemma 4, perhaps that'd be worth a try?


I'm increasingly thinking the same as our spend on tokens goes up.

If you have HPC or Supercompute already, you have much of the expertise on staff already to expand models locally, and between Apple Silicon and Exo there are some amazingly solutions out there.

Now, if only the rumors about Exo expanding to Nvidia are true..


>perhaps we can come up with something like the "linux/postgres/git/http/etc" of the LLMs: something we all can benefit from while it not being monopolized by a single billionarie company

Training and inference costs so we would have to pay for them.


Developing linux/postgres/git also costs, and so do the computers and electricity they use.

My understanding is that the major part of the cost of a given model is the training - so open models depend on the training that was done for frontier models? I'm finding hard to imagine (e.g.) RLHF being fundable through a free software type arrangement.

No, the training between proprietary and open models is completely different. The speculation that open models might be "distilled" from proprietary ones is just that, speculation, and a large portion of it is outright nonsense. It's physically possible to train on chat logs from another model but that's not "distilling" anything, and it's not even eliciting any real fraction of the other model's overall knowledge.

I don't know what to make of it, I am skeptical of OpenAI/Anthropic claims about distillation, but I did notice DeepSeek started sounding a lot like Claude recently.

This is part of the reason why I'm really worried that this is all going to result in a greater economic collapse than I think people are realizing.

I think companies that are shelling out the money for these enterprise accounts could honestly just buy some H100 GPUs and host the models themselves on premises. Github CoPilot enterprise charges $40 per user per month (this can vary depending on your plan of course), but at this price for 1000 users that comes out to $480,000 a year. Maybe I'm missing something, but that's roughly what you're going to be spending to get a full fledged hosting setup for LLMs.


Most companies don't want to host it themselves. They want someone to do it for them, and they are happy to pay for it. If it makes their lives easier and does not add complexity, then it has a lot of value.

Out of curiosity, how many concurrent users could you get with a hosting setup at that price? If let's say 10% of those 1000 users were using it at the same time would it handle it? What about 30% or 100%?

You made a good point that I didn't think through fully. It's the concurrent user aspect that heavily impacts things. Currently, you'd probably need quite a bit more investment to the point of having a mini data center to do what I'm proposing.

However, we've been seeing advancements in compressing context and capabilities of smaller models that I don't think it'd be too far off to see something like what I'm talking about within the next 5 years.


Is that why they are racing to release so many products? It feels to me like they want to suck up the profits from every software vertical.

Yeah it seems so. Anthropic has entered the enshittification phase. They got people hooked onto their SOTAs so it's now time to keep releasing marginal performance increase models at 40% higher token price. The problem is that both Anthropic and OpenAI have no other income other than AI. Can't Google just drown them out with cheaper prices over the long run? It seems like an attrition battle to me.

yep!! had similar thoughts on the the "linux/postgres/git/http/etc" of the LLMs

made a HN post of my X article on the lock-in factor and how we should embrace the modular unix philosophy as a way out: https://news.ycombinator.com/item?id=47774312


Who’s your “we,” if you don’t mind sharing? I’m curious to learn more about companies/organizations with this perspective.

I’m imagining a (private/restricted) tracker style system where contributors “seed” compute and users “leech”.

> I think that's the way forward. Actually it would be great if everybody would put more focus on open models,

I'm still surprised top CS schools are not investing in having their students build models, I know some are, but like, when's the last time we talked about a model not made by some company, versus a model made by some college or university, which is maintained by the university and useful for all.

It's disgusting that OpenAI still calls itself "Open AI" when they aren't truly open.


Open models are only near SOTA because of distillation from closed models.

Opencode go with open models is pretty good

or just use codex

I struggle to understand the "hackers" in HN vouching for proprietary LLMs. Like we have so much so good open source software that is top notch like linux, git, postgres, http, tcp/ip, and a long etc., and now we have these billionaires trying to make us use LLMs for coding at a hefty price.

I understand it from people like PG and the like, but real hackers? C'mon people


Honest question: if you're using multiple agents, it's usually to produce not a dozen lines of code. It's to produce a big enough feature spanning multiple files, modules and entry points, with tests and all. So far so good. But once that feature is written by the agents... wouldn't you review it? Like reading line by line what's going on and detecting if something is off? And wouldn't that part, the manual reviewing, take an enormous amount of time compare to the time it took the agents to produce it? (you know, it's more difficult to read other people's/machine code than to write it yourself)... meaning all the productivity gained is thrown out the door.

Unless you don't review every generated line manually, and instead rely on, let's say, UI e2e testing, or perhaps unit testing (that the agents also wrote). I don't know, perhaps we are past the phase of "double check what agents write" and are now in the phase of "ship it. if it breaks, let agents fix it, no manual debugging needed!" ?


Here's what I suggest:

Serious planning. The plans should include constraints, scope, escalation criteria, completion criteria, test and documentation plan.

Enforce single responsibility, cqrs, domain segregation, etc. Make the code as easy for you to reason about as possible. Enforce domain naming and function / variable naming conventions to make the code as easy to talk about as possible.

Use code review bots (Sourcery, CodeRabbit, and Codescene). They catch the small things (violations of contract, antipatterns, etc.) and the large (ux concerns, architectural flaws, etc.).

Go all in on linting. Make the rules as strict as possible, and tell the review bots to call out rule subversions. Write your own lints for the things the review bots are complaining about regularly that aren't caught by lints.

Use BDD alongside unit tests, read the .feature files before the build and give feedback. Use property testing as part of your normal testing strategy. Snapshot testing, e2e testing with mitm proxies, etc. For functions of any non-trivial complexity, consider bounded or unbounded proofs, model checking or undefined behaviour testing.

I'm looking into mutation testing and fuzzing too, but I am still learning.

Pause for frequent code audits. Ask an agent to audit for code duplication, redundancy, poor assumptions, architectural or domain violations, TOCTOU violations. Give yourself maintenance sprints where you pay down debt before resuming new features.

The beauty of agentic coding is, suddenly you have time for all of this.


> Serious planning. The plans should include constraints, scope, escalation criteria, completion criteria, test and documentation plan.

I feel like i am a bit stupid to be not able to do this. my process is more iterative. i start working on a feature then i disocover some other function thats silightly related. go refactor into commmon code then proceed with original task. sometimes i stop midway and see if this can be done with a libarary somewhere and go look at example. i take many detours like these. I am never working on a single task like a robot. i dont want claude to work like that either .That seems so opposite of how my brain works.

what i am missing.


Again, here's what works for me.

When I get an idea for something I want to build, I will usually spend time talking to ChatGPT about it. I'll request deep research on existing implementations, relevant technologies and algorithms, and a survey of literature. I find NotebookLM helps a lot at this point, as does Elevenreader (I tend to listen to these reports while walking or doing the dishes or what have you). I feed all of those into ChatGPT Deep Research along with my own thoughts about the direction the system, and ask it to produce a design document.

That gets me something like this:

https://github.com/leynos/spycatcher-harness/blob/main/docs/...

If I need further revisions, I'll ask Codex or Claude Code to do those.

Finally, I break that down into a roadmap of phases, steps and achievable tasks using a prompt that defines what I want from each of those.

That gets me this:

https://github.com/leynos/spycatcher-harness/blob/main/docs/...

Then I use an adapted version of OpenAI's execplans recipe to plan out each task (https://github.com/leynos/agent-helper-scripts/blob/main/ski...).

The task plans end up looking like this:

https://github.com/leynos/spycatcher-harness/blob/main/docs/...

At the moment, I use Opus or GPT-5.4 on high to generate those plans, and Sonnet or GPT-5.4 medium to implement.

The roadmap and the design are definitely not set in stone. Each step is a learning opportunity, and I'll often change the direction of the project based on what I learn during the planning and implementation. And of course, this is just what works for me. The fun of the last few months has been everyone finding out what works for them.


You seem to work a lot like how I do. If that is being stupid, then well, count me in too. To be honest, if I had to go through all the work of planning, scope, escalation criteria, etc., then I would probably be better off just writing the damn code myself at that point.


i see lots of posts like stripes minion where they just type a feature into slack chat and agent goes and does it. That doesnt make any sense to me.


To be devil's advocate:

Many of those tools are overpowered unless you have a very complex project that many people depend on.

The AI tools will catch the most obvious issues, but will not help you with the most important aspects (e.g. whether you project is useful, or the UX is good).

In fact, having this complexity from the start may kneecap you (the "code is a liability" cliché).

You may be "shipping a lot of PRs" and "implementing solid engineering practices", but how do you know if that is getting closer to what you value?

How do you know that this is not actually slowing your down?


It depends a lot on what kind of company you are working at, for my work the product concerns are taken care by other people, I'm responsible for technical feasibility, alignment, design but not what features should be built, validating if they are useful and add value, etc., product people take care of that.

If you are solo or in a small company you apply the complexity you need, you can even do it incrementally when you see a pattern of issues repeating to address those over time, hardening the process from lessons learnt.

Ultimately the product discussion is separate from the engineering concerns on how to wrangle these tools, and they should meet in the middle so overbearing engineering practices don't kneecap what it is supposed to do: deliver value to the product.

I don't think there's a hard set of rules that can be applied broadly, the engineering job is to also find technical approaches that balance both needs, and adapt those when circumstances change.


On the one side I reject that product and engineering concerns are separated: Sometimes you want to avoid a feature due to the way it will limit you in the future, even if the AI can churn it in 2 minutes today.

On the other side perhaps your company, like most, does not know how to measure overengineering, cognitive complexity, lack of understanding, balancing speed/quality, morale, etc. but they surely suffer the effects of it.

I suspect that unless we get fully automated engineering / AGI soon, companies that value engineers with good taste will thrive, while those that double down into "ticket factory" mode will stagnate.


> On the one side I reject that product and engineering concerns are separated: Sometimes you want to avoid a feature due to the way it will limit you in the future, even if the AI can churn it in 2 minutes today.

That is exactly not what I meant, I'm sorry if it wasn't clear but your assumption about how my job works is absolutely wrong.

I even mention that the product discussion is separate only on "how to wrangle these tools":

> Ultimately the product discussion is separate from the engineering concerns on how to wrangle these tools, and they should meet in the middle so overbearing engineering practices don't kneecap what it is supposed to do: deliver value to the product.

Delivering value, which means also avoiding a feature that will limit or entrap you in the future.

> On the other side perhaps your company, like most, does not know how to measure overengineering, cognitive complexity, lack of understanding, balancing speed/quality, morale, etc. but they surely suffer the effects of it.

We do measure those and are quite strict about it, most of my design documents are about the trade-offs in all of those dimensions. We are very critical about proposals that don't consider future impacts over time, and mostly reject workarounds unless absolutely necessary (and those require a phase-out timeline for a more robust solution that will be accounted for as part of the initiative, so the cost of the technical debt is embedded from the get-go).

I believe I wasn't clear and/or you misunderstood what I said, I agree with you on all these points, and the company I work for is very much in opposite to a "ticket factory". Work being rejected due to concerns for the overall impact cross-boundaries on doing it is very much praised, and invited.

My comment was focused on how to wrangle these tools for engineering purposes being a separate discussion to the product/feature delivery, it's about tool usage in the most technical sense, which doesn't happen together with product.

We on the engineering side determine how to best apply these tools for the product we are tasked on delivering, the measuring of value delivered is outside and orthogonal to the technical practices since we already account for the trade-offs during proposal, not development time. This measurement already existed pre-AI and is still what we use to validate if a feature should be built or not, its impact and value delivered afterwards, and the cost of maintaining it vs value delivered. All of that includes the whole technical assessment as we already did before.

Determining if a feature should be built or not is ultimately a pairing of engineering and product, taking into account everything you mentioned.

Determining the pipeline of potential future non-technical features at my job is not part of engineering, except for side-projects/hack ideas that have potential to be further developed as part of the product pipeline.


Sorry, I think you're right that I misinterpreted your comment. I still had in mind OP's example (BDD, mutational testing, all that jazz). I apologize!

Reading your comment, it looks like you work for a pretty nice company that takes those things seriously. I envy you!

My concern was that for companies unlike yours that don't have well established engineering practices, it _feels_ that with AI you can go much faster and in fact it's a great excuse to dismantle any remaining practices. But, in reality they either doing busywork or building the wrong thing. My guess is that those are going to learn that this is a bad idea in the future, when they already have a mess to deal with.

To put what I mean into perspective... if you browse OP's profile you can find absolutely gigantic PRs like https://github.com/leynos/weaver/pull/76. I can not review any PR like that in good faith, period.


Can't upvote you enough. This is the way. You aren't vibe coding slop you have built an engineering process that works even if the tools aren't always reliable. This is the same way you build out a functioning and highly effective team of humans.

The only obvious bit you didn't cover was extensive documentation including historical records of various investigations, debug sessions and technical decisions.


Documentation is only useful if it is read. I have found it impossiple to get many humans to read the documentation i write.


Building a fancy looking process doesnt mean output isnt slop. Vibecoders on reddit have even more insane "engineering" process. parent comment has all these

Architecture & Design Principles • Single Responsibility Principle (SRP) • CQRS (Command Query Responsibility Segregation) • Domain Segregation • Domain-Driven Naming Conventions • Clear function/variable naming standards • Architectural constraint definition • Scope definition • Escalation criteria design • Completion criteria definition

Planning & Process • Formal upfront planning • Constraint-based design • Defined scope management • Escalation protocols • Completion criteria tracking • Maintenance sprints (technical debt paydown) • Frequent code audits

AI / Agentic Development Practices • Agent-assisted code audits • Agent-based feedback loops (e.g., reading .feature files pre-build) • Agent-driven reasoning optimization (code clarity for AI) • Continuous automated review cycles

Code Review & Static Analysis • Code review bots: • Sourcery • CodeRabbit • CodeScene • Automated detection of: • Anti-patterns • Contract violations • UX concerns • Architectural flaws

Linting & Code Quality Enforcement • Strict linting rules • Custom lint rules • Enforcement of lint compliance via bots • Detection of lint rule subversion

Testing Strategies

Core Testing • Unit Testing • BDD (Behavior-Driven Development) • .feature file validation before build

Advanced Testing • Property-based testing • Snapshot testing • End-to-end (E2E) testing • With MITM (man-in-the-middle) proxies

Formal / Heavyweight Testing • Model checking • Bounded proofs • Unbounded proofs • Undefined behavior testing

Emerging / Exploratory • Mutation testing • Fuzzing

Code Quality & Auditing • Code duplication detection • Redundancy analysis • Assumption validation • Architectural compliance checks • Domain boundary validation • TOCTOU (Time-of-check to time-of-use) vulnerability analysis

Development Workflow Enhancements • Continuous audit cycles • Debt-first maintenance phases • Feedback-driven iteration • Pre-build validation workflows

Security & Reliability Considerations • TOCTOU vulnerability detection • MITM-based E2E testing • Undefined behavior analysis • Fuzz testing (planned)


And here I am, just drawing diagrams on a whiteboard and designing UI in Balsamiq.


you are prbly shipping so that puts you ahead of most ppl still setting up their perfect process.


This is the biggest bottleneck for me. What's worse is that LLMs have a bad habit of being very verbose and rewriting things that don't need to be touched, so the surface area for change is much larger.


Not only that, but LLMs do a disservice to themselves by writing inconcise code, decorating lines with redundant comments, which wastes their context the next time they work with it


I have had good luck in asking my agent 'now review this change: is it a good design, does it solve the problem, are there excessive comments, is there anything else a reviewer would point out'. I'm still working on what promt to use but that is about right.


It's kind weird; I jumped on the vibe coding opencode bandwagon but using local 395+ w/128; qwen coder. Now, it takes a bit to get the first tokens flowing, and and the cache works well enough to get it going, but it's not fast enough to just set it and forget it and it's clear when it goes in an absurd direction and either deviates from my intention or simply loads some context whereitshould have followed a pattern, whatever.

I'm sure these larger models are both faster and more cogent, but its also clear what matter is managing it's side tracks and cutting them short. Then I started seeing the deeper problematic pattern.

Agents arn't there to increase the multifactor of production; their real purpose is to shorten context to manageable levels. In effect, they're basically try to reduce the odds of longer context poisoning.

So, if we boil down the probabilty of any given token triggering the wrong subcontext, it's clear that the greater the context, the greater the odds of a poison substitution.

Then that's really the problematic issue every model is going to contend with because there's zero reality in which a single model is good enough. So now you're onto agents, breaking a problem into more manageable subcontext and trying to put that back into the larger context gracefully, etc.

Then that fails, because there's zero consistent determinism, so you end up at the harness, trying to herd the cats. This is all before you realize that these businesses can't just keep throwing GPUs at everything, because the problem isn't computing bound, it's contextual/DAG the same way a brain is limited.

We all got intelligence and use several orders of magnitude less energy, doing mostly the same thing.


I highly recommend adding `/simplify` to your workflow. It walks back over-engineerings quite often for me.


It’s a blend. There are plenty of changes in a production system that don’t necessarily need human review. Adding a help link. Fixing a typo. Maybe upgrades with strong CI/CD or simple ui improvements or safe experiments.

There are features you can skip safely behind feature flags or staged releases. As you push in you fine with the right tooling it can be a lot.

If you break it down often quite a bit can be deployed safely with minimal human intervention (depends naturally on the domain, but for a lot of systems).

I’m aiming to revamp the while process - I wrote a little on it here : https://jonathannen.com/building-towards-100-prs-a-day/


I use coding agents to produce a lot of code that I don’t ship. But I do ship the output of the code.


Yep. In many cases I am just reviewing test cases it generated now.

> if it breaks, let agents fix it, no manual debugging needed!" ?

Pretty trivial to have every Sentry issue have an immediate first pass by AI now to attempt to solve the bug.


> you know, it's more difficult to read other people's/machine code than to write it yourself

Not at all, it's just a skill that gets easier with practice. Generally if you're in the position to review a lot of PR's, you get proficient at it pretty quickly. It's even easier when you know the context of what the code is trying to do, which is almost always the case when e.g. reviewing your team-mates' PR's or the code you asked the AI to write.

As I've said before (e.g. https://news.ycombinator.com/item?id=47401494), I find reviewing AI-generated code very lightweight because I tend to decompose tasks to a level where I know what the code should look like, and so the rare issues that crop up quickly stand out. I also rely on comprehensive tests and I review the test cases more closely than the code.

That is still a huge amount of time-savings, especially as the scope of tasks has gone from a functions to entire modules.

That said, I'm not slinging multiple agents at a time, so my throughput with AI is way higher than without AI, but not nearly as much as some credible reports I've heard. I'm not sure they personally review the code (e.g. they have agents review it?) but they do have strategies for correctness.


I'll often run 4 or 5 agents in parallel. I review all the code.

Some agents will be developing plans for the next feature, but there can sometimes be up to 4 coding.

These are typically a mix between trivial bug fixes and 2 larger but non-overlapping features. For very deep refactoring I'll only have a single agent run.

Code reviews are generally simple since nothing of any significance is done without a plan. First I run the new code to see if it works. Then I glance at diffs and can quickly ignore the trivial var/class renames, new class attributes, etc leaving me to focus on new significant code.

If I'm reviewing feature A I'll ignore feature B code at this point. Merge what I can of feature A then repeat for feature B, etc.

This is all backed by a test suite I spot check and linters for eg required security classes.

Periodically we'll review the codebase for vulnerabilities (eg incorrectly scoped db queries, etc), and redundant/cheating tests.

But the keys to multiple concurrent agents are plans where you're in control ("use the existing mixin", "nonsense, do it like this" etc) and non-overlapping tasks. This makes reviewing PRs feasible.


Are you kidding? What else would managers get credit from? They don't produce anything the company is interested in. They steer, they manage, and so if the ones being managed produce the thing the company is interested in, then sure all the credit goes to the team (including the manager!). As it usually happens, getting credit means nothing if not accompanied by a salary bump or something like that. And as it usually happens, not the whole team can get a salary bump. So the ones who get the bump are usually one or two seniors on the team, plus the manager of course... because the manager is the gatekeeper between upper management (the ones who approve salary bumps) and the ICs... and no sane manager would sacrifice a salary bump for themselves just to give it away to an IC. And that's not being a bad manager, that's simply being human. Also if you think about it, if the team succeeded in delivering "the thing", then the manager would think it's partially because of their managing, and so he/she would believe a salary bump is deserved

When things go south, no penalization is made. A simple "post-mortem" is written in confluence and people write "action items". So, yeah, no need for the manager to get the blame.

It's all very shitty, but it's always been like that.


I don't understand the "being more productive" part. Like, sure, LLMs make us iterate faster but our managers know we're using them! They don't naively think we suddenly became 10x engineers. Companies pay for these tools and every engineer has access to them. So if everyone is equally productive, the baseline just shifted up... same as always, no?

Mentioning LLM usage as a distinction is like bragging about using a modern compiler instead of writing assembly. Yeah it's faster, but so is everyone else code... Besides, I wouldn't brag about being more productive with LLMS because it's a double edge sword: it's very easy to use them, and nobody is reviewing all the lines of code you are pushing to prod (really, when was the last time you reviewed a PR generated by AI that changed 20+ files and added/removed thousands of lines of code?), so you don't know what's the long game of your changes; they seem to work now but who knows how it will turn out later?


Sometimes outcomes and achievements and work product are useful beyond just... stack ranking yourself against your peers. Seems so odd to me that this is your mentality unless you're earlier in your career.


Fair enough. I've been in software more than I would like to admit. And the more I'm in, the less I care about achievements in a work environment. All I care about is that the company pays me every month, because companies don't care about me (they care about my outome per hour/week/month). So it's essential to rank yourself high against your peers (being ethically and the like, ofc), otherwise you are out in the next layoff. I know not every company is like this, but the vast majority of tech companies are.

Outside of work, yeah, everything is fine and there's nothing but the pure pursue of knowledge and joy.


People would really be better off seeing themselves as mercenaries with health benefits. You are nothing more. You learn, you make friends, but your job is ephemeral. Do it, but don't get attached TO it.


The key there is "vast majority of tech companies". And I agree with you.

I think the next big movement in tech will be ALL companies becoming tech companies. Right now there are hundreds of thousands of "small" companies with big enough budgets to pay for a CTO to modernize their stack and lead them into the 21st century.

The problem is they don't know they have this problem and so they aren't actively hiring for a CTO. You've got to go find them and insert yourself as the solution.


All companies are like this. Some just have better HR/PR.


Usually hedonic adaptation ends up catching up, and then it’s just the new baseline.


> like bragging about using a modern compiler instead of writing assembly.

Yet people look at me like I'm the odd one out when I say I am more productive with a modern compiler like GHC.


It's not just about gas pricing, it's also about housing. E.g., why live in Paris, Madrid, Barcelona, Milan, if you can live in a cheaper (and way less populated) city? Going back to the office, even if it's 2 days/week completely defets decentralization of housing in most of Europe.


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: