Hacker Newsnew | past | comments | ask | show | jobs | submit | cautiouscat's commentslogin

> I have no concrete scientific evidence of this - my own personal vibe metric of “is a model good enough” is, “do I have to double-check it against an API model”, and GPT-OSS was the first one where I started doing that a lot less often.

The good old butt dyno!

I’ve been eyeing local models more and more with Anthropic squeezing more and more on the subscriptions. A few comments on HN had me waiting until they improved more but this article makes me wonder if I should reconsider that.

I’ve been doing some pretty niche development using a game and a script extender for said game. If these models can handle that, I’d feel good about switching.


Maybe it’s just me but it’s still hard. Writing code wasn’t hard before. Honestly putting up guard rails is harder than writing it yourself. It just may be faster now.

Getting proper requirements, knowing what to make, update, the domain knowledge, satisfying customers was and still is the hard part.


I don't disagree, and I've heard this said a bunch, but to some degree it feels like cope. Writing code wasn't generally the barrier to success, but it was still a barrier. And for sure, it's not gone entirely, but in a way it was one of the fun parts, and it does feel like a big part of it is gone now.

Many of us have nostalgic memories about staying up late, in the zone, cranking out code until you manage to get something working. "Getting requirements", figuring out "what to make", "satisfying customers"... all as important as they always have been, but they just aren't fun in the same way.

As for domain knowledge, well, to a degree that's one of the parts that used to be hard and isn't anymore. Like the article says, many of us prided ourselves on having in-depth knowledge of network protocols, OS internals, whatever. Now you can just ask a bot for that stuff and, while it's sure not perfect yet, stand a solid chance of getting a good answer.


I don’t think writing code was ever hard. It’s basically the same level of difficulty as learning to write sheet music, a foreign language, or mathematical formula. Same with basic computer concepts.

The hardest part is the formal logic, recursive reasoning, and how to abstract. It’s a thinking mode that some find difficult to adopt.

As for domain knowledge, I don’t think that has ever been difficult to obtain. Just behind me, I have the CLRA Algorithms book, and that has pretty much everything you may need in that regards. Same with various other types of knowledge. And with Youtube, you can easily find visualizations if books do not work for you.

I’ve taught people how to code and they can grasp concepts quite easily. It’s the thinking aspect that they have trouble with. Meticulously thinking about every computation path, categorizing errors and handling them is not something a lot of people like.


Interesting - I’ve been thinking that when people say “writing code” they mean figuring out abstractions, logic, reasoning.

Without that, what’s left? The syntax?

Realizing we might not all have the same thing in mind when someone says “writing code was never hard/easy” is making a lot of other comments I’ve read here make more sense!


> Without that, what’s left? The syntax?

I ran a small experiment on co-workers around a decade ago and learned that often enough they don't read code, more like they decipher it. Which all of a sudden the obsession with code linters and formatters suddenly made sense (though I still don't like them). And being unable to read code, I have to guess writing it was difficult too.

Afterwards one of them said it never even crossed his mind it was possible to read code like that.


Writing code, IMO, is mostly the typing. AKA the syntax and whatever libraries/platforms you’re using. They are incidental to creating a software solution to whatever problems. It requires rigor, but it’s not particularly difficult.

Abstractions, logic, and reasoning is not helped by AI tools. Yes they can give you a ready-made solution, but you’ll need to exercise judgement to see if it really fits the problem. And doing the latter can be as hard as just doing it yourself.


I would agree with that. Whenever I’ve delegated more than what feels like ~80% to the agent (“vibe coding”, I suppose) is when it starts feeling bloated, messy, and bug-prone.

On the flip side, having a design written ahead of time, ideally with reviews by peers and AI and as little as possible AI “content”, seems to almost always go well.

Obviously very subjective!


I don’t disagree with you, when building with Xcode or Android studio, kinda like 90 percent of the code is generated or copied from examples in the doc. Same with HTML when using a CSS framework. I learned vim, just to be efficient with copypasting :) So far for me to throw shades to anyone using tools.

My arguments is more about throwing slop at fellow collaborators, not ensuring correctness of the code, and various claims that try to justify those behaviors. “The agent wrote it” is no more of an excuse than “It was the accepted answer on Stack Oveflow”.


Oh for sure, I think we’re on the same page. I wasn’t trying to change your mind on anything, just adding my two cents :)

> It’s basically the same level of difficulty as learning to write sheet music, a foreign language, or mathematical formula.

I mean, all of those are pretty dang hard. Maybe you're just particularly skilled (genuinely, no shade intended), I certainly couldn't do any of those without a significant level of effort.

I'd also personally consider "formal logic, recursive reasoning, and how to abstract" as parts of writing code, as the other commenter said. And while AI certainly isn't at the point of "solving" those yet, it's a heck of a lot closer than we were a few years ago.

And sure, you can always obtain domain knowledge, but the whole point of knowing it is that you can see approaches other might not, answer questions quickly, etc. And a lot of this is still relevant post-AI, but it does feel like a lot of it has been lost. It feels like implying that search engines weren't a major upgrade to research because you could always just go to the library and look through books to find your answer - sure, but googling a question is a lot easier! And chatbots just feel like an upgrade from that.


> It feels like implying that search engines weren't a major upgrade to research because you could always just go to the library and look through books to find your answer

I do agree with you on that point as local libraries (where I am) and the internet has been a true treasure of information for me. And yes, AI is way easier for accessing information (even with the high risk of hallucinations). What I tend to argue against is statements that basically said that before AI, it was a dark age of information.


> What I tend to argue against is statements that basically said that before AI, it was a dark age of information.

For sure, agreed entirely on that.


Maybe the recent final update to Destiny has already taken over my brain but if Marko is a Destiny fan, he has a great GitHub username.

This is an extremely detailed article on every level and I can’t wait to deep dive into it. Marko really nailed the “old” look but it still looks fresh and new.


I assume consumers aren’t a big note in their bottom line. I’m not actually very sure about that, just an assumption.

What I wonder however is if these tools will become something I use at work only. $100/month is already a massive stretch budget wise. If these models keep devouring tokens there’s no way I’d get the same usage time out of them for $100 in usage credits.

I just don’t think I’d use them much at all at home.


In the automotive world we have benchmarks in HP/torque with the dyno. That’s expensive though, so many depend on their “butt dyno” to judge if their fresh new parts and tune made a difference.

I’m curious how this will feel to my code “butt dyno”. I haven’t noticed much between Opus and Sonnet. I’m comparing this difference to the early days of Claude in 2025. It does what I need and both need a little bit of correction and whatnot. Benchmarks are nice, but I want to see how this feels. Looking forward to trying it later tonight.


I have a similar question.

I think most software projects have reached the point that the speed of capturing real information about what the winner's circle looks like, and therefore what the program should be, so many magnitudes slower than the amount of code that can be generated in the wrong direction.

I'd need to measure these new models on well understood but complex problems that are relatively easy to validate to get a sense if they are 'better'; on the other hand, the real impact in daily life may be marginal since generating code is not the biggest problem at the moment.


It’s an interesting thing to bring up because it’s this classic thing we’ve seen for decades now.

The ramifications go beyond the individual which is why I assume they mentioned it. They don’t need to use it/not use it for it to have interesting implications.


so it'd be preferable if they didn't include the model at all?


I didn’t say that and I don’t have a feeling on that either way. But this is a limited time trial and calling it out as such is valid.

Is it nice we get the trial? Sure. Is it also a common play in the playbook of tech companies? Yes.


I agree but there’s definitely room for nuance. I follow a lot of artists because I genuinely like seeing their work. I follow a lot of miniature painters for their tips and tricks. I follow my close friends to see what they’re up to.

I think the folks you’re talking about are influencers. Which I wholeheartedly agree with your take in that case.


That’s just advertising. Yes, mom and pop stores can advertise “just like” the multinational corporations can. Guess who gets the lion’s share of airtime and guess who has armies of men+machines crafting the most convincing messaging.


How is someone showing a 3D render with no products or services to buy from advertising to me? In addition, why does that matter if I enjoy the content?

It’s not “just” advertising. Again this is nuanced.


> I don't know if y'all have tried it, but it now produces really good stuff.

Does it? It produces passable stuff that is fine. However the lack of passion and care completely disinterests me.


Passable and fine is the Hallmark of capitalism.


I’m more of an “AI centrist”, as I think the topic is extremely nuanced. As with most tech hype, there tends to be a black and white “AI good” or “AI bad”. I think reality is somewhere in the middle, personally.

> Let’s face it: by the time I manually ship version 1.0 of a product, the AI-assisted version could have been deployed 10x faster. By then, enough real-world feedback would have surfaced to identify the major issues, and tools like Claude Code would make it possible to fix and ship version 2.0 at an incredible pace.

It’s takes like this that remove the nuance completely and ignore so many facets of the debate. That being said, let’s assume this is true because I think vibecoding a CRUD app does make this realistic on the face of it. When I say vibecoding I mean prompting and dropping, not reading the code.

You do your adversarial reviews with multiple agents, you have your UX agent look over it, your security agent etc. Under the hood there are architectural issues. The code is probably passable, but rough.

You release, customers start using it for their business that they depend on for income. Issues start cropping up, you burn more and more tokens to fix the issues as they come up. Expedience starts sacrificing quality even still, architecture (if there was any) starts being violated and it degrades more and more.

I consider myself a professional, I would never want to end up in a situation like this for mission crititcal products. So, what do I do? I read the output, I make sure I understand it. Why? I care about my customers and secondarily I’m the one with the pager when something breaks down.

Now, for some fun hobby project to track my hobby paints for Warhammer… who cares? I agree. I have used Claude for such projects and not really cared. But your statement does not hold up in the enterprise world with mission critical software.

> At some point, execution speed starts to matter more than the elegance of the code.

This is reductive. You’re assuming people’s concern is “elegance”. It isn’t solely elegance. It’s domain understanding. It’s quality over all. It’s being a professional.

Writing the code was never the slow down for large scale enterprise products.


There are medical devices and stuff that will always require slow deliberate testing and changing.

If you have an application that real people use in everyday work and depend on stuff not moving under their feet. Execution speed is not something you aim for because you have to adjust to speed of users adopting changes.

As a startup owner of course you can’t really care about those people using your software because you have to pivot as quickly as possible to an investment that pays the most and boring every day app is not going to be a unicorn.

There is another option at the end of the spectrum. You aim at posthuman world where humans are not required and your software doesn’t have to bother with earning money or users that are slow or have emotions.


> Problem is once we got them, we realized they are not all that.

Isn't this just the hype cycle? [1]

Fake edit: I know its not a perfect model.

1: https://www.gartner.com/en/research/methodologies/gartner-hy...


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: