Hacker Newsnew | past | comments | ask | show | jobs | submit | eunoia's commentslogin

This is real. I’ve seen some baffling bugs in prompt based stop hook behavior.

When I investigated I found the docs and implementation are completely out of sync, but the implementation doesn’t work anyway. Then I went poking on GitHub and found a vibed fix diff that changed the behavior in a totally new direction (it did not update the documentation).

Seems like everyone over there is vibing and no one is rationalizing the whole.


I’m happy to throw an LLM at our projects but we also spend time refactoring and reviewing each other’s code. When I look at the AI-generated code I can visualize the direction it’s headed in—lots of copy-pasted code with tedious manual checks for specific error conditions and little thought about how somebody reading it could be confident that the code is correct.

I can’t understand how people would run agents 24/7. The agent is producing mediocre code and is bottlenecked on my review & fixes. I think I’m only marginally faster than I was without LLMs.


> with tedious manual checks for specific error conditions

And specifically: Lots of checks for impossible error conditions - often then supplying an incorrect "default value" in the case of those error conditions which would result in completely wrong behavior that would be really hard to debug if a future change ever makes those branches actually reachable.


I always thought that the vast majority of your codebase, the right thing to do with an error is to propagate it. Either blindly, or by wrapping it with a bit of context info.

I don’t know where the LLMs are picking up this paranoid tendency to handle every single error case. It’s worth knowing about the error cases, but it requires a lot more knowledge and reasoning about the current state of the program to think about how they should be handled. Not something you can figure out just by looking at a snippet.


Training data from junior programmers or introductory programming teaching material. No matter how carefully one labels data, the combination of programming’s subjectivity (damaging human labeling and reinforcement’s effectiveness at filtering around this) and the sheer volume of low-experience code in the input corpus makes this condition basically inevitable.


Garbage in garbage out as they say. I will be the first to admit that Claude enables me to do certain things that I simply could not do before without investing a significant amount of time and energy.

At the same time, the amount of anti-patterns the LLM generates is higher than I am able to manage. No Claude.md and Skills.md have not fixed the issue.

Building a production grade system using Claude has been a fools errand for me. Whatever time/energy i save by not writing code - I end up paying back when I read code that I did not write and fixing anti-patterns left and right.

I rationalized by a bit - deflecting by saying this is AI's code not mine. But no - this is my code and it's bad.


> At the same time, the amount of anti-patterns the LLM generates is higher than I am able to manage. No Claude.md and Skills.md have not fixed the issue.

This is starting to drive me insane. I was working on a Rust cli that depends on docker and Opus decided to just… keep the cli going with a warning “Docker is not installed” before jumping into a pile of garbage code that looks like it was written by a lobotomized kangaroo because it tries to use an Option<Docker> everywhere instead of making sure its installed and quitting with an error if it isn’t.

What do I even write in a CLAUDE.md file? The behavior is so stupid I don’t even know how to prompt against it.


> I don’t know where the LLMs are picking up this paranoid tendency to handle every single error case.

Think about it, they have to work in a very limited context window. Like, just the immediate file where the change is taking place, essentially. Having broader knowledge of how the application deals with particular errors (catch them here and wrap? Let them bubble up? Catch and log but don't bubble up?) is outside its purview.

I can hear it now, "well just codify those rules in CLAUDE.md." Yeah but there's always edge cases to the edge cases and you're using English, with all the drawbacks that entails.


I have encoded rules against this in CLAUDE.md. Claude routinely ignores those rules until I ask "how can this branch be reached?" and it responds "it can't. So according to <rule> I should crash instead" and goes and does that.


The answer (as usual) is reinforcement learning. They gave ten idiots some code snippets, and all of them went for the "belt and braces" approach. So now thats all we get, ever. It's like the previous versions that spammed emojis everywhere despite that not being a thing whatsoever in their training data. I don't think they ever fixed that, just put a "spare us the emojis" instruction in the system prompt bandaid.


This is my biggest frustration with the code they generate (but it does make it easy to check if my students have even looked at the generated code). I dont want to fail silently or hard code an error message, it creates a pile of lies to work through for future debugging


Writing bad tests and error handling have been the worst performance part of Claude for me.

In particular writing tests that do nothing, writing tests and then skipping them to resolve test failures, and everybody's favorite: writing a test that greps the source code for a string (which is just insane, how did it get this idea?)


Seriously. Maybe 60% of the time I use claude for tests, the "fix" for the failing tests is also to change the application code so the test passes (in some cases it will want to make massive architecture changes to accomodate the test, even if there's an easy way to adapt the test to better fit the arch). Maybe half the time that's the right thing to do, but the other half the time it is most definitely not. It's a high enough error rate that it borderlines on useful.


Usually you want to fix the code that's failing a test.

The assumption is that your test is right. That's TDD. Then you write your code to conform to the tests. Otherwise what's the point of the tests if you're just trying to rewrite them until they pass?


Or deleting the test files to make all tests pass. It’s my personal favorite.


>Seems like everyone over there is vibing and no one is rationalizing the whole.

Claude Code creator literally brags about running 10 agents in parallel 24/7. It doesn't just seems like it, they confirmed like it is the most positive thing ever.


It's software engineering crack. Starting a project feels amazing, features are shipping, a complex feature in the afternoon - ezpz. But AI lacks permanence, for every feature you start over from scratch, except there is more of codebase now, but the context window is still the same. So there is drift, codebase randomizes, edge cases proliferate, and the implementation velocity slows down.

Full disclosure - I am a heavy codex user and I review and understand every line of code. I manually fight spurious tests it tries to add by pointing a similar one already exists and we can get coverage with +1 LOC vs +50. It's exhausting, but personal productivity is still way up.

I think the future is bright because training / fine-tuning taste, dialing down agentic frameworks, introducing adversarial agents, and increasing model context windows all seem attainable and stackable.


I usually have multiple agents up working on a codebase. But it's typically 1 agent building out features and 1 or 2 agents code reviewing, finding code smells, bad architecture, duplicated code, stale/dead code, etc.

I'm definitely faster, but there's a lot of LLM overhead to get things done right. I think if you're just using a single agent/session you're missing out on some of the speed gains.

I think a lot of the gains I get using an LLM is because I can have the multiple different agent sessions work on different projects at the same time.


I think that the current test suite is far too small. For the Claude Code codebase, a sensible next step would be to generate thousands of tests. Without that kind of coverage, regressions are likely, and the existing checks and review process do not appear sufficient to reliably prevent them. My request is that an entirely LLM-written feature should only be eligible for merge once all of those generated tests pass, so we have objective evidence that the change preserves existing behavior.


I know at least one of the companies behind a coding agent we all have heard of has called in human experts to clean up their vibe coded IAC mess created in the last year.


I switched to OpenCode, away from Claude-Code, because Claude-Code is _so_ buggy.


> When I investigated I found the docs and implementation are completely out of sync, but the implementation doesn’t work anyway.

That is not an uncommon occurrence in human-written code as well :-\


Someone said it best after one of those AWS outages from a fat-fingered config change:

> Automation doesn't just allow you to create/fix things faster. It also allows you to break things faster.

https://news.ycombinator.com/item?id=13775966

Edit: found the original comment from NikolaeVarius


What else could they do? If they don't vibecode Claude Code it is a bad look.


omg are you me? I had this exact same problem last week


It was never principled opposition to anything, just a power fantasy that the current admin lets them live even more viscerally.


The “don’t tread on me” folks converted to “comply or die” shockingly quickly.


"Don't tread on me, tread on them"


I think you’re fundamentally right. Trump is obviously the worst we’ve seen yet, but power has been accumulating unchecked in the executive branch’s hands for decades now.

Trump is merely a symptom of the problem that is the Imperial Presidency. If we can’t tackle the problem itself we’ll get another politician doing the exact same shit after Trump.


Most of which is downstream of 9/11 and the War on Terror. That provided lots of bipartisan support for state sponsored killings.


It's been going since Reagan. Justice John Roberts is in the "unitary executive" camp and has been working to expand presidential power his whole career.


Not sure why this is being downvoted when it's quite an obvious trend in American politics. The executive has been getting stronger and Congress has been getting weaker and more dysfunctional for many years.

We have been setting the stage and preparing the throne for an American dictator or emperor for at least 50 years, just waiting for one to decide to sit in the chair and wield the power we've laid at their feet. The only thing that stopped this from happening sooner is that none of the prior administrations truly wanted to do this.

Bush, in particular, could have become dictator easily after 9/11. I dislike George W. pretty strongly but I do give him a little credit here.


They raided Trump's wife's underwear drawer too so Trump is a victim of this FBI overreach as well.


Its not really overreach if they get a warrant and find the things they were looking for.


[flagged]


Are you referring to the defense’s assertion that even though great care was taken by the FBI to ensure that no individual file would move from one box to another, the ordering within an individual box was not preserved and constituted a “spoilation” of the evidence?

Or are you referring to something else that I’m not yet aware of?

Edit with an article link, to set a standard for the quality of discourse: https://thehill.com/policy/national-security/4648830-mar-a-l...


Fabricated by whom? Like, out of whole cloth? Did the files not exist in the bathroom?


Citations needed.


The difference here is that the journalist got passed information by people who were committing the same crime Trump did. Trump directly committed his crime. The two are not equal.

Furthermore, Trump is no journalist, nor did he steal the secret files for journalistic purposes.


They moved them from a secure closet to a secure bathroom?


Indeed, don't blame the individual (all thought the individual has plenty of individual blame going their way, rightfully so), blame the system.

Unless the system changes, it'll continue to let people misuse it to their own gain. Trump was hardly the first one, and depending on how things will go, he might be the last, but "last" in a good way or in a bad way remains to be seen.


I have an ongoing debate (argument? fight?) with my father about this. He recalls a time when it felt as if there were 'good guys' in politics, and can't understand why it is that I'm so hard on the democrats (this has begun to shift in recent months as Chucklefuck and Aipac Shakur have consistently disappointed him). Besides the obvious issue of republicans being a lost cause, it's policies like too big to fail and dodd-frank and nafta that created the conditions for our current mess, all the while expanding and allowing basic, obvious bad policy to persist (presidential pardons, executive order powers, life terms in the supreme court).

A five year old can see the problems with a lot of this stuff, which once upon a time you'd defend with vague notions of a self-policing culture or the ghost of ethics in governance. Those kinds of non-safeguards can work fine in a stable system, but they inherently rely on foreknowledge of future conditions not changing in unpredictable ways.

The self-reinforcing recursive loop underlying all this is that the systems of governance can only be changed by the governors. I'm becoming increasingly convinced that democracy will fail so long as it's representative - the incentives to fix the system itself are simply not there because any inefficiency is exploitable for personal gain (so why fix it?) The doomsday proposition that comes out of that though is that the system cannot be changed - only replaced once it decisively breaks. Maybe that's what all this is. I would hate to find another bottom but I fear there's more to go before we get there.


Government is of course the quintessential multi-agent coordination problem.

It has big problems when the people running it don't embody the values that it depends on.


I absolutely blame the individual.

Who is responsible for the system if not the individual - and the collective thereof?

The fundamental problem is the citizen not being educated or caring enough about their own independence and state of being in the framework of a global economy and sovereign nation state


I would highly recommend the book Amusing Ourselves to Death if you are looking to understand how the populace got to the point where truth is irrelevant and nothing really matters.

It helped my mental model a lot at the very least.


I’m extremely aware of amusing ourselves to death

I think we came away with very different conclusions

To me it is abject proof that individuals do not have the mental emotional or other capacity to actually behave in the modern world such that they retain their mental independence and develop a sense of personal epistemology

Humans are way too dumb and prone to propaganda to actually have a coherent society at the scale needed so that we don’t collectively kill each other through poorly identified and attributed externalities


Unfortunately I think we actually agree on this and did come away with the same conclusions.


I believe, but cannot prove, that our malleability was an evolutionary advantage. It enabled homo sapiens to gather in ever larger social groups.

Media, from obelisks to tiktok, enables exploitation of our evolutionary quirk.


Similar to how we investigate and figure out airplane crashes, the system should not allow you to get into those situations in the first place, that's the solution that works across time, instead of for just individual situations.

For example, how is someone who led/incited an insurrection against the government able to become head of said government? Already there, something is gravely wrong. You don't let undemocratic leaders lead a democratic society. So the system is broken, and the current administration is proof of that.

Otherwise what other commentators said will happen, someone who might even be worse than Trump will eventually lead the country.


So then again my question is who ultimately audits and holds the system accountable such that if the system needs to be fixed it gets fixed?

The only answer to that is the people who form the citizenry.

If the citizens cannot influence the system such that they can actually affect change on the system then they are irrelevant in it and the system needs to be replaced

As long as they continue to fail to organize then they will continue to be dominated by it

That’s just reality

There is no alternative organization that can counter the global capitalist system currently


We must blame the system and the individual, otherwise the system will never change.


His full Lord of the Rings audiobook is likewise incredible.


How dare you quote me!


Why is it impossible for most people (more specifically Americans in my experience) to act in good faith during political discussion? You can't even admit wrongdoing or poor phrasing without them twisting your words or deliberately misunderstanding you.


Eh not sure why OP chose Boulder, it is a great EV town. I commute into Boulder in an EV every day I need to be in the office. I almost exclusively charge my EV in Boulder at the office. The proportion of EVs I see driving in and out of Boulder is very high.

I also road trip around Colorado in my EV and it works great.


Read the first two this year. Absolutely incredible and now some of my all time favorite sci-fi.


It does now with M series chips. iirc Apple made a point of demoing the quick wake in the announcement too.


We have extremely old words for this kind of behavior: greed, avarice. Traditionally they have not been considered good things.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: