More

hatefulmoron · 2026-01-03T10:29:43 1767436183

I got their z.ai plan to test alongside my Claude subscription; it feels about on par with something between sonnet 4.0 and sonnet 4.5. It's definitely a few steps below current day Claude, but it's very capable.

enraged_camel · 2026-01-03T12:35:47 1767443747

When you say "current day Claude" you need to distinguish between the models. Because Opus 4.5 is significantly ahead of Sonnet 4.5.

hatefulmoron · 2026-01-04T20:53:40 1767560020

Yeah, when I say "current day Claude" I'm referring to Opus 4.5, which is what I always use on the max plan.

kachapopopow · 2026-01-03T14:45:08 1767451508

opus 4.5 is truly like magic, completely different type of intellience - not sure.

hhh · 2026-01-03T15:19:02 1767453542

most of my experience with 4.5 is similar to codex 5.1, where I just have to scold it for being dumb and doing things I would have done as a teenager

kachapopopow · 2026-01-03T16:50:08 1767459008

dumbness usually comes from lack of information, humans are the same way - the difference between other llms is that if opus has information it has a ridiculously high accuracy on tasks.

croes · 2026-01-03T16:21:31 1767457291

Magic when it works.

jijji · 2026-01-03T18:30:56 1767465056

z.ai (Zhipu AI) is a chinese run entity, so presumably China's National Intelligence Law put in place in 2018, which requires data exfiltration back to the government, would apply to the use of this. I wouldn't feel comfortable using any service that has that fundamental requirement.

deaux · 2026-01-04T02:17:55 1767493075

Google, OpenAI, Anthropic and Y Combinator are US run entities, so presumably the CLOUD Act and FISA require data exfiltration back to the government when asked, on top of the all the "Room 641A"s where the NSA directly taps into the ISP interconnects, would apply to the use of them. I wouldn't feel comfortable using any service that has that fundamental requirement.

hatefulmoron · 2026-01-04T20:55:33 1767560133

I wouldn't use any provider: z.ai, Claude, OpenAI, ... if I was concerned about the government obtaining my prompts. If you're doing something where this is a legitimate concern (as opposed to my open source stuff), you should get a local LLM or put a lot of effort into anonymizing yourself and your prompts.

queenkjuul · 2026-01-03T22:43:45 1767480225

If the Chinese government has the data at least the US government can't grab it and use it in court.

Not living in China I'm not too concerned about the Chinese government

hatefulmoron · 2025-12-19T01:10:52 1766106652

I've had the $20/month plan for a few months alongside a max subscription to Claude; the cheap codex plan goes a really long way. I use it a few times a day for debugging, finding bugs, and reviewing my work. I've ran out of usage a couple of times, but only when I lean on it way more than I should.

I only ever use it on the high reasoning mode, for what it's worth. I'm sure it's even less of a problem if you turn it down.

Foobar8568 · 2025-12-19T06:16:12 1766124972

$200 on claude for vibe coding, $20 on codex for code review and "brainstorming". I use other LLM for a 2nd - 3rd - 4th opinion.

hatefulmoron · 2025-12-17T04:25:08 1765945508

Dafny and similar languages use SMT; their semantics need to be such that you're giving enough information for your proof to verify in sufficient time, otherwise you'll be waiting for a very long time or your proof is basically undecidable.

I'm not sure about benchmarks comparing languages, but Dafny goes through a lot of tweaking to make the process faster.

hatefulmoron · 2025-12-01T08:20:15 1764577215

Is that interesting? Computers accomplish all sorts of tasks which require thinking from humans.. without thinking. Chess engines have been much better than me at chess for a long time, but I can't say there's much thinking involved.

hatefulmoron · 2025-11-23T15:09:00 1763910540

I admit that when reading the description of your relationship (I don't mean to be disrespectful, for what it's worth) I can't help but wonder how it can possibly be consistent with "a relationship between two people can be basically whatever they want it to be." It really reads like the relationship is whatever _she_ wants it to be.

If you had come into the relationship with the understanding that you'd both date/have sex with other people then great; it doesn't matter what other people think. However, when you say that it was hard for you to accept her being with other men, and that you're lucky that "she has never fallen in love and wanted to run away with one of em", damn. My first instinct is that you should take your own advice: find or design a relationship where you don't have to accept this.

I realize that some of my knee jerk reaction might just be instinct/cultural values, I mean no disrespect.

bad_haircut72 · 2025-11-23T15:54:56 1763913296

If I didnt like it, I would leave. Reread the post though you misinterpreted our situation.

hatefulmoron · 2025-11-17T00:12:53 1763338373

If you believed that you wouldn't explicitly say there was no AI generated content at all, you'd let it speak for itself.

hatefulmoron · 2025-11-04T05:39:17 1762234757

Calling things "slop" is just begging the question. The real differentiating factor is that, in the past, "human-generated slop" at least took effort to produce. Perhaps, in the process of producing it, the human notices what's happening and reconsiders (or even better, improves it such that it's no longer "slop".) Claude has no such inhibitions. So, when you look at a big bunch of code that you haven't read yet, are you more or less confident when you find out an LLM wrote it?

fragmede · 2025-11-04T08:18:45 1762244325

If you try and one shot it, sure, but if you question Claude, point out the errors of its ways, tell it to refactor and ultrathink, point out that two things have similar functionality and could be merged. It can write unhinged code with duplicate unused variable definitions that don't work, and it'll fix it up if you call it out, or you can just do it yourself. (cue questions of if, in that case, it would just be faster to do it yourself.)

hatefulmoron · 2025-11-04T08:29:38 1762244978

I have a Claude max subscription. When I think of bad Claude code, I'm not thinking about unused variable definitions. I'm thinking about the times you turn on ultrathink, allow it to access tools and negotiate it's solution, and it still churns out an over complicated yet partially correct solution that breaks. I totally trust Claude to fix linting errors.

fragmede · 2025-11-04T08:58:29 1762246709

It's hard to really discuss in the abstract though. Why was the generared code overly complicated? (I mean, I believe you when you say it was, but it doesn't leave much room for discussion). Similarly, what's partially correct about it? How many additional prompts does it take before you a) use it as a starting point b) use it because it works c) don't use any of it, just throw it away d) post about why it was lousy to all of the Internet reachable from your local ASN.

hatefulmoron · 2025-11-04T09:19:06 1762247946

I've read your questions a few times and I'm a bit perplexed. What kind of answers are you expecting me to give you here? Surely if you use Claude Code or other tools you'd know that the answers are so varying and situation specific it's not really possible for me to give you solid answers.

fragmede · 2025-11-04T19:07:09 1762283229

However much you're comfortable sharing! Obviously ideal would be the full source for the "overly complicated" solution, but naturally that's a no go, so even just more words than a two word phrase "overly complicated". Was it complicated because it used 17 classes with no inheritance and 5 would have done it? Was it overly complicated because it didn't use functions and so has the same logic implemented in 5 different places?

I'm not asking you, generically, about what bad code do LLMs produce. It sounds like you used Claude Code in a specific situation and found the generated code lacking. I'm not questioning that it happened to you, I'm curious in what ways it was bad for your specific situation more specifically than "overly complicated". How was it overly complicated?

Even if you can't answer that, maybe you could help me reword the phrasing of my original comment so it's less perplexing?

WalterSear · 2025-11-04T08:47:57 1762246077

If you are getting garbage out, you are asking it for too much at once. Don't ask for solutions - ask for implementations.

hatefulmoron · 2025-11-04T09:02:09 1762246929

Distinction without a difference. I'm talking about its output being insufficient, whatever word you want to use for output.

WalterSear · 2025-11-04T09:41:00 1762249260

And I'm arguing that if the output wasn't sufficient, neither was your input.

You could also be asking for too much in one go, though that's becoming less and less of a problem as LLMs improve.

hatefulmoron · 2025-11-04T10:03:05 1762250585

You're proposing a truism: if you don't get a good result, it's either because your query is bad or because the LLM isn't good enough to provide a good result.

Yes, that is how this works. I'm talking about the case where you're providing a good query and getting poor results. Claiming that this can be solved by more LLM conversations and ultrathink is cope.

WalterSear · 2025-11-04T10:43:15 1762252995

I've claimed neither. I actually prefer restarting or rolling back quickly rather than trying to re-work suboptimal outputs - less chance of being rabbit holed. Just add what I've learned to the original ticket/prompt.

'Git gud' isn't much of a truism.

WalterSear · 2025-11-04T06:31:53 1762237913

I have pretty much the same amount of confidence when I receive AI generated or non-AI generated code to review: my confidence is based on the person guiding the LLM, and their ability to that.

Much more so than before, I'll comfortably reject a PR that is hard to follow, for any reason, including size. IMHO, the biggest change that LLMs have brought to the table is that clean code and refactoring are no longer expensive, and should no longer be bargained for, neglected or given the lip service that they have received throughout most of my career. Test suites and documentation, too.

(Given the nature of working with LLMs, I also suspect that clean, idiomatic code is more important than ever, since LLMs have presumably been trained on that, but this is just a personal superstition, that is probably increasingly false, but also feels harmless)

The only time I think it is appropriate to land a large amount of code at once is if it is a single act of entirely brain dead refactoring, doing nothing new, such as renaming a single variable across an entire codebase, or moving/breaking/consolidating a single module or file. And there better be tests. Otherwise, get an LLM to break things up and make things easier for me to understand, for crying out loud: there are precious few reasons left not to make reviewing PRs as easy as possible.

So, I posit that the emotional reaction from certain audiences is still the largest, most exhausting difference.

grey-area · 2025-11-04T06:51:06 1762239066

clean code and refactoring are no longer expensive

Are you contending that LLMs produce clean code?

WalterSear · 2025-11-04T06:54:53 1762239293

They do, for many people. Perhaps you need to change your approach.

grey-area · 2025-11-04T16:01:50 1762272110

The code I've seen generated by others has been pretty terrible in aggregate, particularly over time as the lack of understanding and coherent thought starts to show. Quite happy without it thanks, haven't seen it adding value yet.

Jeremy1026 · 2025-11-05T14:59:44 1762354784

Or is the bad code you've seen generated by others pretty terrible, but the good code you've seen generated by others blends in as human-written?

My last major PR included a bunch of tests written completely by AI with some minor tweaking by hand, and my MR was praised with, "love this approach to testing."

dmurray · 2025-11-04T07:58:36 1762243116

If you can produce a clean design, the LLM can write the code.

WalterSear · 2025-11-04T09:35:57 1762248957

I think maybe there's another step too - breaking the design up into small enough peices that the LLM can follow it, and you can understand the output.

TexanFeller · 2025-11-04T11:57:02 1762257422

So do all the hard work yourself and let the AI do some of the typing, that you’ll have to spend extra time reviewing closely in case its RNG factor made it change an important detail. And with all the extra up front design, planning, instructions, and context you need to provide to the LLM I’m not sure I’m saving on typing. A lot of people recommend going meta and having LLMs generate a good prompt and sequence of steps to follow, but I’ve only seen that kinda sorta work for the most trivial tasks.

fragmede · 2025-11-04T08:20:32 1762244432

Unless you're doing something fabulously unique (at which point I'm jealous you get to work on such a thing), they're pretty good at cribbing the design of things if it's something that's been well documented online (canonically, a CRUD SaaS app, with minor UI modification to support your chosen niche).

WalterSear · 2025-11-04T08:45:53 1762245953

And if you are doing something fabulously unique, the LLM can still write all the code around it, likely help with many of the components, give you at least a first pass at tests, and enable rapid, meaningful refactors after each feature PR.

hatefulmoron · 2025-11-04T08:14:03 1762244043

I don't really understand your point. It reads like you're saying "I like good code, it doesn't matter if it comes from a person or an LLM. If a person is good at using an LLM, it's fine." Sure, but the problem people have with LLMs is their _propensity_ to create slop in comparison to humans. Dismissing other people's observations as purely an emotional reaction just makes it seem like you haven't carefully thought about other people's experiences.

WalterSear · 2025-11-04T09:28:12 1762248492

My point is that, if I can do it right, others can too. If someone's LLM is outputing slop, they are obviously doing something different: I'm using the same LLMs.

All the LLM hate here isn't observation, it's sour grapes. Complaining about slop and poor code quality outputs is confessing that you haven't taken the time to understand what is reasonable to ask for, aren't educating your junior engineers how to interact with LLMs.

lukan · 2025-11-04T12:37:23 1762259843

"My point is that, if I can do it right, others can too."

Can it also be, that different people work in different areas and LLM's are not equally good in all areas?

WalterSear · 2025-11-04T18:07:29 1762279649

That was my first assumption, quite a while ago now.

rockskon · 2025-11-05T00:03:52 1762301032

???

People complaining about receiving bad code is, by definition, observation.

HelloNurse · 2025-11-04T14:58:29 1762268309

> Perhaps, in the process of producing it, the human notices what's happening and reconsiders (or even better, improves it such that it's no longer "slop".)

Given the same ridiculously large and complex change, if it is handwritten only a seriously insensitive and arrogant crackpot could, knowing what's inside, submit it with any expectation that you accept it without a long and painful process instead of improving it to the best of their ability; on the other hand using LLM assistance even a mildly incompetent but valuable colleague or contributor, someone you care about, might underestimate the complexity and cost of what they didn't actually write and believe that there is nothing to improve.

hatefulmoron · 2025-10-15T00:12:34 1760487154

K et al. can look like that, but this example doesn't require any explanation to someone who has any familiarity.

hatefulmoron · 2025-10-07T01:13:54 1759799634

There's probably a difference in degree, however. Alopecia Areata is much more uncommon, while regular male pattern baldness is very common.

There's also the fact that Alopecia Areata is actually more common in women, which I imagine exaggerates the distress compared to the more run of the mill MPB.

I realize you didn't mean to use a study on Alopecia Areata, but the difference in degree could be quite large.

hatefulmoron · 2025-10-07T01:06:54 1759799214

It's also possible that people taking Finasteride might be a more potent selection of people that are distressed about hair loss, and are therefore more likely to exhibit depression, etc. As in, if people with androgenetic alopecia are more likely to be depressed, people who take finasteride may be a sampling of those people who are distressed enough to seek and maintain treatments.

nosefurhairdo · 2025-10-07T01:38:58 1759801138

Additionally, the kind of person who would reach for prescription medication vs accepting hair loss may be predisposed to depression. I.e. this may be selecting for people who struggle with self-acceptance generally.

I also wonder whether there's some degree of placebo going on. Patients know finasteride is anti-androgenic; perhaps when they inevitably experience some symptoms associated with hypogonadism they assume the worst and lament the choice between having hair and feeling youthful. This would also explain why many who get off finasteride don't notice their symptoms improve.

Personal bias: I've taken finasteride for years with no side effects.

derektank · 2025-10-07T02:16:45 1759803405

This is exactly why people thought isotretinoin (brand name Accutane) caused suicides (and required huge hurdles to access for years). It turns out that people suffering from physical disfigurements, such as acne, are more prone to suicide than the general population. Not sure if this is also true of androgenetic alopecia but it would hardly be surprising.

NickM · 2025-10-07T04:03:59 1759809839

This is completely false; the psychiatric effects of isotretinoin are well studied and significant, with a plausible mechanism of action no less.

Many people supposed that it’s just the acne making people depressed because it’s a nice plausible explanation, but it’s verifiably wrong.

hatefulmoron · 2025-10-07T02:32:48 1759804368

I don't think we're saying different things. People who are distressed about their appearance are more likely to be depressed, and people who seek medicine and surgeries are probably more distressed still, and therefore more likely to be depressed, ..

derektank · 2025-10-07T14:47:06 1759848426

We're not, I was agreeing with you

hatefulmoron · 2025-10-15T10:17:03 1760523423

My mistake :-)

teraflop · 2025-10-07T03:09:12 1759806552

It did jump out at me that the paper repeatedly cites studies that found a correlation between finasteride and psychological side effects, and then talks about them as though they're evidence of causation.

coolThingsFirst · 2025-10-07T03:16:49 1759807009

The medicine has side-effects documented by users as well. There is PFS(post finesteride syndrome), I've never heard of Post Aspirine Syndrome.