This can't be understated. I started using it heavily earlier this summer and it...

data-ottawa · 2025-09-09T16:00:24 1757433624

I don’t think you’re crazy, something is off in their models.

As an example I’ve been using an MCP tool to provide table schemas to Claude for months.

There was a point where it stopped recognizing the tool unless mentioned in early August. Maybe that’s related to their degraded quality issue.

This morning after pulling the correct schema info Sonnet started hallucinating columns (from Shopify’s API docs) and added them to my query.

That’s a use case I’ve been doing daily for months and in the last few weeks has gone from consistent low supervision to flaky and low quality.

I don’t know what’s going on, Sonnet has definitely felt worse, and the timeline matches their status page incident, but it’s definitely not resolved.

Opus 4.1 also feels flaky, it feels like it’s less consistent about recalling earlier prompt details than 4.0.

I personally am frustrated that there’s no refund or anything after a month of degraded performance, and they’ve had a lot of downtime.

reissbaker · 2025-09-09T18:58:42 1757444322

FWIW I strongly recommend using some of the recent, good Chinese OSS models. I love GLM-4.5, and Kimi K2 0905 is quite good as well.

jimbo808 · 2025-09-09T19:38:44 1757446724

I'd like to give these a try - what's your way of using them? I mostly use Claude because of Claude Code. Not sure what agentic coding tools people are using these days with OSS models. I'm not a big fan of manually uploading files into a web UI.

reissbaker · 2025-09-09T20:19:27 1757449167

The most private way is to use them on your own machine; a Mac Studio maxed out to 512GB RAM can run GLM-4.5 at FP8 with fairly long context, for example.

If you don't have the hardware to run it locally, let me shill my own company for a minute: Synthetic [1] has a $20/month subscription to most of the good open-weight coding LLMs, with higher rate limits than Claude's $20/month sub. And our $60/month sub has higher rate limits than the $200/month maxed-out version of the Claude Max plan.

You can still use Claude Code by using LiteLLM or similar tools that convert Anthropic-style API requests to OpenAI-style API requests; once you have one of those running locally, you override the ANTHROPIC_BASE_URL env var to point to your locally-running proxy. We'll also be shipping an Anthropic-compatible API this week to work with Claude Code directly. Some other good agentic tools you could use instead include Cline, Roo Code, KiloCode, OpenCode, or Octofriend (the last of which we maintain).

1: https://synthetic.new

sheepscreek · 2025-09-09T21:37:28 1757453848

Very impressed with what you're doing. It's not immediately clear how the prompts and the data is used on the site. Your terms mention a 14 day API retention, but it's not clear if that applies to Octo/the CLI agent and any other forms of subscription usage (not through UI).

If you can find a way to secure the requests even during the 14 day period, or anonymize them while allowing the developers to do their job, you can have my money today. I think privacy/data security is the #1 concern for me, especially if the agents will be supporting me in all kinds of personal tasks.

reissbaker · 2025-09-10T19:08:52 1757531332

FWIW the 14 day retention is just to cover accidental log statements being deployed — we don't intentionally store API request prompts or completions after processing at all. We'll probably change our stated policy to no-store since in practice that's what we do (and we get this feedback a lot!)

IgorPartola · 2025-09-09T23:09:56 1757459396

Is there a possibility of my work leaning to others? Does your staff have the ability to view prompts and responses? Is tenancy shared with other users, or entities other than your company?

This looks really promising since I have also been having all sorts of issues with Claude.

reissbaker · 2025-09-10T19:16:18 1757531778

We never train on your prompts or completions, and for the API we don't store longer than 14 days (in fact, we don't ever intentionally store API prompts or completions at all, the 14 day policy was originally just to cover accidental log statements being deployed; we'll probably change it to no-store since it's confusing to say 14 days when we actually don't intentionally store). For the web UI we do have to store, since otherwise we couldn't show you your message history.

In terms of tenancy: we have our own dedicated VMs for our Kubernetes cluster via Azure, although I suspect a VM is not equivalent to an entire hardware node. We use Supabase for our Postgres DB, and Redis for ephemeral data; while we don't share access to that to any other company, we don't create a new DB for every user of our service, so there is user multitenancy there. Similarly, the same GPUs may serve many customers — otherwise we'd need to charge enormous amounts for inference. But, the requests themselves aren't intermingled; i.e. if you make a request, it doesn't affect someone else's.

AlecSchueler · 2025-09-09T21:59:36 1757455176

How do you store/view the data I send you?

reissbaker · 2025-09-10T19:18:14 1757531894

For API prompts or completions, we don't store after we return the completion to your prompt (our privacy policy allows us to store for a maximum of 14 days, just to cover accidental log statement deploys). For the web UI we store them in Postgres, since the web UI lets you view your message history and we wouldn't be able to serve that to you without storing it.

AlecSchueler · 2025-09-11T09:09:41 1757581781

https://developer.mozilla.org/en-US/docs/Web/API/Window/loca...

reissbaker · 2025-09-11T16:44:24 1757609064

Yeah, localStorage-only doesn't do things like sync across devices or persist if you lose your phone. But since we expose an OpenAI-compatible endpoint, if you don't care about those things there are plenty of LLM clients that will keep your data 100% on-device that you can use instead of the web UI.

billyjobob · 2025-09-09T19:50:38 1757447438

Both of those models have Anthropic API compatible endpoints, so you just set an environmental variable pointing to them before you run Claude Code.

8note · 2025-09-09T19:00:38 1757444438

ive been thinking its that my company mcp has blown up in context size, but using claude without claude code, i get context window overflows constantly now.

another option could be a system prompt change to make it too long?

data-ottawa · 2025-09-09T21:12:28 1757452348

I think that’s because of the Artifacts feature and how it works. For me after a few revisions it uses a ton of tokens.

As a baseline from a real conversation, 270 lines of sql is ~2500 tokens. Every language will be different, this is what I have open.

When Claude edits an artifact it seems to keep the revisions in the chat context, plus it’s doing multiple changes per revision.

After 10 iterations on a 1k loc artifact (10k tokens) you’re at 100k tokens.

claude.ai has a 200k token window according to their docs (not sure if that’s accurate though).

Depending on how Claude is doing those in place edits that could be the whole budget right there.

dingnuts · 2025-09-09T18:28:13 1757442493

I have read so many anecdotes about so many models that "were great" and aren't now.

I actually think this is psychological bias. It got a few things right early on, and that's what you remember. As time passes, the errors add up, until the memory doesn't match reality. The "new shiny" feeling goes away, and you perceive it for what it really is: a kind of shitty slot machine

> personally am frustrated that there’s no refund or anything after a month of degraded performance

lol, LMAO. A company operates a shitty slot machine at a loss and you're surprised they have "issues" that reduce your usage?

I'm not paying for any of this shit until these companies figure out how to align incentives. If they make more by applying limits, or charge me when the machine makes errors, that's good for them and bad for me! Why should I continue to pay to pull on the slot machine lever?

It's a waste of time and money. I'll be richer and more productive if I just write the code myself, and the result will be better too.

mordymoop · 2025-09-09T21:05:47 1757451947

I think you’re onto something but it works the opposite way too. When you first start using a new model you are more forgiving because almost by definition you were using a worse model before. You give if the sorts of problems the old model couldn’t do, and the new model can do them; you see only success, and the places where it fails, well, you can’t have it all.

Then after using the new model for a few months you get used to it, you feel like you know what it should be able to do, and when it can’t do that, you’re annoyed. You feel like it got worse. But what happened is your expectations crept up. You’re now constantly riding it at 95% of its capabilities and hitting more edge cases where it messes up. You think you’re doing everything consistently, but you’re not, you’ve dramatically dialed up your expectations and demands relative to what you were doing months ago. I don’t mean “you,” I mean the royal “you”, this is what we all do. If you think your expectations haven’t risen, go back and look at your commits from six months ago and tell me I’m wrong.

adonese · 2025-09-09T18:47:37 1757443657

Claude has been constantly terrible for the last couple of weeks. You must have seen this, but just in case: https://x.com/claudeai/status/1965208247302029728

lacy_tinpot · 2025-09-09T18:55:19 1757444119

Except this is a verifiable thing that actually is acknowledged and even tracked by people.

throwaway314155 · 2025-09-09T23:04:43 1757459083

Go on then. Verify and track it. Or at least cite a source that does.

lacy_tinpot · 2025-09-18T08:26:01 1758183961

https://www.anthropic.com/engineering/a-postmortem-of-three-...

fragmede · 2025-09-10T08:50:41 1757494241

https://x.com/claudeai/status/1965208247302029728

holoduke · 2025-09-09T21:46:08 1757454368

You are saying that you are writing mock data, boiler plate code all yourself? I seriously don't believe that. Llms are already much much faster in these tasks. There is no going back there.

reactordev · 2025-09-09T18:33:39 1757442819

This is equivalent to people reminiscing about WoW or EverQuest saying gaming peaked back then…

I think you’re right. I think it’s complete bias with a little bit of “it does more tasks now” so it might behave a bit differently to the same prompt.

I also think you’re right that there’s an incentive to dumb it down so you pull the lever more. Just 2 more $1 spins and maybe you’ll hit jackpot.

Really it’s the enshitification of the SOTA for profits and glory.

pc86 · 2025-09-09T15:25:50 1757431550

I hesitate to use phrases like "bait and switch" but it seems like every model gets released and is borderline awe-inspiring, then as adoption increases, and load increases, it's like it gets hit in the head with a hammer and is basically useless for anything beyond a multi-step google search.

dingnuts · 2025-09-09T18:31:35 1757442695

I think it's a psychological bias of some sort. When the feeling of newness wears off and you realize the model is still kind of shit, you have an imperfect memory of the first few uses when you were excited and have repressed the failures from that period. As the hype wears off you become more critical and correctly evaluate the model

Uehreka · 2025-09-09T18:59:29 1757444369

I get that it’s fun and stylish to tell people they aren’t aware of their own cognitive biases, but it’s also a difficult take to falsify, which is why I generally have a high bar for people to clear when they want to assert that something is all in people’s heads.

People seem to turn to this with a lot when the suspicion many people have is difficult to verify. And while I don’t trust a suspicion just because it’s held by a lot of people, I also won’t allow myself to embrace the comforting certainty of “it’s surely false and it’s psychological bias”.

Sometimes we just need to not be sure what’s going on.

ewoodrich · 2025-09-09T20:28:55 1757449735

Doesn't this go both ways? A random selection of commenters online out of hundreds of thousands of devs using LLMs reporting degraded capability based on personal perception isn't exactly statistically meaningful data.

I've seen the cycle of claims going from "10x multiplier, like a team of junior devs" to "nerfed" for so many model/tool releases at this point it's hard for me not to believe there's an element of perceptual bias going on, but how much that contributes vs real variability on the backend is impossible to know for sure.

lacy_tinpot · 2025-09-09T18:56:14 1757444174

It's not because it's actually tracked and even acknowledged by the companies themselves.

otabdeveloper4 · 2025-09-09T17:14:38 1757438078

No, that's just the normal slope of the hype curve as you start figuring out how the man behind the curtain operates.

citizenAlex · 2025-09-09T23:01:16 1757458876

I think the models deteriorate over time with more inputs. I think the noise increases like photocopies of photocopies

mh- · 2025-09-10T01:21:20 1757467280

If you mean within an individual context window, yes, that's a known phenomenon.

If you mean over the lifetime of a model being deployed, no, that's not how these models are trained.

rootnod3 · 2025-09-09T15:37:56 1757432276

AI is not useful in the long term is is unsustainable. News at 11.

j45 · 2025-09-09T17:18:47 1757438327

It’s important to jump on new models super early while the rails get out in.

Anyone remember GPT4 the day it launched? :)

trunnell · 2025-09-09T17:30:42 1757439042

https://status.anthropic.com/incidents/72f99lh1cj2c

They recently resolved two bugs affecting model quality, one of which was in production Aug 5-Sep 4. They also wrote:

  Importantly, we never intentionally degrade model quality as a result of demand or other factors, and the issues mentioned above stem from unrelated bugs.

Sibling comments are claiming the opposite, attributing malice where the company itself says it was a screw up. Perhaps we should take Anthropic at its word, and also recognize that model performance will follow a probability distribution even for similar tasks, even without bugs making thing worse.

kiratp · 2025-09-09T17:54:46 1757440486

> Importantly, we never intentionally degrade model quality as a result of demand or other factors, and the issues mentioned above stem from unrelated bugs.

Things they could do that would not technically contradict that:

- Quantize KV cache

- Data aware model quantization where their own evals will show "equivalent perf" but the overall model quality suffers.

Simple fact is that it takes longer to deploy physical compute but somehow they are able to serve more and more inference from a slowly growing pool of hardware. Something has to give...

cj · 2025-09-09T18:11:22 1757441482

> Something has to give...

Is training compute interchangeable with inference compute or does training vs. inference have significantly different hardware requirements?

If training and inference hardware is pooled together, I could imagine a model where training simply fills in any unused compute at any given time (?)

kiratp · 2025-09-09T18:38:25 1757443105

Hardware can be the same but scheduling is a whole different beast.

Also, if you pull too manny resources from training your next model to make inference revenue today, you’ll fall behind in the larger race.

mh- · 2025-09-09T17:34:24 1757439264

The problem is twofold:

- They're reporting that only impacted Haiku 3.5 and Sonnet 4. I used neither model during the time period I'm concerned with.

- It took them a month to publicly acknowledge that issue, so now we lack confidence there isn't another underlying issue going undetected (or undisclosed, less charitably) that affects Opus.

trunnell · 2025-09-09T17:38:39 1757439519

now we lack confidence there isn't another underlying issue

You can be confident there is a non-zero rate of errors and defects in any complex service that's moving as fast as the frontier model providers!

mh- · 2025-09-09T17:42:52 1757439772

Of course. Totally agree, and that's why (I think) I'm being as charitable as possible in this thread.

criemen · 2025-09-09T20:26:38 1757449598

They posted

> We are continuing to monitor for any ongoing quality issues, including reports of degradation for Claude Opus 4.1.

I take that as acknowledgment that there might be an issue with Opus 4.1 (granted, undetected still), but not undisclosed, and they're actively looking for it? I'd not jump to "they must be hiding things" yet. They're building, deploying and scaling their service at incredible pace, they, as we all, are bound to get some things wrong.

mh- · 2025-09-10T01:23:44 1757467424

To be clear, I'm not one of the people suggesting they're doing something nefarious. As I said elsewhere, I don't know what my expectations are of them at this point. I'd like early disclosure of known performance drops, I guess. But from a business POV, I understand why they're not going to be updating a status page to say "things are worsening but we're not exactly sure why".

I'm also a realist, though, and have built a career on building/operating large systems. There's obviously capability to dynamically shed load built into the system somewhere, there's just no other responsible way to engineer it. I'd prefer they slowed response times rather than harmed response quality, personally.

claude_ya_ · 2025-09-09T17:45:32 1757439932

Does anyone know if this also affected Claude Sonnet models running in AWS Bedrock, or if it was just when using the model via Anthropic’s API?

pqdbr · 2025-09-09T15:01:23 1757430083

Same here. Even with Opus in Claude Code I'm getting terrible results, sometimes feeling we went back to the GPT 3.5 eon. And it seems they are implementing heavily token-saving measures: the model does not read context anymore unless you force it to, making up method calls as it goes.

mh- · 2025-09-09T15:08:14 1757430494

The simplest thing I frequently ask of regular Claude (not Code) in the desktop app:

"Use your web search tool to find me the go-to component for doing xyz in $language $framework. Always link the GitHub repo in your response."

Previously Sonnet 4 would return a good answer to this at least 80% of the time.

Now even Opus 4.1 with extended thinking frequently ignores my ask for it to use the search tool, which allows it to hallucinate a component in a library. Or maybe an entire repo.

It's gone backwards severely.

(If someone from Anthropic sees this, feel free to reach out for chat IDs/share links. I have dozens.)

spicybright · 2025-09-09T15:54:11 1757433251

Glad I'm not crazy. I actually noticed both 4 models are just garbage. I started running my prompts through those, and Sonnet 3.7 comparing the results. Sonnet 3.7 is way better at everything.

idonotknowwhy · 2025-09-10T15:05:38 1757516738

You're not crazy, and this isn't new for Anthropic. Something is off with Opus4.1, I actually saw it make 2 "typos" last week (I've never seen a model like this make a dumb "typo" before). And it's missing details that it understood last month (can easily test this if you have some chats in OpenWebUI or LibreChat, just go in and hit regenerate).

Sonnet 3.5 did this last year a few times, it'd have days where it wasn't working properly, and sure enough, I'd jump online and see "Claude's been lobotomized again".

They also experiment with injecting hidden system prompts from time to time. Eg. if you ask for a story about some IP, it'll interrupt your prompt and remind the model not to infringe copyright. (We could see this via API with prompt engineering, adding a "!repeat" "debug prompt" that revealed it, though they seem to have patched that now.

> I started running my prompts through those, and Sonnet 3.7 comparing the results. Sonnet 3.7 is way better at everything.

Same here. And on API, the old Opus 3 is also unaffected (though that model is too old for coding).

dingnuts · 2025-09-09T18:41:48 1757443308

How is this better/faster than typing "xyz language framework site://github.com" into Kagi

IDK about you but I find it faster to type a few keywords and click the first result than to wait for "extended thinking" to warm up a cup of hot water only to ignore "your ask" (it's a "request," not an "ask," unless you're talking to a Product Manager with corporate brain damage) to search and then outputs bullshit.

I can only assume after you waste $0.10 asking Claude and reading the bullshit, you use normal search.

Truly revolutionary rechnology

j45 · 2025-09-09T17:20:33 1757438433

I’m running into this as well.

Might be Claude optimizing for general use cases compared to code and that affecting the code side?

Feels strange, because Claude api isn’t the same as the web tool so I didn’t expect Claude code to be the same.

It might be a case of having to learn to read Claude best practice docs and keep up with them. Normally I’d have Claude read them itself and update an approach to use. Not sure that works as well anymore.

OtomotO · 2025-09-09T16:06:39 1757433999

This, so much this...

I signed up for Claude over a week ago and I totally regret it!

Previously I was using it and some ChatGPT here and there (also had a subscription in the past) and I felt like Claude added some more value.

But it's getting so unstable. It generates code, I see it doing that, and then it throws the code away and gives me the previous version of something 1:1 as a new version.

And then I have to waste CO2 to tell it to please don't do that and then sometimes it generates what I want, sometimes it just generates it again, just to throw it away immediately...

This is soooooooo annoying and the reason I canceled my subscription!

brandon272 · 2025-09-09T16:31:31 1757435491

> But it's getting so unstable. It generates code, I see it doing that, and then it throws the code away and gives me the previous version of something 1:1 as a new version.

I've had the same experience. Totally unreliable.

actsasbuffoon · 2025-09-09T16:54:38 1757436878

I regularly have this happen:

1. Ask Claude to fix something

2. It fails to fix the issue

3. I tell it that the fix didn’t work

4. It reverts its failed fix and tells me everything is working now.

This is like finding a decapitated body, trying to bring it back to life by smooshing the severed head against the neck, realizing that didn’t bring them back to life, dropping the head back on the ground, and saying, “There; I’ve saved them now.”

johnisgood · 2025-09-09T18:17:55 1757441875

Gosh, can't we get back to Sonnet 3.5 or whichever was the version around a year ago? It worked so well for me.

jononor · 2025-09-09T19:40:54 1757446854

This happens to me a lot. Almost once per session now, and not even when things are complicated. The model also thinks it has done the changes. So it seems a UI/state bug, not on the model side.

mh- · 2025-09-10T01:27:59 1757467679

I believe this is an issue with tool calling, similar to my complaint above about it refusing to use its search tool (or claiming that it did when I can see that it did not.)

yumraj · 2025-09-09T20:50:37 1757451037

I had even posted a Ask HN: if people had experienced issues with Claude Code since for me it's slowed down substantially, it'll frequently just pause and take much longer. I have a Claude Max 5X plan.

I've been running ccusage to monitor and my usage in $ terms has dropped to a 1/3 of what it was few weeks ago. While some of it could be due to how I'm using it, but a drop of 60%-70% cannot be attributed to that alone and I think is partly due to the performance.

To add: frequently, as in almost every time: 1) it'll start doing something and will go silent for a long time. 2) pressing esc to interrupt will take a long time to take action since it's probably stuck doing something. Earlier, interrupting via esc used to be almost instantaneous.

So, I still like it, but at my 1/3 drop in measured usage I'm almost tempted to go back to Pro and see if that'll meet my needs.

alvis · 2025-09-09T16:04:05 1757433845

And lest we forget opus was accidentally dumber last week! https://status.anthropic.com/incidents/72f99lh1cj2c

allisdust · 2025-09-09T17:22:32 1757438552

Yup. Opus 4.1 has been feeling like absolute dog shit and it made me give up in frustration several times. They really did downgrade their models. Max plan is a joke now. I'm barely using Pro level tokens since its a net negative on my productivity. Enshittification is now truly in place.

gjvc · 2025-09-09T15:41:18 1757432478

"can't be overstated", you mean

mh- · 2025-09-09T16:08:59 1757434139

You're absolutely right! I should have used the correct word when writing the Hacker News comment.

(lol, yes, thank you.)

glenstein · 2025-09-09T20:33:28 1757450008

This one is interesting because I have seen a fair amount of "can't be understated" on reddit also. Interesting case of linguistic drift.

mh- · 2025-09-10T01:29:01 1757467741

In my case it was just me straight up using the wrong word by accident. Parent commenter caught it inside the edit window but I left it alone so their comment wasn't out of context. :)

gjvc · 2025-09-10T05:00:42 1757480442

linguistic drift my ass

teknologist · 2025-09-09T19:19:19 1757445559

Here's a useful tracker for how "stupid" the models are now and over some preset time periods: https://aistupidlevel.info

bongodongobob · 2025-09-09T18:25:49 1757442349

Thanks for the confirmation. Lately it's been telling me it has made edits or written code yet it's nowhere to be seen. It's been messing up extremely simple tasks like "move this knob from the bottom of the screen to the right". Over and over it will insist it made the changes but it hasn't. Getting confused about completely different sections of code and files.

I picked up Claude at the beginning of the summer and have had the same experience.

fuomag9 · 2025-09-09T16:45:03 1757436303

I felt like the model degraded lately as well, I've been using Claude everyday for months now

j45 · 2025-09-09T17:22:03 1757438523

I’m considering trying the api directly for a bit with Claude code to compare but need a test quite first to compare all 3.

probably_wrong · 2025-09-09T15:28:03 1757431683

Have you considered perhaps that you are, indeed, out of your mind? Or more precisely, that you could be rationalizing what is essentially a random process?

Based on the discussions here it seems that every model is either about to be great or was great in the past but now is not. Sucks for those of us who are stuck in the now, though.

tofuahdude · 2025-09-09T15:55:24 1757433324

Anthropic literally stated yesterday that they suffered degraded model performance over the last month due to bugs:

https://status.anthropic.com/incidents/72f99lh1cj2c

Suggesting people are "out of their mind" is not really appropriate on this forum, especially so in this circumstance.

probably_wrong · 2025-09-09T16:55:54 1757436954

The first comment claims that Anthropic "are having to quantise the models to keep up with demand", to which the parent comment agrees with "This can't be understated". So based on this discussion so far Anthropic has [1] great models, [2] models that used to be great but now aren't due to quantization, [3] models that used to be great but now aren't due to a bug, and [4] models that constantly feel like a "bait and switch".

This most definitely feels like people analyzing the output of a random process - at this point I am feeling like I'm losing my mind.

(As for the phrasing I was quoting the OP, who I believe took it in the spirit in which it was meant)

[1] https://news.ycombinator.com/item?id=45183587

[2] https://news.ycombinator.com/item?id=45182714

[3] https://news.ycombinator.com/item?id=45183820

[4] https://news.ycombinator.com/item?id=45183281

qaq · 2025-09-09T17:41:03 1757439663

I am not sure why you are loosing your mind Anthropic dynamically adjusts knobs based on capacity and load Those knobs can be as simple as reducing usage limits to more advanced like switching to more optimized paths that have anything from more aggressive caching to using more optimized models etc. Bugs are a factor in quality of any service.

mh- · 2025-09-09T17:36:33 1757439393

The part I was saying I agree with is:

> New features like this feel pointless when the underlying model is becoming unusable.

I recognize I could have been clearer.

And for what it's worth, yes, your comment's phrasing didn't bother me at all.

wasabi991011 · 2025-09-09T18:43:32 1757443412

> Suggesting people are "out of their mind" is not really appropriate on this forum, especially so in this circumstance.

They were wrong, but not inappropriate. They re-used the "out of their mind" phrase from the parent comment to cheekily refer to the possibility of a cognitive bias.

mh- · 2025-09-10T01:30:13 1757467813

Yeah, I (parent commenter) had a laugh reading and writing the reply. Didn't offend me.

mh- · 2025-09-09T15:31:02 1757431862

> Have you considered perhaps that you are, indeed, out of your mind?

Yes, but I'll revisit.

hkt · 2025-09-09T15:41:37 1757432497

It seems plausible enough that they're trying to squeeze as much out of their hardware as possible and getting the balance wrong. As prices for hardware capable of running local LLMs drop and local models improve, this will become less prevalent and the option of running your own will become more widespread, probably killing this kind of service outside of enterprise. Even if it doesn't kill that service, it'll be _considerably_ better to be operating your own as you have control over what is actually running.

On that note, I strongly recommend qwen3:4b. It is _bonkers_ how good it is, especially considering how relatively tiny it is.

j45 · 2025-09-09T17:15:00 1757438100

Thanks. Mind sharing which kinds of Claude tasks you are able to run on qwen3:4b?

j45 · 2025-09-09T17:14:02 1757438042

Just because one can’t concieve something being possible doesn’t mean it’s not possible.

groby_b · 2025-09-09T15:42:28 1757432548

"that every model is either about to be great or was great in the past but now is not"

FWIW, Codex-CLI w/ ChatGPT5 medium is great right now. Objectively accelerating me. Not a coding god like some posters would have it, but overall freeing up time for me. Observably.

Assuming I haven't had since-cured delusions, the same was true for Claude Code, but isn't any more.

Concrete supporting evidence: From time to time, I have coding CLIs port older projects of varying (but small-ish) sizes from JS to TS. Claude Code used to do well on that. Repeatedly. I did another test last Sunday, and it dug a momentous hole for itself that even liberal sprinkling of 'as unknown' everywhere couldn't solve. Codex managed both the ab-initio port and was able to undig from CC's massive hole abandoned mid-port.

So I'd say the evidence points somewhat against random process, given repeated testing shows clear signal both of past capability and of recent loss of capability.

The idea that it's a "random" process is misguided.

jus3sixty · 2025-09-09T18:13:41 1757441621

I was going to tell you a joke about a broken pencil, but there's no point.

eatsyourtacos · 2025-09-09T15:50:26 1757433026

>Or more precisely, that you could be rationalizing what is essentially a random process?

You mean like our human brains and our entire bodies? We are the result of random processes.

>Sucks for those of us who are stuck in the now, though

I don't know what you are doing- but GPT5 is incredible. I literally spent 3 hours last night going back and forth on a project where I loaded some files for a somewhat complicated and tedious conversion between two data formats. And I was able to keep going back and forth and making the improvements incrementally and have AI do 90% of the actual tedious work.

To me it's incredible people don't seem to understand the CURRENT value. It has literally replaced a junior developer for me. I am 100% better off working with AI for all these tedious tasks than passing them off to someone off. We can argue all day if that's good for the world (it's not) but in terms of the current state of AI- it's already incredible.

mattbettinson · 2025-09-09T16:36:03 1757435763

But would you have hired a junior dev for that work if AI hadn't 'replaced' it?

j45 · 2025-09-09T17:18:03 1757438283

Not a valid response in all cases.

It might not be a junior dev tool. Senior devs are using AI quite differently to magnify themselves not help them manage juniors with developing ceilings.

otabdeveloper4 · 2025-09-09T17:10:42 1757437842

Congrats, you grew up. It's not Claude's fault.