Hacker Newsnew | past | comments | ask | show | jobs | submit | jryio's commentslogin

If anyone was wondering ... it's racist

Unsurprisingly the texts written up until that time were dominated by such individuals which is tragic for LLM training if you think about it.

The voiceless groups or fringe opinions which we take as normative today do not appear.

Does this encourage us to write in the present such that we influence the models in perpetuity?


Voiceless groups do not appear in the training data? How could they, they are voiceless. You think the voiceless people are represented in todays training data? They cannot they are voiceless.

Nothing tragic about using data from a time period.

Common words used in 1900s are labeled racist now. I doubt anyone was wondering if they filtered those words for modern safe wordx.


>The voiceless groups or fringe opinions which we take as normative today do not appear.

Times are different. Anybody with an internet connection can "publish" their thoughts and perspective online. LLMs scrape all of this. Modern datasets like CommonCrawl capture a vastly wider spectrum of humanity than a printing press ever could. The pre-1930 model acts as a time capsule of "gatekept publishing", but modern LLMs are trained on the democratized web.

>Does this encourage us to write in the present such that we influence the models in perpetuity?

I noticed a bunch of LLM-powered Reddit accounts praising products/services in dead threads. Or one bot posting a setup question, then a few other bots responding with praise / questions about a specific product in response. I don't know why they're doing this but I'm beginning to suspect it's something like this (get this positive sentiment into the datasets for the next generation of LLMs).


I'd be more worried if words from that era were fully aligned with present day notions of morality. Wouldn't that indicate a certain stagnation & lack of progress?

Let us hope, 100 years from now, there will be people who look back unkindly on us.


As Proudhon said, "I dream of a society where I would be guillotined as a reactionary."

10 years ago people might had cared about your whining, not anymore (thank god)

one day we'll have SOTA models trained like this one and there's nothing you can do about it :^)

> OpenAI has contracted to purchase an incremental $250B of Azure services, and Microsoft will no longer have a right of first refusal to be OpenAI’s compute provider.

Azure is effectively OpenAI's personal compute cluster at this scale.


What fraction of Azure compute does OpenAI represent? (Does the $250bn commitment have a time period? Is it legally binding?)

Azure did $75B last quarter.

That article doesn't give a timeframe, but most of these use 10 years as a placeholder. I would also imagine it's not a requirement for them to spend it evenly over the 10 years, so could be back-loaded.

OpenAI is a large customer, but this is not making Azure their personal cluster.


I wonder how this figure was settled. Is it based on consumer pricing? Can't Microsoft and OpenAI just make a number up, aside from a minimum to cover operating costs? When is the number just a marketing ploy to make it seem huge, important and inevitable (and too big to fail)?

I find it strange that you've anthropomorphized Claude but not ChatGPT seemingly based on one having a human name and the other not

Exactly - cooperation is not incentivized properly

Just another disposable piece of software maintained by a single person that does 80% of what other apps do but worse.

Max lifespan 2 years


Please cut this out. You really don't want to live in a world where individuals are discouraged from trying to build things that are good.

If you want something to stick around: you have to use and pay for it.


You're right. We should absolutely only rely on "Ask sales for price" closed-source software from megacorps, that get worse on every release, and get sunset anyway when the funding runs out.

I hAvE a FeW qUaLmS wItH tHiS aPp

https://news.ycombinator.com/item?id=9224


But if they ever choose to decommission it, they have the chance to do the funniest thing:

https://scryfall.com/card/plst/INV-156/obliterate


unacceptable comment. hacker news is misunderstood as a toxic community because of fellas like you. have some dignity.

Of all the things to judge this on, you chose the most ridiculous one. Why shouldn’t a project like this exist just because there are “bigger” alternatives out there?

If youre gonna shut this one down, at the very least do it for the right reasons such as the fact that this is a webwrapper—absolutely disgusting, either go native or don’t bother shoving your webpage into a browser-container and calling it what it is not (an app).


Some people...

You do realize that would have once described GCC and Linux, right?

Of Linux, yes. Of GCC, no. From the very beginning there was multiple authors and the project was a mishmash of several other projects.

This feels like an unethical release of a model. They've opened a can of worms without investing in defense first.

Anthropic announced their capabilities in advanced, issued a private release, then put up $100M in credits to Fortune 500 companies and OSS projects to secure themselves.

OpenAI sees that, makes a model equally capable at exploiting vulnerabilities, then released it to the pubic with no equivalent program [1]

[1]: https://www.anthropic.com/glasswing


Their 'Preparedness Framework'[1] is 20 pages and looks ChatGPT generated, I don't feel prepared reading it.

https://cdn.openai.com/pdf/18a02b5d-6b67-4cec-ab64-68cdfbdde...


1. They changed the default in March from high to medium, however Claude Code still showed high (took 1 month 3 days to notice and remediate)

2. Old sessions had the thinking tokens stripped, resuming the session made Claude stupid (took 15 days to notice and remediate)

3. System prompt to make Claude less verbose reducing coding quality (4 days - better)

All this to say... the experience of suspecting a model is getting worse while Anthropic publicly gaslights their user-base: "we never degrade model performance" is frustrating.

Yes, models are complex and deploying them at scale given their usage uptick is hard. It's clear they are playing with too many independent variables simultaneously.

However you are obligated to communicate honestly to your users to match expectations. Am I being A/B tested? When was the date of the last system prompt change? I don't need to know what changed, just that it did, etc.

Doing this proactively would certainly match expectations for a fast-moving product like this.


> 2. Old sessions had the thinking tokens stripped, resuming the session made Claude stupid (took 15 days to notice and remediate)

This one was egregious: after a one hour user pause, apparently they cleared the cache and then continued to apply “forgetting” for the rest of the session after the resume!

Seems like a very basic software engineering error that would be caught by normal unit testing.


To be fair to Anthropic, they did not intentionally degrade performance.

To take the opposite side, this is the quality of software you get atm when your org is all in on vibe coding everything.


Are you saying dropping cache after 1 hour is not intentionally degrading performance?

Yes. Caching is a cost optimization not a response quality metric.

But it still degrades performance.

It's unfortunate that the word performance is overloaded and ML folks have a specific definition..that isn't what the rest of CS uses, but I understand Anthropic to mean response quality when they say this and not any other dimension you could measure performance on.

You can argue they're lying, but I think this is just folks misunderstanding what Anthropic is saying.


They didn't just drop cache. They elided thinking blocks even if you recache. That permanently degraded the model output for the rest of the session, even ignoring the bug, if you waited 60 minutes instead of 59.

None of these problems equate to degrading model performance. Completely different team. Degraded CC harness, sure.

Sure, but it gives the impression of degraded model performance. Especially when the interface is still saying the model is operating on "high", the same as it did yesterday, yet it is in "medium" -- it just looks like the model got hobbled.

Oh, absolutely. Though changes in how the model is used is imminently more fixable than the model itself.

Yes, but for many users, CC is the product. Especially since I'm not allowed(?) to use my own harness with my sub.

> Anthropic publicly gaslights their user-base: "we never degrade model performance" is frustrating.

They're not gaslighting anyone here: they're very clear that the model itself, as in Opus 4.7, was not degraded in any way (i.e. if you take them at their word, they do not drop to lower quantisations of Claude during peak load).

However, the infrastructure around it - Claude Code, etc - is very much subject to change, and I agree that they should manage these changes better and ensure that they are well-communicated.


Model performance at inference in a data center v.s. stripping thinking tokens are effectively the same.

Sure they didn't change the GPUs their running, or the quantization, but if valuable information is removed leading to models performing worse, performance was degraded.

In the same way uptime doesn't care about the incident cause... if you're down you're down no one cares that it was 'technically DNS'.


I thought these days thinking tokens sent my the model (as opposed to used internally) were just for the users benefit. When you send the convo back you have to strip the thinking stuff for next turn. Or is that just local models?

Claude code is not infra, the model is the infra. They changed settings to make their models faster and probably cheaper to run too. Honestly with adaptive thinking it no longer matters what model it is if you can dynamically make it do less or more work.

Notion did it first and arguably better[1]. Shared agents benefit from shared context.

The hardest part is ensuring that shared context is maintained and it converges on a representation of reality and the people in the company.

[1] https://www.notion.com/help/custom-agents


Notion, as any other thin-AI product out there, is now in Anthropic/OpenAI/Google's crosshairs. Unless one has a moat the size of SharePoint or Google Docs or OneDrive, it's just a feature away.

I really like Notion's UI. I wish they would focus only on that and let me access my Notion DB as .md files with Claude.

Take a look at Outline! I use it almost exactly like a cloud based Obsidian vault. And they have been very responsive for MCP feature requests

I don't think they have added a Obsidian Bases / Notion Database like feature yet, right? Saw some discussion of adding a NocoDB integration, but also didn't see that happen yet.

I know this is probably out of scope, but I'd love it as well if Notion could slowly accrete the features of Airtable... at least expose some form of programmatic access to tables!

Yes, please. Their MCP suuuuuuuucks

How does it suck? I use it almost daily and love their Notion MCP

I was probably a bit harsh.

It works, but models seem to have these insane long traces to do the most basic things. I had to create a couple of skills so they know how to properly use the thing without breaking, so they don't always try to pass the wrong parameters to it.

It also doesn't let us change a couple of things (like icons). Or, if it does, not even Opus 4.6 can figure out how to do it.


Can't limit access easily. You can do per-workspace permissions and that's about it.

At promptql, our solution to this was a wiki. You get knowledge-graph/relations for free through page links.

New knowledge additions are proposed when agents decide it would be relevant to retain, humans confirm/deny or create wiki modifications themselves.


it's funny how adding AI to notion actually made it a lot more usable. Most products force it on you, but here I feel like it's actually a massive benefit. It was hard finding content and using the filters felt clunky. (And the whole UI either in a browser or their app feels buggy + slow). But with their notion AI / MCP it's gotten super easy to get information in and out.

In demo videos, it shows Memory under Files, so i assume it holds learnings and shared context.

Yeah, the memory is cool, just a file store that you can instruct the agent to use however you see fit.

Software engineering is certainly not engineering. Even at the highest levels. Real engineering have infinitely more complex interactions in the physical world than symbolic institutions for machines.

Thats right, no need to understand anything other than symbols on a machine. No people involved. No reality to model. No economics to think about. Nothing like real engineering. Thats for the big boys and girls

Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: