Voiceless groups do not appear in the training data? How could they, they are voiceless. You think the voiceless people are represented in todays training data? They cannot they are voiceless.
Nothing tragic about using data from a time period.
Common words used in 1900s are labeled racist now. I doubt anyone was wondering if they filtered those words for modern safe wordx.
>The voiceless groups or fringe opinions which we take as normative today do not appear.
Times are different. Anybody with an internet connection can "publish" their thoughts and perspective online. LLMs scrape all of this. Modern datasets like CommonCrawl capture a vastly wider spectrum of humanity than a printing press ever could.
The pre-1930 model acts as a time capsule of "gatekept publishing", but modern LLMs are trained on the democratized web.
>Does this encourage us to write in the present such that we influence the models in perpetuity?
I noticed a bunch of LLM-powered Reddit accounts praising products/services in dead threads. Or one bot posting a setup question, then a few other bots responding with praise / questions about a specific product in response.
I don't know why they're doing this but I'm beginning to suspect it's something like this (get this positive sentiment into the datasets for the next generation of LLMs).
I'd be more worried if words from that era were fully aligned with present day notions of morality. Wouldn't that indicate a certain stagnation & lack of progress?
Let us hope, 100 years from now, there will be people who look back unkindly on us.
> OpenAI has contracted to purchase an incremental $250B of Azure services, and Microsoft will no longer have a right of first refusal to be OpenAI’s compute provider.
Azure is effectively OpenAI's personal compute cluster at this scale.
That article doesn't give a timeframe, but most of these use 10 years as a placeholder. I would also imagine it's not a requirement for them to spend it evenly over the 10 years, so could be back-loaded.
OpenAI is a large customer, but this is not making Azure their personal cluster.
I wonder how this figure was settled. Is it based on consumer pricing? Can't Microsoft and OpenAI just make a number up, aside from a minimum to cover operating costs? When is the number just a marketing ploy to make it seem huge, important and inevitable (and too big to fail)?
You're right. We should absolutely only rely on "Ask sales for price" closed-source software from megacorps, that get worse on every release, and get sunset anyway when the funding runs out.
Of all the things to judge this on, you chose the most ridiculous one. Why shouldn’t a project like this exist just because there are “bigger” alternatives out there?
If youre gonna shut this one down, at the very least do it for the right reasons such as the fact that this is a webwrapper—absolutely disgusting, either go native or don’t bother shoving your webpage into a browser-container and calling it what it is not (an app).
This feels like an unethical release of a model. They've opened a can of worms without investing in defense first.
Anthropic announced their capabilities in advanced, issued a private release, then put up $100M in credits to Fortune 500 companies and OSS projects to secure themselves.
OpenAI sees that, makes a model equally capable at exploiting vulnerabilities, then released it to the pubic with no equivalent program [1]
1. They changed the default in March from high to medium, however Claude Code still showed high (took 1 month 3 days to notice and remediate)
2. Old sessions had the thinking tokens stripped, resuming the session made Claude stupid (took 15 days to notice and remediate)
3. System prompt to make Claude less verbose reducing coding quality (4 days - better)
All this to say... the experience of suspecting a model is getting worse while Anthropic publicly gaslights their user-base: "we never degrade model performance" is frustrating.
Yes, models are complex and deploying them at scale given their usage uptick is hard. It's clear they are playing with too many independent variables simultaneously.
However you are obligated to communicate honestly to your users to match expectations. Am I being A/B tested? When was the date of the last system prompt change? I don't need to know what changed, just that it did, etc.
Doing this proactively would certainly match expectations for a fast-moving product like this.
> 2. Old sessions had the thinking tokens stripped, resuming the session made Claude stupid (took 15 days to notice and remediate)
This one was egregious: after a one hour user pause, apparently they cleared the cache and then continued to apply “forgetting” for the rest of the session after the resume!
Seems like a very basic software engineering error that would be caught by normal unit testing.
It's unfortunate that the word performance is overloaded and ML folks have a specific definition..that isn't what the rest of CS uses, but I understand Anthropic to mean response quality when they say this and not any other dimension you could measure performance on.
You can argue they're lying, but I think this is just folks misunderstanding what Anthropic is saying.
They didn't just drop cache. They elided thinking blocks even if you recache. That permanently degraded the model output for the rest of the session, even ignoring the bug, if you waited 60 minutes instead of 59.
Sure, but it gives the impression of degraded model performance. Especially when the interface is still saying the model is operating on "high", the same as it did yesterday, yet it is in "medium" -- it just looks like the model got hobbled.
> Anthropic publicly gaslights their user-base: "we never degrade model performance" is frustrating.
They're not gaslighting anyone here: they're very clear that the model itself, as in Opus 4.7, was not degraded in any way (i.e. if you take them at their word, they do not drop to lower quantisations of Claude during peak load).
However, the infrastructure around it - Claude Code, etc - is very much subject to change, and I agree that they should manage these changes better and ensure that they are well-communicated.
Model performance at inference in a data center v.s. stripping thinking tokens are effectively the same.
Sure they didn't change the GPUs their running, or the quantization, but if valuable information is removed leading to models performing worse, performance was degraded.
In the same way uptime doesn't care about the incident cause... if you're down you're down no one cares that it was 'technically DNS'.
I thought these days thinking tokens sent my the model (as opposed to used internally) were just for the users benefit. When you send the convo back you have to strip the thinking stuff for next turn. Or is that just local models?
Claude code is not infra, the model is the infra. They changed settings to make their models faster and probably cheaper to run too. Honestly with adaptive thinking it no longer matters what model it is if you can dynamically make it do less or more work.
Notion, as any other thin-AI product out there, is now in Anthropic/OpenAI/Google's crosshairs. Unless one has a moat the size of SharePoint or Google Docs or OneDrive, it's just a feature away.
I don't think they have added a Obsidian Bases / Notion Database like feature yet, right? Saw some discussion of adding a NocoDB integration, but also didn't see that happen yet.
I know this is probably out of scope, but I'd love it as well if Notion could slowly accrete the features of Airtable... at least expose some form of programmatic access to tables!
It works, but models seem to have these insane long traces to do the most basic things. I had to create a couple of skills so they know how to properly use the thing without breaking, so they don't always try to pass the wrong parameters to it.
It also doesn't let us change a couple of things (like icons). Or, if it does, not even Opus 4.6 can figure out how to do it.
it's funny how adding AI to notion actually made it a lot more usable. Most products force it on you, but here I feel like it's actually a massive benefit.
It was hard finding content and using the filters felt clunky. (And the whole UI either in a browser or their app feels buggy + slow). But with their notion AI / MCP it's gotten super easy to get information in and out.
Software engineering is certainly not engineering. Even at the highest levels. Real engineering have infinitely more complex interactions in the physical world than symbolic institutions for machines.
Thats right, no need to understand anything other than symbols on a machine. No people involved. No reality to model. No economics to think about. Nothing like real engineering. Thats for the big boys and girls
Unsurprisingly the texts written up until that time were dominated by such individuals which is tragic for LLM training if you think about it.
The voiceless groups or fringe opinions which we take as normative today do not appear.
Does this encourage us to write in the present such that we influence the models in perpetuity?
reply