More

johnfn · 2026-03-27T20:43:16 1774644196

The section on "artificially low costs" does not make a lot of sense to me. If anything I feel like the costs are inflated for the frontier models, not "artificially low". Easy proof: GLM-5 costs about 1/10 as much as Opus. I'm not going to tell you it's as good as Opus 4.6 -- it's not -- but it performs comparably to where frontier models were 6 months ago. (It's on par with Sonnet 4.5 on leaderboards, though in practice it's probably closer to Sonnet 4.0.)

If I can switch to an open source model today, run it myself, and spend 1/10 as much as Opus, and get to about where frontier models were 6 months ago, fear-mongering about how we'll have to weather "orders-of-magnitude price hikes" and arguing that that one shouldn't even bother to learn how to use AI at all seems disconnected from reality. Who cares about the "shady accounting" OpenAI is doing, or that AI labs are "wildly unprofitable"? I can run GLM 5 right now, forever, for cheap.

piker · 2026-03-27T21:01:58 1774645318

The post is factoring in training costs, not just inference.

dwohnitmok · 2026-03-27T23:12:46 1774653166

No it's not. Otherwise this part doesn't make sense

> in fact, they actually compound the problem by encouraging significantly more usage

because if eliminating training costs makes running the model above cost, the problem is helped by significantly more usage not compounded.

More usage compounds the problem only if inference is unprofitable.

(the article briefly mentions training but that's later).

piker · 2026-03-27T23:42:11 1774654931

It made sense to me understanding that you can have a unit-profitable API but lose money on loss-leading campaigns like Code subscriptions. Those losses are amplified by encouraging usage. Perhaps I'm mistaken.

dwohnitmok · 2026-03-29T04:49:32 1774759772

Again, that is a statement about inference time costs, not training costs.

johnfn · 2026-03-27T21:03:17 1774645397

But I don't need to pay training costs to use GLM-5?

piker · 2026-03-27T21:08:00 1774645680

Sure, but somebody needs to pay for GLM-6 unless you're happy to stop here.

InsideOutSanta · 2026-03-27T21:32:28 1774647148

If everybody stopped training models today and Anthropic and OpenAI were deleted from the universe, I'd be happy to just keep using GLM-5 at its current inference cost. The article's author assumes that there will be a point where we will no longer have access to good models at reasonable cost because current models are subsidized, but GLM-5 disproves that.

johnfn · 2026-03-28T00:00:46 1774656046

Even in this hypothetical future, I will continue to use frontier models until they become "orders of magnitude more expensive", at which point I'll just fall back to the best open source model, which will still only be about 6 months behind. I don't see where the issue is?

johnfn · 2026-03-25T19:23:43 1774466623

I think Sora is an excellent way to see how people's beliefs clash with reality. Even in this post, I see people likening Sora to unveiling "a weapon", it filling them with "bland dread", or comparing it to creating "killing robots". But now that Sora is being shut down, what impact did Sora actually have on society, other than getting a couple of people to waste their time making some funny meme videos? Did any of those negative externalities actually play out?

If you are autistic, I feel that it causes you to see reality a more accurately than most here on this thread.

gordonhart · 2026-03-25T19:43:54 1774467834

At least according to the Head of Product at X, Sora was by far the most widely used tool to create fake war videos[0] aiming to push various false narratives. Given how popular fake content is at Meta I can only imagine what they see there (if they even have anybody looking at this kind of thing).

[0] https://x.com/nikitabier/status/2029024577624650041

heavyset_go · 2026-03-25T22:11:34 1774476694

On X, viewing actual war footage was locked behind age-gating and identity verification, while any idiots' fake war footage was uncensored and consumable by anyone.

johnfn · 2026-03-26T06:22:05 1774506125

I understand that misinformation is a bad thing, and your point is taken that I was probably too quick to brush off the worst thing that Sora did as 'some funny memes'. But still. Photoshop is used to make a lot of misinformation, probably 1000x to 10,000x as much as Sora did, or even more than that. Does anyone say the latest version of Photoshop is like unveiling a weapon? Does anyone say that AI driven generative fill in Photoshop is like creating killing robots?

toraway · 2026-03-25T22:13:26 1774476806

Sora was one of the earliest demos of a "wow okay that is good enough to be mistaken for real" GenAI model, which is what that comment was referencing with the "weapon" reference (the tech behind it not just Sora™ Videos).

Sure, by the time they productized it, Sora was no longer SOTA thanks to the AI arms race. And ultimately positioned as a TikTok for Slop with an annoying watermark so didn't take the world by storm on its own.

But since it was unveiled GenAI videos as a whole have become commonplace everywhere else on the internet, with plenty of negative impact already in terms of spam or manipulation, and we're barely in year 2 so far.

johnfn · 2026-03-25T01:51:14 1774403474

As someone who generally liked the products that OpenAI puts out, I think Sora was their first product that I really didn't like. I liked GPT primarily because I felt like it respected me: I never felt like it was trying to distract me from my work or get me to waste time doomscrolling. It's primary value proposition to keep me using it wasn't to trick me with addictive content, but to get me high quality answers as fast as possible. And I felt like OpenAI's other products, like Deep Research, agent mode, etc, were the same way. Even Atlas, although I suspect it will be equally ill-fated, attempts to follow this same pattern. It really felt like OpenAI was separating themselves from the common popular apps like Tiktok, Reddit, Instagram, etc, which seemed to exist entirely to distract me from things I care about and waste my time.

Sora was the first product OpenAI shipped where I felt that fell into that second category, and for that I was very disappointed. You have all those GPUs, and the most incredible technology in the world, and the most brilliant engineers, and all you can think to do with them is to make an app that just makes meme videos? I mean, c'mon!

Still, I am mystified by how rapidly Sora went from launch to shutdown. Does anyone have any guess what happened there? Even if Sora wasn't a spectacular success, it seems to me like subsequent model improvements could have moved the needle - shutting it down so soon seems premature. I mean, what if this is the equivalent of making ChatGPT with GPT 3?

greenie_beans · 2026-03-25T14:07:25 1774447645

> I liked GPT primarily because I felt like it respected me: I never felt like it was trying to distract me from my work or get me to waste time doomscrolling

i recently used gpt for the first time in several months (i'm a daily claude user) and didn't find this at all. it is most certainly trying to pull you into engagement with how it ends each response. "if you want, i could tell you about this thing that's relevant to what you are discussing and tease just enough so that you addictively answer yes"

nananana9 · 2026-03-25T06:28:26 1774420106

What happened is that they make no money, because people use it an masse to generate videos that they then post on TikTok and Instagram, nobody actually doomscrolls Sora.

mortsnort · 2026-03-25T02:35:40 1774406140

Hosting videos is really expensive. AI video generation inference is really expensive. I'd love to see how much money this experiment cost.

rblatz · 2026-03-25T03:14:14 1774408454

So much that they walked away from a billion dollar deal with Disney by dropping Sora.

riffraff · 2026-03-25T06:49:08 1774421348

It's not clear to me what that billion dollar meant.

To me it seems it was "Disney gets shares and we get to use their characters in Sora".

Even if Sora breaks even, why would you gift Disney stock? It's not like they actual gave 1B to openai.

lossyalgo · 2026-03-25T14:58:36 1774450716

I don't think anyone outside of Disney/ClosedAI knows what deal was actually made. Maybe they just shut down public use of Sora but Disney will still be able to use it internally? Maybe they never even signed anything, as is too often the case with AI deals, especially big ones, how we read about signed/inked deals but then it turns out it was all just words spoken. Maybe they took the cash, then shut Sora down to save money? Could be any number of things that happened which we might never know.

karel-3d · 2026-03-25T07:24:09 1774423449

Hosting videos is not that expensive, compared to generation and inference costs. It's not cheap but it's not that horrible

imankulov · 2026-03-25T13:03:28 1774443808

> I liked GPT primarily because I felt like it respected me: I never felt like it was trying to distract me from my work or get me to waste time doomscrolling.

Not about Sora, but about ChatGPT. I felt the same way for quite a while until I noticed that its response pattern has changed, apparently aiming for higher engagement. Someone aggressively pursued a metric.

At some point, ChatGPT started leaving annoying cliffhangers in its every response, like "Do you want me to share a little-known secret of X that professionals often use?" Like, come on!

mvdtnz · 2026-03-25T18:42:59 1774464179

> I liked GPT primarily because I felt like it respected me: I never felt like it was trying to distract me from my work or get me to waste time doomscrolling. It's primary value proposition to keep me using it wasn't to trick me with addictive content, but to get me high quality answers as fast as possible.

I'm curious if you still feel this way about current iterations of ChatGPT? It seems like it's now primed to engagement bait the user, especially when used through the web UI. You can ask it a simple question with a straight forward answer and it will still try to get you to follow up with more.

> What is the minimum thickness for Shimano M8100 disc brake rotors?

> For Shimano XT M8100-series rotors (like RT-MT800 / RT-MT900 commonly used with M8100 brakes), the minimum thickness is 1.5 mm. If the rotor measures 1.5 mm or thinner, Shimano says it should be replaced.

> (a bunch of pointless details in bullet points)

> If you want, tell me the exact rotor model (e.g., RT-MT800, RT-MT900, size), and I can confirm the spec for that specific one and what typical wear looks like.

The entire query could have been answered with "1.5mm". The "if you want" follow ups are so annoying.

cess11 · 2026-03-25T08:42:07 1774428127

"I am mystified by how rapidly Sora went from launch to shutdown"

I suspect they promised synthetic movies but it quickly became clear that they were never going to be able to deliver on this.

Slick fifteen second lulz-clips, sure, but I don't think they can make several of them consistent enough to fit into a larger video narrative without the audience finding it jarring and incoherent.

Perhaps legal at Disney also concluded that the output wouldn't be possible to copyright, which is their core business.

DrewADesign · 2026-03-28T17:59:55 1774720795

Every studio that made video content using AI video generation — think those all come commercials— basically just generated and regenerated the same few-second clips until they got an acceptable one. Hundreds and hundreds of times. I would be astonished if it would have been cheaper than actual CGI had the generation not been so heavily subsidized, and the product sucked.

DrewADesign · 2026-03-29T03:06:10 1774753570

*Coke commercials

iAMkenough · 2026-03-25T02:17:38 1774405058

> Still, I am mystified by how rapidly Sora went from launch to shutdown. Does anyone have any guess what happened there?

My guess is they over committed server/energy resources, since they were generating ~30 images per frame of 1 second of video for results that may be discarded and then tried again.

Now that energy costs are increasingly less predictable because of the war, they're prioritizing what is sustainable. Willing to blow up the $1 billion Disney deal for Sora, because that's a popular IP that would have increased discarded server time.

iAMkenough · 2026-03-25T02:42:19 1774406539

I'm also curious if Sora has been used by Iran to generate those Lego propaganda videos critical of the President. Given how close Sam Altman is with the current administration, I wouldn't be surprised if Sora is now reserved for U.S. government propaganda only.

Might be why the latest Iran propaganda video could be created in PowerPoint: https://bsky.app/profile/rachelbitecofer.bsky.social/post/3m...

pjc50 · 2026-03-25T07:10:16 1774422616

Are there known tells that could be used to determine which model the video came from?

(This sort of question, and the Grok sexual abuse, is why I'd like to see mandatory invisible watermarks on generated images/video)

torginus · 2026-03-25T02:57:40 1774407460

I don't think so. There are tons of self hosted models for video (they are smaller and easier to run).

Most people serious about this stuff usually have their own pipelines.

iAMkenough · 2026-03-25T03:01:46 1774407706

I'm not sure, but you could be right. Sora is/was the top-of-the-line platform for video generation, and the Lego IP videos were polished. Makes sense to outsource when your own energy grid is being destroyed. Anyone with an account and VPN could utilize the platform.

I'd like to know what self hosted models they've been using, if any, and who provided them, trained on Lego IP.

iAMkenough · 2026-03-25T03:22:47 1774408967

Since you seem to be better informed, I'm also interested in what self hosted models for video you recommend for creating my own Lego movie clips now that Sora is no longer an option for a paid service. There's tons, right?

pavlov · 2026-03-25T09:38:28 1774431508

Look up Wan and Hunyan for starters.

These are open weight models, so you can fine tune them on Lego content… But presumably they already have enough training data since they were made by Chinese companies who don’t give a shit about Western IP rights.

hbn · 2026-03-25T14:22:55 1774448575

> Still, I am mystified by how rapidly Sora went from launch to shutdown

I think if you had to foot the bill for generating a bajillion gigabytes of slop with no real utility, you wouldn't be too mystified.

They showed off their technology and proved it was impressive. That's all it had to do.

AussieWog93 · 2026-03-25T03:20:21 1774408821

For me, Sora changed the way I viewed Sam Altman as a person.

I really thought he wasn't like the previous generations of tech leaders - as you mentioned OpenAI (with him in charge) seemed to be genuine about making a product that could improve people's lives.

He'd go on podcasts and quite convincingly talk about how ChatGPT could prevent real world harm like suicide, and possibly even contribute to helping disease too.

Then they drop this and it just doesn't gel. So much of what they've done since has just doubled down on the Zuck-esque scumminess and greed too.

Part of me still sees Dario as genuine in the way that Sama seemed back in 2024, but I'm sure once he has enough investor pressure he'll cave the same way too.

kergonath · 2026-03-25T10:16:41 1774433801

> He'd go on podcasts and quite convincingly talk about how ChatGPT could prevent real world harm like suicide, and possibly even contribute to helping disease too.

He is a con man. Of course he’s charming and convincing, that’s how he ended up where he is. But he’s just as full of it as Musk when he was waxing lyrical about saving the world and going to Mars. They lie very convincingly.

username223 · 2026-03-25T05:16:37 1774415797

Sam Altman made his stake at the table with a shady and failed location data harvesting app (https://en.wikipedia.org/wiki/Loopt). That's who he is, that's what he does, and we're all better off paying less attention to the sounds he emits, and more to the things he does.

waterproof · 2026-03-25T05:41:08 1774417268

> the things he does.

The things he does is convince investors to give him billions of dollars to build what he wants. Where exactly does that leave us?

rustystump · 2026-03-25T06:33:57 1774420437

A fool and his money shall soon be parted. Sam is a face. If it wasnt him, it would be someone else.

Eufrat · 2026-03-25T05:36:34 1774416994

Multiple people have attested that Sam Altman is extremely charming (especially in more casual, intimate settings) and talks very nobly about his goals, but his actual work is just…all kinds of awful. And I think that charm only goes so far as it seems clear that people are starting to demand that OpenAI actually match its words with work it cannot produce.

I think his board fight within OpenAI where essentially lied to the board, his obsession with retinal scanning everyone for his biometric cryptocurrency (Worldcoin), how he left Y Combinator are just evidence that he’s not very heroic. Most cringe to me is that he and many others seem aware that what their are doing is corrosive and harmful to society on some level as Altman has admitted to having a bunker somewhere around Big Sur [0]. Which…WTF.

[0] https://www.newyorker.com/magazine/2016/10/10/sam-altmans-ma...

morpheuskafka · 2026-03-25T07:37:42 1774424262

> how he left Y Combinator

Not too familiar with that history, but he still is listed as a courtesy credit/reviewer at the end of PG's blog entries, so I assume he didn't have too much of a bad exit?

Eufrat · 2026-03-25T07:46:04 1774424764

We’ll never know exactly what exactly transpired, but I think the existing evidence is clear that as President of Y Combinator he should not have been also as involved in OpenAI as he was.

This is a conflict of interest and I think one a very obvious one. He tried to have it both ways and was forced to choose in the end. I think putting himself in that situation rather than resigning up front to pursue OpenAI ambitions says a lot about his character.

aaa_aaa · 2026-03-25T12:23:52 1774441432

He is a conman, and potentially a terrible person (look for it)

presbyterian · 2026-03-25T16:12:32 1774455152

> ChatGPT could prevent real world harm like suicide

It could prevent suicide, maybe, but we know that it does cause suicides, at least in some cases. Seems like a poor value proposition.

sfn42 · 2026-03-25T08:24:18 1774427058

I haven't followed him much as I really don't care, but the one clip I've seen of him that really stands out to me (I've seen more but this is the one I remember) is one where he's talking to some guy who doubts the LLMs genius, and Sam says something like "what if ChatGPT solved quantum gravity, would you be convinced then?"

To me, this just came off as pathetic. It hasn't solved anything and there's no reason to believe it ever will. The whole question is completely pointless except to put the idea in viewers heads that ChatGPT will soon revolutionize science, with no actual substance behind it. It's not even a question, there's only one possible answer. He's holding the guy verbally hostage just to manipulate dumb viewers.

So anyway that's the only memorable clip I've seen of Sam Altman, and based on that alone, fuck that guy.

piva00 · 2026-03-25T10:22:16 1774434136

The most memorable clip I've seen of him was the Brad Gerstner's podcast one (an investor of OpenAI), Gerstner questioned Altman about the financials of OAI, how could it have committed to spend so much given the revenue, it's a decent question and it's been up in the air for a while across the media.

Altman's reaction was very telling of the kind of person he is, just immediately lashing out at Gerstner in a childish way, asking if Gerstner wanted to sell his shares because he could find a buyer in no time.

It was a pathetically immature reaction, I wouldn't expect that from any kind of professional, even less someone who has held positions as Altman has and now sits at the top of the leadership for a company sucking hundreds of billions of investment.

Apart from that clip there's also the whole saga of sama @ Reddit, full of lies, deceptions, and the same kind of immature attitude peppered across Reddit itself.

Hendrikto · 2026-03-25T11:25:18 1774437918

> Gerstner questioned Altman about the financials of OAI

After glazing OpenAI and Sam personally for 45 minutes straight. But as soon as Sam was questioned in the slightest, he exploded.

mvdtnz · 2026-03-25T18:49:44 1774464584

My most memorable clip was when he was interviewed about the "suicide" of an ex-employee and Sama lied through his teeth. I can't understand people who say this snake is "charming"... he's a bad liar and has sub-zero charisma.

https://www.youtube.com/watch?v=zrgEZ8FeZEc

heavyset_go · 2026-03-25T21:35:20 1774474520

> It was a pathetically immature reaction, I wouldn't expect that from any kind of professional, even less someone who has held positions as Altman has and now sits at the top of the leadership for a company sucking hundreds of billions of investment.

If you're familiar with nepobaby brats and narcissists, this is not surprising.

Gooblebrai · 2026-03-25T22:32:19 1774477939

> He's holding the guy verbally hostage just to manipulate dumb viewers.

Why? The other person can say "Yes". That doesn't mean ChatGPT has the capability to do it?

sfn42 · 2026-03-26T08:56:27 1774515387

That's the point. The other guy can only say yes - if chatgpt solved a hard problem and improved our understanding of the universe there would be no discussion as to its capability to do so.

"No" is not a reasonable answer to the question. It's like asking an atheist "if god and Jesus and all the angels came to earth and showed themselves for all to see, would you believe in god then?" Well yes of course, I believe in all the things we can all see. The lack of evidence is the whole point.

So asking "if there was evidence would you think differently?" Is either a fundamental misunderstanding of the persons position, or just a cheap ploy to manipulate people. In Sam's case I'm thinking it was the latter. He's a clever guy, he knows he's on camera. He asked that question just to plant the idea in people's minds - not the guy he was talking to, that guy didn't even need to answer the question because as already said there's only one answer to it. But to everyone watching, Sam basically just put it out there that ChatGPT solving quantum gravity is within the realm of possibility. Which it probably isn't.

Gooblebrai · 2026-03-26T09:27:05 1774517225

Fair, thanks for explaining

Lionga · 2026-03-25T07:27:59 1774423679

Thinking that Scam Altman of Worldcoin etc. fame was "genuine about making a product that could improve people's lives" seems like a strange kind of delusion.

johnfn · 2026-03-24T03:47:31 1774324051

I like to imagine that the number of consumed tokens before a solution is found is a proxy for how difficult a problem is, and it looks like Opus 4.6 consumed around 250k tokens. That means that a tricky React refactor I did earlier today at work was about half as hard as an open problem in mathematics! :)

chromacity · 2026-03-24T05:25:27 1774329927

You're kidding, but it could be true? Many areas of mathematics are, first and foremost, incredibly esoteric and inaccessible (even to other mathematicians). For this one, the author stated that there might be 5-10 people who have ever made any effort to solve it. Further, the author believed it's a solvable problem if you're qualified and grind for a bit.

In software engineering, if only 5-10 people in the world have ever toyed with an idea for a specific program, it wouldn't be surprising that the implementation doesn't exist, almost independent of complexity. There's a lot of software I haven't finished simply because I wasn't all that motivated and got distracted by something else.

Of course, it's still miraculous that we have a system that can crank out code / solve math in this way.

kuschku · 2026-03-24T07:34:13 1774337653

If only 5-10 people have ever tried to solve something in programming, every LLM will start regurgitating your own decade-old attempt again and again, sometimes even with the exact comments you wrote back then (good to know it trained on my GitHub repos...), but you can spend upwards of 100mio tokens in gemini-cli or claude code and still not make any progress.

It's afterall still a remix machine, it can only interpolate between that which already exists. Which is good for a lot of things, considering everything is a remix, but it can't do truly new tasks.

rfw300 · 2026-03-25T01:25:54 1774401954

What is a "truly new task"? Does there exist such a thing? What's an example of one?

Everything we do builds on top of what's already been done. When I write a new program, I'm composing a bunch of heuristics and tricks I've learned from previous programs. When a mathematician approaches an open problem, they use the tactics they've developed from their experience. When Newton derived the laws of physics, he stood on the shoulders of giants. Sure, some approaches are more or less novel, but it's a difference in degree, not kind. There's no magical firebreak to separate what AI is doing or will do, and the things the most talented humans do.

kuschku · 2026-03-25T01:37:20 1774402640

That highlighted phrase "everything is a remix" was for a good reason, there's a documentary of that same name, and I can certainly recommend it.

At the same time, there are things that are truly novel, even if the idea is based on combining two common approaches, the implementation might need to be truly novel, with new formulas and new questions that arise from those. AI can't belp there, speaking from experience.

nextaccountic · 2026-03-24T05:15:40 1774329340

That's why context management is so important. AI not only get more expensive if you waste tokens like that, it may perform worse too

Even as context sizes get larger, this will likely be relevant. Specially since AI providers may jack up the price per token at any time.

stabbles · 2026-03-24T09:31:25 1774344685

You're glancing over the fact that mathematics uses only one token per variable `x = ...`, whereas software engineering best practices demand an excessive number of tokens per variable for clarity.

VierScar · 2026-03-24T09:47:35 1774345655

It's also a pretty silly thing to say difficulty = tokens. We all know line counts don't tell you much, and it shows in their own example.

Even if you did have Math-like tokenisation, refactoring a thousand lines of "X=..." to "Y=..." isnt a difficult problem even though it would be at least a thousand tokens. And if you could come up with E=mc^2 in a thousand tokens, does not make the two tasks remotely comparable difficulty.

ozozozd · 2026-03-24T04:45:01 1774327501

Try the refactor again tomorrow. It might have gotten easier or more difficult.

locknitpicker · 2026-03-24T07:20:26 1774336826

> I like to imagine that the number of consumed tokens before a solution is found is a proxy for how difficult a problem is (...)

The number of tokens required to get to an output is a function of the sequence of inputs/prompts, and how a model was trained.

You have LLMs quite capable of accomplishing complex software engineering work that struggle with translating valid text from english to some other languages. The translations can be improved with additional prompting but that doesn't mean the problem is more challenging.

gf000 · 2026-03-24T07:49:15 1774338555

I think it's more of a data vs intelligence thing.

They are separate dimensions. There are problems that don't require any data, just "thinking" (many parts of math sit here), and there are others where data is the significant part (e.g. some simple causality for which we have a bunch of data).

Certain problems are in-between the two (probably a react refactor sits there). So no, tokens are probably no good proxy for complexity, data heavy problems will trivially outgrow the former category.

pks016 · 2026-03-24T18:34:17 1774377257

I don't think so. I went through the output of Opus 4.6 vs GPT 5.4 pro. Both are given different directions/prompts. Opus 4.6 was asked to test and verify many things. Opus 4.6 tried in many different ways and the chain of thoughts are more interesting to me.

sublinear · 2026-03-24T03:51:25 1774324285

You might be joking, but you're probably also not that far off from reality.

I think more people should question all this nonsense about AI "solving" math problems. The details about human involvement are always hazy and the significance of the problems are opaque to most.

We are very far away from the sensationalized and strongly implied idea that we are doing something miraculous here.

johnfn · 2026-03-24T03:54:59 1774324499

I am kind of joking, but I actually don't know where the flaw in my logic is. It's like one of those math proofs that 1 + 1 = 3.

If I were to hazard a guess, I think that tokens spent thinking through hard math problems probably correspond to harder human thought than tokens spend thinking through React issues. I mean, LLMs have to expend hundreds of tokens to count the number of r's in strawberry. You can't tell me that if I count the number of r's in strawberry 1000 times I have done the mental equivalent of solving an open math problem.

throw310822 · 2026-03-24T04:08:20 1774325300

You can spend countless "tokens" solving minesweeper or sudoku. This doesn't mean that you solved difficult problems: just that the solutions are very long and, while each step requires reasoning, the difficulty of that reasoning is capped.

p1necone · 2026-03-25T00:38:13 1774399093

A lot of math problems/proofs are like minesweeper or sudoku in a way though. They're a long series of individually kinda simple logical deductions that eventually result in a solution. Some really hard problems are only really hard because each one of those "simple" deductions requires you to have expert knowledge in some disparate area to make that leap.

gpm · 2026-03-24T04:19:57 1774325997

Some thoughts.

1. LLMs aren't "efficient", they seem to be as happy to spin in circles describing trivial things repeatedly as they are to spin in circles iterating on complicated things.

2. LLMs aren't "efficient", they use the same amount of compute for each token but sometimes all that compute is making an interesting decision about which token is the next one and sometimes there's really only one follow up to the phrase "and sometimes there's really only" and that compute is clearly unnecessary.

3. A (theoretical) efficient LLM still needs to emit tokens to tell the tools to do the obviously right things like "copy this giant file nearly verbatim except with every `if foo` replaced with `for foo in foo`. An efficient LLM might use less compute for those trivial tokens where it isn't making meaningful decisions, but if your metric is "tokens" and not "compute" that's never going to show up.

Until we get reasonably efficient LLMs that don't waste compute quite so freely I don't think there's any real point in trying to estimate task complexity by how long it takes an LLM.

refulgentis · 2026-03-24T05:44:12 1774331052

I fear that under those constraints, the only optimal output is “42”

pinkmuffinere · 2026-03-24T04:10:39 1774325439

This is interesting, I like the thought about "what makes something difficult". Focusing just on that, my guess is that there are significant portions of work that we commonly miss in our evaluations:

1. Knowing how to state the problem. Ie, go from the vague problem of "I don't like this, but I do like this", to the more specific problem of "I desire property A". In math a lot of open problems are already precisely stated, but then the user has to do the work of _understanding_ what the precise stating is.

2. Verifying that the proposed solution actually is a full solution.

This math problem actually illustrates them both really well to me. I read the post, but I still couldn't do _either_ of the steps above, because there's a ton of background work to be done. Even if I was very familiar with the problem space, verifying the solution requires work -- manually looking at it, writing it up in coq, something like that. I think this is similar to the saying "it takes 10 years to become an overnight success"

famouswaffles · 2026-03-24T04:13:49 1774325629

>The details about human involvement are always hazy and the significance of the problems are opaque to most.

Not really. You're just in denial and are not really all that interested in the details. This very post has the transcript of the chat of the solution.

typs · 2026-03-24T03:56:21 1774324581

I mean the details are in the post. You can see the conversation history and the mathematician survey on the problem

johnfn · 2026-03-22T00:37:50 1774139870

> This is arguably their defining HN characteristic: they are one of the most vocal, persistent AI optimists on the platform. They claim ~90-95% of their shipped code is AI-generated, report 5-10x productivity gains, and have built a detailed methodology around it — using Playwright for visual verification, static typechecking as a hallucination filter, and e2e test suites as automated validation harnesses

Wow, I sound really annoying. Sorry about that everyone!

hypercube33 · 2026-03-22T00:48:03 1774140483

I sound like an annoying old people I guess so I think I'm worse. Either way I forgive you. (GPT called me a wiring closet gremlin)

giardini · 2026-03-22T04:13:42 1774152822

"wiring closet gremlin"!?

I knew the original!8-)) Hope you aren't him!

johnfn · 2026-03-21T18:51:02 1774119062

I mean, you are painting it as some moralistic judgement, but if you’re asking me for on one hand listening to some annoying music, and on the other hand having some chance (however slight) of bodily injury, knife wound, or whatever… I know which one I am going to choose.

johnfn · 2026-03-20T17:37:11 1774028231

It’s hard to imagine a slow, overworked, somewhat inept, bureaucratic school board, with a thousand other things it wants to care about, managing to stay ahead of thousands of crafty and highly motivated teens.

johnfn · 2026-03-19T06:59:32 1773903572

The BSBench is such a fantastic resource - thank you for sharing.

We should really be citing rather than anecdata every time someone brings up hallucinations.

johnfn · 2026-03-18T06:28:20 1773815300

> The concept of congregating in walled gardens owned by pedophilic fascist speed freaks

Are we really calling everyone we don't like a pedophilic fascist now? I honestly had really hoped that this sort of polarized, low-quality content wouldn't make it onto HN. :(

speedgoose · 2026-03-18T06:31:13 1773815473

I thought it was pretty factual.

rudhdb773b · 2026-03-18T07:07:49 1773817669

So which walled garden owner regularly has sex with prepubescent children and is a heavy meth user?

johnfn · 2026-03-18T06:41:07 1773816067

If you think that everyone who works on a website that is a walled garden is a "pedophile fascist", I don't know what to say to you -- I don't think we live in the same reality.

speedgoose · 2026-03-18T06:48:24 1773816504

It’s not what is written?

johnfn · 2026-03-18T07:06:15 1773817575

You should tell me your interpretation of the quote I excerpted then.

speedgoose · 2026-03-18T07:13:49 1773818029

It’s about the common and popular walled garden American social medias owned by people that are close and supporting their current elected government.

johnfn · 2026-03-18T07:20:23 1773818423

The quote says "walled gardens owned by pedophilic fascist speed freaks". Not "owned by people close to", "owned by".

speedgoose · 2026-03-18T07:26:52 1773818812

If one supports pedophiles and promote fascist speeches…

johnfn · 2026-03-18T07:36:17 1773819377

It is not "factual" to call these people pedophiles. Maybe you think they are bad for society. Maybe you think their websites are terrible. Maybe you don't like them. Those are all fine things, and you are free to say them! But to say they are factually a pedophile without evidence is not true. It only diminishes the quality of conversation.

mlrtime · 2026-03-18T11:41:47 1773834107

I'm reading this line of conversations and I can tell you, you're wasting your time.

There is NO convincing these people of anything else, they will move the goal posts every time. I've been in these same conversations and it goes nowhere.

If you continue, it will move all the way to "If you're not out protesting, voting for X, you are in fact a fascist pedo yourself".

Even the mere fact that you question such line of thought... makes you a facist pedo.

pacifika · 2026-03-18T07:16:48 1773818208

A bad apple spoils the bunch

johnfn · 2026-03-18T06:12:51 1773814371

Rationalists were talking about AI decades before anyone else were talking about it. They were also early on COVID and crypto. They are only "aggressively wrong" about "everything" if you are, ironically, not thinking rationally about it.