Hacker Newsnew | past | comments | ask | show | jobs | submit | Davidzheng's commentslogin

3 m5 mac ultras with 512 each? idk

I don't understand what including the time of "4 years" does for your arguments here. I don't think anyone is arguing that the usefulness of these AIs for real projects started at GPT 3.5/4. Do you think the capabilities of current AIs are approximately the same as GPT 3.5/4 4 years ago (actually I think SOTA 4 years ago today might have been LaMDA... as GPT 3.5 wasn't out yet)?

> I don't think anyone is arguing that the usefulness of these AIs for real projects started at GPT 3.5/4

Only not in retrospect. But the arguments about "if you're not using AI you're being left behind" did not depend on how people in 2026 felt about those tools retrospectively. Cursor is 3 years old and ok 4 years might be an exaggeration but I've definitely been seeing these arguments for 2-3 years.


Yeah. I started integrating AI into my daily workflows December 2024. I would say AI didn't become genuinely useful until around September 2025, when Sonnet 4.5 came out. The Opus 4.5 release in November was the real event horizon.

Honestly for research level math, the reasoning level of Gemini 3 is much below GPT 5.2 in my experience--but most of the failure I think is accounted for by Gemini pretending to solve problems it in fact failed to solve, vs GPT 5.2 gracefully saying it failed to prove it in general.

Have you tried Deep Think? You only get access with the Ultra tier or better... but wow. It's MUCH smarter than GPT 5.2 even on xhigh. It's math skills are a bit scary actually. Although it does tend to think for 20-40 minutes.

I tried Gemini 2.5 Deep Think, was not very impressed ... too much hallucinations. In comparison GPT 5.2 extended time hallucinates at like <25% of the time and if you ask another copy to proofread it goes even lower.

I never tried 2.5. Three is pretty solid though, at least for my use case.

If there's a specific query you want me to run through it for comparison I'm happy to give it a go.


makes you wonder how automate-able this babysitter roles is...

That was my reaction.

really interested in what the brain does when it "loads" the context for something it's familiar with but is currently unloaded from the working memory. Does it mostly try to align some internal state? or more just load memories into fast access

how do you take over the world if you have access to 1000 normal people? if AGI is by the original definition (long forgotten by now) of surpassing MEDIAN human at almost all tasks. How the rebranding of ASI into AGI happened without anyone noticing is kind of insane

"People living in less advanced economies will do OK, but the rest of us not so much" how is this possible? are the less advanced economies protected from outside influences? are they also protected from immigration?

Not OP, but assuming I am following the argument correctly, I think parent is referring to something else. Advanced economies have participants, who function well in that environment and are shaped by it to a large degree. As a result, if one was to ask them to get food in an environment, where it is not as easily accessible as it is today, they might stumble. On the other hand, in the old country, a lot of people I knew had a tendency to have a little garden, hunt every so often, forage for mushrooms and so on. In other words, more individuals may be able to survive in less developed economies precisely, because they are less developed and less reliant on convenience today brings.

ah makes a lot of sense! thanks!

Sure, but in pure mathematics there are a lot of well specific problems which no one can solve.

Mathematics is indeed one of those rare fields where intimate knowledge of human nature is not paramount. But even there, I don't expect LLMs to replace top-level researchers. The same evolutionary "baggage" which makes simulating and automating humans away impossible is also what enables (some of) us to have the deep insight into the most abstract regions of maths. In the end it all relies on the same skills developed through millions of years of tuning into the subtleties of 3D geometry, physics, psychology and so on.

I find it unbelievable that this question can't be settled themselves without posting this simply by asking the AI enough novel questions. I myself have little doubt that at least they can solve some novel questions (of course similarity of proofs is a spectrum so it's hard to draw the line at how original they are)

I settle this question for myself every month: I try asking ChatGPT and Gemini for help, but in my domains it fails miserably at anything that looks new. But, YMMV, that's just the experience of one professional mathematician.

New doesn't have to mean "the hardest thing yet", but as humans mastering our subdomain, they are often the same.

Trust, but verify, no? No one benefits from refusing to experiment and test.

Even if your argument is correct here, it would only mean this particular method of replacement doesn't immediately work for this job.

Thanks for the reply. I'm not sure I understand. Why does replacement matter so much? Where do these anxieties come from?

I'm saying that this type of employment is a necessary diffusion layer for making decisions, and isn't about "productivity". Payroll for these kinds of jobs is even considered capex. Efficiency misses the point entirely and is tripping over dollars to pick up pennies.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: