Hacker Newsnew | past | comments | ask | show | jobs | submit | qsort's commentslogin

This is unfortunately the problem. The level of the public debate is abysmal, most politicians push unbelivably stupid shit about immigration and other identitarian nonsense, budget gets spent to ensure cheese and wine have the proper AOC certifications on them. Honestly up to a point I even understand it, many people don't see themselves as having a meaningful identity as EU citizens and you can't force it upon them.

Asking for sensible AI policy is like asking for a base on mars.


> many people don't see themselves as having a meaningful identity as EU citizens

I sometimes wonder if the citizens of the United States (of America) even comprehend that the EU is not itself a sovereign nation (unlike the states in say, the USA, or Australia) and is just a union of sovereign polities.

Nobody in the EU is an EU Citizen unless they are a citizen of one of the member states.


[flagged]


> reducing unskilled and hateful immigration is the democratic thing do

Hateful, sure, though just like all those on the right who say the first word of "hate crime" is redundant, I'd argue home-grown hate's just as dangerous as imported.

But "unskilled" immigration? When the topic is AI? If this stuff works as advertised, *nobody's skills matter any more*. If it doesn't at least render many of our skills obsolete, why build it? If you make an AI which can't automate anything, how is this not a waste of money?

Even without that, I've not seen anyone who knows about Baumol's cost disease opine either way about migration, high or low skilled.


Actually yea, it has everything to do with it.

I am open to the idea that we should handle immigration differently, but I want a plan and specifics, not slogans. What we want to achieve, and by what mechanisms you plan to get there. Open any newspaper: are you more likely to find careful and considerate opinions or racist screeds?

And that is the problem. Time and energy and money and political capital are routinely spent on inconsequential electoral poliTICS rather than substantial poliCY.


Agreed, I don't know if it's going to stay like this forever, but right now, if anything, the difference is amplified. You can make unbelivable stuff happen through the sheer power of knowing what you're doing.

Yeah, GPT 5.5 + Fable beating either individually is belivable, but 2x Opus > Fable is what makes me a bit dubious about the whole thing. They might be measuring skills that are too specific or benefit a lot from more tokens being thrown at them. Also Claude Code (the harness) is not the best at the moment, that might be part of it as well?

What throws me off is DeepSeek beating both Opus 4.8 and GPT 5.5.

That definitely doesn't sound right.


Not all translations are the same. Literary translations are often works of art in and of themselves, and automating them would be missing the point entirely, like automating homework or weightlifting at the gym. I don't really know what's the state of the art, but I do buy that, on the other hand, translating toaster manuals or generic copy could soon be automatic.

Yup. If you are bilingual, you quickly realize how some translations are very bad. How some translations are very good. And how hard it is to translate. With dry, simple text, it might be easy. But when it involves art? Some jokes don't translate directly. There is pun. Sounds of words. Double meaning. Ambiguity. Cultural background. The creation of new words.

It can be reasonably argued that some poetry can be impossible to translate from some languages to others. A poem might be explained, but by a lenghty, dissecting explanation, that completely loses the point of it.


Or if you compare a poetic translation to a literal one, of different translations of the same work to the same language to each other.

When it's one one-hundredth the cost, "good enough" is generally good enough.

It's almost certainly a reference to Lovecraft actually:

https://en.wikipedia.org/wiki/Cthulhu_Mythos

Hopefully future models will be kind enough not to behave like malevolent gods.


The word mythos means roughly the same as "myth" and dates to 1753.

Why do you think that?

If you're a "React person", as the article puts it, friendly reminder that you can render components to HTML and serve that to the user.

I have done exactly that on a project that was under similar constraints. The UI models live in .tsx files and the browser gets pure HTML with zero JS by default.


These are the results from the website they link in the paper:

https://math.sciencebench.ai/benchmarks

I take the "2 unsolved" claim to mean "not solved by any model in any configuration in any stage with any number of attempts", the "benchmark results" are much lower. To be clear: it's extremely impressive, I still remember I was in utter disbelief when models started solving AIME problems, and this is obviously several levels above that.

It's also interesting that OpenAI models perform that much better on math and math-adjacent stuff. I assume this comes down to differences in post-training?


If you're trying to compare what the models are good at, important to note that the different models did not run with the same settings. In one case they also retried with GPT until it answered all the problems but did not retry with the other models.

GPT has 5 effort settings and they picked the highest (xhigh). Claude has 5 and they picked the middle one to avoid having to retry when it timed out. Gemini has medium or high effort and they picked medium.


the difference between gpt and gemini concerning the "retry until..." can almost be ignored. I did rerun gpt a few times, but still way below what gemini was not able to answer at all.

Look, I've never been someone who mindlessly hypes AI companies, as a matter of fact I think they have serious leadership problems across the board, but you people are straw-manning them so badly it actually makes me sympathize with them.

They aren't saying they have fully automated luxury AGI, they specifically list the ways models fall short of that bar and caution against people taking the 8x figure as the actual uplift number. At the same time they recognize that 80% of new code is now AI-authored, when two years ago those models were little more than toys. And frankly that checks out: if two years ago you told me we'd have something like Opus 4.8/GPT 5.5 I would have rolled to disbelieve.


> At the same time they recognize that 80% of new code is now Al-authored

I can setup a loop that will write a trillion lines of code automatically, how much of it is actually useful? Or are we back to counting LoC because there's no other metric for these systems that anyone can rely on?


It's 80% of new code they shipped that is AI authored.

Would you ship pointless code?

I do tend to agree though, it could be that AI solves problems with more code than a human would. What you need to measure is the value the code brings and how much of that is done by AI, hard to get an objective measure of that though.


> Would you ship pointless code?

I wouldn't, no. I don't see evidence that the engineers at Anthropic are similarly cautious however. They describe Claude Code as "basically a game engine" when it's literally a TUI app, and it eats memory for no apparent reason. I fully believe that Anthropic would ship pointless and garbage code. Especially if it's being written by LLM.


I could write a bash script that copies a codebase repeatedly in the pre-AI past as well, but I didn't do that because I wasn't stupid. More than 80% of my code is now AI-generated, and trust me I'm still not stupid. It was 0% only a year ago.

Who says LoC is the only metric we should rely on? A software product should first and foremost meet user requirements, functionality and performance. Judging from the sensational rise of Anthropic's user base and revenue I think we can safely says they're in that ball pack.


I'm dumb as a rock and I don't have a PhD, but since ~1 year ago I started forcing myself to do small bits of coding and math manually.

I'm not noticing a "cognitive decline" per se, but I do see I'm a lot "lazier", even stuff that used to be routine when I started coding now feel heavy.


>I'm not noticing a "cognitive decline" per se

The funny thing is, maybe not noticing one can be the actual sign of it :)


Yes, precisely. Assessing your own cognitive skills is dubious. I’m pretty certain I’m less clever than I was when younger but if I find a problem tough now maybe 25 yo me would also have struggled?

That’s the most important thing. If we keep reading, maybe we can hold our own.

>even stuff that used to be routine when I started coding now feel heavy.

The same weight feeling heavier is a sign that your muscles are weaker :)

There's many areas in life were we look back a few decades and think "people use to do it that awkwardly?" And yet results were better. I think the process of removing friction have just served to destroy our ability to concentrate and tolerate difficulty.


> but I do see I'm a lot "lazier", even stuff that used to be routine when I started coding now feel heavy.

Not getting that quick dopamine hit the LLMs give you..

Some say you can re-train your system to get back the dopamine hits you used to get from other things, like the enjoyment of the "old fashioned" manual coding and math. Getting there is hard work. And YMMV.


I just do things manually and ask LLMs to check my work. That seems to be working great for me.

I had the most Russian of Russian bosses when I was in college. My first day on the job he so eloquently stated, "I am not your mother. Do not come to me with problems. Come to me with solutions. I want to know what you tried and what did not work."

His advice has served me well in many areas of life too. I try my best to treat LLMs no differently for domains I care about (not one-off little questions here and there).


What I would like to do is double model post-check, with a form of "debate", to better catch edge cases.

Unfortunately, I haven't found a way to set that up as I envisioned it, for the time being.


Absolutely this, I'm the same as you.

And I'm just afraid this is what cognitive decline feels like from inside the deteriorating mind.


“ I'm not noticing a "cognitive decline" per se, but I do see I'm a lot "lazier"”

These are correlated - it just hasn’t happened in a large enough amount for you to have clearly noticed it yet.


I do a similar version of this, where if I notice a mistake in generated code, I fix it manually (or at least attempt to) instead of telling Claude to fix it.

This is the right balance for me as well.

I use an agent to generate a first-pass attempt, and then (deadlines willing), I manually read every line at least once so I understand what the code actually does.

Then I manually fix the inevitable slop that is mixed in with the good stuff, and only once the code is up to my personal standards do I send it.

This probably reduces my “AI performance boost” to 30-50% instead of the huge gains reported by others. But I retain the ability to reason about the codebase and use AI much more precisely when I’m trying to troubleshoot production outages or subtle bugs — something I notice the rest of my team struggles with, since adopting “agentic workflows” everywhere.

I think actively working to retain some cognitive flexibility and “muscle memory” around coding tasks is going to be rather advantageous in the long run.


Pure copium, but what can you do with the deadlines.

Same, but also because it feels like it takes longer for an LLM to do it. I think that's something people who are into gathering personal metrics should do - measure how long it takes to type a prompt / have the LLM fix things vs just doing it yourself.

LLMs are making me smarter. I have more code to read!

The object-level discussion is interesting, but I disagree with the premise to such an extent it feels like a moot point. It feels like the article doesn't play out the line to its logical conclusion.

Why would agents want GUIs made for humans? It's already the case that, like everyone who's good at computers, agents want a terminal and good APIs, not some ad-ridden crap.

If anything, AI is a reason why it will never be the year of the linux desktop but also it doesn't matter anymore, because if the higher-order bit of productivity is defined by AI, then my tmux+vim is as good as your Visual Studio.


  > tmux+vim is as good as your Visual Studio.
You probably don't need tmux. The utility is really when you're remoting into machines and want to keep your session (or are too lazy to use nohup or disown)

Your terminal should split panes for and do tabs. Ghostty is my preferred but use whatever. And fwiw, even if your terminal sucks vim can do this all for you too (:term), so you don't even need to leave vim.

  > vim is better than your Visual Studio.
FTFY ;)

Side note: just because you can live in the terminal on Linux doesn't mean GUIs can't exist or are even second class citizens. The real beauty is being able to have both. You can have a platform that is usable for most people while not fucking over power users. Wild concept, I know


<strongly offended noises>

Everybody needs tmux, especially locally.

My terminal splits panes (which I don't use), but what if I want to open two terminals that share the same set of splits? Can't. But tmux can!

What if I want to SSH back into my desktop (because I'm on a laptop or whatever) and grab something from my desktop terminal? Can't. But tmux can!

Vim splits and the vim terminal are poorly implemented. Technically, yes, they work. But you'll run into a lot of issues. I know, because a few years ago I went down the same path: Why do I need tmux, when I have vim!? ... I quickly learned why I needed tmux.

I agree with your side note: plasma+kitty+tmux and a few support scripts:

(please don't criticize my scripts; these were never meant to be shared, and it's a disaster, but it works for me)

I have this script (https://doc.xn0.org/tmuxedkitty-newwindow.sh) bound to WIN+T; it opens kitty, and either creates a new tmux session if there isn't one or attaches to the existing session and creates a new pane.

Then, I have my insane (I understand I am insane, but it works for me!) tmux config file: https://doc.xn0.org/.tmux.conf

Then, I have my insane zshrc that auto-titles my tmux windows: https://doc.xn0.org/.zshrc

Using titles from: https://doc.xn0.org/tmux-window-titles

I have put way too much thought and time into this...


  > but what if I want to open two terminals that share the same set of splits?
You want clones? I'll admit most terminals can't do this (some can), but I'm struggling to see the use case. What's the advantage of having 2 windows displaying the same information?

  > What if I want to SSH back into my desktop
Agreed! That was the explicitly stated usecase where I said tmux was for[0]

  > Vim splits and the vim terminal are poorly implemented.
Completely fair and I avoid for exactly those reasons. But they're still handy in a pinch and they're good to know about

BUT tmux is also poorly implemented. Start trying to use sixel (or kitty graphics) in your fzf previews, yazi, or whatever you're displaying things with. This is a big pain point.

  > please don't criticize my scripts
Do you want friendly comments? All code sucks so I'll not going to call you dumb or anything. But do upload somewhere so I don't have to download 0x0.st is perfect for this usecase.

  > Using titles
Your terminal doesn't do titles? What terminal are you using?

[0] I'll also admit Claude code is another use case. But that is because it is so poorly written not because the terminals suck. I absolutely believe Dario when he says Claude does most of the coding... it shows...


Just because tmux doesn't work for you doesn't mean it can't be useful for someone else. I for one really appreciate having the same interface and keybinds across several devices and I've never felt a need to look elsewhere.


  > having the same interface and keybinds across several devices 
I'm a bit lost. I use my dotfiles for this.

If it is a machine I control: I control the terminal so there's no issues.

If it's a machine I'm sshd into: that's my explicitly stated tmux case right there.

If it's a machine I don't control: well I can't do anything anyways, so conversation is moot. This situation is exceptionally rare though (where I can't even do local installs)

I agree that you should use what works for you, but I'm curious what you're getting that isn't already offered by your system


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: