Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I don’t think any goalposts need to be redecorated. The “inner monologue” isn’t a reliable witness to o3’s model, it’s at best a post-hoc estimation of what a human inner monologue might be in this circumstance. So its “testimony” about what it is doing is unreliable, and therefore it doesn’t move the needle on whether or not this is “real reasoning” for some value of that phrase.

In short, it’s still anthropomorphism and apophenia locked in a feedback loop.



Devil's advocate, as with most LLM issues this applies to the meatbags that generated the source material as well. Quick example is asking someone to describe their favorite music and why they like it, and note the probable lack of reasoning on the `this is what I listened to as a teenager` axis.


Something as inherently subjective as personal preference doesn't seem like an ideal example to make that point. How could you expect to objectively evaluate something like "I enjoy songs in a minor scale" or "I hate country"?


The point is to illustrate the disconnect between stated reasoning and proximate cause.

Consider your typical country music enjoyer. Their fondness of the art, as it were, is far more a function of cultural coding during their formative years than a deliberate personal choice to savor the melodic twangs of a corncob banjo. The same goes for people who like classic rock, rap, etc. The people who `hate' country are likewise far more likely to do so out of oppositional cultural contempt, same as people who hate rap or those in the not so distant past who couldn't stand rock & roll.

This of course fails to account for higher-agency individuals who have developed their musical tastes, but that's a relatively small subset of the population at large.


Good point. When we try to explain why we're attracted to something or someone, what we do seems closer to modeling what we like to think about ourself. At the extreme, we're just story-telling about an estimation we like to think is true.


I largely agree! Humans are notoriously bad at doing what we call reasoning.

I also agree with the cousin comment that (paraphrased) “reasoning is the wrong question, we should be asking about how it adapts to novelty.” But most cybernetic systems meet that bar.


I don't think the inner monologue is evidence of reasoning at all, but doing a task which can only be accomplished by reasoning is.


Geoguessr is not a task that can only be accomplished by reasoning. Famously, it took a less than a day of compute time in 2011 to SLAM together a bunch of pictures of Rome (https://grail.cs.washington.edu/rome/).


Such as? geoguessing certainly isn't that.


> it’s at best a post-hoc estimation of what a human inner monologue might be in this circumstance

Nope. It's not autoregressive training on examples of human inner monologue. It's reinforcement learning on the results of generated chains of thoughts.


"It's reinforcement learning on the results of generated chains of thoughts."

No, that's not how LLMs work.



Base models are trained using autoregressive learning. "Reasoning models" are base models (maybe with some modifications) that were additionally trained using reinforcement learning.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: