Hacker Newsnew | past | comments | ask | show | jobs | submit | alexop's commentslogin

Yes, exactly. What I meant is that a human would also try every "tool" available. In the case of o3, the only tools it had were Python and Bing.

But you are right. It does not actually understand anything. It is just a next-token predictor that happens to have access to Python and Bing.


Yes, I agree. Like I said, in the end it did what a human would do: google for the answer. Still, it was interesting to see how the reasoning unfolded. Normally, humans train on these kinds of puzzles until they become pure pattern recognition. That's why you can't become a grandmaster if you only start learning chess as an adult — you need to be a kid and see thousands of these problems early on, until recognizing them becomes second nature. It's something humans are naturally very good at.


I am a human and I figured this puzzle out in under a minute by just trying the small set of possible moves until I got it correct. I am not a serious chess player. I would have expected it to at least try the possible moves? I think this maybe lends credence to the idea that these models aren’t actually reasoning but are doing a great job of mimicking what we think humans do.


Oh cool, I wonder how good 03 will be. While using 03, I noticed something funny: sometimes I gave it a screenshot without any position data. It ended up using Python and spent 10 minutes just trying to figure out where the figures were exactly.


yes


its funny how video games are the hardest benchmark that humanity has for ai


They're not the hardest problems we have, they are just very nice benchmark tools because by definition they already run on a computer and you can fairly easily interface an AI with them.

There's probably also a distorting factor in that all the AI research into stock market and military applications probably doesn't get published, so it seems like video game AIs are a much larger percentage of research than it actually is.


A video game is a very well-defined problem, and usually comes with simple metrics for success – health, time, or in Factorio’s case, ultimately science per minute (or per minute played, for AIs?). Real world problems are much harder to define, they are embedded in a very complex ecosystem, and it’s not clear at all what to optimize for.


It is "hardest" in a context of the AI actually having a chance.

There's no problem asking AI for the blueprints to a working faster-than-light spaceship, only we already know the AI will fail, and the way it fails provides no useful information.


I'd love to see a Baba is You or Stephen's Sausage Roll llm environment to gauge spatial reasoning. Stephen's Sausage Roll in particular could be very interesting because the mechanics are incredibly simple but challenging.


DeepMind went from playing Pong to protein folding in a short number of years. There are much harder things for AI to do than playing video games. Also see: self driving cars.


This looks nice. I also played on the weekend with Vue and Transformer.js to build the embeddings locally. See https://github.com/alexanderop/vue-vector-search


Thank you for the good feedback. I tried to improve that. I was writing the blog post for myself to understand Cosine Similarity, which is why it's maybe a bit repetitive, but this is the best way for me to learn something. I get your point. Next time I will write it better. Good feedback - I love that.


Ha, when you put it that way, I can totally see why it read like that!

It looks super great now. What you have here leaves an entirely different impression, and a stylish one!

Two last suggestions:

* Now I'm thinking the Why Cosine Similarity Matters for Modern Web Development section belongs at the top, right after your intro.

* The angle indicator is still a bit wonky in the diagram. I might even take direction only mode out entirely, as you point out cosine similarity is invariant to changes in magnitude.


I think the web animation is really useful for people new to the concept.


Oh, certainly, I meant remove the "Direction-only mode" toggle, not the whole animation!


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: