Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Such a dramatic name for such a boring set of tests. We need to test whether it can come up with a Nobel Prize-winning scientific breakthrough, a Booker/Pulitzer-worthy novel, Ken Thompson-level code that solves a real problem, or a proof for Fermat’s Last Theorem.


This makes me wonder if you could train an llm without any references to Wiles’ work and see if it can compete Fermat’s last theorem


None of those are easily verifiable


Then they probably shouldn't have called it "Humanity's last exam." Kinda lame, if you think about it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: