Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Only until they start incorporating this test into their training data.


Dataset contamination alone won't get them good-looking SVG pelicans on bicycles though, they'll have to either cheat this particular question specifically or train it to make vector illustrations in general. At which point it can be easily swapped for another problem that wasn't in the data.


I like this one as an alternative, also requiring using a special representation to achieve a visual result: https://voxelbench.ai

What's more, this doesn't benchmark a singular prompt.


they can have some cheap workers make about 10 pelicans by hand in svg, fuzz them to generate thousands of variations and throw it in their training pool. don't need to 'get good at svgs' by any means.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: