Only until they start incorporating this test into their training data.

orbital-decay · 2025-11-06T22:32:34 1762468354

Dataset contamination alone won't get them good-looking SVG pelicans on bicycles though, they'll have to either cheat this particular question specifically or train it to make vector illustrations in general. At which point it can be easily swapped for another problem that wasn't in the data.

jug · 2025-11-07T00:07:01 1762474021

I like this one as an alternative, also requiring using a special representation to achieve a visual result: https://voxelbench.ai

What's more, this doesn't benchmark a singular prompt.

nwienert · 2025-11-07T01:21:30 1762478490

they can have some cheap workers make about 10 pelicans by hand in svg, fuzz them to generate thousands of variations and throw it in their training pool. don't need to 'get good at svgs' by any means.