We might have megapixel images that we can easily get with phone cameras, but vi... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		eigenvalue on Aug 26, 2023 \| parent \| context \| favorite \| on: Deep Neural Nets: 33 years ago and 33 years from n... We might have megapixel images that we can easily get with phone cameras, but virtually all vision models in common use take 224x224 resolution images as input, or maybe 384x384. Anything higher resolution than that just gets resampled down. It seems that you are better off using your compute budget on a bigger “brain” than on better “eyes” for now.

version_five on Aug 26, 2023 [–]

I don't think that's current. Certainly the object detection models work on bigger images, and the datasets they're pretrained on e.g. coco are not 224x224. I think standard models pretrained on imagenet, like the Resnets usually have everything resized to 224x224, and so they favor this kind of scaling.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact