Very odd that they don't do linear probing on the model features and measure ImageNet validation accuracy. In general, this paper is missing comparisons to existing vision models on existing benchmarks.
The visual prompting is a neat trick and it's great to see scale continue to work, but without comparisons to other models or releasing the code/weights, I don't think this strategy is going to be competitive.
The visual prompting is a neat trick and it's great to see scale continue to work, but without comparisons to other models or releasing the code/weights, I don't think this strategy is going to be competitive.