Is that not just traditional OCR applied on top of LLM? | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		sushid on July 11, 2024 \| parent \| context \| favorite \| on: Vision language models are blind Is that not just traditional OCR applied on top of LLM?

energy123 on July 11, 2024 | [–]

It's possible they have a software layer that does that. But I was assuming they don't, because the open source multimodal models don't.

maxlamb on July 11, 2024 | [–]

No it’s not, it’s a multimodal transformer model.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact