Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Is that not just traditional OCR applied on top of LLM?


It's possible they have a software layer that does that. But I was assuming they don't, because the open source multimodal models don't.


No it’s not, it’s a multimodal transformer model.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: