There's been a large push to do server-side rendering for web pages which means ...

kordlessagain · on Sept 3, 2024

I've had good success running Playwright screenshots through EasyOCR, so parsing the DOM isn't the only way to do it. Granted, tables end up pretty messy...

fzysingularity · on Sept 3, 2024

We've been doing something simliar for VLM Run [1]. A lot of websites that have obfuscated HTML / JS or rendered charts / tables tend to be hard to parse with the DOM. Taking screenshots are definitely more reliable and future-proof as these webpages are built for humans to interact with.

That said, the costs can be high as the OP says, but we're building cheaper and more specialized models for web screenshot -> JSON parsing.

Also, it turns out you can do a lot more than just web-scraping [2].

[1] https://vlm.run

[2] https://docs.vlm.run/introduction