Hacker Newsnew | past | comments | ask | show | jobs | submit | cpeffer's commentslogin

If you’re looking for an open core version of this check out firecrawl.dev


Very cool. We posted about a similar tool we built yesterday

https://www.firecrawl.dev/

It also crawls (although you can scrape single pages as well)


It crawls webpages (finds subdirectories), handles JS blocking with fallbacks to headless browsers, and does this all concurrently.

If only that script worked for every website. But, alas, it does not.


* Creator here - Thats the goal!


And you honor or ignore robots.txt?


It wasn't in our initial version (we didn't plan on launching today), but we are pushing an update to do so now.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: