Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> The crawler is hybrid, using async python requests and puppeteer with uBlock Origin. The way detection works is we count the number of uBO blocked requests on the page, and if too many (threshold is set to 5), we kick it out, leaving only "clean" pages in the index.

Fascinating; cnn.com reports 47 on the front page, npr.org is at 16, developer.hashicorp.com is at 9. I don't think that metric is doing what they think it is, or rather maybe they're trying to target only savanna.gnu.org style sites or something



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: