Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'd be surprised to see a mass-scraping bot behind a NAT gateway. They're probably using public lambdas where they can't even control the egress IPs (unless something has changed in the last 6 months since I last looked) and sending results to a queue or bucket somewhere.

What I'd do is block the AWS AP range at the edge (unless there's something else there that needs access to your site) - you can get regularly updated JSON formatted lists around the internet, or have something match its fingerprint to send it heaps of garbage, like the zip-bombs others have suggested. It could be a recursive "you're abusing my site - go away" or what-have-you. You could also do some-kind of grey-listing, where you limit the speed to a crawl so that each connection just consumes crawler resources and gets little content. If they are tracking this, they'll see the performance issues and maybe adjust.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: