Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Disguise your scrapes as a browser, so include an Explorer or Firefox browser ID string. Randomize the times between scrapes, so it looks more like a human being doing it. Make sure the scraper takes "coffee breaks" every now and then. Run the service from several servers at once, if you have them. I would guess your program is fairly low overhead (mine always have been) so contact friends and ask to use their server or home PCs. Extra credit for designing a cloud-like infrastructure where PCs could come-and-go without missing any data :)


Might be easier just to use Tor than to use multiple servers. I think the only thing that will matter is # of requests per IP address per time period. It won't be a human looking at the logs, it'll be an automated process, so it's unlikely that randomizing the time between scrapes will matter.


botnet = free cloud computing :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: