Hacker Newsnew | past | comments | ask | show | jobs | submit | bmetz's commentslogin

My favorite thing was to identify bots and instead of blocking them, switch to a slightly scrambled data set to make the scrape useless but look good to the developer who stole it. It was a ton of fun as a side project. I'd also suggest you add some innocent fake data to your real site and then set up google alerts of all of the above to catch traffic. About 50% of sites would respond positively to an email when you showed them they were hosting fake data. About 90% would take my data down if that was followed up with a stronger C&D. One key is to catch them fast, while they're still a little nervous about showing off their stolen data online.


This is like the concept of a 'trap street': https://en.wikipedia.org/wiki/Trap_street


This is what we used to do. Then send a large zipfile with schreenshots and other data to the lawyers to handle the contact. Shortly after the scraping usually stopped. The contact and sell access wasnt an option because it was competitors taking the data.


I did some scraping for a lawyer back in like 01 from other lawyers. He got a c&d and told me to turn it off (we were done anyway).

Funny part was the lawyer on the other side wanted us to return all of the content on disk. Not show what we had copied but literally return it. My lawyer laughed about it. The other lawyer was smart/savvy enough to be effectively using the internet in 01 but didn't really understand the tech.

Other funny part is if he had generalized his site outside of law he would have had a major business these days.


> Funny part was the lawyer on the other side wanted us to return all of the content on disk.

I actually had a client that asked me to record a screenshot of me deleting their files from my computer. (and this was actually a developer making the request)


And this is why it's an excellent idea to always scrape behind proxies. Never scrape from your own IP, or one easily traced to you.


I'm gung-ho on solar but the article clearly says there is a low penalty for going over budget. They may just be guessing where prices will be by the time they have to pay for the panels.


And those "numerous integration points" are?


Mike from IG here. Some early wins are integrations with spam fighting systems, logging infrastructure, and FB's Hive infrastructure.


Still can't change my login email ID.


Tried to write him a note encouraging him to check us out. The comment box doesn't work. Designers, shrug. Seems like a pretty cool person!


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: