Has anyone else noticed Stack Overflow clones in Google search results? They come up frequently for me. I can't help but wonder who's behind these. It can't be hurting Stack Overflow's SEO.
So far I have saved 5 different domains, and it looks like 2 have vanished.
- [dead] http://www.codeitive.com/0izVUjjXVP/selective-foreign-key-usage-in-django-maybe-with-limitchoicesto-argument.html
- [dead] http://www.codedisqus.com/0QmqWVgjgg/hide-label-in-django-admin-fieldset-readonly-field.html
- http://w3facility.org/question/image-servingurl-and-google-storage-blobkey-not-working-on-development-server/
- http://goobbe.com/questions/3109325/how-can-i-disable-a-third-party-api-when-executing-django-unit-tests
- http://www.ciiycode.com/0HyN6eQxgjXP/django-admin-inline-popups
I actually wrote Stack Overflow support about this in April 2015, but so far nothing has changed. Here's the thread:
Me: "Hello,
There a lots of spam results on Google. As a web developer, I am frequently googling for how to resolve some programming issue. Often, I get these spoofing sites that link to stack overflow.
I would suggest adding a stricter robots.txt or perhaps blocking some of these bots that are scraping your site.
Here is an example.
http://www.ciiycode.com/0HyN6eQxgjXP/django-admin-inline-popups
Thank you."
Response: "Hello,
Thank you for reporting this content. I've passed the information along to the person at our company who handles such issues. It's the diligence of users like you that helps us stay valuable!
Please note, bringing these sites into compliance (or getting them to no longer serve our content) is often a long and arduous process. You may not see immediate results. However, rest assured that we're working on it.
Thank you again,
Stack Exchange Team"
Thoughts?
Google could check when it saw something, but that won't work against fast scrapers. For that, you need trusted timestamps.
One solution to this would be to have a few time-stamping services. You send in a string, probably a hash, and it adds a timestamp, signs it, and sends back a signed result. Then provide a WordPress plug-in to use this service, hashing and time-stamping each blog entry, and putting the result in the HTML in some standard way. (Perhaps <span signed-provenance-timestamp-hash="xxxxx"> blog entry </span>). A few mutually mistrustful services for that would help; blogs with serious forgery problems could use multiple time-stamping services.
Search engines then need to look at timestamps as a rating indicator. If two results are very similar, the earliest one wins.