Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Do you really want raw, unmoderated user generated content?


Yes! They can provide both filtered & unfiltered data.


I imagine there would be some serious legal concerns with releasing the raw data (I work for Google, but no special insights into this project)


If you're not gonna release the data, then don't do the project in the first place (especially not under the implication that it'll be shared with everyone). Saying in hindsight "oh we can't release all the data due to sensitivity issues" is just weaseling your way to keeping the most valuable data to yourselves as if the stated issues couldn't have been predicted.


Why? Users volunteer the data. Just ask them if they're ok with it being public.


I think that approach is akin to the honor system and based on my experiences on the internet I fear that it won't scale well. For some types of images, just because the uploader is ok with the file being shared doesn't mean it's a good idea to redistribute it. For a bland example, think of a photo where the uploader doesn't have copyright. I'm sure you can imagine what would happen if someone on the seedier parts of the internet says "hey, if you upload your images to this website, Google will host it for free forever!"


One negative, I guess, would be uncovering the moderation algorithm so a malicious user could circumvent it.

Another negative would be release of Bad Words or illegal content submitted by malicious users. Depends on the task.

But the actual raw data would be of more use to researchers than one cleaned from an output of algorithms. Perhaps there could be a program for educational researchers?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: