Food compare: Compare the characteristics of two food images.
Response rating: Evaluate the natural-ness of a bot response.
Audio donation: Record your voice to improve speech technology.
Food facts: Tell us if a food dish has particular characteristics.
Food labeller: Tell us what food an image contains.
Semantic similarity: Judge whether two phrases have the same meaning.
Chart understanding: Judge whether charts are understandable and trustworthy.
Glide type: Glide your fingers on the keyboard to type the text that you see.
Audio validation: Listen to a short audio clip and determine if the pronunciation sounds natural in your language.
Image label verification: Tell us if images are tagged correctly.
Image capture: Collect and share photos of your part of the world.
Translation: Translate phrases and words into different languages.
Translation validation: Select which phrases are translated correctly.
Handwriting recognition: Look at handwriting and type the text that you see.
Sentiment evaluation: Decide if a sentence in your language is positive, negative or neutral.
Smart camera (Android Lollipop 5.0+ required): Point at an object and see if the camera can guess what it is.
Even in those 3 datasets, Google does not disclose the proportion/percentage of the crowdsourced contributions that are released publicly. I would not contribute to Crowdsource with the expectation that my contributions would help build a freely licensed dataset.
I was expecting to be pessimistic, and I am, because I thought that datasets that actually matter probably won't be released and there is no guarantee that this trend will continue. Please don't trust the giant.
Yeah, most of the 'Google Research Datasets' github account has super boring datasets. There's no way they'll help out with the actually interesting datasets (this is just PR).
If you're not gonna release the data, then don't do the project in the first place (especially not under the implication that it'll be shared with everyone). Saying in hindsight "oh we can't release all the data due to sensitivity issues" is just weaseling your way to keeping the most valuable data to yourselves as if the stated issues couldn't have been predicted.
I think that approach is akin to the honor system and based on my experiences on the internet I fear that it won't scale well. For some types of images, just because the uploader is ok with the file being shared doesn't mean it's a good idea to redistribute it. For a bland example, think of a photo where the uploader doesn't have copyright. I'm sure you can imagine what would happen if someone on the seedier parts of the internet says "hey, if you upload your images to this website, Google will host it for free forever!"
One negative, I guess, would be uncovering the moderation algorithm so a malicious user could circumvent it.
Another negative would be release of Bad Words or illegal content submitted by malicious users. Depends on the task.
But the actual raw data would be of more use to researchers than one cleaned from an output of algorithms. Perhaps there could be a program for educational researchers?
Btw what do those 'data cards' actually do? Can you get sued for going against it? Does it conflict with the permissive license or does that take precedence?
https://research.google/tools/datasets/open-images-extended-...
https://github.com/google-research-datasets/hiertext