1. We would prefer to be funded with donations like Wikipedia.
2. I don't think we can avoid it completely, perhaps with volunteers helping us determine the trustworthiness of websites. Do you have any suggestions?
3. I think programmers and people with experience raising money for nonprofits could help the most right now. But if you see some other way you would want to contribute, please let us know!
Regarding the raise of money I wouldn't be surprised if given the current state of things in the EU you could manage to get some funding. I have no experience on it but there are companies specialized on helping with writing grant proposals.
1. Yes, any structured data could definitely help improve the results, I personally like the Wikidata dataset. It's just a matter of time and resources :)
2. The first step will probably be to handle this in our "post processing". We query several servers when doing a search and often get many more results than we need and in this step we could quite easily remove identical results.
3. The ranking is currently heavily based on links (same as Google) so we will have similar issues. But hopefully we will find some ways to better determine what sites are actually trustworthy, perhaps with more manually verified sites if enough people would want to contribute.
4. I think that Gigablast and Marginalia Search are really cool and interesting to see how much can be done with a very small team.
> Yes, any structured data could definitely help improve the results
Which syntaxes and vocabularies do you prefer? microformats, as well as schema.org vocabs represented Microdata or JSON-LD, seem to be the most common acc to the latest Web Data Commons Extraction Report[0]. The report is also powered by the Common Crawl.
At the moment we primarily need help with development and funding. But if you have suggestions or want to help in some other ways, please let us know!