As someone studying this intensely, it's quite the opposite. Basic ML can (and h...

law · on Feb 8, 2012

I respectfully disagree. Tools like Weka, nltk, etc. are okay for exploratory data analysis, but it's risky to use them for problems that scale, problems that differ from the norm, or homegrown solutions for data that does not yet exist. Because a large portion of HN users are interested in bringing their ideas to life, I'd suspect that the latter particularly resonates with them.

The problem facing people who intend to work with data that does not yet exist becomes one of feature selection: what data matters and how do we use it? For NLP tasks, does stemming matter? What about part-of-speech tagging? Some classification problems are not linearly separable, which makes certain kernel methods impossible without using (and knowing to use) the kernel trick.

In the end, I think my reply here is tautological: ML is too complex to be transformed into a set of APIs a la Google Maps and Google Search.

microtonal · on Feb 8, 2012

The problem facing people who intend to work with data that does not yet exist becomes one of feature selection: what data matters and how do we use it? For NLP tasks, does stemming matter? What about part-of-speech tagging?

Indeed, I worked on machine learning in NLP (fluency ranking, parse disambiguation). As a general rule, roughly 90% of the improvement of models is in clever feature engineering and exploiting the underlying system to get more interesting information that improves classification, 10% you get from using more clever machine learning techniques than, say a standard maxent learner with a gaussian prior (for linearly separable data).

For instance, the last relatively large boosts of the accuracy of the parser developed by our research group came from feature engineering:

http://www.let.rug.nl/vannoord/papers/iwpt2007.pdf

tel · on Feb 8, 2012

I think we just disagree on what "basic" ML means. I think a lot of real problems have solutions which involve very simple applications of poorly tuned ML algorithms.

Engineering even a basic ML solution is challenging---feature engineering especially.

law · on Feb 8, 2012

Sorry, my mistake; I misread the comment to which I was replying as a response to my earlier comment. I completely agree with everything you've said.

ntoshev · on Feb 8, 2012

Actually, Google Prediction API is very simple and it covers supervised learning (regression and classification) already. I can imagine very simple extensions (of the API itself, the algorithms would be completely different) to cover a lot of the unsupervised and semi-supervised ground as well.

The algorithms are not disclosed, but the docs hint that they are properly regularized so throwing more features at them is always good.

You still need to be able to reformulate the problems so that they fit a standard ML setting and then know how to tune things, but it looks like the API can get you pretty far.