The biggest problem with machine learning occurs when people subscribe to the be...

megaframe · on Feb 8, 2012

Agree: I had a dataset for work no one had yet been able to use in categorizing two effects (one category was 98% of all the data). The values looked too "Gaussian normal" with everything mixed up. It couldn't be separated out, but a combination of SVM and in dept knowledge of the source of the data and I was able to find a generalized model that could accurately categorize parts 80%+ of the time for the small set, without misclassifying the other 98%. All other methodologies had failed up to that point and a blind approach with linear regression or SVMs resulted in at best 70% accuracy on all categories... not very good or implementable in a production setting (that means in the bulk of cases the 98% I was only correct 70% of the time).

cageface · on Feb 8, 2012

I can certainly see a role for somebody that understands the tradeoffs of each of these algorithms and that understands how to properly select and prepare dataasets. But I wonder how many people will really need to be able to actually implement these algorithms.

tel · on Feb 8, 2012

They're brutally simple to implement much of the time. The difficulty comes in two places:

    1. Derivation of slight variations on the basic principles 
    2. Scaling.

Both are very difficult.

ramblerman · on Feb 8, 2012

That was also the whole point of the stanford ML course. TO teach exactly that skillset.

Sure we did some basic implementations in octave, it helps to have some idea of the internals. But that wasn't the goal of the course.

Drbble · on Feb 9, 2012

Well, no. ml-class was a series of demos. Plugging in a formula is 1-line developing an algorithm. The original cs229 is more like what you describe.

Drbble · on Feb 9, 2012

Well, no. ml-class was a series of demos. The data was all curated in advance and the models pre-selected appropriately.

tel · on Feb 8, 2012

I think the regress you're talking about is super important---black box AI only goes so far---but I also think there's great benefit to just applying the first layer of broken, incorrectly paired ML to a new field.

My prediction is that even the most black box ML, creatively applied, is and will be an incredible skill. Increasing levels of sophistication will continually kill off the current practices of black box ML, but the willingness to apply statistical pattern recognition to new and interesting areas can't help but be incredible.