The gap that I see in current machine learning is that everyone is learning how to use the popular models, but no one knows how to construct a new model that solves a new problem. So everyone can download word vectors and use them for what they're good at, but the second you get off the beaten track, almost all machine learning practitioners fall flat. I really dont think this is due to how new the field is, rather that very few people have the mathematical maturity and experience to actually use optimization theory and linear algebra to construct new highly specific language models. There is simply no information available about doing this. You can learn about the underpinnings of matrix factorization and how that relates to word vectors, take that further and read about eigen vectors, but still, its too thin.
One needs to form an experimental design with a ability to detect the challenge, understand the properties and come up with an appropriate computational solution to that challenge. This isn’t just for Machine Learning but for any kind of algorithm you develop.
What mathematical maturity went into designing the most popular models? What mathematical maturity lead to using ReLU instead of tanh for activation functions, for example?
As far as I know, a lot (most?) advances in the field are just trying new ideas that happen to work. Is this correct?
Sure, there's lots of trial and error. But consider something like Universal Sentence Encoders (USE) versus facebooks InferSent. USE is superior, mostly because its new, but still under performs infersent in a few areas. This is the kind of thing where actually building a more specialized model for question answering or textual similarity could see huge performance boosts for companies, but nobody is doing it. If they are, its under lock and key. Anyone looking to perform these tasks is mostly just pulling the code, tweaking the data, messing with the heads of the networks, and then calling it a day.
EDIT:
copying and pasting this from another answer:
I think its way different than that, those would just be precursors, and in cases like real analysis, superfluous. Instead it would look something like, I have a Universal Sentence Encoder architecture, but its not performing well on my data, aside from tweaking the training set, how can I take this architecture and change it to work better with my individual problem? The number of people on the planet that can do this successfully, without wasting months of time messing around with tensorflow is extremely small. But this is where the value is. These massive catch all models only work for the people creating them, just jamming them into any NLP model will always produce sub par and probably unusable results
I think its way different than that, those would just be precursors, and in cases like real analysis, superfluous. Instead it would look something like, I have a Universal Sentence Encoder architecture, but its not performing well on my data, aside from tweaking the training set, how can I take this architecture and change it to work better with my individual problem? Assume here that the problem one is trying to solve is very similar to the models original use case. The number of people on the planet that can do this successfully, without wasting months of time messing around with tensorflow is extremely small. But this is where the value is. These massive catch all models only work for the people creating them, just jamming them into any NLP model will always produce sub par and probably unusable results
I believe Google's AutoML is attempting to answer these types of questions. It's obviously internal-only so others can't fork the research...but it has helped them invent new specific networks like "EfficientNet for EdgeTPU" [0].
I think humans can still invent new macro structures like CNN's...but humans are inherently shit at analyzing "what if we removed one neuron in the 2nd hidden layer?". The subtle tweaking is really best left to an automated recursion process.
Humans are better at seeing/inventing macro structures - such as adapting the unidirectional GPT to a bidirectional ELMO/BERT. After the invention, humans are generally pretty good at determining "whether" a network can be used to solve a particular task, although not infallible [1: Can BERT generate sentences from a prompt like GPT?]
But computers are once again often better at quickly determining whether which (ELMO, BERT, or GPT) perform better on a particular task for which they are all at least feasibly suited.
Sure, but auto ML has not in most cases panned out. The ML name for auto ml is neural architecture search, which is mostly useless these days. NAS has shown to not be any better than standard random search across a neural network architecture. I do not say this to disparage googles results, only that they came up with the networks they did by expending huge amounts of computational power.
I feel like there's about 1000 hours of high quality AI lectures available for free on the internet and while I do believe in a certain amount of selflessness in education, I am skeptical that any of that is providing more than a glimpse of what you need to know to be productive at it. In other words, there's a thousand hours of material out there, which probably takes 10000 hours to actually get into so you might as well put all that effort into properly studying it at a university, getting the knowledge not covered in the lectures alone and a degree to prove it in the end.
If you're just "curious" about AI, a really good half hour lecture should get you up to speed.
> I feel like there's about 1000 hours of high quality AI lectures available for free on the internet and while I do believe in a certain amount of selflessness in education, I am skeptical that any of that is providing more than a glimpse of what you need to know to be productive at it.
Why? Whether you go to the lectures in a university or watch them online, it makes no difference. I prefer and recommend textbooks over videos though.
> In other words, there's a thousand hours of material out there, which probably takes 10000 hours to actually get into so you might as well put all that effort into properly studying it at a university, getting the knowledge not covered in the lectures alone and a degree to prove it in the end.
What knowledge "not covered in lectures alone" or books do you need? Why do you think having a signal of pedigree somehow confers this knowledge onto people.
> If you're just "curious" about AI, a really good half hour lecture should get you up to speed.
You don't have to either be an expert or a complete layman. This gatekeeping is ridiculous. I've worked with many phds and most of them are not even close to as competent as folks who are naturally gifted and put in the work to pick up the topics.
Studying it at a university means moving and having a certain background. While i agree that these resources are hard to break down into a curriculum, theres nothing stopping you from copying a university curriculum at home and doing work on your own...
Yet most people don't. Same for learning an instrument, carpentering etc.. While you could in principle self-study lots of things, study groups, structure, people to talk to and discuss with and even just "we meet every Thursday at noon to ..." are not negligible.
ML is a very applied subject. There is entirely too little theory people need to know. In fact the most impressive vision/nlp architectures are indeed uninterpretable alchemy. It would make very little difference to study it at a university. Unless of course you 're going for probability theory.
Are you defining "need to know" as in the state of the art methods of today? Problem is the next stage after mature technology is commodity. We'll be getting daily spam from India offering to fulfill all our "A.I. design" needs just as with web design. If all you have to do is follow a blog post or video on how to use a prepackaged framework to get a job done, than everyone else can do it too.
These lectures are for university students as well. Anecdotally, MIT lectures were a vital part of my education while at university; they perfectly complemented our own. Some professors even endorsed them.
Notably, these are all winter courses, taught over the course of three weeks in January.
They don't have the formality and institutional support of a semester-long course. Often they are taught by students. MIT has a ton of people working in this area, and I'm not sure this particular group of people is representative.
So Lex Fridman curated this page and is the subject of many videos within it. Seems like Missy Cummings and Filip Piekniewski's assumption that MIT has scrubbed him from the site is unfounded.
Looks like overview to me. Guys like Francois Chollet aren't going to spend their time doing an in-depth tutorial, and even if he spent the whole 90-minute lecture walking you through the specific thing he invented (Keras), it probably wouldn't be enough time to get a naive programmer up to competency. Each week is a different lecturer.
A lot of MIT subdomains are hosted by students. A MIT subdomain getting enough traffic to bring it down isn't surprising in itself. It's that this one is hosted by the administration itself (seemingly) that makes it notable.