That may all be true, but it sort of sidesteps my point. My point is that people...

p1esk · on March 13, 2024

The difference between our opinions is I see 2024 as the time just before the knee of the exponential progress curve, and you see it as a long way to go to get to the knee. I realize how saying "this time it's different" might sound. But I do think this time it's different.

I remember when I read http://karpathy.github.io/2015/05/21/rnn-effectiveness back in 2015 I became convinced that these models are scalable. I remember thinking if only we could find a way to train a really big RNN/CNN hybrid on a lot of video data to try to predict the next video frame we would eventually force it to develop understanding the world. Predicting what happens in a video frame is a lot harder than predicting the next word (just ask Lecun), but it turned out that even just predicting the next word is extremely effective, and GPT4 feels like the first model that finally "understands" the world. To me, this was the hard part, developing this proof of concept that we can get there simply by scaling. Next step is video prediction, and we have a lot of room for further scaling to get there. There is a lot of video training data, and we can scale our models a lot more. The progress is mainly limited by available hardware processing power. There's no lack of good ideas to try to make things work.

In a way, 2024 feels like 2012, when deep learning took over ML world by storm. The same thing is happening now with multi-modal foundational models. GPT4 is like AlexNet - a culmination of many years of gradual improvements, a combination of unprecedented scale and various tricks. Think about every improvement starting with GPT1, which established state of the art in language modeling using a simple, universal, and scalable model architecture. GPT2 was able to generate a high quality one page long text. It's funny, it does not even sound that impressive now, but at the time it was absolutely mind blowing. GPT3 demonstrated incredible generalization capabilities, and significantly raised the quality and reliability of generated output. GPT4 took it to another level, achieving human-level reasoning capabilities. Every single one of these breakthroughs took me by surprise, and I do deep learning research for a living. I have absolutely no reasons to believe we have reached a saturation point in quality of these models. So what's next? Where do we go from already near human capabilities of GPT4?

What do you expect from GPT-5? In what ways do you think it will be better than GPT4, and what will be its main limitations? Which aspects of software engineering do you think it will excel at, and which aspects we will still need humans for? Would these challenging aspects still not be solved in GPT6, assuming another significant improvement in quality over GPT5? I will not be surprised if GPT6 will be designed by GPT5, with some help from humans. How does your timeline of AI progress look like?