The LLM algorithms seem pretty clunky to me, like a hack designed for text translation that surprised everyone by getting quite smart. The reason the human brain is so much more energy efficient is quite likely better design/algorithms. I was watching some video comparing brain and LLM function and almost tempted to try building something myself (https://youtu.be/3SUqBUGlyh8). I'm sure there are many more competent people looking at similar things.
Everyone says "those autoregressive transformer LLMs are obviously flawed", and then fails to come up with anything that outperforms them.
I'm not too bullish on architectural gains. There are efficiencies to be had, but far closer to "+5% a year" than "+5000% in a single breakthrough".
You can try to build a novel AI architecture, at a small scale. Just be ready. This field will kick your teeth in. ML doesn't like grand ideas and isn't kind to high aspirations.
Physics is obviously incomplete and yet nobody can solve quantum gravity. Being obviously flawed doesn't mean the solution is obvious. That's the whole problem.
I think in this case, people tend to underrate just how capable and flexible the basic LLM architecture is. And, also, underrate how many gains are there in better training vs better architecture.
Most people are not ML researchers. Most of the AI industry is not AI researchers. Most of the AI spending is not going to AI researchers.
AI researchers came up with an architectural improvement that made a lot of previously impossible stuff barely possible. Then, industry ran with it. Scaling that particular trick to the limits by throwing as much raw compute and data at it as humanly possible.
You don't need to be an AI expert to know that there are probably more advances to be had and that funding foundational research is the way to get them.
The LLM algorithms seem pretty clunky to me, like a hack designed for text translation that surprised everyone by getting quite smart. The reason the human brain is so much more energy efficient is quite likely better design/algorithms. I was watching some video comparing brain and LLM function and almost tempted to try building something myself (https://youtu.be/3SUqBUGlyh8). I'm sure there are many more competent people looking at similar things.