I guess I should be wiser than contradicting LeCun on a public forum, but his math doesn't really work out. It only works if there is a unique correct answer to any question, in which case e=1/dict_size which is clearly false - even humans can just put "Let me think..." in front of anything else and get logically "the same" answer; after how many such filler words do you deem it "wrong"? Even taking colloquial speech aside, in Math, there is apparently a book that collects 367 different proofs of the Pythagorean Theorem; that tree of correct answers is certainly quite complicated. You can't approximate it with a fixed probability and take the exponential - the number of possibly correct next tokens varies very strongly depending on the previous string.
LLMs and autoregression are very good at avoiding the 99.999[...]% of the strings that are simple gibberish. If you were to generate a string made of 20 random tokens from GPT2 tokenizer, you would get something like:
and of course any half decent language model does much better than that. If the "paths to truth" were as unlikely as LeCun puts it there would be no hope.
A non-autoregressive model would certainly be "better" because it would be faster, which is where language models started from (BERT & co.), it just doesn't seem to work as well... similarly to how a human sometimes needs to write something down and only realizes the correct answer to a complicated question on the go.
If anything, we'd need to allow LLMs to realize mistakes and correct themselves out of them, i.e. making the generation non-linear. If you ask GPT4 something complicated (like math) it's not rare at all that it logically contradicts itself in their answers. I would be surprised if, somewhere deep in the model, it doesn't "realize" this, but it can't fix it, so it falls back to what humans do at an exam or interview that started badly: try to bullshit their way out of the thing, sweeping the inconsistency under the carpet, unless you explicitly point it to them (and often even after that, both GPT4 and humans).
P.S. Mathematician rant: who on Earth calls a probability "e"??
LLMs and autoregression are very good at avoiding the 99.999[...]% of the strings that are simple gibberish. If you were to generate a string made of 20 random tokens from GPT2 tokenizer, you would get something like:
"automakersGrand carries liberties Occupations ongoingOULDessing heartbeat Pillar intrigued Trotskymediatelyearable founding examinations lavAg redesign folds"
and of course any half decent language model does much better than that. If the "paths to truth" were as unlikely as LeCun puts it there would be no hope.
A non-autoregressive model would certainly be "better" because it would be faster, which is where language models started from (BERT & co.), it just doesn't seem to work as well... similarly to how a human sometimes needs to write something down and only realizes the correct answer to a complicated question on the go.
If anything, we'd need to allow LLMs to realize mistakes and correct themselves out of them, i.e. making the generation non-linear. If you ask GPT4 something complicated (like math) it's not rare at all that it logically contradicts itself in their answers. I would be surprised if, somewhere deep in the model, it doesn't "realize" this, but it can't fix it, so it falls back to what humans do at an exam or interview that started badly: try to bullshit their way out of the thing, sweeping the inconsistency under the carpet, unless you explicitly point it to them (and often even after that, both GPT4 and humans).
P.S. Mathematician rant: who on Earth calls a probability "e"??