Here is an example of a task that I do not believe this generation of LLMs can ever do but that is possible for an average human: designing a functional trivia app.
There, you don't need to invoke Turing or compiler bootstrapping. You just need one example of a use case where the accuracy of responses is mission critical
Have you personally verified that the answers are not hallucinations and that they are indeed factually true?
Oh, you just asked it to make a trivia app that feeds on JSON. Cute, but that's not what I meant. The web is full of tutorials for basic stuff like that.
To be clear I meant that LLMs can't write trivia questions and answers, thus proving that they can't produce trustworthy outputs.
And a trivia app is a toy (one might even say... a trivial example), but it's a useful demonstration of why you wouldn't put an LLM into a system on which lives depend on, let alone invest billions on it.
If you don't trust my words just go back to fiddling with your models and ask them to write a trivia quiz about a topic that you know very well. Like a TV show.
There, you don't need to invoke Turing or compiler bootstrapping. You just need one example of a use case where the accuracy of responses is mission critical