> Even the most stupid people can usually ask questions and correct their answers. LLMs are incapable of that. They can regurgitate data and spew a lot of generated bullshit, some of which is correct. Doesn't make them intelligent.
The way the current interface for most models works can result in this kind of output, the quality of the output - not even in the latests models - doesn't necessarily reflects the fidelity of the world model inside the LLM nor the level of insight it can have about a given topic ("what is the etymology of the word cat").
The current usual approach is "one shot", you've got one shot at the prompt, then return your output, no seconds thoughts allowed, no recursion at all. I think this could be a trade-off to get the cheapest most feasible good answer, mostly because the models get to output reasonably good answers most of the time. But then you get a percentage of hallucinations and made up stuff.
That kind of output could be - in a lab - fully absent actually. Did you you notice that the prompt interfaces never gives and empty or half-empty answer? "I don't know", "I don't know for sure", "I kinda know, but it's probably a bit shaky answer", or "I could answer this, but I'd need to google some additional data before", etc.
There's another one, almost never, you get to be asked back by the model, but the models can actually chat with you about complex topics related to your prompt. It's obvious when you're chatting with some chatbot, but not that obvious when you're asking it for a given answer for a complex topic.
In a lab, with recursion enabled, the models could get the true answers probably most of the time, including the fabulous "I don't know". And they could get the chance to ask back as an allowed answer, asking for additional human input, relaying on a live RHLF right there (it's quite technically feasible to achieve, not economically sound if you have a public prompt GUI facing the whole planet inputs).
but it wouldn't make much economic sense to make public a prompt interface like that.
I think it could also have a really heavy impact in the public opinion if they get to see a model that never makes a mistake, because it can answer "I don't know" or can ask you back to get some extra details about your prompt, so there you have another reason to do not make prompts that way.
> The current usual approach is "one shot", you've got one shot at the prompt, then return your output, no seconds thoughts allowed, no recursion at all.
We've had the models for a while and still no one has shown this mythical lab where this regurgitation machine reasons about things and makes no mistakes.
Moreover, since it already has so much knowledge stored, why does it still hallucinate even in specific cases where the answer is known, such as the case I linked?
>We've had the models for a while and still no one has shown this mythical lab where this regurgitation machine reasons about things and makes no mistakes.
It would be a good experiment to interact with the unfiltered, not-yet-RHLFed interfaces provided to the initial trainers (nigerian folks/gals?).
Or maybe the - lightly filtered - interfaces used privately in demos for CEOs.
So the claim that LLMs are intelligent is predicated on the belief that there are labs running unfiltered output and that there are some secret demos only CEOs see.
The way the current interface for most models works can result in this kind of output, the quality of the output - not even in the latests models - doesn't necessarily reflects the fidelity of the world model inside the LLM nor the level of insight it can have about a given topic ("what is the etymology of the word cat").
The current usual approach is "one shot", you've got one shot at the prompt, then return your output, no seconds thoughts allowed, no recursion at all. I think this could be a trade-off to get the cheapest most feasible good answer, mostly because the models get to output reasonably good answers most of the time. But then you get a percentage of hallucinations and made up stuff.
That kind of output could be - in a lab - fully absent actually. Did you you notice that the prompt interfaces never gives and empty or half-empty answer? "I don't know", "I don't know for sure", "I kinda know, but it's probably a bit shaky answer", or "I could answer this, but I'd need to google some additional data before", etc.
There's another one, almost never, you get to be asked back by the model, but the models can actually chat with you about complex topics related to your prompt. It's obvious when you're chatting with some chatbot, but not that obvious when you're asking it for a given answer for a complex topic.
In a lab, with recursion enabled, the models could get the true answers probably most of the time, including the fabulous "I don't know". And they could get the chance to ask back as an allowed answer, asking for additional human input, relaying on a live RHLF right there (it's quite technically feasible to achieve, not economically sound if you have a public prompt GUI facing the whole planet inputs).
but it wouldn't make much economic sense to make public a prompt interface like that.
I think it could also have a really heavy impact in the public opinion if they get to see a model that never makes a mistake, because it can answer "I don't know" or can ask you back to get some extra details about your prompt, so there you have another reason to do not make prompts that way.