Same same human problems. Regardless of their inherent intelligence...humans perform well only when given decent context and clear specifications/data. If you place a brilliant executive into a scenario without meaningful context.... an unfamiliar board meeting where they have no idea of the company’s history, prior strategic discussions, current issues, personel dynamics...expectations..etc etc, they will struggle just as a model does surly. They may still manage something reasonably insightful, leveraging general priors, common sense, and inferential reasoning... their performance will never match their potential had they been fully informed of all context and clearly data/objectives. I think context is the primary primitive property of intelligent systems in general?
A human will struggle, but they will recognize the things they need to know, and seek out people who may have the relevant information. If asked "how are things going" they will reliably be able to say "badly, I don't have anything I need".
That the person go and get themselves. If a model could to that we wouldn't need you to drive them. Basically every human is self going that way, you don't need to go and pick them up since they got stuck in a loop of unknowns at a grocery store etc.
I really like this analogy! Many real-world tasks that we'd like to use AI for seem infinitely more complex than can be captured in a simple question/prompt. The main challenge going forward, in my opinion, is how to let LLMs ask the right questions – query for the right information – given a task to perform. Tool use with MCPs might be a good start, though it still feels hacky to have to define custom tools for LLMs first, as opposed to how humans effectively browse and skim lots of documentation to find actually relevant bits.
> I think context is the primary primitive property of intelligent systems in general?
What do you mean by 'context' in this context? As written, I believe that I could knock down your claim by pointing out that there exist humans who would do catastrophically poorly at a task that other humans would excel at, even if both humans have been fully informed of all of the same context.
> I think wood is the primary primitive property of sawmills in general.
An obvious observation would be that it is dreadfully difficult to produce the expected product of a sawmill without tools to cut or sand or otherwise shape the wood into the desired shapes.
One might also notice that while a sawmill with no wood to work on will not produce any output, a sawmill with wood but without woodworking tools is vanishingly unlikely to produce any output... and any it does manage to produce is not going to be good enough for any real industrial purpose.
My perspective ("context as primary primitive") was about context as the foundational prerequisite of intelligent performance. I'm discussing a scenario with the minimum conditions for any intelligent action, whether small scale or large scale. At risk of talking past each other due to nuance methinks and I'm a bit lazy to think it through properly but... I think there is something in saw vs sawmill? Like a scale thing? Either way I wasn't trying to be profound or anything, I was just saying I think context abilities is likely the first prerequisite for any minimally intelligent thing (maybe I shouldn't have used the word system in my original comment).
This comparison may make sense on short-horizon tasks for which there is no possibility of preparation. Given some weeks to prepare, a good human executive will get the context, while today's best AI systems will completely fail to do so.
Today’s AI systems probably won’t excel, but they won’t completely fail either.
Basically give the LLM a computer to do all kinds of stuff against the real world, kick it off with a high level goal like “build a startup”.
The key is to instruct it to manage its own memory in its computer, and when context limit inevitably approaches, programmatically interrupt the LLM loop and instruct it to jot down everything it has for its future self.
It already kinda works today, and I believe AI systems a year from now will excel at this: