You are not wrong, but after having started working with LLMs, I have this feeling that many humans are simply autocomplete engines too. So LLMs might be actually close to AGI, if you define "general" as "more than 50% of the population".
Humans are absolutely auto-complete engines, and regularly produce incorrect statements and actions with full confidence in it being precisely correct.
Just think about how many thousands of times you've heard "good morning" after noon both with and without the subsequent "or I guess I should say good afternoon" auto-correct.
In Queensland, Australia we have solar powered e-paper displays [1][2] at some bus stops that are very similar to this (much bigger than a kindle screen, though).
Mercury v1 focused on autocomplete and next-edit prediction. Mercury 2 extends that into reasoning and agent-style workflows, and we have editor integrations available (docs linked from the blog). I’d encourage folks to try the models!
You are right edited my post (twice actually). Missed the chat first time around (though its hard to see it as a reasoning model when chain of thought is hidden, or not obvious. I guess this is the new normal), and also missed the reasoning table because text is pretty small on mobile and I thought its another speed benchmark.
I tried their chat demo again, and if you set reasoning effort to "High", you sometimes see the chain of thought before the answer (click the "Thought for n seconds" text to expand it).
That being said, the chain is pretty basic. It's possible that they don't disclose the full follow-up prompt list.
> Suddenly, smart AI-enabled juniors can easily match the productivity of traditional (or conscientious) seniors, so why hire seniors at all?
I guess we'll see, but so far the flattening curve of LLM capabilities suggest otherwise. They are still very effective with simpler tasks, but they can't crack the hardest problems like a senior developer does.
That's a great opportunity for a controlled study! You should do it. If you can send me the draft publication after doing the study, I can give feedback on it.
I don't think there is a need for a new study as Cognitive Reflection Tests are a well-researched subject [1]. I am actually surprised that I got downvoted, as I thought this would be common knowledge.
The version numbers are mostly irrelevant as afaik price per token doesn't change between versions.
reply