My working theory is that all models are approximately the same, and the variance in quality mostly depends on how long they think for.
So the trick is to always set to max, and then begin every task with “this is an extremely complex task, do not complete it without extensive deep thinking and research” or whatever.
You’re basically fighting a battle to make the model think more, against the defaults getting more and more nerfed to save costs.
My experience has been that this isn’t generally true, mainly because worse models pursue red herrings or get confused and stuck. a better model will get to the correct solution in fewer tokens, and my surface-level understanding of how RL works supports this.
It is not a tricky problem because it has a simple and obvious solution: do not filter or block usage just because the input includes a word like "gun".
I expected this to become less necessary over time as models got faster, but the opposite has happened. It feels like Claude has actually gotten slower (but in fairness does more per prompt), meaning worktrees are even more essential now.
It’s weirder than that. There is a surge of companies working on how to provide automated access to things like payments, email, signup flows, etc to *Claw.
> There is no equivalent of the network effects seen at everything from Windows to Google Search to iOS to Instagram, where market share was self-reinforcing and no amount of money and effort was enough for someone else to to break in or catch up.
The main direct network effect is that Google uses heuristic data from users to improve their search rankings. (e.g. which links they click, whether someone returns quickly to Google after clicking on a link, etc)
Other factors that favor Google at scale:
- Sites often allow only the biggest search engine crawlers and block every other bot to prevent scraping. This has been going on for more than a decade and is especially true now with AI crawlers going around.
- Google search earns more per search than competitors due to their more mature ad network that they can hire lots of engineers to work on to improve ad revenues. They can also simply serve more relevant ads since their ad network is bigger.
- Google can simply share costs (e.g. index maintenance) among many more users.
So the trick is to always set to max, and then begin every task with “this is an extremely complex task, do not complete it without extensive deep thinking and research” or whatever.
You’re basically fighting a battle to make the model think more, against the defaults getting more and more nerfed to save costs.
reply