Just make sure you never let claude make do any calculation without using a programming language. It can be somewhat trusted to make less mistakes than a human when it comes to filling in the right numbers but it's still not great at math.
Also did create a cli. I had to fork the project and removed one unused import which allowed me to remove a lot of unused ml libraries: https://github.com/Mic92/puss-say
So usually MCP tool calls a sequential and therefore waste a lot of tokens. There is some research from Antrophic (I think there was also some blog post from cloudflare) on how code sandboxes are actually a more efficient interface for llm agents because they are really good at writing code and combining multiple "calls" into one piece of code. Another data point is that code is more deterministic and reliable so you reduce the hallucination of llms.
What do the calls being sequential have to do with tokens? Do you just mean that the LLM has to think everytime they get a response (as opposed to being able to compose them)?
LLMs can use CLI interfaces to compose multiple tool calls, filter the outputs etc. instead of polluting their own context with a full response they know they won't care about. Command line access ends up being cleaner than the usual MCP-and-tool-calls workflow. It's not just Anthropic, the Moltbot folks found this to be the case too.
That makes sense! The only flaw here imo is that sometimes that thinking is useful. Sub-agents for tool calls imo make a nice sort of middle ground where they can both be flexible and save context. Maybe we need some tool call composing feature, a la io_uring :)
Do expose the accessibility tree of a website to llms? What do you do with websites that lack that? Some agents I saw use screenshots, but that seems also kind of wasteful. Something in-between would be interesting.
I actually do use cross-platform accessibility shenanigans, but for websites this is rarely as good as just doing like two passes on the DOM, it even figures out hard stuff like Google search (where ids/classes are mangled).
I am sure they measured the difference but i am wondering why reading screenshots + coordinates is more efficient than selecting aria labels? https://github.com/Mic92/mics-skills/blob/main/skills/browse.... the JavaScript snippets should at least more reusable if you want semi-automate websites with memory files
Well this compiler was written to build Linux as a proof of concept. You don't need a libc for building the kernel. Was it claimed anywhere that it is a fully compliant C compiler?
I skipped over the first few ones and haven't seen critical ones. The hardcoded oauth client secrets is basically present in any open-source or commercial app that is distributed to end users. It doesn't break the security of end users. It mainly allows other apps to impersonate this app, i.e. present itself as clawdbot, which is a moot point given anyone can just change /inject code into it.
I remember that a llm agents often store those in clear text files (I think claude-code beeing one of them). Many of the CLIs I use have a better secret hygiene than that i.e. allow passwords commands or use secret apis.
reply