More

esperent · 2026-02-10T12:35:12 1770726912

Right, but that's a short term moat. If they pause on their incredible levels of spending for even 6 months, someone else will take over having spent only a tiny fraction of what they did. They might get taken over anyway.

raincole · 2026-02-10T13:40:42 1770730842

> someone else will take over having spent only a tiny fraction of what they did

How. By magic? You fell for 'Deepseek V3 is as good as SOTA'?

Gud · 2026-02-10T14:26:44 1770733604

By reverse engineering, sheer stupidity from the competition, corporate espionage, ‘stealing’ engineers and sometimes a stroke of genius, the same as it’s always been

esperent · 2026-02-10T06:46:06 1770705966

For the ui comparisons, making the shadcn/materialui elements darker/low contrast is highly dishonest.

Compare like with like, not a badly colored and low contrast version of the competition against yours.

Garrett_Mack · 2026-02-10T07:46:54 1770709614

But my versions are the lower contrast ones.

edit: wait are you on light mode or dark mode? I work in light mode where mine are lower contrast but i swapped to dark and now it's reversed.

esperent · 2026-02-10T09:01:09 1770714069

I'm on dark mode. Maybe this is my error. But still, just compare them side by side with the exact same colors for a fair comparison.

esperent · 2026-02-08T03:36:50 1770521810

So it's basically useless then. Even with Claude Max I have to manage my usage when doing TDD, and using ccusage tool I've seen that I'd frequently hit $200 per day if I was on the API. At 6x cost you'll burn through $50 in about 20 minutes. I wish that was hyperbole.

andersa · 2026-02-08T06:34:46 1770532486

I tried casually using it for two hours and it burned $100 at the current 50% discounted rate, so your guess is pretty accurate...

copperx · 2026-02-08T07:26:00 1770535560

I still don't get why Claude is so expensive.

airspresso · 2026-02-08T10:35:38 1770546938

Because we all prefer it over Gemini and Codex. Anthropic knows that and needs to get as much out of it as possible while they can. Not saying the others will catch up soon. But at some point other models will be as capable as Opus and Sonnet are now, and then it's easier to let price guide the choice of provider.

esperent · 2026-02-07T08:33:12 1770453192

Most MCPs I've seen could be:

1. A cli script or small collection of scripts

2. A very short markdown file explaining how it works and when to use it.

3. Optionally, some other reference markdown files

Context use is tiny, nearly everything is loaded on demand.

And as I'm writing this, I realize it's exactly what skills are.

Can anyone give an example of something that this wouldn't work for, and which would require MCP instead?

anon7000 · 2026-02-07T10:19:06 1770459546

But this is entirely besides the point. The point of MCP is bundling those exact things into a standardized plugin that’s easy for people to share with others.

MCP is useful because I can add one in a single click for an external service (say, my CI provider). And it gives the provider some control over how the agent accesses resources (for example, more efficient/compressed, agent-oriented log retrieval vs the full log dump a human wants). And it can set up the auth token when you install it.

So yeah, the agent could write some those queries manually (might need me to point it to the docs), and I could write helpers… or I could just one-click install the plugin and be done with it.

I don’t get why people get worked up over MCP, it’s just a (perhaps temporary) tool to help us get more context into agents in a more standard way than everyone writing a million different markdown files and helper scripts.

bravura · 2026-02-07T10:27:14 1770460034

"The point of MCP is bundling those exact things into a standardized plugin that’s easy for people to share with others." Like... a CLI/API?

"MCP is useful because I can add one in a single click for an external service" Like... a CLI/API? [edit: sorry, not click, single 'uv' or 'brew' command]

"So yeah, the agent could write some those queries manually" Or, you could have a high-level CLI/API instead of a raw one?

"I don’t get why people get worked up over MCP" Because we tried them and got burned?

"to help us get more context into agents in a more standard way than everyone writing a million different markdown files and helper scripts." Agreed it's slightly annoying to add 'make sure to use this CLI/API for this purpose' in AGENTS.md but really not much. It's not a million markdown files tho. I think you're missing some existing pattern here.

Again, I fail to see how most MCPs are not lazy tools that could be well-scoped discoverable safe-to-use CLI/APIs.

0x696C6961 · 2026-02-07T10:44:24 1770461064

That's literally what they are. It's a dead simple self describing JSONRPC API that you can understand if you spend 5 seconds looking at it. I don't get why people get so worked up over it as if it's some big over-engineered spec.

I can run an MPC on my local machine and connect it to an LLM FE in a browser.

I can use the GitHub MCP without installing anything on my machine at all.

I can run agents as root in a VM and give them access to things via an MCP running outside of the VM without giving them access to secrets.

It's an objectively better solution than just giving it CLIs.

philipp-gayret · 2026-02-07T12:03:05 1770465785

All true except that CLI tools are composable and don't pollute your context when run via a script. The missing link for MCP would be a CLI utility to invoke it.

0x696C6961 · 2026-02-07T13:25:54 1770470754

How does the agent know what clis/tools it has available? If there's an `mcpcli --help` that dumps the tool calls, we've just moved the problem.

The composition argument is compelling though. Instead of clis though, what if the agent could write code where the tools are made available as functions?

   tools.get_foo(tools.get_bar())

philipp-gayret · 2026-02-07T15:32:46 1770478366

> what if the agent could write code where the tools are made available as functions?

Exactly, that would be of great help.

> If there's an `mcpcli --help` that dumps the tool calls, we've just moved the problem.

I see I worded my comment completely wrong... My bad. Indeed MCP tool definitions should probably be in context. What I dislike about MCP is that the IO immediately goes into context for the AI Agents I've seen.

Example: Very early on when Cursor just received beta MCP support I tried a Google Maps MCP from somewhere on the net; asked Cursor "Find me boxing gyms in Amsterdam". The MCP call then dumped a HATEOAS-annotated massive JSON causing Cursor to run out of context immediately. If it had been a CLI tool instead, Cursor could have wrapped it in say a `jq` to keep the context clean(er).

jmalicki · 2026-02-07T18:31:46 1770489106

I mean what was keeping Cursor from running jq there? It's just a matter of being integrated poorly - which is largely why there was a rethink of "we just made this harder on ourselves, let's accomplish this with skills instead"

ra · 2026-02-07T10:15:29 1770459329

I'm with you because we get to specify our context more precisely.

jmalicki · 2026-02-07T18:28:55 1770488935

I mean, one could argue skills are sort of MCP 2.0 fixing some of the mistakes.

The big pluses for MCPs are when:

1. They live remotely and update themselves 2. You install the skill and the scripts it uses together locally, so it can be more convenient packaging

MCPs aren't really all that complicated inherently, a lot of mistakes around them happened because they came early.

esperent · 2026-02-07T07:53:35 1770450815

If you're using Claude try the hookify plugin and ask if to block commits unless the rules pass.

esperent · 2026-02-07T06:52:24 1770447144

Except for dependency cruiser which I hadn't heard of, this is almost exactly what I've built up over the past few weeks.

For the pre-commit hook, I assume you run it on just the files changed?

> Custom script to ensure shared/util directories are not over stuffed (built this using dependency-cruiser as a library rather than an exec)

Would you share this?

esperent · 2026-02-07T02:47:51 1770432471

> everybody only uses 20% of a given program's features, but the problem is that everyone is using a different 20%

This is a phrase that gets repeated and it sounds clever. But it's completely at odds with statistics, specifically the normal distribution.

We should say, people use 80-90% the same features, and then there's a tail of less common features that only some people use but are very important to them.

This is why plugin systems for apps are so important. You can build an app that supports the 80% with a tightly designed set of core features, and if someone needs to go outside of those they can use/build a plugin.

esperent · 2026-02-06T05:09:46 1770354586

The problem is if you're using subagents, the only way to interject is often to press escape multiple times which kills all the running subagents. All I wanted to do was add a minor steering guideline.

This might be better with the new teams feature.

Skwrm · 2026-02-06T15:11:31 1770390691

They actually made a change a few weeks ago that made subagents more steerable

When they ask approval for a tool call, press down til the selector is on "No" and press tab, then you can add any extra instructions

cruffle_duffle · 2026-02-06T05:32:31 1770355951

That is so annoying too because it basically throws away all the work the subagent did.

Another thing that annoys me is the subagents never output durable findings unless you explicitly tell their parent to prompt the subagent to “write their output to a file for later reuse” (or something like that anyway)

I have no idea how but there needs to be ways to backtrack on context while somehow also maintaining the “future context”…

esperent · 2026-02-06T05:07:57 1770354477

The key is a well defined task with strong guardrails. You can add these to your agents file over time or you can probably just find someone's online to copy the basics from. Any time you find it doing something you didn't expect or don't like, add guardrails to prevent that in future. Claude hooks are also useful here, along with the hookify plugin to create them for you based on the current conversation.

vorticalbox · 2026-02-06T07:54:07 1770364447

I have started using openspec for this. I find it works far better to have a proposal and a list of tasks the ai stays more focused.

https://openspec.dev/

esperent · 2026-02-05T00:44:00 1770252240

> If the models don't get to the point where they can correct fixes on their own

Depending on what you're working on, they are already at that point. I'm not into any kind of AI maximalist "I don't read code" BS (I read a lot of code), but I've been building a fairly expensive web app to manage my business using Astro + React and I have yet to find any bug or usability issue that Claude Code can't fix much faster than I would have (+). I've been able to build out, in a month, a fully TDD app that would have conservatively taken me a year by myself.

(+) Except for making the UI beautiful. It's crap at that.

The key that made it click is exactly what the person describes here: using specs that describe the key architecture and use cases of each section. So I have docs/specs with files like layout.md (overall site shell info), ui-components.md, auth.md, database.md, data.md, and lots more for each section of functionality in the app. If I'm doing work that touches ui, I reference layout and ui-components so that the agent doesn't invent a custom button component. If I'm doing database work, reference database.md so that it knows we're using drizzle + libsql, etc.

This extends up to higher level components where the spec also briefly explains the actual goal.

Then each feature building session follows a pattern: brainstorm and create design doc + initial spec (updates or new files) -> write a technical plan clearly following TDD, designed for batches of parallel subagents to work on -> have Claude implement the technical plan -> manual testing (often, I'll identify problems and request changes here) -> automated testing (much stricter linting, knip etc. than I would use for myself) -> finally, update the spec docs again based on the actual work that was done.

My role is less about writing code and more about providing strict guardrails. The spec docs are an important part of that.