Hacker Newsnew | past | comments | ask | show | jobs | submit | noodletheworld's commentslogin

Is it just me, or do skills seem enormously similar to MCP?

…including, apparently, the clueless enthusiasm for people to “share” skills.

MCP is also perfectly fine when you run your own MCP locally. It’s bad when you install some arbitrary MCP from some random person. It fails when you have too many installed.

Same for skills.

It’s only a matter of time (maybe it already exists?) until someone makes a “package manager” for skills that has all of the stupid of MCP.


I don’t feel they’re similar at all and I don’t get why people compare them.

MCP is giving the agents a bunch of functions/tools it can use to interact with some other piece of infrastructure or technology through abstraction. More like a toolbox full of screwdrivers and hammers for different purposes, or a high-level API interface that a program can use.

Skills are more similar to a stack of manuals/books in a library that teach an agent how to do something, without polluting the main context. For example a guide how to use `git` on the CLI: The agent can read the manual when it needs to use `git`, but it doesn’t need to have the knowledge how to use `git` in it’s brain when it’s not relevant.


> MCP is giving the agents a bunch of functions/tools

A directory of skills... same thing

You can use MCP the same way as skills with a different interface. There are no rules on what goes into them.

They both need descriptions and instruction around them, they both have to be is presented and index/instn to the agent dynamically, so we can tell them what they have access to without polluting the context.

See the Anthropic post on moving MCP servers to a search function. Once you have enough skills, you are going to require the same optimization.

I separate things in a different way

1. What things do I force into context (agents.md, "tools" index, files) 2. What things can the agent discorver (MCP, skills, search)


It is conceptually different. Skill was created over the context rot problem. You will pull the right skill from the deck after having a challenge and figuring out the best skill just by reading the title and description.

That's the point. It was supposed to be a simpler, more efficient way of doing the same things as MCP but agents turned out not to like them as much.

It's mostly just static/dynamic content behind descriptive names.

> Is it just me, or do skills seem enormously similar to MCP?

Ok I'm glad I'm not the only one who wondered this. This seems like simplified MCP; so why not just have it be part of an MCP server?


For one thing, it’s a text file and not a server. That makes it simpler.

Sure, but in an MCP server the endpoints provide a description of how to use the resource. I guess a text file is nice too but it seems like a stepping stone to what will eventually be necessary.

Vibe Engineering. Automatic Programming. “We need to get beyond the arguments of slop vs sophistication..."

Everyone seems to want to invent a new word for 'programming with AI' because 'vibe coding' seems to have come to equate to 'being rubbish and writing AI slop'.

...buuuut, it doesn't really matter what you call it does it?

If the result is slop, no amount of branding is going to make it not slop.

People are not stupid. When I say "I vibe coded this shit" I do not mean, "I used good engineering practices to...". I mean... I was lazy and slapped out some stupid thing that sort of worked.

/shrug

When AI assisted programming is generally good enough not to be called slop, we will simply call it 'programming'.

Until then, it's slop.

There is programming, and there is vibe coding. People know what they mean.

We don't need new words.


That's kind of Salvatore's point though; programming without some kind of AI contribution will become rare over time, like people writing assembly by hand is rare now. So the distinction becomes meaningless.

There is no perfect black or perfect white, so the distinction is meaningless, everything is gray.

...but it didn't develop ways of doing that did it?

Any idiot can have cursor run for 2 weeks and produce a pile of crap that doesn't compile.

You know the brilliant insight they came out with?

> A surprising amount of the system's behavior comes down to how we prompt the agents. Getting them to coordinate well, avoid pathological behaviors, and maintain focus over long periods required extensive experimentation. The harness and models matter, but the prompts matter more.

i.e. It's kind of hard and we didn't really come up with a better solution than 'make sure you write good prompts'.

Wellll, geeeeeeeee! Thanks for that insight guys!

Come on. This was complete BS. Planners and workers. Cool. Details? Any details? Annnnnnnyyyyy way to replicate it? What sort of prompts did you use? How did you solve the pathalogical behaviours?

Nope. The vagueness in this post... it's not an experiment. It's just fund raising hype.


IMHO, this whole thing could be read with "human" instread of "agent" and would make the exact same amount of sense.

"We put 200 human in a room and gave them instructions how to build a browser. They coded for hours, resolving merge conflicts and producing code that did not build in the end without intervention of seniors []. We think, giving them better instructions leads to better results"

So they actually invented humans? And will it come down to either "managing humans" or "managing agents"? One of both will be more reliable, more predictable and more convenient to work with. And my guess is, it is not an agent...

As it seemed in the git log, something is weird.


> Can I run a local LLM that allows me to control Home Assistant with natural language? Some basic stuff like timers, to do/shopping lists etc would be nice etc.

No. Get the larger PI recommended by the article.

Quote from the article:

> So power holds it back, but the 8 gigs of RAM holds back the LLM use case (vs just running on the Pi's CPU) the most. The Pi 5 can be bought in up to a 16 GB configuration. That's as much as you get in decent consumer graphics cards1.

> Because of that, many quantized medium-size models target 10-12 GB of RAM usage (leaving space for context, which eats up another 2+ GB of RAM).

> 8 GB of RAM is useful, but it's not quite enough to give this HAT an advantage over just paying for the bigger 16GB Pi with more RAM, which will be more flexible and run models faster.

The model specs shown for this device in the article are small, and not fit for purpose even for the relatively trivial use case you mentioned.

I mean, look, lots of people have lots of opinions about this (many of them wrong); it’s cheap, you can buy one and try… but, look. The OP really gave it a shot, and results were kind of shit. The article is pretty clear.

Don’t bother.

You want a device with more memory to mess around with for what you want to do.


It is neat, and at 32GB it might be useful.

Almost nothing useful runs in 8.

This is the problem with this gen of “external AI boards” floating around. 8, 16, even 24 is not really enough to run much useful, and even then (ie. offloading to disk) they're so impractically slow.

Forget running a serious foundation model, or any kind of realtime thing.

The blunt reality is fast high memory GPU systems you actually need to self host are really really expensive.

These devices are more optics and dreams (“itd be great if…”) than practical hacker toys.


Vision models are way smaller and run decently on these things.


> I don't think you can just catch up in a few weeks, and I do think that the risk of falling behind isn't being taken seriously enough by much of the developer population.

This is nonsense.

This field moves so fast the things you did more than a year ago aren't relevant anymore.

Claude code came out last year.

Anyone using random shit from before that is not using it any more. It is completely obsolete in all but a handful of cases.

To make matters worse “intuition” about models is wasted learning, because they change, significantly, often.

Stop spreading FUD.

You can be significantly less harmful to people who are trying to learn by sharing what you actually do instead of nebulously hand waving about magical BS.

Dear readers: ignore this irritating post.

Go and watch Armin Ronacher on youtube if you want to see what a real developer doing this looks like, and why its hard.


You're accusing me of spreading harmful advice here, when you're the one telling people that they don't need to worry about not investing in their skills because "This field moves so fast the things you did more than a year ago aren't relevant anymore."

One of us is right here. I hope for your sake and the people that listen to you that it's you. I don't think it is.


Simon, you're literally fear mongering.

You're making wild claims, and absolutely failing to back them up with evidence.

That is FUD.

People should invest, they should try things. …but its far faaaaaar less clear cut that dropping everything and focusing on AI right now is so absolutely important.

The difference between what you get prompting and a totally naïve user of claude code gets is marginal.

People are not being left behind if they try it a bit and find its ok, not great, and come back later.

It is not a deep topic.

writing AI tools is a deep topic, but most people arent doing that.

Youre in the wrong here.

Stop making people scared.

I quote antirez here, since you clearly arent interested in listening to me:

> I have a single suggestion for you, my friend. Whatever you believe about what the Right Thing should be, you can't control it by refusing what is happening right now.

> Skipping AI is not going to help you or your career. Think about it. Test these new tools, with care, with weeks of work, not in a five minutes test where you can just reinforce your own beliefs.

> Find a way to multiply yourself, and if it does not work for you, try again every few months.


I'm currently sounding the alarm, because I sincerely believe that the the "it doesn't work, and even if it did you can catch up easily" message is no longer credible as of November 2025.

If I didn't sincerely believe that I wouldn't say it.

It true that you will always be able to catch up eventually - the industry has newbies entering it all the time and I believe they will continue to make it to point where they can contribute effectively.

But if I'm right and it does take 6-12 months for most developers to get proficient there's a real career risk involved now in listening to people who say it's all hype and no substance and you should keep on sitting it out.


It might scale.

So far, Im not convinced, but lets take a look at fundmentally whats happening and why humans > agents > LLMs.

At its heart, programming is a constraint satisfaction problem.

The more constraints (requirements, syntax, standards, etc) you have, the harder it is to solve them all simultaneously.

New projects with few contributors have fewer constraints.

The process of “any change” is therefore simpler.

Now, undeniably

1) agents have improved the ability to solve constraints by iterating; eg. Generate, test, modify, etc. over raw LLm output.

2) There is an upper bound (context size, model capability) to solve simultaneous constraints.

3) Most people have a better ability to do this than agents (including claude code using opus 4.5).

So, if youre seeing good results from agents, you probably have a smaller set of constraints than other people.

Similarly, if youre getting bad results, you can probably improve them by relaxing some of the constraints (consistent ui, number of contributors, requirements, standards, security requirements, split code into well defined packages).

This will make both agents and humans more productive.

The open question is: will models continue to improve enough to approach or exceed human level ability in this?

Are humans willing to relax the constraints enough for it to be plausible?

I would say currently people clambering about the end of human developers are cluelessly deceived by the “appearance of complexity” which does not match the “reality of constraints” in larger applications.

Opus 4.5 cannot do the work of a human on code bases Ive worked on. Hell, talented humans struggle to work on some of them.

…but that doesnt mean it doesnt work.

Just that, right now, the constraint set it can solve is not large enough to be useful in those situations.

…and increasingly we see low quality software where people care only about speed of delivery; again, lowering the bar in terms of requirements.

So… you know. Watch this space. Im not counting on having a dev job in 10 years. If I do, it might be making a pile of barely working garbage.

…but I have one now, and anyone who thinks that this year people will be largely replaced by AI is probably poorly informed and has misunderstood the capabilities on these models.

Theres only so low you can go in terms of quality.


No one said anything about lying.

Their job is to make the company successful. Part of success is raising funds and boosting share price.

That is their job, and how do you imagine they can do that?

Sound kind of glum and down about the company prospects?

Do not make be laugh.

Even if the company is literally haemorrhaging cash and has < week of runway left, senior executives are often so far up their own basses and surrounded by yes men, that they often honestly believe they can turn things around.

Its often not about will-fully lying.

Its just delusional belief and faith in something that is very unlikely.

(Last minute turn arounds and DSA do exist, but like lottery players, seeing the very few people who do win and mimicking them does not make you into a winner; most of the time)


> What's the metric?

Language model capability at generating text output.

The model progress this year has been a lot of:

- “We added multimodal”

- “We added a lot of non AI tooling” (ie agents)

- “We put more compute into inference” (ie thinking mode)

So yes, there is still rapid progress, but these ^ make it clear, at least to me, that next gen models are significantly harder to build.

Simultaneously we see a distinct narrowing between players (openai, deepseek, mistral, google, anthropic) in their offerings.

Thats usually a signal that the rate of progress is slowing.

Remind me what was so great about gpt 5? How about gpt4 from from gpt 3?

Do you even remember the releases? Yeah. I dont. I had to look it up.

Just another model with more or less the same capabilities.

“Mixed reception”

That is not what exponential progress looks like, by any measure.

The progress this year has been in the tooling around the models, smaller faster models with similar capabilities. Multimodal add ons that no one asked for, because its easier to add image and audio processing than improve text handling.

That may still be on a path to AGI, but it not an exponential path to it.


> Language model capability at generating text output.

That's not a metric, that's a vague non-operationalized concept, that could be operationalized into an infinite number of different metrics. And an improvement that was linear in one of those possible metrics would be exponential in another one (well, actually, one that is was linear in one would also be linear in an infinite number of others, as well as being exponential in an infinite number of others.

That’s why you have to define an actual metric, not simply describe a vague concept of a kind of capacity of interest, before you can meaningfully discuss whether improvement is exponential. Because the answer is necessarily entirely dependent on the specific construction of the metric.


I don’t think the path was ever exponential but your claim here is almost as if the slow down hit an asymptote like wall.

Most of the improvements are intangible. Can we truly say how much more reliable the models are? We barely have quantitative measurements on this so it’s all vibes and feels. We don’t even have a baseline metric for what AGI is and we invalidated the Turing test also based on vibes and feels.

So my argument is that part of the slow down is in itself an hallucination because the improvement is not actually measurable or definable outside of vibes.


I kind of agree in principle but there are a multitude of clever benchmarks that try to measure lots of different aspects like robustness, knowledge, understanding, hallucinations, tool use effectiveness, coding performance, multimodal reasoning and generation, etc etc etc. all of these have lots of limitations but they all paint a pretty compelling picture that compliments the “vibes” which are also important.


> Language model capability at generating text output.

How would you put this on a graph?


> Language model capability at generating text output.

That's not a quantifiable sentence. Unless you put it in numbers, anyone can argue exponential/not.

> next gen models are significantly harder to build.

That's not how we judge capability progress though.

> Remind me what was so great about gpt 5? How about gpt4 from from gpt 3?

> Do you even remember the releases?

At gpt 3 level we could generate some reasonable code blocks / tiny features. (An example shown around at the time was "explain what this function does" for a "fib(n)") At gpt 4, we could build features and tiny apps. At gpt 5, you can often one-shot build whole apps from a vague description. The difference between them is massive for coding capabilities. Sorry, but if you can't remember that massive change... why are you making claims about the progress in capabilities?

> Multimodal add ons that no one asked for

Not only does multimodal input training improve the model overall, it's useful for (for example) feeding back screenshots during development.


Exactly, gpt5 was unimpressive not because of its leap from GPT4 but because of expectations based on the string of releases since GPT4 (especially the reasoning models). The leap from 4->5 was actually massive.


Next gen models are always hard to build, they are by definition pushing the frontier. Every generation of CPU was hard to build but we still had Moores law.

> Simultaneously we see a distinct narrowing between players (openai, deepseek, mistral, google, anthropic) in their offerings. Thats usually a signal that the rate of progress is slowing.

I agree with you on the fact in the first part but not the second part…why would convergence of performance indicate anything about the absolute performance improvements of frontier models?

> Remind me what was so great about gpt 5? How about gpt4 from from gpt 3? Do you even remember the releases? Yeah. I dont. I had to look it up.

3 -> 4 -> 5 were extraordinary leaps…not sure how one would be able to say anything else

> Just another model with more or less the same capabilities.

5 is absolutely not a model with more or less the same capabilities as gpt 4, what could you mean by this?

> “Mixed reception”

A mixed reception is an indication of model performance against a backdrop of market expectations, not against gpt 4…

> That is not what exponential progress looks like, by any measure.

Sure it is…exponential is a constant % improvement per year. We’re absolutely in that regime by a lot of measures

> The progress this year has been in the tooling around the models, smaller faster

Effective tool use is not somehow some trivial add on it is a core capability for which we are on an exponential progress curve.

> models with similar capabilities. Multimodal add ons that no one asked for, because its easier to add image and audio processing than improve text handling.

This is definitely a personal feeling of yours, multimodal models are not something no one asked for…they are absolutely essential. Text data is essential and data curation is non trivial and continually improving, we are also hitting the ceiling of internet text data. But yet we use an incredible amount of synthetic data for RL and this continues to grow……you guessed it, exponentially. and multimodal data is incredibly information rich. Adding multi modality lifts all boats and provides core capabilities necessary for open world reasoning and even better text data (e.g. understanding charts and image context for text).


> exponential is a constant % improvement per year

I suppose of you pick a low enough exponent then the exp graph is flat for a long time and you're right, zero progress is “exponential” if you cherry pick your growth rate to be low enough.

Generally though, people understand “exponential growth” as “getting better/bigger faster and faster in an obvious way

> 3 -> 4 -> 5 were extraordinary leaps…not sure how one would be able to say anything else

They objectively were not.

The metrics and reception to them was very clear and overwhelming.

Youre spitting some meaningless revisionist BS here.

Youre wrong.

Thats all there is to it.


Doesn’t sound like you really seem to be interested in any sort of rational dialogue, metrics were “objectively” not better? What are you talking about of course they were have you even looked at benchmark progression for every benchmark we have?

You don’t understand what an exponential is or apparently what the benchmark numbers even are or possibly even how we actually measure model performance and the very real challenges and nuances involved but yet I’m “spitting some revisionist BS”. You have cited zero sources and are calling measured numbers “revisionist”.

You are also citing reception to models as some sort of indication of their performance, which is yet another confusing part of your reasoning.

I do agree that “metrics were were very clear” it just seems you don’t happen to understand what they are or what they mean.


I know it seems like forever ago, but claude code only came out in 2025.

Its very difficult to argue the point that claude code:

1) was a paradigm shift in terms of functionality, despite, to be fair, at best, incremental improvements in the underlying models.

2) The results are an order of magnitude, I estimate, better in terms of output.

I think its very fair to distill “AI progress 2025” to: you can get better results (up to a point; better than raw output anyway; scaling to multiple agents has not worked) without better models with clever tools and loops. (…and video/image slop infests everything :p).


Did more software ship in 2025 than in 2024? I'm still looking for some actual indication of output here. I get that people feel more productive but the actual metrics don't seem to agree.


I'm still waiting for the Linux drivers to be written because of all the 20x improvements that AI hypers are touting. I would even settle for Apple M3 and M4 computers to be supported by Asahi.


I am not making any argument about productivity about using AI vs. not using AI.

My point is purely that, compared to 2024, the quality of the code produced by LLM inference agent systems is better.

To say that 2025 was a nothing burger is objectively incorrect.

Will it scale? Is it good enough to use professionally? Is this like self driving cars where the best they ever get is stuck with an odd shaped traffic cone? Is it actually more productive?

Who knows?

Im just saying… LLM coding in 2024 sucked. 2025 was a big year.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: