More

flail · 2026-02-27T15:02:57 1772204577

Security is even a bigger issue than it looks at first glance. While security risk by omission was always a thing (AI or not), now we face a whole new level of risks, from prompt injection to creating malicious libraries to be used by coding agents: https://garymarcus.substack.com/p/llms-coding-agents-securit...

The most shallow security, however, seems easier. Now, you can get through an automated AI security audit every day for (basically) free. You don't have to hire specialists to run pen tests.

Which makes the whole thing even more challenging. Safe on the surface while vulnerable in the details creates the false sense of safety.

Yet, all these would be a concern only once a product is any successful. Once it is, hypothetically, the company behind should have money to fix the vulnerabilities (I know, "hypothetically"). The maintenance cost hits way earlier than that. It will kick in even for a pet personal project, which is isolated from the broader internet. So I treat it as an early filter, which will reduce the enthusiasm of wannabe founders.

pipejosh · 2026-02-27T15:50:51 1772207451

The automated audit only covers static analysis. When the agent actually runs, hitting MCP servers, making HTTP calls, getting responses back, that's where the real problems show up. Prompt injection through tool responses, malicious libraries that exfiltrate env vars, SSRF from agents that blindly follow redirects. Code audits miss all of it because this is a runtime and network problem, not a code quality problem.

Built Pipelock for this actually. It's a network proxy that sits between the agent and everything it talks to. Still early but the gap is real. https://github.com/luckyPipewrench/pipelock

flail · 2026-02-27T16:13:11 1772208791

Yes. And the more autonomously we create code, the more of these (and not only these) vulnerabilities we'll be adding. Combine that with the AI-automation in attacks, and you have an all-out security mess.

It's like a Petri dish for inventing new angles of security attacks.

Oh, and let's not forget that coding agents are non-deterministic. The same prompt will yield a different result each time. Especially for more complex tasks. So it's probably enough to wait till the vibe-coded product "slips." Ultimately, as a black hat hacker, I don't need all products to be vulnerable. I can work with those few that are.

pipejosh · 2026-02-27T16:18:58 1772209138

Agreed. The non-determinism makes traditional testing basically useless here. You can't write a test suite for "the agent decided to do something unexpected this time." Logging and runtime checks are the only way to catch the weird edge cases.

flail · 2026-02-19T17:02:21 1771520541

The question is not whether we like or want subscriptions, but rather whether we're used to them. And the answer is yes.

Given the choice, we'd be using Spotifys and Netflixes for free, and have ad-free Google. I don't expect that choice to be given to us.

AI tools won't change anything on that account. At best, we'll switch one subscription for another one, except that the latter will add a bill for the tokens we use.

flail · 2025-12-12T17:07:16 1765559236

There's a huge difference between nurses or teachers and Ivy League students. Namely, the former are not remotely as prestigious roles. I highly doubt there are 20 candidates for each nurse or teacher job.

Affirmative action happens when we discuss privileged positions. Spots at Ivy League colleges definitely are positions of privilege.

So if the situation under consideration were nursing, there wouldn't be such a discussion because there wouldn't be affirmative action in place.

flail · 2025-11-28T17:26:24 1764350784

> do Altman and Andreesen really believe that, or is it just a marketing and investment pitch?

As for Andreessen, I don't think he even cares. As the author writes:

"for the venture capitalists that have driven so much of field, scaling, even if it fails, has been a great run: it’s been a way to take their 2% management fee investing someone else’s money on plausible-ish sounding bets that were truly massive, which makes them rich no matter how things turn out"

VCs win every time. Even if it's a bubble and it bursts, they still win. In fact, they are the only party that wins.

Heck, the bigger the bubble, the more money is poured into it, and the bigger the commissions. So VCs have an interest in pumping it up.

flail · 2025-11-28T17:17:19 1764350239

> Have LLMs learned to say "I don't know" yet?

Can they, fundamentally, do that? That is, given the current technology.

Architecturally, they don't have a concept of "not knowing." They can say "I don't know," but it simply means that it was the most likely answer based on the training data.

A perfect example: an LLM citing chess rules and still making an illegal move: https://garymarcus.substack.com/p/generative-ais-crippling-a...

Heck, it can even say the move would have been illegal. And it would still make it.

pdimitar · 2025-11-29T18:13:44 1764440024

If the current technology does not allow them to sincerely say "I don't know, I am now checking it out" then they are not AGI, was my original point.

I am aware that the LLM companies are starting to integrate this quality -- and I strongly approve. But again, not being self-critical and as such lacking self-awareness is one of the qualities that I would ascribe to an AGI.

flail · 2025-11-28T17:07:29 1764349649

> We've got something that seems to be general and seems to be more intelligent than an average human.

We've got something that occasionally sounds as if it were more intelligent than an average human. However, if we stick to areas of interest of that average human, they'll beat the machine in reasoning, critical assessment, etc.

And in just about any area, an average human will beat the machine wherever a world model is required, i.e., a generalized understanding of how the world works.

It's not to criticize the usefulness of LLMs. Yet broad statements that an LLM is more intelligent than an average Joe are necessarily misleading.

I like how Simon Wardley assesses how good the most recent models are. He asks them to summarize an article or a book which he's deeply familiar with (his own or someone else's). It's like a test of trust. If he can't trust the summary of the stuff he knows, he can't trust the summary that's foreign to him either.

flail · 2025-11-28T16:56:07 1764348967

What's the lifecycle length of GPUs? 2-4 years? By the time OpenAIs and Anthropics pivot, many GPUs will be beyond their half-life. I doubt there would be many takers for that infrastructure.

Especially given the humungous scale of infrastructure that the current approach requires. Is there another line of technology that would require remotely as much?

Note, I'm not saying there can't be. It's just that I don't think there are obvious shots at that target.

flail · 2025-11-24T17:50:50 1764006650

> I stopped reading here, which is at the very start of the article (...) > (...) this article is low quality and honestly full of basic errors.

Just curious: How do you know it's full of errors, given that you stopped reading at the very start?

flail · 2025-11-24T17:48:20 1764006500

One more interesting aspect: the infrastructure doesn't age that well. We basically need to renew all that infrastructure every, like, 2-4 years or so? (And I think I'm being optimistic here.)

flail · 2025-11-24T17:44:27 1764006267

I don't think FB was an outlier. I can't be sure, but I don't think there were many (any?) companies that took more than 10 years to profitability pre-2015.

I think Twitter took 11 years, and it was 2017.

Uber is actually a good counterexample for more reasons than just how long it took to reach profitability. It also raised a lot of money $13B+ (compared to Facebook's ~$2B and Twitter's ~$3.5B), and ~$8B from IPO (that's another interesting fact; IPO when bleeding money).

However, it would rather make Uber an outlier, not vice versa. I guess Tesla and SpaceX fall into the "Uber" bucket, too (SpaceX would actually be profitable pre-2015, right?). How many others can you list?

So yes, we have extending timelines, but pouring money into a leaky bucket for 10 years is still predominantly a losing bet. For each that eventually made it you would have Foursquare, We Work, Better Place, Jawbone, Theranos (!), Fisker Automotive, etc.

And for each of those, you would have dozens that are even more forgotten because investors pulled the plug after just a few years (anyone remember fab.com perchance?). I would put Groupons of this world in the same bucket.

But even if we treated Uber and Tesla as the norm, OpenAI has already beaten them all in terms of how much funding it raised (and Anthropic is on its way there, too). Both with no signs of profitability round the corner and an absurd burn rate that can't be carried by any single customer group (and I already think about their geography as global).

That's why corporate results are so important, as they can afford to pay a premium. ChatGPT users will not.

So even among the wildest outliers, AI companies are extreme outliers.