As part of our continued collaboration with Anthropic, we had the opportunity to apply an early version of Claude Mythos Preview to Firefox. This week’s release of Firefox 150 includes fixes for 271 vulnerabilities identified during this initial evaluation.
As these capabilities reach the hands of more defenders, many other teams are now experiencing the same vertigo we did when the findings first came into focus. For a hardened target, just one such bug would have been red-alert in 2025, and so many at once makes you stop to wonder whether it’s even possible to keep up.
There are three things happening simultaneously: 1st a new model, codenamed "Mythos", 2nd a lightweight harness built for finding vulnerabilities, and 3rd a push by Anthropic to collaborate with various Open Source projects and companies to use 1 and 2 to find vulnerabilities
We know that the combination of all three results in finding lots of security vulnerabilities. That's what Mozilla is talking about. The quote from the curl story states that just 2 and 3, but with just regular SotA models, would have produced very similar results
Which is really the crux of all this hype around Mythos: would the results really be different if they used Claude Opus instead of Claude Mythos? How much is the model, how much the harness, and how much is just because Anthropic is running a big campaign systematically trying to find vulnerabilities?
Not to discredit anything that was said in any particular blog post.
Folks also need to remember that a lot of blog posts are written by engineers or managers that have their own agendas and careers and often external blog posts can be a form of self marketing or idea marketing that an engineer or director has been pushing internally.
I have no idea if this happened in mozilla's case but the person that wrote it seemed to talk about the their own internal harness / fuzz testing framework quite a bit, and I imagine it was probably a big part of that person's scope / accomplishments and will probably show up at their end of year review and on their resume.
Also, the people at Mozilla who helped achieve a highly visible collaboration with the hottest AI company in the zeitgeist that included a lot of expensive data center time to harden their flagship product are definitely going to be happy/excited/proud about pulling it off successfully.
There's a lot of kneejerk "so you're accusing Mozilla of a conspiracy to boost Anthropic?" which is an overly simplistic lens. Particularly when it involves groups of individual humans with different motivations and emotional investment in their own contributions to the collaboration.
Okay so supposing everybody is acting in a benign manner, following their incentives and passions, not meaning to mislead anybody. Do you think that this results in writing a misleading blog post? Because the blog post makes Mythos out to be a big friggin deal. (It had certainly convinced me).
It is difficult to compare these two accounts since Daniel Stenberg didn't get access to Mythos himself, and we have no information about how it was run compared to the other AI models that have been used on curl. It is possible that Mythos is not much better than these other models, but it is also possible that the curl team simply made better use of the other models.
Part of what made Mythos so effective for Mozilla was the integrated agentic workflow where it not only looked for bugs, but then created an exploit to demonstrate them, and ran that exploit while dynamic analysis was enabled verifying that invalid memory access occurred. In this case it hard to know how much of their success was because they put more effort into the harness compared to previous tools (we know they did), or if Mythos was more suitable for this sort of workflow to begin with.
Not many apple-to-apple comparisons to be made with Mythos at this point.
> then created an exploit to demonstrate them, and ran that exploit while dynamic analysis was enabled verifying that invalid memory access occurred
Four years ago that would have sounded like science fiction. Right now, I think that even Gemini Flash might be able to do that, given a couple of attempts.
I'll wear the dunce cap: how are you so certain this is co-marketing? I'm not saying you are wrong, but it doesn't seem obviously like marketing copy to me (which is of course what they'd want but that's nevertheless not in any way evidence one way or the other).
It starts with the words "As part of our continued collaboration with Anthropic"
Once these words are used you can assume there is a contract stating how that collaboration works, and that this includes some sentences about how much each side is allowed to or required to say about it
So you claim that Mozilla entered into a contract with Anthropic, and said contract requires Mozilla to advertise for Anthropic on their blog. I hope Mozilla is getting a good payday out of this.
I think it's more the cost to find a vulnerability that has significantly reduced, not the possibility that the vulnerability could have been found. But that cost mattered tremendously because someone has to fund the effort to find the bugs. This economics also applies to attackers.
Read this as: "we get discounts, rate limit increases, a direct line to responsible product managers; in exchange we participate in friendly marketing." It's extremely common in this line of business - typical of database vendors, software tool companies, etc.
In many countries it is mandatory to mark any form of compensated advertising as such. If your claim is true they might be breaking some laws here & there…
GrapheneOS users (and actually just citizen who care) in the EU should complain to the DMA team [1]. As with everything: the more people complain, the higher priority it gets.
I recommend every EU citizen to do this. Don't send a pre-canned message or an LLM-generated message. Write your own story and how Google (and Apple) are destroying competition and freedom for you as an EU citizen.
Even if you are a GMS Android user, they are going to make installing apps outside the Play Store much more annoying and these attestation-backed verifications are going to further deanonymize you.
Compilers are a layer of abstraction that we can ask another human about. Some human is there taking care of it. Until we get to the point where we trust AI with our survival it would be good to be able to audit the entire stack.
I agree that the problem is volume, even more so than correctness.
All that LLMs and other generative models have done is enable an order of magnitude more stuff to be created cheaply. This then puts the onus and cost on the consumer of that output, hence why everyone is exhausted after a day of work that just involves looking over output. This volume of output will cause people to stop looking at all of the output and just trust the randomly generated code, and in time the quality will suffer.
I'm just saying that I already see that people are outsourcing all the thinking to the models - not only code generation and reviews, but even design - the part that "senior engineers" without imagination think only they are capable of doing.
It's worrying how much trust is being put in those systems. And my worry is not about the job anymore, but our future in general.
I think those of us who have years of experience under our belt our safe. If we're older the knowledge is ingrained and atrophy of this knowledge will be limited based on the fact that it's already "imprinted" onto our brains.
Our futures are safe in this sense, in fact it's even beneficial as we may be the last generation to have these skills. Humanities future on the other hand is another open question.
It's a bit of a weird place to be in as a senior engineer who has spent 2 decades perfecting his craft.
So, on one hand, I'm also kinda sad and how quickly we've thrown the guardrails away, but on the other -- it's... Well. It's just work.
Turns out, no one ever really cared how elegant or robust our code was and how clever we were to think up some design or other, or that we had an eye on the future; just that it worked well enough to enable X business process / sale / whatever.
And now we're basically commoditised, even if the quality isn't great, more people can solve these problems. So, being honest, I think a lot of my pushback is just a kinda internal rebellion against admitting that actually, we're not all that special after all.
I'm just glad I got to spend 20 years doing my hobby professionally, got paid really well for it, and often times was forced to solve complicated problems no one else could -- that kept me from boredom.
I think the shift we are seeing now, as 'previously' knowledge workers is that work becomes a lot more like manual labour than what we've really been doing up until now. When there's no 'I don't know' anymore, then you're not really doing knowledge work, right?
I guess I'll just ride the wave, spew out LLM crap at work, and save the craft for some personal projects, I'll certainly have the capacity now work is a no-op.
Yeah, but the thing is, it's not "just work". Software now has really big impact on the world and actual lives.
In a corporate world, we are typically detached from real world consequences and looking at people around me, people really don't think about such things - but I do. And I really care, because "relaxed" standards might result in errors that amount to stuff like identity thefts, or stolen money, shit like this, even on the smallest scale.
Obviously we can't prevent everything, but it seems like we, as industry, decided to collectively YOLO and stop giving shit at all. And personally I don't like that it is me who is losing sleep over this, while people who happily delegate all their thinking over to LLMs sleep better than ever now.
Yeah that's a tough spot to be in; I think though, your responsibility really ends with you at work, unless you're very high up on the management chain.
Keep it simple right; in everything you do, make things a bit better than you found them. It's enough. You're never going to win the fight to get everyone (or maybe even ANYONE depending how messed up your org is) to care; so why lose sleep on things you can't change?
At least, that's what I started doing some years ago by now having lost lots of those fights, and I'm sleeping fine again.
You could say the same thing about compiled code, actually it's worse because anything a compiler spits out is very hard to understand even for those who understand assembly.
You don't need to look at the entire program at the assembly level to figure out parts that you want to optimise or prove for correctness. You do need to look at all the code the LLM generates in order to understand it.
You can learn to understand the patterns that compilers spit out and there are many tools out there to aid in that understanding. You can't learn to understand what an LLM spits out because by design it is non-deterministic and will vary in form and function for each pull of the lever.
You can learn to understand how high level concepts in code map down to assembly language and how compilers transform constructs in one language to another. You can't know that about LLMs because they generate non-deterministic output based on processing of huge low-precision tables.
I dunno, I'd rather proofread (or better yet just test) LLM-generated code than have to reason about assembly. You can't just look at part of the assembly to prove that the rest is right, especially if it's hand-written, or maybe just -O3. But anyway compilers are not what come to mind when someone mentions LLM coding.
This is an illusion: reasoning about typical NPM-based project with hundreds of dependencies that you will NEVER dig into is not at all easier, it's just that most people completely give up on this and base their "reasoning" not on the facts, but on the made-up stories about what do those things supposedly do.
Have you tried to shift through a whole lot of vibe coded slop? It’s really mentally draining to see all of the really bad techniques they fall back on just to brute force a solution.
It's not just about being friendly. If they have a bubble around them of employees, true believers, and people just afraid of speaking out that chills free expression of criticism, the truth has trouble getting out, which hurts trust.
Maybe true, but but the flip side is that sometimes what is called an attack is actually criticism. That's how it appears to a lot of us from the outside.
GrapheneOS wants to post more positive things, rather than just defensive replies. But they have very little choice in the matter. If the inhumane levels of attacks werent happening, they would have more time to discuss future features, how they choose to approach features, etc. But ignoring the attacks only make it worse. The suggestions to ignore it, even if genuine, arent helpful.
It may be the case that Daniel and the project are so under siege that they need to take a hostile attitude toward some of the people they interact with as a matter of self preservation. They may have no other option. But taking this posture while also being fair to all of the people around them (i.e. some people who aren't actually attacking them) may be difficult or even impossible. I can see this behavior in myself sometimes. I just don't have the energy to be fair. "F U".
I wouldn't want to see friendly corporate slop either. I appreciate how down to nuts and bolts the communiques are on Mastodon and how deadly serious they take everything. That part of the communication style makes me trust them more.
I think a good step in the right direction might be acknowledging that being defensive necessarily leads to erring on the side of assuming bad faith rather than good, which leads to some mis-judgements. So far you said that GrapheneOS is open to all criticisms, which (though I haven't followed the space very recently so my memory on specifics is hazy) just does not seem to match my interpretation. I think that if we were having this conversation on Twitter or Mastodon, Daniel would have blocked me by now (if he hadn't already blocked me years ago).
People can accidentally be spreading attacks with loaded/presumptuous statements even when their intentions are pure. Unfortunately, pure intentions can still cause harm that needs to be countered.
Take your reply as an example, the GrapheneOS accounts are managed by multiple people, so the fixation on one specific project member may not even be accurate to the discussion. Having ones character attacked is immensely harmful on its own, but being attacked for something one may not even be doing is also immensely harmful.
The unfortunate reality is that people tend to believe the first thing they read, and without something countering it, will roll with it, intentionally or otherwise. So countering misinfo efficiently and quickly is vital.
It has modern features. It stores message history. It has a fairly unique feature of letting you create ad-hoc "topics" (that go under a "Channel") that make it easier to manage the flood of conversation.
The NSA can't break GPG assuming everything is working properly. This blog post (which to be fair I only skimmed) explains that GPG is a mess which could lead to things not working properly, and also gives real life examples. You may also want to see https://gpg.fail (you can tell they're from the ivory tower by the cat ears). The blog post also mentions bad UX, which you and I can directly appreciate (if anything I might expect ivory tower types to dismiss UX issues).
I am well familiar with that presentation at CCC. Yes, the presentation are by people who live in the low stakes world of theoreticals as you can tell by the cat ears.
Maybe the site is overloaded. But as for the "brb, were on it!!!!" - this page had the live stream of the talk when it was happening. Hopefully they'll replace it with the recording when media.ccc.de posts it, which should be within a couple hours.
reply