IMO it is valuable because it suggests the primary value was in the harness and not the LLM.
That's not too surprising for those of us who have been working with these things, either. All kinds of simpler use cases are manageable with harnesses but not reliably by LLMs on their own.
What if Mythos didn't need the narrowing harness? That's still the burning question that has yet to be answered. Anthropic suggested very strongly that Mythos did not need it.
What if you could boil a pot of water with an f16 jet engine?
The harness discussion is relevant because it might be possible to achieve the same results at 1/20th the cost. IFF it is the case, these trillion dollar companies have less value than what is understood at the moment.
It's a lot easier to research harness optimizations without having to raise a billion dollars.
I'm personally very interested to know the answer. There are a lot of resources being expended (and a lot of big bets being placed) on running and training these frontier models.
I don't think it matters. Even if it didn't need it, all that implies is that it better handles a larger context window. A larger context window is not necessary to solve the problem.
We're being told that Mythos is such a big step change in capability that it needs to be kept secret and carefully controlled because a wide release could threaten cybersecurity everywhere. That does not really hold water if a barely simpler harness can do the same stuff at a lower price and is available to all of us.
The burning question to me, at least, is how many false positives each approach generated, and the degree of their falseness (e.g. "valid but not exploitable" vs. "not valid"). It's not super useful if it's generating way more noise than signal.
It can't do the same stuff and the fact that you think it can means that you aren't reading past the headlines of these posts!
Anthropic's own blogpost mentioned that Opus found many of the vulnerabilities as well. The difference is that Mythos developed working exploits end to end, autonomously.
> But if you put any more information than that in the prompt, like chunk focuses, line numbers, or hints on what the vulnerability is: You're acting in bad faith
I think you're misrepresenting what they're doing here.
The Mythos findings themselves were produced with a harness that split it by file, as you noted. The harness from OP split each file into chunks of each file, and had the LLM review each chunk individually.
That's just a difference in the harness. We don't yet have full details about the harness Mythos used, but using a different harness is totally fair game. I think you're inferring that they pointed it directly at the vulnerability, and they implicitly did, but only in the same way they did with Mythos. Both approaches are chunking the codebase into smaller parts and having the LLM analyze each one individually.
I switched my home ISP from cable (which supported IPv6) to fiber (which doesn't) and I've had a nagging disappointment ever since. But I guess consumers aren't really demanding it enough.
And the higher level libraries mostly do it for you, too, even if you directly specify IPv4 addresses in your code (due to NAT64 [1]). I think it only even requires special work from you as a developer if you're using low-level or non-standard libraries.
The problematic low-level libraries are standard, and effectively impossible to fully deprecate since they're decades old and part of the socket API.
I think currently Apple still helps you with these via "bump in the stack" (i.e. they can translate internal v4 structures and addresses into NAT64-prefixed v6 at the kernel level), but they probably don't want to commit to doing that forever.
I used Little Snitch on Mac a few years ago and liked it, though I wasn't a fan of how (necessarily) deep it had to be in the OS to work. It felt like one of those things where, the moment you have any kind of network connectivity issue, it's the first thing you need to disable to troubleshoot because it's the weirdest thing you're doing.
I guess what I'd really like is a middleware box or something that I could put on my home network, but would then still give the same user experience as the normal app. I don't want to have to log into some web interface and manually add firewall rules after I find something not working. I like the pop-ups that tell you exactly when you're trying to do something that is blocked, and allow you to either add a rule or not.
I'm probably straddling some gray area between consumer-focused and enterprise-focused feature sets, but it would be neat.
I am the same, used Little Snitch for a few years back in the late 2000s, I think like 2010 until a few years back when I moved fulltime to Linux. Back then, my parents had an iMac and I was the designated "IT" person to keep it running efficiently. My siblings had a bad habit of installing games and hack software on it for their games. I ended up purchasing a license and after the first few hours/days of configuring allow/block lists, it worked pretty well. It earned the label of "Little B*ch" from them since it would stop their gaming hacking apps from connecting and wrecking havoc. Eventually I learned to keep them on a standard user account and separate admin for installing software.
Long story you didn't ask for. Like I said, I haven't used Little Snitch in a while. I'll give this a whirl this weekend. What I have done over the past few years is run AdGuard Home on a min home server. This has helped keep ads undercontrol in our hoursehold and I have an easy "turn off adguard for 10 mins" in homeassistant for the wife so she can do some shopping online since it can occasionally break some sites, but overall they tolerate adguard and think it's a good middle ground. I have a few block lists, nothing too crazy or strict to avoid breaking most sites. On the desktops/laptops, they all run FireFox w uBlock origin.
How deep it was in the OS was exactly what I liked about it. I only wished it were open source so I know what exactly is happening with that level of access.
Agreed. Another important lesson I learned when delivering code to a non-profit org, and them asking me to convert my final invoices into "donations" to the org.
Shop vac tube would be gross fast and need regular maintenance. Dog poop bag is entirely disposable. Throw it behind the spacecraft and use it as propulsion.
I read somewhere that the reason they don't typically use IT networking cables / tech is because normal IT infrastructure is a lot less strict with things like packet loss. It's actually not a huge deal to drop packets here and there, especially if any given component is at capacity. But in a car, some devices are super chatty and you can't be dropping packets much at all.
That said, I'm sure there's gotta be a better way to solve it with less copper. And I think they did something like that with CyberTruck.
> ...in a car, some devices are super chatty and you can't be dropping packets much at all....there's gotta be a better way to solve it with less copper.
I know CAN is a thing for a while now, and in the aviation world they have ethernet-derived standards like AFDX etc. But for some reason cables abound.
That's not too surprising for those of us who have been working with these things, either. All kinds of simpler use cases are manageable with harnesses but not reliably by LLMs on their own.
reply