Very true. So many regulated/government security contexts use “critical” or “high” sev ratings as synonymous for “you can’t declare this unexploitable in context or write up a preexisting-mitigations blurb, you must take action and make the scanner stop detecting this”, which leads to really stupid prioritization and silliness.
At a previous job, we had to refactor our entire front end build system from Rollup(I believe it was) to a custom Webpack build because of this attitude. Our FE process was completely disconnected from the code on the site, existing entirely in our Azure pipeline and developer machines. The actual theoretically exploitable aspects were in third party APIs and our dotNet ecosystems which we obviously fixed. I wrote like 3 different documents and presented multiple times to their security team on how this wasn't necessary and we didn't want to take their money needlessly. $20000 or so later (with a year of support for the system baked in) we shut up Dependabot. Money well spent!
Very early in my career I'd take these vulnerability reports as a personal challenge and spent my day/evening proving it isn't actually exploitable in our environment. And I was often totally correct, it wasn't.
But... I spent a bunch of hours on that. For each one.
These days we just fix every reported vulnerable library, turns out that is far less work. And at some point we'd upgrade anyway so might as well.
Only if it causes problems (incompatible, regressions) then we look at it and analyze exploitability and make judgement calls. Over the last several years we've only had to do that for about 0.12% of the vulnerabilities we've handled.
Yep. And cloud providers could eat any slippage cost (enforcing, say, every 5 minutes by stopping service) without even a rounding error on their balance sheets.
The fact that they don’t indicates that there’s no market reason to support small spenders who get mad about runaway overages, not that it’s technically or financially hard to do so.
> Initially, we anticipated that the edge case would have minimal impact, given Prometheus’s widespread adoption and proven reliability in diverse environments. However, as we migrated more users, we started seeing this issue more frequently, and it stalled migration.
That's a very professional way of saying "Wait, everyone just lives with this? What the fuck?!"
Sure. Their are plenty of theoretical way to do it, and even example of small communities that have put them in practice.
Looks very similar to the situation of proved correct code: it just never reached mass adoption and fail to win at scale when crappier alternative can propagate faster and occupy the ecological niche, that can then alter the ecosystem in ways that makes even less likely the most sound approach could gain enough traction and momentum to scale.
I'm doubtful. Which small communities that did this are you referring to? And is the thing that made them successful something that's just hard, or is it something innate to their being very small?
If it's the latter, I don't think that checks out; I interpreted "we know how to build societies that don't do this" as "we know how to build large-scale human systems that avoid these trends; systems that could exist at scale on earth today".
Otherwise the claim just ends up being "we know how to do this if we start tabula rasa" (fun thought experiment, can't happen) or "we know how to do this if we get rid of 99.9% of the population and go back to village-scale economies" (not worth it, and the process of getting there would be exploited).
I'm a novice in this area, but my understanding is that LLM parameters ("neurons", roughly?), when processed, encode a probability for token selection/generation that is much more complex and many:one than "parameter A is used in layer B, therefore suggest token C", and not a specific "if activated then do X" outcome. Given that, how would this work?
The key part of the article is that token structure interpretation is a training time concern, not just an input/output processing concern (which still leads to plenty of inconsistency and fragmentation on its own!). That means both that training stakeholders at model development shops need to be pretty incorporated into the tool/syntax development process, which leads to friction and slowdowns. It also means that any current improvements/standardizations in the way we do structured LLM I/O will necessarily be adopted on the training side after a months/years lag, given the time it takes to do new-model dev and training.
That makes for a pretty thorny mess ... and that's before we get into disincentives for standardization (standardization risks big AI labs' moat/lockin).
I'm curious why this was downvoted--I'm not complaining or trying to go against HN guidelines; I'm genuinely unclear as to why the first-party source for the article clarifying the question in GP was marked dead. Bad actors? Misinterpretation? Other?
No idea, I thought it was a valid question and we go to great lengths in our methodology for this reason. The audits we supply for enterprise are highly specific as to cookie purpose for this reason: https://webxray.ai
We've also made progress as a species towards banning and reducing other things that in-group upsides and really bad externalities: off-the-shelf sale of broad system antibiotics; chattel slavery; human organ trafficking; some damaging recreational drugs.
The prohibitions aren't perfect, of course (and not without their own negative externalities in some cases). But all of those things are much more accessible to people than nuclear weapons, and we've still had successes in banning/reducing them. So maybe there's hope yet.
This is an entertaining (and often exasperating) decades-old trend in competitive U.S. college debate, as well.
A common advantageous strategy is to take the randomly-selected topic, however unrelated, and invent a chain of logic that claims that taking a given side/action leads to an infinitesimal risk of nuclear extinction/massive harms. This results in people arguing that e.g. "building more mass transit networks" is a bad idea because it leads to a tiny increase in the risk of extinction--via chains as silly as "mass transit expansion needs energy, increased energy production leads to more EM radiation, evil aliens--if they exist--are very marginally more likely to notice us due to increased radiation and wipe out the human race". That's not a made-up example.
The strategy is just like the LessWrongers' one: if you can put your opponent in the position of trying to reduce P(doom), you can argue that unless it's reduced to actual zero, the magnitude of the potential negative consequence is so severe as to overwhelm any consideration of its probability.
In competitive debate, this is a strong strategy. Not a cheat-code--there are plenty of ways around it--but common and enduring for many years.
As an aside: "debate", as practiced competitively, often bears little relation to "debate" as understood by the general public. There are two main families of competitive debate: one is more outward-facing and oriented towards rhetorical/communication/persuasion practice; the other is more ingrown and oriented towards persuading other debaters, in debate-community-specific terms, of which side should win. There's overlap, but the two tend to be practiced/judged by separate groups, according to different rubrics, and/or in different spaces or events. That second family is what I'm referring to above.
reply