Pithy. But a made up quote by Tytler, he never said or wrote that.
Tyler expressed some skepticism of Democracies but nothing like this. The too on-the-nose nature of this often passed along bit of propaganda should also be the giveaway that it might be one of those rare things on the internet that someone may have been less than honest about the origins, and go look and see.
so… choices, as you see them in this issue, the lenses through which on the one hand you think is extreme and the other appropriate… are either screens-as-drugs or sports fishing?
Some middle ground might be there somewhere. But if forced to choose… the choices for interpreting behavioral engineering funded by $billions in research for over a decade + data harvesting on a scale unprecedented, for the purpose of manipulating users:
There were no uncomfortable truths there about code agents, save one of the 4 points which was that maybe they sometimes get prompt injected if you let them search for things online and don't pay attention to where they search and the code they write. That's not an uncomfortable truth in the normal sense of "I know you don't want to admit this but..." and more just the thing that, if you didn't know it already 8 months ago, you certainly should by now.
The other truths that were not about coding agents:
--Skill Atrophy. (Use it or lose it-- another thing we already know)
--The economics of serving code agents at scale (Ungrounded in actual numbers, only OpenAI's miscellaneous statements and annecdotes. Actual cost of running code agents: last gen's mid-tier gaming gpu's will get you reasonably close to Claude Sonnet if you put just a little time in to an agent harness, and its getting cheaper and cheaper for better and better. So, at scale, with real sysadmins doing the hard engineering to eek out every last bit of performance-- well, infra needed for serving these isn't the cost center)
--Copyright. (This passed on the same bad read of a court ruling half the press has been doing for a few years now. TLDR: The Thaler vs. Perlmutter case, which said nothing about output not being protected by copyright. It denied Thaler's attempt to register *the AI* as the owner of the copyright)
Good luck getting the average person through the setup process
AI is part of the problem with what MS has shoved in to things but it may be part of what can help with the underlying issue of this behavior by corporations.
The average user increasingly will not need to be walked through in certain ways, they’ll only have to be aware something, some way, is possible. Because we are most of usthe average, meaning outsider to knowledge and understanding of things their functioning on a computer. I can strip out tired windows behavior to some extent and certainly stand up a Linux desktop. But I didn’t know how to easily manage retrieval of data from an old disc image that refused to mount. But I knew it was there and not impossible so I asked Claude. A one shot prompt that a few minutes later had Claude reading raw bytes in someway and finding the location of a few files I needed.
So there is potential for AI to fill some gaps in this way and make some things easier and more in reach of average users. It’s potential only though, so continuing to work and ensure open models remain a thing, it’s important. Just like the Internet enabled a lot of things previously out of reach of people. And yeah, that was not an un mixed blessing with the rest, so all the more reason to move forward thoughtfully.
If that was the intent don’t you think it would be stated somewhere, or in the faq?
>“Talking” past
It’s only text, there’s no talking past. You can’t talk past someone when the conversation isn’t spoken. At best, you might ignore what they write and go on and on and on at some length on your own point instead, ever meandering further from the words you didn’t read, widening the scope of the original point to include the closest topic that isn’t completely orthogonal to the one at hand, like the current tendency to look for the newest pattern of LLM output in everyone’s’ comments in an attempt to root out all potential AI generated responses. And eventually exhaust all of their rhetoric and perhaps, just maybe, in the very end, get to the
Last year Jamie Dimon said there were some going to be some “cockroaches” found lingering unattended to in lots of private credit portfolios— the implication at the time was not that it was systemic and deep, merely that various incentives and market forces have meant a shakeout of either the incremental as-it-happens variety or slightly larger ones of multiple happening at a time.
Since then I’ve seen small things indicating lots of people quietly checking their books for such.
In the last week or two this has accelerated. A lot. Every few days there are ratchets tightening things up. Dimon just put some hard limits on some private credit lines and what they could take out. A few other banks trying to take other precautions.
I took that to be what it was intended it convey, and what Dimon wanted people to feel about what he said. That maybe they should poke around their own books but he wasn’t telling people “well ‘08 all over again”
My own read of the subtext was something a bit different. Dimon saw something he really didn’t like and my guess would be that more than just a handful of people at JP Morgan were having their next few days or longer personal plans cancelled—- or that it had already settled from something like that— to find whatever they had in the way of cockroaches. And so Dimon’s public statement was a soft nudge to try and get others to do the same, cautiously and slowly without panicking.
It’s tea leaves but the time since then seems to bear that out, with right now’s world economic volatility being a good opportunity for many places to go a little more aggressively in reigning in whatever they have in cockroach’s with some cover from that volatility and distraction to not have to explain too much more or get too much scrutiny that would accelerate things beyond manageable.
Overall, my take was that Dimon is still probably pissed off about SV bank and trying to make sure whatever shape or size this private credit rot may have doesn’t go down quite that haphazardly.
Any document store where you haven’t meticulously vetted each document— forget about actual bad actors— runs this risk. A size org across many years generates a lot of things. Analysis that were correct at one point and not at another, things that were simply wrong at all times, contradictory, etc.
You have to choose model suitably robust is capabilities and design prompts or various post training regimes that are tested against such, where the model will identify the different ones and either choose the correct one on surface both with an appropriately helpful and clear explanation.
At minimum you have to start from a typical model risk perspective and test and backtest the way you would traditional ML.
You're right, and this is an underappreciated point. The "attacker" framing can actually obscure the more common risk: organic knowledge base degradation over time. The poisoning attack is just the adversarial extreme of a problem that exists in every large document store.
The model robustness angle is valid but I'd push back slightly on it being sufficient as a primary control. The model risk / backtesting framing is exactly right for the generation side. Where RAG diverges from traditional ML is that the "training data" is mutable at runtime (any authenticated user or pipeline can change what the model sees without retraining).
My apologies, it wasn’t my intent to convey that as a primary. It isn’t one. It’s simply the first thing you should do, apart from vetting your documents as much as practicality allows, to at least start from a foundation where transparency of such results is possible. In any system whose main functionality is to surface information, transparency and provenance and a chain of custody are paramount.
I can’t stop all bad data, I can maximize the ability to recognize it on site. A model that has a dozen RAG results dropped on its context needs to have a solid capability in doing the same. Depending on a lot of different details of the implementation, the smaller the model, the more important it is that it be one with a “thinking” capability to have some minimal adequacy in this area. The “wait-…” loop and similar that it will do can catch some of this. But the smaller the model and more complex the document—- forget about context size alone, perplexity matters quite a bit— the more a small model’s limited attention budget will get eaten up too much to catch contradictions or factual inaccuracies whose accurate forms were somewhere in its training set or the RAG results.
I’m not sure the extent to which it’s generally understood that complexity of content is a key factor in context decay and collapse. By all means optimize “context engineering” for quota and API calls and cost. But reducing token count without reducing much in the way of information, that increased density in context will still contribute significantly to context decay, not reducing it in a linear 1:1 relationship.
If you aren’t accounting for this sort of dynamic when constructing your workflows and pipelines then— well, if you’re having unexpected failures that don’t seem like they should be happening, but you’re doing some variety of aggressive “context engineering”, that is one very reasonable element to consider in trying to chase down the issue.
I have-- I see your info via your HN profile. If I have a spare moment this weekend I'll reach out there, I'll dig up a few examples and take screenshots. I built an exploration tool for investigating a few things I was interested in, and surfacing potential reasoning paths exhibited in the tokens not chosen was one of them.
Part of my background is in Linguistics-- classical not just NLP/comp-- so the pragmatics involved with disfluencies made that "wait..." pattern stand out during just normal interactions with LLM's that showed thought traces. I'd see it not too infrequently eg by expanding the "thinking..." in various LLM chat interfaces.
In humans it's not a disfluency in the typical sense of difficulty with speech production, it's a pragmatic marker, let's the listener know a person is reevaluating something they were about to say. It of course carries over into writing, either in written dialog or less formal self-editing contexts, so it's well represented in any training corpora. As such, being a marker of "rethinking", it stood to reason models' "thinking" modes displayed it-- not unlikely it's specifically trained for.
So it's one of the things I went token-diving to see "close up", so to speak, in non-thinking models too. It's not hard to induce a reversal or at least diversion off whatever it would have said-- if close to a correct answer there's a reasonable chance it will get the correct one instead of pursuing a more likely of the top k. This wasn't with Qwen, it was gemma 3 1b where I did that particular exploration. It wasn't a systematic process I was doing for a study, but I found it pretty much any time I went looking-- I'd spot a decision point and perform the token injection.
If I have the time I'll mockup a simple RAG scenario, just inject the documents that would be retrieved from RAG result similar to your article, and screenshot that in particular. A bit of a toy setup but close enough to "live" that it could point the direction towards more refined testing, however the model responds, and putting aside the publishing side of these sorts of explorations there's a lot of practical value in assisting with debugging the error rates.
With respect to consumption, it’s pretty efficient vs older traditional servers, though I know workloads like that aren’t completely fungible. Nonetheless it bears keeping in mind that a single GB200 NVL72 rack provides 1.4 ExaFLOPS of AI compute (at FP4 precision, ideal circumstances, but this is envelope math all around). So it’s power efficient, for what it is.
Oh, I have no doubt it is functionally efficient. I'm just amazed given the system deployments I've been party to, and the tiny amount of per rack energy usage comparatively speaking given the functionality of those systems.
Like, what in the good god damn are we using all this energy for?
You left out overthrowing governments with customized targeted propaganda, jamming citizen discussion with noise, artificially creating and nourishing contrarian cells in democratic societies. The machines will now be programming people.
This article is setting up a bit of a moving target. Legal vs legitimate is at least only a single vague question to be defined but then the target changes to “socially legitimate” defined only indirectly by way of example, like aggressive tax avoidance as “antisocial”— and while I tend to agree with that characterization my agreement is predicated on a layering of other principals.
The fundamental problem is that once you take something outside the realm of law and rule of law in its many facets as the legitimizing principal, you have to go a whole lot further to be coherent and consistent.
You can’t just leave things floating in a few ambiguous things you don’t like and feel “off” to you in some way- not if you’re trying to bring some clarity to your own thoughts, much less others. You don’t have to land on a conclusion either. By all means chew over things, but once you try to settle, things fall apart if you haven’t done the harder work of replacing the framework of law with that of another conceptual structure.
You need to at least be asking “to what ends? What purpose is served by the rule?” Otherwise you’re stuck in things where half the time you end up arguing backwards in ways that put purpose serving rules, the maintenance of the rule with justifications ever further afield pulled in when the rule is questioned and edge cases reached. If you’re asking, essentially, “is the spirit of the rule still there?” You’ve got to stop and fill in what that spirit is or you or people that want to control you or have an agenda will sweep in with their own language and fill the void to their own ends.
Hmmm… that may only work if they end up using the brick… maybe just send them donuts, corporate hq, stick a long stream of receipt paper in their with your own preferred use-based tos writing small font and faded ink.
Tyler expressed some skepticism of Democracies but nothing like this. The too on-the-nose nature of this often passed along bit of propaganda should also be the giveaway that it might be one of those rare things on the internet that someone may have been less than honest about the origins, and go look and see.
reply