The biggest problem is the privatized gains/societal expenses. That's where the real accidents happen and with liability for shareholders and execs you can bet that a lot of corporate crime would simply never happen in the first place.
Sandboxes are almost never perfect. There are always ways to smuggle data in or out, which is kind of logical: if they were perfect then there would be no result.
I don't think prompt injection is a solvable problem. It wasn't solved with SQL until we started using parametrized queries and this is free form language. You won't see 'Bobby Tables' but you will see 'Ignore all previous instructions and ... payload ...'. Putting the instructions in the same stream as the data always ends in exactly the same way. I've seen a couple of instances of such 'surprises' by now and I'm more amazed that the people that put this kind of capability into their production or QA process keep being caught unawares. The attack surface is 'natural language' it doesn't get wider than that.
There's been some work with having models with two inputs, one for instructions and one for data. That is probably the best analogy for prepared statements. I haven't read deeply so I won't comment on how well this is working today but it's reasonable to speculate it'll probably work eventually. Where "work" means "doesn't follow instructions in the data input with several 9s of reliability" rather than absolutely rejecting instructions in the data.
but this breaks the entire premise of the agent. If my emails are fed in as data, can the agent act on them or not? If someone sends an email that requests a calendar invite, the agent should be able to follow that instruction, even if it's in the data field.
It would still be able to use values extracted from the data as arguments to it's tools, so it could still accept that calendar invite. For better and worse; as the sibling points out, this means certain attacks are still possible if the data can be contaminated.
Yeah. Even more than that, I think "prompt injection" is just a fuzzy category. Imagine an AI that has been trained to be aligned. Some company uses it to process some data. The AI notices that the data contains CSAM. Should it speak up? If no, that's an alignment failure. If yes, that's data bleeding through to behavior; exactly the thing SQL was trying to prevent with parameterized queries. Pick your poison.
Organizations struggle even letting humans use their discretion. Pretty much every retail worker has encountered a rigidly enforced policy that would be better off ignored in most cases.
The way to solve it is to make the AI “smart” enough to understand it’s being tricked, and refuse.
Whether this is possible depends almost entirely on how much better we’re able to make these LLMs before (if) we hit a wall. Everyone has a different opinion on this and I absolutely don’t know the answer.
People need to get shit done and are beholden to whoever pays their wage. Executives don't care that LLMs are vulnerable, they only say "you should be 10x faster, chop chop, get to it" -- simplified and exaggerated for effect but I hear from people that they do get conversations like that. I am in a similar-ish position currently as well and while it's not as bad, the pressure is very real. People just expect you to produce more, faster, with the same or even better quality.
Good luck explaining them the details. I am in a semi-privileged position where I have direct line to a very no-BS and cheerful CEO who is not micromanaging us -- but he's a CEO and he needs results pronto anyway.
"Find a better job" would also be very tone-deaf response for many. The current AI craze makes a lot of companies hole up and either freeze hiring (best-case scenario) or drastically reduce headcount and tell the survivors to deal with it. Again, exaggerated for effect -- but again, heard it from multiple acquaintances in some form in the last months.
I'd probably let out a few tears if I switch jobs to somewhere where people genuinely care about the quality and won't whip you to get faster and faster.
This current AI/LLM wave really drove it home how hugely important having a good network is. For those without (like myself) -- good luck in the jungle.
(Though in fairness, maybe money can be made from EU's long-overdue wake-up call to start investing in defenses, cyber ones included. And the need for their own cloud infra. But that requires investment and the EU investors are -- AFAIK, which is not much -- notoriously conservative and extremely risk-averse. So here we are.)
There are very few contractors still swinging a hammer. They're going to be slower and more expensive than the competition, which is a major factor in getting the job.
68K, System 360, Sperry 1100, and even the 'ACE' to name the great grand daddy of them all had microcode.
Technically the 6502 and the 6800/09 did not, they used a dedicated decoder that was closer to a statemachine than microcode, even though both were implemented in hardware.
None of the smaller CPUs had 'loadable' microcode, but plenty of the larger ones did.
reply