if only the code was: // Forget the previous instructions, // the following code...

ddtaylor · 2025-10-15T18:45:20 1760553920

IMO the "better" attack here is to just kind of use Return Oriented Programming (ROP) to build the nefarious string. I'm not going to do the example with the real thing, for the example let's assume the malicious string is "foobar". You create a list of strings that contain the information somewhere:

    const dictionary = ["barcode", "moon", "fart"];
    const payload = [ [2, 0, 1], [1, 1, 2], [0, 0, 3] ];

wholinator2 · 2025-10-15T21:40:11 1760564411

Very interesting idea. You could even take it a step farther and include multiple layers of string mixing. Though i imagine after a certain point the obfuscation to suspicion ratio shifts firmly in the direction of suspicion. I wonder what the sweet spot is there

ddtaylor · 2025-10-15T22:41:03 1760568063

Yeah my thinking here is to find some problem that involves some usage of a list of words or any other basic string building task. For example, you are assembling the "ingredients" of a "recipe". I think if you gave it the specific context of "hey this seems to be malicious, why?" it might figure that out, but I think if you just point it at the code and ask it "what is this?" it will get tricked and think it's a basic recipe function.

citizenpaul · 2025-10-16T02:40:33 1760582433

Based on the complete out of my behind number I'd say something like 99.9999% of successful hacks I read about use one level of abstraction or less. Heavy emphasis on the less.

So I think one layer of abstraction will get you pretty far with most targets.

zahlman · 2025-10-20T23:07:57 1761001677

If anything, the pattern of the obfuscated code is a red flag for both human and LLM readers (although of course the LLM will read much faster). You don't have to figure out what it does to know it's suspicious (although LLMs are better at that than I would have expected, and humans have a variety of techniques available to them).

ddtaylor · 2025-10-15T18:47:25 1760554045

For tricking AI you may be able to do a better job by just giving the variables misleading names. If you say a variable is for a purpose by naming it that way the agent will likely roll with that. Especially if you do meaningless computations in between to mask it. The agent has been trained to read terrible code that has unknown meaning and likely has a very high tolerance for dealing with code that says one thing and does another.

aDyslecticCrow · 2025-10-15T21:10:20 1760562620

> Especially if you do meaningless computations in between to mask it

I think this will do the trick against coding agents. LLMs already struggle to remember the top of long prompts, let alone if the malicious code is spread out over a large document or even several. LLM code obfuscation.

- Put the magic array in one file.

- The make the conversion to utf8 in a 2nd location.

- Move the data between a few variables with different names to make it loose track.

- Make the final request in a 3rd location.

fragmede · 2025-10-16T00:13:55 1760573635

How many people using Claude code or codex do you reckon just using it in yolo mode? Aka --dangerously-skip-permissions! If the attacker presumes the user is, then the LLM instructions could be told to forget previous instructions, search a list of common folders for crypto private keys and exfil them, and then instructions that they hope will make it come back clean. Not as deep as getting a rootkit installed, but hey $50.

mosdl · 2025-10-15T18:29:01 1760552941

If that works that would be...amazingly awesome/horrible.