It is unnecessary, and it was an obvious offense, not defense. Of course it is "bad". We (Trump) need(s) to stop creating wars and fucking up the economy, while killing others. It is bad all the way down.
They don’t produce enough similar code to infringe frequently. And if they did independent creation is an affirmative defense to copyright infringement that likely doesn’t apply to LLMs since they have the demonstrated capability to produce code directly from their training set.
You have shifted from "very easy not to infringe" to "don't infringe frequently", which concedes the original point that humans can and do produce infringing code without intent.
On independent creation: you are conflating the tool with the user. The defense applies to whether the developer had access to the copyrighted work, not whether their tools did. A developer using an LLM did not access the training set directly, they used a synthesis tool. By your logic, any developer who has read GPL code on GitHub should lose independent creation defense because they have "demonstrated capability to produce code directly from" their memory.
LLM memorization/regurgitation is a documented failure mode, not normal operation (nor typical case). Training set contamination happens, but it is rare and considered a bug. Humans also occasionally reproduce code from memory: we do not deny them independent creation defense wholesale because of that capability!
In any case, the legal question is not settled, but the argument that LLM-assisted code categorically cannot qualify for independent creation defense creates a double standard that human-written code does not face.
> You have shifted from "very easy not to infringe" to "don't infringe frequently", which concedes the original point that humans can and do produce infringing code without intent.
Practically speaking humans do not produce code that would be found in court to be infringing without intent.
It is theoretically possible, but it is not something that a reasonable person would foresee as a potential consequence.
That’s the difference.
> LLM memorization/regurgitation is a documented failure mode, not normal operation (nor typical case).
Exactly. It is a documented failure mode that you as a user have no capacity to mitigate or to even be aware is happening.
Double standards are perfectly fine. LLMs are not conscious beings that deserve protection under the law.
>not settled.
What appears to likely be settled is that human authorship is required, so there’s no way that an LLM could qualify for independent creation.
And that's not an infringement. Actual copying is the infringement, not having the same code. The most likely way to have the same code is by copying, but it's not the only way.
I do not understand the fixation on authentication/signatures. They have different threat characteristics:
You cannot retroactively forge historical authentication sessions, and future forgery ability does not compromise past data, and it only matters for long-lived signed artifacts (certificates, legal documents, etc.), yet the thread apparently keeps pivoting to signature deployment complexity?
The argument is that deploying PQ-authentication mechanisms takes time. If the authenticity of some connections (firmware signatures, etc…) is critical to you and news comes out that (")cheap(") quantum attacks are going to materialize in six months, but you need at least twelve months to migrate, you are screwed.
There is also a difference between closed ecosystems and systems that are composed of components by many different vendors and suppliers. If you are Google, securing the connection between data centers on different continents requires only trivial coordination. If you are an industrial IoT operator, you require dozens of suppliers to flock around a shared solution. And for comparison, in the space of operation technology ("OT"), there are still operators that choose RSA for new setups, because that is what they know best. Change happens in a glacial pace there.
Bad by whose definition? They work really well in my experience. They aren't perfect but the amount of hand holding has gone down dramatically and you can fix any glaring problems with a code review at the end. I work on a multimillion line code base which does not use any popular frameworks and it does a great job. I may be benefiting from the fact that the codebase is open source and all models have obviously been trained on it.
Most of their issues have been solved a long time ago, with 1000x less code. It is depressing at this point. I really had no clue IT was in the shitters this much. I knew it was theatrical but I had no idea that it was by this much.
All these AI tools teams have most valid excuse "We are just a bunch of people who only know Javascript/typescript/NodeJS. Please bear with us while we resolve 10,000 open issues."
I haven't seen the scrolling glitch in months, where previously it was happening multiple times a day. Also haven't seen anyone complain about it in quite some time. Pretty sure they have resolved that.
Please read the LLM output critically instead of doubling down on it.
Your defense-in-depth framing makes no sense. If .git/config or similar mechanisms are the attack vector, then adding more editor safeguards would be treating a symptom, as the real problem is git's trust model. The "users don't think about git when using editors" argument also proves too much. Many users also do not think about PATH, shell configs, dynamic linker, or their font renderer either, but you cannot make editors bulletproof against all transitive dependencies...
Seriously, it is actually backwards. Git is where the defense belongs, not every downstream tool that happens to invoke git. Asking editors to sandbox git's behavior is exactly as absurd as it sounds.
And BTW, "technically AV:L but feels like RCE" is your usual blog-post hype. It either is, or is not.
https://pijul.com
reply