Spyke

The title of the article is extraordinary wrong that makes it click bait.

There is no "yes to copilot"

It is only a formalization of what Linux said before: All AI is fine but a human is ultimately responsible.

" AI agents cannot use the legally binding "Signed-off-by" tag, requiring instead a new "Assisted-by" tag for transparency"

The only mention of copilot was this:

"developers using Copilot or ChatGPT can't genuinely guarantee the provenance of what they are submitting"

This remains a problem that the new guidelines don't resolve. Because even using AI as a tool and having a human review it still means the code the LLM output could have come from non GPL sources.

293

marlowe221 reply

Yeah, that’s also my question. Partially because I am a former-lawyer-turned-software-developer… but, yeah. How are the kernel maintainers supposed to evaluate whether a particular PR contains non-GPL code?

Granted, this was potentially an issue before LLMs too, but nowhere near the scale it will be now.

(In the interests of full disclosure, my legal career had nothing to do with IP law or software licensing - I did public interest law).

Alex reply

They don't, just like they don't with human submitted stuff. The point of the Signed-off-by is the author attests they have the rights to submit the code.

ell1e reply

Which I'm guessing they cannot attest, if LLMs truly have the 2-10% plagiarism rate that multiple studies seem to claim. It's an absurd rule, if you ask me. (Not that I would know, I'm not a lawyer.)

Alex reply

Where are you seeing the 2-10% figure?

In my experience code generation is most affected by the local context (i.e. the codebase you are working on). On top of that a lot of code is purely mechanical - code generally has to have a degree of novelty to be protected by copyright.

ell1e reply

If you had a contributor that plagiarized at a 2-10%, would you really go "eh it has to have a degree of novelty to be a problem" rather than just ban them? The different standards baffle me sometimes.

You can find various rates mentioned here: https://dl.acm.org/doi/10.1145/3543507.3583199 and here: https://www.theatlantic.com/technology/2026/01/ai-memorization-research/685552/

Alex reply

If the 2-10% is just boilerplate syscall number defines or trivial MIN/MAX macros then it's just the common way to do things.

ell1e reply

So do you want to legally review every line by an LLM to see if it meets the fair use criterion, since you have to assume it was probably stolen? And would you do this for a known plagiarizing human contributor too...?

Danquebec reply

Imagine how broken it would be otherwise. The first person to write a while loop in any given language would be the owner of it. Anyone else using the same concept would have to write an increasingly convoluted while loop with extra steps.

sloppy_diffuser reply

Anyone else using the same concept would have to write an increasingly convoluted while loop with extra steps.

Sounds like an origin story for recursion.

If it's flagged as "assisted by " then it's easy to identify where that code came from. If a commercial LLM is trained on proprietary code, that's on the AI company, not on the developer who used the LLM to write code. Unless they can somehow prove that the developer had access to said proprietary code and was able to personally exploit it.

If AI companies are claiming "fair use," and it holds up in court, then there's no way in hell open-source developers should be held accountable when closed-source snippets magically appear in AI-assisted code.

Granted, I am not a lawyer, and this is not legal advice. I think it's better to avoid using AI-written code in general. At most use it to generate boilerplate, and maybe add a layer to security audits (not as a replacement for what's already being done).

But if an LLM regurgitates closed-source code from its training data, I just can't see any way how that would be the developer's fault...

sem reply

piefed.blahaj.zone

Pretty convenient.

This is how copyleft code gets laundered into closed source programs.

All part of the plan.

How would they launder it? Just declare it their own property because a few lines of code look similar? When there's no established connection between the developers and anyone who has access to the closed-source code?

That makes no sense. Please tell me that wouldn't hold up in court.

lagoon8622 reply

Please tell me that wouldn't hold up in court.

First tell us how much money you have. Then we'll be able to predict whether the courts will find in your favor or not

Sad but true...

sem reply

piefed.blahaj.zone

First of all, who is going to discover the closed source use of gpl code and create a lawsuit anyway?

Second, the llm ingests the code, and then spits it back out, with maybe a few changes. That is how it benefits from copyleft code while stripping the license.

Maybe a human could do the same thing, but it would take much longer.

Wait, did you just move the goalposts? I thought the issue we were talking about was open-source developers who use LLM-generated code and unwittingly commit changes that contain allegedly closed-source snippets from the LLM's training data.

Now you want to talk about LLM training data that uses open-source code, and then closed-source developers commit changes that contain snippets of GPL code? That's fine. It's a change of topic, but we can talk about that too.

Just don't expect what I said before about the previous topic of discussion to apply to the new topic. If we're talking about something different now, I get to say different things. That's how it works.

sem reply

piefed.blahaj.zone

I was responding specifically to this part

But if an LLM regurgitates closed-source code from its training data, I just can't see any way how that would be the developer's fault...

showing what would happen when the llm regurgitates open source code into close source projects.

Sorry if you didn't like that.

ricecake reply

I believe what they're referring to is the training of models on open source code, which is then used to generate closed source code.
The break in connection you mention makes it not legally infringement, but now code derived from open source is closed source.

Because of the untested nature of the situation, it's unclear how it would unfold, likely hinging on how the request was formed.

We have similar precedent with reverse engineering, but the non sentient tool doing it makes it complicated.

That makes sense. I see the problem with that, and I don't have a good solution for it. It is a divergence of topic though, as we were discussing open-source programmers using LLMs which are potentially trained on closed-source code.

LLMs trained on open-source code is worth its own discussion, but I don't see how it fits in this thread. The post isn't about closed-source programmers using LLMs.

Besides, closed-source code developers could've been stealing open-source code all along. They don't really need AI to do that.

Still, training LLMs on open-source code is a questionable practice for that reason, particularly when it comes to training commercial models on GPL code. But it's probably hard to prove what code was used in their datasets, since it's closed-source.

ricecake reply

I don't really see it as a divergence from the topic, since it's the other side of a developer not being responsible for the code the LLM produces, like you were saying.
In any case, it's not like conversations can't drift to adjacent topics.

Besides, closed-source code developers could've been stealing open-source code all along. They don't really need AI to do that.

Yes, but that's the point of laundering something. Before if you put foss code in your commercial product a human could be deposed in the lawsuit and make it public and then there's consequences. Now you can openly do so and point at the LLM.

People don't launder money so they can spend it, they launder money so they can spend it openly.

Regardless, it wasn't even my comment, I just understood what they were saying and I've already replied way out of proportion to how invested I am in the topic.

anarchiddy reply

Yup.

I would also just point out that this doesnt change the legal exposure to the Linux kernel to infringing submissions from before the advent of LLMs.

lechekaflan reply

The title of the article is extraordinary wrong that makes it click bait.

It's the pain in the ass with some of those fucking tech/video/showbiz news outlets and then rules in some fora where you cannot make "editorialized" post titles, even though it's so tempting to correct the awful titling.

Fmstrat reply

Because even using AI as a tool and having a human review it still means the code the LLM output could have come from non GPL sources.

I get why they are passing this by though, since you don't know the provenance of that Stack Overflow snippet, either.

scarabic reply

That’s probably why they say “a human is responsible” not “a human must validate it.” I certainly agree that validation is not always possible. And this problem will get worse in time.

TheOctonaut reply

piefed.zip

the LLM output could have come from non-GPL sources

Fundamentally not how LLMs work, it's not a database of code snippets.

-2

BradleyUffner reply

"Derivative works"

theherk

Seems like a reasonable approach. Make people be accountable for the code they submit, no matter the tools used.

146

ell1e reply

If the accountability cannot be practically fulfilled, the reasonable policy becomes a ban.

What good is it to say "oh yeah you can submit LLM code, if you agree to be sued for it later instead of us"? I'm not a lawyer and this isn't legal advice, but sometimes I feel like that's what the Linux Foundation policy says.

ViatorOmnium reply

But this was already the case. When someone submitted code to Linux they always had to assume responsibility for the legality of the submitted code, that's one of the points of mandatory Signed-off-by.

badgermurphy reply

-14

Traister101 reply

lemmy.today

That's their problem. If they are using an LLM and cannot verify the output they shouldn't be using an LLM

jj4211 reply

Problem is that broadly most GenAI users don't take that risk seriously. So far no one can point to a court case where a rights holder successfully sued someone over LLM infringement.

The biggest chance is getty and their case, with very blatantly obvious infringement. They lost in the UK, so that's not a good sign.

FauxLiving reply

Most GenAI users do not submit code to the Linux kernel project.

hperrin reply

So why invite them to?

hperrin reply

Nobody can verify that the output of an LLM isn’t from its training data except those with access to its training data.

ell1e reply

I agree. However, I think the natural conclusion is an LLM ban. See also here.

badgermurphy reply

-8

Traister101 reply

lemmy.today

Frankly I expect the kernel dudes to be pretty good about this, their style guides alone are quite strick and any funny business in a PR that isn't marked correctly is I think likely a ban from making PRs at all. How it worked beforehand, as already stated by others is the author says "I promise this follows the rules" and that's basically the end of it. Giving an official avenue for generated code is a great way to reduce the negatives of it that'll happen anyway. We know this from decades of real life experience trying to ban things like alcohol or drugs, time after time providing a legal avenue with some rules makes things safer. Why wouldn't we see a similar effect here?

badgermurphy reply

I do think that some projects will fare better than others, particularly ones like you mentioned, where the team is robust and capable of handling the filtering of increased submissions from these new sources.

I believe we are going to end up having to see some new mechanism for project submissions to deal with the growing imbalance between submission volume and work hours available for review, as became necessary when viruses, malware, and spam first came into being. It has quickly become incredibly easy for anyone to make a PR, but not at all easier to review them, so something is going to have to give in the FOSS world.

scarabic reply

What accountability has there been for bad code by humans?

hperrin reply

No, it’s not a reasonable approach. Make people be the authors of the code they submit is reasonable, because then it can be released under the GPL. AI generated code is public domain.

theherk reply

I suppose there should be no code generators, assemblers, compilers, linkers, or lsp’s then either? Just etching 1’s and 0’s?

hperrin reply

The copyright office has made it explicitly clear that those tools do not interfere with the traditional elements of authorship, and that the use of LLMs does. So, if you don’t want to take my word for it, take the US Copyright Office’s word for it.

theherk reply

As the agency overseeing the copyright registration system, the Office has extensive experience in evaluating works submitted for registration that contain human authorship combined with uncopyrightable material, including material generated by or with the assistance of technology. It begins by asking “whether the ‘work’ is basically one of human authorship, with the computer [or other device] merely being an assisting instrument, or whether the traditional elements of authorship in the work (literary, artistic, or musical expression or elements of selection, arrangement, etc.) were actually conceived and executed not by man but by a machine.” In the case of works containing AI-generated material, the Office will consider whether the AI contributions are the result of “mechanical reproduction” or instead of an author’s “own original mental conception, to which [the author] gave visible form.” The answer will depend on the circumstances, particularly how the AI tool operates and how it was used to create the final work. This is necessarily a case-by-case inquiry. If a work’s traditional elements of authorship were produced by a machine, the work lacks human authorship and the Office will not register it For example, when an AI technology receives solely a prompt from a human and produces complex written, visual, or musical works in response, the “traditional elements of authorship” are determined and executed by the technology—not the human user. Based on the Office’s understanding of the generative AI technologies currently available, users do not exercise ultimate creative control over how such systems interpret prompts and generate material. Instead, these prompts function more like instructions to a commissioned artist—they identify what the prompter wishes to have depicted, but the machine determines how those instructions are implemented in its output. For example, if a user instructs a text-generating technology to “write a poem about copyright law in the style of William Shakespeare,” she can expect the system to generate text that is recognizable as a poem, mentions copyright, and resembles Shakespeare’s style. But the technology will decide the rhyming pattern, the words in each line, and the structure of the text. When an AI technology determines the expressive elements of its output, the generated material is not the product of human authorship. As a result, that material is not protected by copyright and must be disclaimed in a registration application.

In other cases, however, a work containing AI-generated material will also contain sufficient human authorship to support a copyright claim. For example, a human may select or arrange AI-generated material in a sufficiently creative way that “the resulting work as a whole constitutes an original work of authorship.” Or an artist may modify material originally generated by AI technology to such a degree that the modifications meet the standard for copyright protection. In these cases, copyright will only protect the human-authored aspects of the work, which are “independent of ” and do “not affect” the copyright status of the AI-generated material itself.

This policy does not mean that technological tools cannot be part of the creative process. Authors have long used such tools to create their works or to recast, transform, or adapt their expressive authorship. For example, a visual artist who uses Adobe Photoshop to edit an image remains the author of the modified image, and a musical artist may use effects such as guitar pedals when creating a sound recording. In each case, what matters is the extent to which the human had creative control over the work’s expression and “actually formed” the traditional elements of authorship.

— https://www.copyright.gov/ai/ai_policy_guidance.pdf

What this makes clear is that it certainly isn’t black or white as you say. Nevertheless, automation converting an input to an output, simply cannot be the only mechanism used in determining authorship.

And that wouldn’t change my statement anyway, but rather supports it. The person submitting a patch must be accountable for its contents.

An outright ban would need to carefully define how an input gets converted to an output, and that may not be so clear. To be effectively clear, one would have to potentially end the use of many tools that have been used for many years in the kernel, including snippet generation, spelling and grammar correction, IDE autocompleting. So such a reductive view simply will not suffice.

Additionally, copywritability and licenseability are wholly different questions. And it does not violate GPL to include public domain content, since the license applies to the aggregate work.

hperrin reply

https://sciactive.com/human-contribution-policy/#Definitions

If a work’s traditional elements of authorship were produced by a machine, the work lacks human authorship and the Office will not register it For example, when an AI technology receives solely a prompt from a human and produces complex written, visual, or musical works in response, the “traditional elements of authorship” are determined and executed by the technology—not the human user. Based on the Office’s understanding of the generative AI technologies currently available, users do not exercise ultimate creative control over how such systems interpret prompts and generate material. Instead, these prompts function more like instructions to a commissioned artist—they identify what the prompter wishes to have depicted, but the machine determines how those instructions are implemented in its output. For example, if a user instructs a text-generating technology to “write a poem about copyright law in the style of William Shakespeare,” she can expect the system to generate text that is recognizable as a poem, mentions copyright, and resembles Shakespeare’s style. But the technology will decide the rhyming pattern, the words in each line, and the structure of the text. When an AI technology determines the expressive elements of its output, the generated material is not the product of human authorship. As a result, that material is not protected by copyright and must be disclaimed in a registration application.

That seems very clear to me. Generative AI output is not human authored, and therefore not copyrighted.

The policy I use also makes very clear the definition of AI generated material:

I’m not exactly sure how you can possibly think there is an equivalence between a tool like a spelling and grammar checker and a generative AI, but there’s a reason the copyright office will register works that have been authored using spelling and grammar checkers, but not works that have been authored using LLMs.

theherk reply

Just read the next two paragraphs. Don’t just stop because you got to something that you like. The equivalence I draw is clear. You don’t like it, and that’s okay. But one would have to clarify exactly what the ban entails, and that wouldn’t be as clear as you might think. LLM’s only, transformers specifically, what about graph generation, other ML models? Is it just ML? If so, is that because a matrix lattice was used to get from input to output? Could other deterministic math functions trigger the same ban? What is a spell checker used RNG to select best replacement from a list of correct options? What if a compiler introduces an assembled output with an optimization not of the authors writing?

Do you see why they say “The answer will depend on the circumstances, particularly how the AI tool operates and how it was used to create the final work. This is necessarily a case-by-case inquiry”?

And that still affects copywriteability, not license compliance.

hperrin reply

https://en.wikipedia.org/wiki/Generative_AI

Do you want to explain to me what, in those two paragraphs, means that the use of spell checkers and LLMs is equivalent with regard to copyrightability? It seems like those paragraphs make it clear that the use of spell checkers is not the same as LLMs.

The policy I use bans “generative AI model” output. Generative AI is a pretty well defined term:

https://www.merriam-webster.com/dictionary/generative%20AI

If you have trouble determining whether something is a generative AI model, you can usually just look up how it is described in the promotional materials or on Wikipedia.

Type: Large language model, Generative pre-trained transformer

- https://en.wikipedia.org/wiki/Claude_(language_model)

I never said it violates GPL to include public domain code. I’m not sure where you got that from. What I said is that public domain code can’t really be released under the GPL. You can try, but it’s not enforceable. As in, you can release it under that license, but I can still do whatever I want with it, license be damned, because it’s public domain.

I did that with this vibe coded project:

https://github.com/hperrin/gnata

I just took it and rereleased it as pubic domain, because that’s what it is anyway.

scarabic reply

Also, having buttons on your clothes is an abomination. Hooks and eyes only.

ziproot reply

Isn’t that the rule? The author has to be a human?

The new guidelines mandate that AI agents cannot use the legally binding "Signed-off-by" tag, requiring instead a new "Assisted-by" tag for transparency. Ultimately, the policy legally anchors every single line of AI-generated code and any resulting bugs or security flaws firmly onto the shoulders of the human submitting it.

hperrin reply

If the author is an LLM, then the author is not a human.

scarabic reply

There is a difference between authoring and submitting, right?

sonofearth

I am the c/fuck_ai person but at this point I have made peace we can't avoid it. I still don't want it to do artsy stuff (image gen, video gen) and to blindly use it in critical stuff because humans are the ones that should be doing it or have constant oversight. I think the team's logic is correct here, because there is no way to know if the code is from an LLM or a human unless something there screams LLM or the contributor explicitly mentions it. Mandating the latter seems like a reasonable move for now.

DaleGribble88 reply

I consider myself to be more pro AI than not, but I'm certainly not a zealot and mostly agree with the take that it shouldn't be used in artistic pursuits. However, I love using AI to help me create art. It can give great critiques, often good advice on how to improve, and is great for rapid experimentation and prototyping. I actually used it this weekend to see what a D&D mini might look like with different color schemes before painting it. I could have done the same with Gimp, but it would have taken much longer for worse results that was ultimately just for a brain storming session. How do you feel about my AI usage from your perspective? I suppose from an energy conservation perspective, all of it was bad, but I'm more interested in a less trivial take.

sonofearth reply

Yes the energy consumption is bad. My main gripe about LLM generated art is that it will not be original. It will use its training data from uncredited artworks to generate it. Art usually is made by humans to express something or convey something in a creative way. LLMs fail at that. What LLMs can actually be helpful at is making learning art more accessible to everyone. Art schools or private art classes can be expensive. This lowers the barrier to entry.

As for you using generated Art is that the it might be really beautiful but it will be very difficult to maintain that style and even more difficult to convince that it is your style. The Artist doesn’t get much recognition with LLM generated art. Using it as a critique also seems stupid because LLMs will aways try to give an objective view on it than subjective. Your art won’t trigger an emotion in it and might say it is bad or “do this to make it more understandable” — that’s where you lose as an artist.

My mom likes to paint as a hobby. What she does it searches stuff on Pinterest (which is mostly LLM Generated). She uses it as an inspiration to do it in her own style and maybe give it some spin. She keeps all of it for herself.

MeekerThanBeaker reply

I'm a writer. I got paid to write on a few things here and there, but mostly there are just huge barriers for people without connections.

I plan on using AI to turn my writing into a visual animated format for people to consume. I don't much care about the style of art, I just want my work to be seen. I can't afford to pay for artists. If I could, I would. But at least, this would give me an opportunity to show my work without some execs saying no a hundred times.

When I look at the art for cartoons in the 70s/80s, there is so much crap animation with mistakes and duplications, you would think it's "a.i. slop." I understand that these were done overseas, pumped out quickly so quality control was overlooked for speed... but it wasn't the animation I was interested in, it was the stories and characters.

I still think original artists will continue to exist. A.I. is just another tool. People will get bored of the same old stuff and want originality. I really hope it'll make our lives better in the long run, but we're just in the weird middle stage of A.I. crawling before running.

-3

sonofearth reply

I can’t afford to pay for artists

You can afford LLMs right now because all of the LLM companies are losing money on it. If they decide they want to make a profit, they will raise their prices significantly. So you still end up in the same situation. You don’t have much control on what an LLM spits out while with doing animation manually, you have total control or at-least sit with an actual animator to make it look how you envision it to be.

I plan on using AI to turn my writing into a visual animated format for people to consume.

What makes you think that people will respond the same way and in the same numbers to LLM generated animation than if it were crafted by an artist? I reckon that it will be much lower. I see it on youtube constantly. I watched a video about a topic, then I got recommended something related to it from a different channel. Guess what? The script and the animation were so damn similar and the shit they were spewing wasn’t even true in the end. Everything that both the channels made was slop. Sure they spit out more content than conventional methods and got a few thousand views each video and made decent money on it. But they aren’t gonna sustain for long if they want audience retention.

Since then I have been more mindful on what video I click on and going to the extent of disabling recommendations and watch history.

MeekerThanBeaker reply

I have downloaded my own LLM that can be used on my own computer... So the only cost is electricity since I upgraded my computer before the prices went to shit. Newegg even gave me free RAM with the purchase of a motherboard so I lucked out on that. Storage is not an issue too since I got that back in 2024 knowing Trump would fuck everything up.

And no, people might not respond the same way to my work, but then again I'm not taking any work away from anyone else because then it would not even exist. If you want to fund me and the artist for our work, then okay. Show me the money.

One thing I've noticed is that I see many more people complain about slop than slop itself. It's so annoying at this point that's it's making me go in the opposite direction. Hey everyone, slop here... Microsoft slop here... Use Linux Linux Linux. Slop slop slop. Sloppy joes. It's like candlestick makers complaining to Nikola Tesla.

-4

Cataphract reply

Another great example of how AI is just wreaking havoc on people's brains.

Wants to show an enticing product to execs, doesn't want to invest in paying an artist
realizes they have to have connections but doesn't want to network
wants recognition of their hard work, hasn't sought out a community or collaboration but states "show me the money"

AI will fix everything for me! Slop doesn't exist! (ignores the very article we're in, any platform algorithm feed, the us president shit posting, all the slop that gets presented here). Go get em Nik, don't let haters stop your brilliance.

MeekerThanBeaker reply

A very extreme takeaway, but okay.

sonofearth reply

my own LLM that can be used on my own computer

May I ask how many B parameters does it have? Because the paradox over here is:

if it is weak then you will be getting much much worse results than even the Big Models the corpos have (we don't even know how much tbh), let alone the quality of an actual artist.
If you have a respectfully powerful model then your PC might cost thousands of dollars (even by ignoring the price hikes) which eliminates the excuse to pay an actual artist.

catlover

I'd still be highly sceptical about pull requests with code created by llms. Personally what I noticed is that the author of such pr doesn't even read the code, and i have to go through all the slop

kcuf reply

Ya I'm finding myself being the bad code generator at work as I'm scattered across so many things at the moment due to attrition and AI can do a lot of the boilerplate work, but it's such a time and energy sink to fully review what it generates and I've found basic things I missed that others catch and shows the sloppiness. I usually take pride in my code, but I have no attachment to what's generated and that's exposing issues with trying to scale out using this

Repple (she/her) reply

Same. There’s reduction in workforce, pressure to move faster, and no good way to do that without sloppiness. I have never been this down on the industry before; it was never great, but now it’s terrible.

Danitos reply

reddthat.com

Some thought I had the other day: LLM is supposed to make us more productive, say by 20%. Have you won a 20% pay rise since you adopted it? I haven't

NotEasyBeingGreen reply

slrpnk.net

Increases in productivity go to the owners, not the workers. Even imaginary increases in productivity.

Feyd reply

Just fucking stop using it? Wtf? Tell you boss to pound sand! They're going to blame you when it goes south anyway so you might as well stay honest.

-2

jj4211 reply

I suspect the answer will be that such large requested as you frequently see with LLM codegen will just be rejected.

Already I see changes broken up and suggested bit by bit, so I presume the same best practice applies.

terabyterex reply

Did we all forget about stackoverflow?

Peopleblindly copy/pasted from there all the time.

-8

Railcar8095 reply

Couple of years back I got a PR at work that used a block of code that read a CSV, used some stream method to covert it to binary to then feed it to pandas to make a dataframe. I don't remember the exact steps it did, but was just crazy when pd.read_csv existed.

On a hunch I pasted the code in google and found an exact match on overflow for a very weird use case on very early pandas.

I'm lucky and if people send obvious shit at work I can just cc their manager, but I fell for the volunteers at large FOSS projects, or even paid employees.

Evotech reply

Yeah people have not understood their code for centuries now

CanIFishHere

AI is here, another tool to use...the correct way. Very reasonable approach from Torvalds.

Newsteinleo reply

I don't have a problem with LLMs as much as the way people use them. My boss has offloaded all of his thinking to LLMs to the point he can't fix a sentence in a slide deck without using an LLM.

It's the people that try to use LLMs for things outside their domain of expertise that really cause the problems.

InternetCitizen2 reply

This is a big point. People need to understand that the LLMs are more like a fancy graphing calculator; they are very good and handle multiple things, but its on you to understand why the calculation is meaningful. At a certain point no one wants to see your long division or factorial. We want the results and for students and professionals to focus on the concept.

NekoKoneko reply

I get the metaphor but it's not a great one for AI in mathematics especially. A statistical word generator is not going to perform reliable math and woe to anyone who acts otherwise.

I would call it an autistic sycophantic savant with brain damage. It's able to perform apparent miraculous feats of memory and creativity but then be unable to tell reality from fiction, to tell if even the simplest response is valid, and likely will lie about it to make itself seem more competent to please you.

If you have a use for an assistant like that, then great. But a calculator - simple and cheap and reliable - it definitely is not.

NotMyOldRedditName reply

It's the people that try to use LLMs for things outside their domain of expertise that really cause the problems.

That seems to general. Im a mobile developer and sometimes I need a simple script outside my knowledge area. I needed to scrape a website recently, not for anything serious, but to save me time. Claude wrote it and it works. Its probably trash code, but it works and it helped. But you wouldn't want me using Claude to do important work outside my specific area of focus either or im sure Id cause problems.

Newsteinleo reply

I'm talking about people that are accountants that now thing they can create software. Or engineers who think they can now write legal briefs for court.

boraginoru reply

I'm also a mobile app dev and at my workplace they're having non-mobile devs submit code to my codebases totally vibed with no understanding behind it. It's absolutely causing problems, especially for me, who is one of the only lines of defense keeping stuff even remotely maintainable.

So yes basically you're right. If people only used it to learn and do initial code review passes and other reasonable things we'd be totally fine. But that's unfortunately not the reality 🙈

NotMyOldRedditName reply

It’s absolutely causing problems, especially for me, who is one of the only lines of defense keeping stuff even remotely maintainable.

The next step is, CEO, look at how good these non-mobile devs are, they're submitting 10x the commits to the mobile repo than boraginoru our mobile dev! We should fire him and just let the backend devs keep vibe coding it!

CanIFishHere reply

Very frustrating for sure. Like any tool, it's up to humans to know when the tool is useful.

filcuk reply

Partly a marketing issue.
Companies keep advertising their new AI's as destroyers of worlds, and something that's too dangerous to even release.
As with anything else, the average user will not have but the most surface level understanding of the tool

null reply

Clickbait got me. No mention of "Yes copilot" which I assumed was a joke anyway.

oyzmo reply

👆🏻true

NewNewAugustEast

Copilot? You mean the AI with terms of service that are in bold and explicit: "for entertainment purposes only"?

Which is why its in the title and not the article? EntertainBait?

Zacryon reply

I suppose GitHub Copilot is meant, which is a different thing.

Senal reply

Different how, isn't github owned by microsoft ?

lepinkainen reply

There are like 70 copilots

ThinkyMcThinkface reply

Up to 81 now

Diurnambule reply

The hell. How can they expect people to understand ? They plan to sell 100 things under the same name and try to sell it as one big AI when it is hundred of différents things unrelated ?

Squizzy reply

Most of those are bundled, no one is buying copilot fot OneNote they just get it when the get the rest of that suite.

JcbAzPx reply

They've never been good at naming things, but they now seem to be going out of their way to try to be the worst with the names of their software. For instance, they named the successor to the already generically named "remote desktop protocol" "windows app".

Diurnambule reply

This one is funny. Go google windows app commands. They just fucked sysadmins

Senal reply

Ok, so there are 70-81 copilots, github is one of them.

Why is github copilot a different thing in the context of the reply that was being responded to ?

lepinkainen reply

Copilot is the harness, Claude and GPT are the models

Copilot is by far the worst harness of all the major players

Senal reply

Yes, i get that, copilot is like opencode or cursor, though perhaps with less general access to models.

There was a reply

Copilot? You mean the AI with terms of service that are in bold and explicit: “for entertainment purposes only”?

followed by

I suppose GitHub Copilot is meant, which is a different thing.

i was asking why github copilot is different in that context.

dev_null reply

Different in that it's not an AI model, it's just a tool you can use to run AI models like Claude.

Senal reply

see my reply here

bss03 reply

Source: https://lifehacker.com/tech/microsoft-copilot-for-entertainment-purposes-only

Electricd reply

Just legal stuff. Making a huge deal of it is dumb

NewNewAugustEast reply

I disagree.

Legal stuff would be Use at your own risk, or answers may not be correct.

This is really strong language.

Katherine 🪴

Linux kernel being written by Microsoft's AI.

MoogleMaestro reply

Microsoft needs to try to ruin Linux somehow, it can't just hurt windows 11 with AI slop code, it needs to expand it's efforts to other systems.

nutsack reply

which is trained on free and open source code

Sylvartas reply

That will definitely not introduce some weird things when it starts feeding on itself.

Seth Taylor

Bad actors submitting garbage code aren't going to read the documentation anyway, so the kernel should focus on holding human developers accountable rather than trying to police the software they run on their local machines.

"Guns don't kill people. People kill people"

Torvalds and the maintainers are acknowledging reality: developers are going to use AI tools to code faster, and trying to ban them is like trying to ban a specific brand of keyboard.

The author should elaborate on how exactly AI is like "a specific brand of keyboard". Last I checked a keyboard only enters what I type, without hallucinating 50 extra pages. And if AI, a tool that generates content, is like "a specific brand of keyboard", does that mean my brain is also a "specific brand of keyboard"?

I get their point. If you want to create good code by having AI create bad code and then spending twice the time to fix it, feel free to do that. But I'm in favor of a complete ban.

Simulation6 reply

The keyboard thing is sort of a parable, it is as difficult to determine if code was generated in part by AI as it is to determine what keyboard was used to create it.

Miaou reply

The (very obvious) point is that this cannot be enforced. So might as well deal with it upfront.

Shayeta reply

AI is a useful tool for coding as long as it's being used properly. The problem isn't the tool, the problem is the companies who scraped the entire internet, trained LLM models, and then put them behind paywalls with no options to download the weights so that they could be self-hosted. Brazen, unaccountable profiteering off of the goodwill of many open source projects without giving anything back.

If LLMs were community-trained on available, open-source code with weights freely available for anyone to host there wouldn't be nearly as much animosity against the tech itself. The enemy isn't the tool, but the ones who built the tool at the expense of everyone and are hogging all the benefits.

cartoon meme dog reply

There are hundreds of such LLMs with published training sets and weights available on places like HuggingFace. Lots of people run their own LLMs locally, it's not hard if you have enough vram and a bit of patience to wait longer for each reply.

Electricd reply

Eh, trust me, anti AI people don’t think this much about it

Also, there are a lot of open weight models out there that are pretty good

Electricd reply

You’re the one comparing AI and guns/killing people, and then saying their metaphorical comparison isn’t accurate? Lol

ede1998 reply

Last I checked a keyboard only enters what I type

I've had (broken) keyboard "hallucinate" extra keystrokes before, because of stuck keys. Or ignore keypresses. But yeah, that means the keyboard is broken.

BigPotato reply

Wooting and Razer had a macro function that allowed Counterstrike players to setup a function to always get counter strafe. Valve decided that was a bridge too far and banned "Hardware level" exploits.

So, Valve once banned a keyboard.

bassow reply

Torvalds and the maintainers are acknowledging reality: developers are going to use AI tools to code faster, and trying to ban them is like trying to ban a specific brand of keyboard.
The author should elaborate on how exactly AI is like “a specific brand of keyboard”. Last I checked a keyboard only enters what I type, without hallucinating 50 extra pages. And if AI, a tool that generates content, is like “a specific brand of keyboard”, does that mean my brain is also a “specific brand of keyboard”?

It's about the heritage of code not being visible from the surface. I don't know about your brain.

ziproot reply

Last I checked a keyboard only enters what I type

I’m assuming the author is talking about mobile keyboards, which have autocomplete and autocorrect.

alyth reply

-3

null

lemmy.org

Ah, the solution that recognizes there's no way to eliminate AI from the supply chain after it's already been introduced.

sunbeam60 reply

You make it sound as if there was another choice if just people had better principles. Pray tell us, what would you have done, now. Not in the past, now.

-3

null reply

lemmy.org

That wasn't my intent. This is me saying, "of course that's what they're going to do because there's nothing else they can do."

sunbeam60 reply

I completely misunderstood you. I’m sorry.

Feyd reply

You're agreeing with the comment you replied to. Why the fuck are you trying to be so smug???

0ndead

“Yes to Copilot, no to AI slop”

Pick One

truthfultemporarily reply

Where does slop start? If you use auto complete and it is just adding a semicolon or some braces, is it slop? Is producing character by character what you would have wrote yourself slop?

How about using it for debugging?

hperrin reply

You don’t need AI to autocomplete code. We’ve had autocomplete for over 30 years.

ell1e reply

If you would have written it yourself the same way, why not write it yourself? (And there was autocomplete before the age of LLMs, anyway.)

The big problems start with situations where it doesn't match what you would have written, but rather what somebody else has written, character by character.

BoxOfFeet reply

To me, it starts at anything beyond correcting spelling for individual words or adding punctuation. I don't even want it suggesting quick reply phrases.

Is producing character by character what you would have wrote yourself slop?

Yes.

badgermurphy reply

There's the rub. When establishing laws and guidelines, every term must be explicitly defined. Lack of specificity in these definitions is where bad-faith actors hide their misdeeds by technically obeying the letter of the law due to its vagueness, while flagrantly violating its spirit.

Its why today, in the USA, corporations are legally people when its convenient, and not when its not, and the expenditure of money is governments protected "free speech".

FauxLiving reply

There is a certain brand of user (who may or may not be a human) who draws the venn of 'AI slop' and 'AI output' as a circle.

They've taken the extremist position that AI should be uninvented and any use of AI is the worst thing that could possibly happen to any project and they'll have an entire grab bag of misinformation-based memes to shotgun at you. Engaging with these people is about as productive as trying to convince a vaccine denier that vaccines don't cause autism.

I'm not saying that the user you replies to believes this, but the comment they wrote is indistinguishable from the comments of such a user.

e: I'd also like to point out that these users are very much attracted to low-effort activism. This is why you see comments like mind being heavily downvoted but not many actual replies. They want to try to influence the discussion but don't have the capability or motivation to step into the ring, so to speak, and defend their opinions.

-8

ell1e reply

https://github.com/mastodon/mastodon/issues/38072#issuecomment-4105681567

It's less extremist if you look at how easily these LLMs will just plagiarize 1:1, apparently:

Some see "AI slop" as "identified by the immediate problems of it that I can identify right away".

Many others see "AI slop" as bringing many more problems beyond the immediate ones. Then seeing LLM output as anything but slop becomes difficult.

FauxLiving reply

It's extremist to take the fact that you CAN get plagiaristic output and to conclude that all other output is somehow tainted.

You personally CAN quote copyrighted music and screenplays. If you're an artist then you also CAN produce copyright violating works. None of these facts taint any of the other things that you produce that are not copyright or plagiarized.

In this situation, and in the current legal environment, the responsibility to not produce illegal and unlicensed code is on the human. The fact that the tool that they use has the capability to break the law does not mean that everything generated by it is tainted.

Photoshop can be used to plagiarize and violate copyright too. It would be just as absurd to declare all images created with Photoshop are somehow suspect or unusable because of the capability of the tool to violate copyright laws.

The fact that AI can, when specifically prompted, produce memorized segments of the training data has essentially no legal weight in any of the cases where it has been argued. It is a fact that is of interest to scientists who study how AI represent knowledge internally and not any kind of foundation for a legal argument against the use of AI.

badgermurphy reply

Sure, but if they can be demonstrated to ever plagiarize without attribution, and the default user behavior is to pencil-whip the output, which it is, then it becomes statistically certain that users are unwittingly plagiarizing other works.

Its like using a tool that usually bakes cookies, but every once in a great while, it knocks over the building its in. It almost never does that, though.

FauxLiving reply

Plagiarism and copyright violation are two different things, one is ethical and the other is legal.

Copyright has a body of case law which helps determine when a work significantly infringes on the copyrighted work of another. Plagiarism has no body of law at all, it is an ethical construct and not a legal one.

You can plagiarize something that has no copyright protection and you can infringe on copyright protection without plagiarizing. They're not interchangeable concepts.

In your example, some institutions would not allow such a device to operate on their property but it would not be illegal to operate and the liability would be on the person and not on the oven.

To further strain the metaphor, Linus is saying that you can use (possibly) exploding ovens, because he isn't taking a moral stance on the topic, but you are responsible for the damages if they cause any because the legal systems require that this be the case.

-1

hperrin reply

According to the US Copyright Office, AI generated material cannot be copyrighted (unless of course it’s plagiarized copyrighted code). That’s reason enough to leave it out of the kernel. If the kernel’s license becomes unenforceable because of public domain code, the kernel is tainted.

Edit: I don’t know why people are downvoting this. It’s literally just the truth: https://sciactive.com/human-contribution-policy/#More-Information

FauxLiving reply

Copyright and License terms are two different categories of law. Copyright is an idea created and enforced by the laws of the country which has jurisdiction. Licenses are a contract between two parties and is covered by contract law.

A thing can be unable to be protected by copyright and also protected by the terms of the license that it is provided under. If a project contains copyrighted code that does not mean that you cannot be held to the terms of the license. Your use of licensed works is granted under the agreement that you follow the terms of the license. You cannot be held liable for copyright violations for using the code, but using the code in a manner that is not allowed by the license makes you liable for violation of the contract that is the license agreement.

hperrin reply

I think you’re misunderstanding what I’m saying. Any portions of the kernel that are public domain can be used by anyone for any purpose without following the terms of the GPL. AI generated code is public domain. To make sure all parts of the kernel are protected by the GPL, public domain code should not be accepted unless absolutely necessary.

FauxLiving reply

I don't see the problem. GPL protects all of the code that is copyrighted, i.e. 100% made by humans. Accepting a submission created with AI tools doesn't change this. It's not going to be a simple task for someone who has decided to violate the GPL license to only use the generated/uncopyrighted portions without using any other GPL code and thus being subject to GPL licensing terms.

These hypothetical GPL violating people will have a hard time using lines 27-38 of ./kernel/events/ring_buffer.c to do anything even if they technically can do so without releasing their code under the GPL. If they use any piece of GPL code, at all, anywhere, their entire project is required to follow the GPL. So while they could, technically, take 27-38 of ring_buffer.c and build an entire proprietary non-GPL Linux kernel... it is, in practice, not feasible even if it technically possible.

hperrin reply

So what happens thirty years from now when 95% of the kernel code is AI generated? It’ll be a lot easier to rewrite the parts that aren’t, and have a fully closed source kernel that you can use without following the GPL.

femtek reply

I mean I don't use copilot but a self hosted Claude at work for debugging and creating templates. I still run thru and test it. I'm only doing crossplane, kyverno, kubernetes infra things though and I started without it so I have an understanding. Now running their someone's crossplane composition written in go and I asked them about this error and he just said get the AI to fix it was worrying since his last day is next week.

chilicheeselies reply

Its only slop if you accept slop. What i mean is that it cna and does generate perfectly fine code. It also generates code that is ok, but needs a human touch. It also generates verbose garbage.

Its only slop if you approve the slop. Its perfectly fine to let it generate the boilerplate of what you want, and tweak it. If its prompted well enough, you get less slop.

Ultimately I am with Linus on this one. The genie is out of the bottle. Use it responsibly.

gandalf_der_12te

discuss.tchncs.de

I agree. If AI becomes outlawed, it will simply be used without other people knowing about it.

This approach, at least, means that people will label AI-generated code as such.

emmy67 reply

Maybe. There's still strong disapproval around it. I can imagine many will still hide it.

Jankatarch

Maintainers' only responsibility is to ensure quality and shouldn't have to check for rogue AI submissions.

Tho I still miss consistent fucking weather so year of the netbsd?

Electricd reply

Ensuring you don’t approve garbage, either human or AI generated, is part of quality

peacefulpixel

ell1e

https://dl.acm.org/doi/10.1145/3543507.3583199

Ultimately, the policy legally anchors every single line of AI-generated code

How would that even be possible? Given the state of things:

Our results suggest that [...] three types of plagiarism widely exist in LMs beyond memorization, [...] Given that a majority of LMs’ training data is scraped from the Web without informing content owners, their reiteration of words, phrases, and even core ideas from training sets into generated texts has ethical implications. Their patterns are likely to exacerbate as both the size of LMs and their training data increase, [...] Plagiarized content can also contain individuals’ personal and sensitive information.

https://www.theatlantic.com/technology/2026/01/ai-memorization-research/685552/

Four popular large language models—OpenAI’s GPT, Anthropic’s Claude, Google’s Gemini, and xAI’s Grok—have stored large portions of some of the books they’ve been trained on, and can reproduce long excerpts from those books. [...] This phenomenon has been called “memorization,” and AI companies have long denied that it happens on a large scale. [...]The Stanford study proves that there are such copies in AI models, and it is just the latest of several studies to do so.

https://www.twobirds.com/en/insights/2025/landmark-ruling-of-the-munich-regional-court-(gema-v-openai)-on-copyright-and-ai-training

The court confirmed that training large language models will generally fall within the scope of application of the text and data mining barriers, [...] the court found that the reproduction of the disputed song lyrics in the models does not constitute text and data mining, as text and data mining aims at the evaluation of information such as abstract syntactic regulations, common terms and semantic relationships, whereas the memorisation of the song lyrics at issue exceeds such an evaluation and is therefore not mere text and data mining

https://www.sciencedirect.com/science/article/pii/S2949719123000213#b7

In this work we explored the relationship between discourse quality and memorization for LLMs. We found that the models that consistently output the highest-quality text are also the ones that have the highest memorization rate.

https://arxiv.org/abs/2601.02671

recent work shows that substantial amounts of copyrighted text can be extracted from open-weight models. However, it remains an open question if similar extraction is feasible for production LLMs, given the safety measures [...]. We investigate this question [...] our work highlights that, even with model- and system-level safeguards, extraction of (in-copyright) training data remains a risk for production LLMs.

How does merely tagging the apparently stolen content make it less problematic, given I'm guessing it still won't have any attribution of the actual source (which for all we know, might often even be GPL incompatible)?

But I'm not a lawyer, so I guess what do I know. But even from a non-legal angle, what is this road the Linux Foundation seems to embrace of just ignoring the license of projects? Why even have the kernel be GPL then, rather than CC0?

I don't get it. And the article calling this "pragmatism" seems absurd to me.

FauxLiving reply

Given the research that you've done here I'm going to assume that you're looking for an answer and not simply taking us on a gish gallop.

Your premise, and what appears to be the primary source of confusion, is built on the idea that this is 'stolen' work which, from a legal point of view, is untrue. If you want to dig into why that is, look into the precedent setting case of Authors Guild, Inc. v. Google, Inc. (2015). The TL;DR is that training AI on copyrighted works falls under the Fair Use exemptions in copyright law. i.e. It is legal, not stealing.

The case you linked from Munich shows that other country's legal systems are interpreting AI training in the same way. Training AI isn't about memorization and plagiarism of existing work, it's using existing work to learn the underlying patterns.

That isn't to say that memorization doesn't happen, but it is more of a point of interest to AI scientists that are working on understanding how AI represents knowledge internally than a point that lands in a courtrooom.

We all memorize copyrighted data as part of our learning. You, too, can quote Disney movies or Stephen King novels if prompted in the right way. This doesn't make any work you create automatically become plagarism, it just means that you have viewed copyrighted work as part of your learning process. In the same way, artists have the capability to create works which violate the copyright of others and they consumed copyrighted works as part of their learning process. These facts don't taint all of their work, either morally or legally... only the output that literally violates copyright laws.

The pragmatism here is recognizing that these tools exist and that people use them. The current legal landscape is such that the output of these tools is as if they were the output of the users. If an image generator generates a copyrighted image then the rightsholder can sue the person, not the software. If a code generator generates licensed code then the tool user is responsible.

This is much like how we don't restrict the usage of Photoshop despite the fact that it can be used to violate copyright. We, instead, put the burden on the person who operates the tool

That's what is happening here. Linus isn't using his position to promote/enforce/encourage LLM use, nor is he using his position to prevent/restrict/disallow any AI use at all. He is recognizing that this is a tool that exists in the world in 2026 and that his project needs to have procedures that acknowledge this while also ensuring that a human is the one responsible for their submissions.

This is the definition of pragmatism (def: action or policy dictated by consideration of the immediate practical consequences rather than by theory or dogma).

e: precedent, not president (I'm blaming the AI/autocorrect on this one)

mimavox reply

Training AI isn't about memorization and plagiarism of existing work, it's using existing work to learn the underlying patterns.

Thank you. This is exactly what people misunderstands. LLMs aren't gigantic databases that just shuffles information that they've copied from the internet.

bss03 reply

The TL;DR is that training AI on copyrighted works falls under the Fair Use exemptions in copyright law

This judgement was reversed by the next federal judge that reviewed AI, in the Meta case.

It is far from legally settled whether training is fair use or not.

FauxLiving reply

Well, cynically, the Supreme Court will decide and Team AI has more money to buy RVs and luxury vacations.

anarchiddy reply

That's not really how copyright law works.

ell1e reply

Would you also say that to this lawyer reviewing Co-Pilot in 2026? https://github.com/mastodon/mastodon/issues/38072#issuecomment-4105681567

Disclaimer: this isn't legal advice.

anarchiddy reply

LLMs themselves being products of copyright isnt the legal question at issue, it's the downstream use of that product.

If I use a copyright-infringing work as a part of a new creative work, does that new work infringe copyright by default? Or does the new work need to be judged itself as to the question of infringing a copyrighted work?

And if it is judged as infringing, who is responsible for the damage done? Can I pass the damages back to the original infringing work? Or should I be held responsible for not performing due diligence?

FauxLiving reply

If I use a copyright-infringing work as a part of a new creative work, does that new work infringe copyright by default?

No, see reaction content, parody content, etc. They all undoubtedly use copyrighted work and they don't automatically infringe on copyright by default.

And if it is judged as infringing, who is responsible for the damage done? Can I pass the damages back to the original infringing work? Or should I be held responsible for not performing due diligence?

The infringing party is the human that used the tool which generated the infringing work. Everything after that is exactly the same applicaton of copyright law just as if you were selling pictures of Mickey Mouse that you drew yourself. Disney can sue you, they can't sue the pencil manufacturer.

anarchiddy reply

Yup

People want to pretend as if everything that flows downstream from the creation of LLMs is illegal, but that's just not the reality.

hperrin reply

https://sciactive.com/human-contribution-policy/#More-Information

It is though. If you commit copyrighted code that was output by an LLM, you do have to follow the license of that code. If you don’t, that’s copyright infringement.

Even if the code isn’t copyrighted code, then it’s public domain code that can’t be copyrighted:

FauxLiving reply

You're confusing two separate legal issues.

Licenses are created and enforced by contract law.

You can violate a contract without violating a copyright and you can violate a copyright without agreeing to a license. You can also license works that are not able to be protected by a copyright because they are two separate categories of law.

hperrin reply

Sure, you can license them, but that license is unenforceable, because you don’t own the copyrights, so you can’t sue anyone for copyright infringement. And you’d have to be a fool to agree to a license for public domain material. You can do whatever you want with it, no license necessary.

FauxLiving reply

because you don’t own the copyrights, so you can’t sue anyone for copyright infringement.

You can't sue for copyright infringement.

You can, however, use content which is not able to be copyrighted and also still license (under contract law/EULAs) your product including terms prohibiting copying of the non-copyrightable information.

This was settled in: https://en.wikipedia.org/wiki/ProCD%2C_Inc._v._Zeidenberg

On Zeidenberg's copyright argument, the circuit court noted the 1991 Supreme Court precedent Feist Publications v. Rural Telephone Service, in which it was found that the information within a telephone directory (individual phone numbers) were facts that could not be copyrighted. For Zeidenberg's argument, the circuit court assumed that a database collecting the contents of one or more telephone directories was equally a collection of facts that could not be copyrighted. Thus, Zeidenberg's copyright argument was valid. However, this did not lead to a victory for Zeidenberg, because the circuit court held that copyright law does not preempt contract law. Since ProCD had made the investments in its business and its specific SelectPhone product, it could require customers to agree to its terms on how to use the product, including a prohibition on copying the information therein regardless of copyright protections

You can't copyright phone numbers, just like you can't copyright generated code, but you can still create a license which protects your uncopyrightable content and it can be enforced via contract law.

hperrin reply

Sure, but if it’s open source, I can just take that code without agreeing to your contract. Since it’s public domain, I can do whatever I want with it. You can only enforce a contract if I agree to it.

FauxLiving reply

It doesn't have to be open source.

If someone 100% generates code to make software then the software isn't protected by copyright.

That software could be distributed and licensed under an EULA and the fact that it isn't protected by copyright means absolutely nothing as far as the EULA is concerned.

The copyright status and the ability to license a piece of software under contract law do not depend on one another.

anarchiddy reply

The Linux Kernel is under a copyleft license - it isnt being copyrighted.

But the policy being discussed isn't allowing the use of copyrighted code - they're simply requiring any code submitted by AI be tagged as such so that the human using the agent is ultimately responsible for any infringing code, instead of allowing that code go undisclosed (and even 'certified' by the dev submitting it even if they didnt write or review it themselves)

Submissions are still subject to copyright law - the law just doesnt function the way you or OP are suggesting.

-2

hperrin reply

Copyleft doesn’t mean it’s not copyrighted. Copyleft is not a legal term. “Copyleft” licenses are enforced through copyright ownership.

Did you read the quotes from the copyright office I linked to? I am going to go ahead and trust the copyright office over you on issues of copyrightability.

anarchiddy reply

Even if this were true, it would only mean that the GNU license is unenforceable, not that the Linux kernel itself is infringing copyright

hperrin reply

Unless the code the AI generated is a copy of copyrighted code, of course. Then it would be copyright infringement.

I can cause the AI to spit out code that I own the copyright to, because it was trained on my code too. If someone used that code without including attribution to me (the requirement of the license I release my code under), that would be copyright infringement. Do you understand what I mean?

anarchiddy reply

That would be true even if they didn't use AI to reproduce it.

The problem being addressed by the Linux foundation isn't the use of copyrighted work in developer contribution, it's the assumption that the code was authored by them at all just because it's submitted in their name and tagged as verified.

Does that make sense?

AeonFelis reply

they’re simply requiring any code submitted by AI be tagged as such so that the human using the agent is ultimately responsible for any infringing code, instead of allowing that code go undisclosed

This makes zero sense, because the article says that this new tagging will replace the legally binding "Signed-off-by" tag. Wouldn't that old tag already put that responsibility on the person submitting the code.

Also - what will holding the submitter responsible even achieve? If an infringement is detected, the Linux maintainers won't be able to just pass all the blame to the submitter of that code while keeping it in the codebase - they'll have to remove the infringing code regardless of who's responsible for putting it in.

anarchiddy reply

Kinda, but they're specifically saying the the AI agent cannot itself tag the contribution with the sign-off - like, someone using Claude Code to submit PRs on their behalf. The developer must add the tag themselves, indicating that they at least reviewed and submitted it themselves, and it wasn't just an agent going off-prompt or some other shit and submitting it without the developer's knowledge. This is saying 'the dog ate my homework' is not a valid excuse.

The developer can use AI, but they must review the code themselves, and the agent can't "sign-off" on the code for them.

Also - what will holding the submitter responsible even achieve?

What does holding any individual responsible on a development team do? The Linux project is still responsible for anything they put out in the kernel just like any other project, but individual developers can be removed from the contributing team if they break the rules and put it at risk.

The new rule simply makes the expectations clear.

hperrin

https://sciactive.com/human-contribution-policy/#Reasoning

There are so many reasons not to include any AI generated code.

stylusmobilus

aussie.zone

any resulting bugs or security flaws firmly onto the shoulders of the human submitting it.

Watch Americans and their companies pull some mad gymnastics on proportioning blame for this

Electricd reply

Well yea, it’s the human submitting the code, and using a tool known to be imperfect

Your comment is pretty dumb

-6

stylusmobilus reply

aussie.zone

At this point it’s 23 on -5 with opinions on that dumb comment sunshine

Electricd reply

Because obviously the majority always right.

pastermil

Definitely not a big fan of it, but realistically speaking, it's here to stay. It is wise for them to govern and regulate it rather than outright ban it. Especially with a project as big as this one, people will try. Saying that the responsibility falls on the human is definitely the right move.

twinnie

No point getting upset about this, it’s inevitable. So many FOSS programmers work thanklessly for hours and now there’s some tool to take loads of that work away, of course they’re going to use it. I know loads of people complain about it but used responsibly it can take care of so much of the mundane work. I used to spend 10% of my time writing code then 90% debugging it. If I do that 10% then give it to Claude to go over I find it just works.

uuj8za reply

but used responsibly

That's like the most incredibly hard part of all of this. Everything is aligned so that you don't use it responsibly. And it's really hard to guard against this.

Just a few days ago, I was pairing with a coworker and he was using Claude to do a bunch of stuff. He didn't check any of it. I thought he was gonna check stuff before pushing stuff... And nope! I said, "Wait, shouldn't we review the changes to make sure they're correct?" And he said, "Nah, it's probably fine. I trust it. Plus, even if it's wrong, we'll just blame the AI and we can just fix it later."

...

Yes, checking the work would have negated all of the "time saved" and he was being a lazy fuck.

People who don't like coding or engineering use this and they are not interested in using this responsibly.

uuj8za reply

for why using it "responsibly" is super hard, even if you're an expert. We're hardwired to take mental shortcuts, so we might not even realize we're using heuristics or falling for cognitive biases when fact checking the AI.

Tiresia reply

slrpnk.net

That's valid for workers in a capitalist system or for capitalists trying to scam people. But why would someone sign their real name to unchecked AI slop for an open source project? It would risk ruining their reputation for little personal gain.

geekwithsoul reply

"I used to spend 10% of my time writing code then 90% debugging it"

Skill issue

(Edited to add context)

-18

NaibofTabr reply

This is a bad take, which dismisses the amount of labor involved in maintaining widely used software projects.

geekwithsoul reply

I was referring (mostly jokingly) to his spending 90% of his time debugging. But you do you.

-5

femtek reply

Time issue

ell1e reply

Whatever it is, it doesn't mean LLMs are a sane or "inevitable" answer.

Mihies reply

How is it time issue if you have percentages?

femtek reply

I mean oss is not getting the support they need and have to keep up with security, bug, and features so using LLMs to speed up development will help.

XLE

This seems like an ill-thought-out decision, especially in a landscape where Linux should be differentiating itself from, and not following Windows.

The titular "slop" just means "bad AI generated code is banned" but the definition of "bad" is as vague as Google's "don't be evil." Good luck enforcing it, especially in an open-source project where people's incentives aren't tied to a paycheck.

Title is also inaccurate regarding CoPilot (the Microsoft brand AI tool), as a comment there mentions

says yes to Copilot

Where in the article does it say that?? The only mention of CoPilot is where it talks about LLM-generated code having unverifiable provenance. Reply

Naich reply

piefed.world

Google's "don't be evil" was like a warrant canary. It didn't need to be precise, it just needed to be there.

Avid Amoeba reply

They're already enforcing it. PRs are reviewed and bad ones are rejected all the time.

truthfultemporarily reply

I also want to say that Linus is still the one merging things into the kernel and he is ahm.. opinionated?

Avid Amoeba reply

It's also probably possible to teach an agent this opinion to help review.

XLE reply

So you're advocating in favor of more AI in more steps of the process?

Avid Amoeba reply

Not necessarily Big Tech's AI, but "a program" that can automate this part of the PR process. I'm not interested in a program that gives pointless or bad suggestions. I'm interested in a program that can spot pattern X which I always say "this is bad because Y" and print that for me. If it were easy to write a classic program to do this, I would have written it. If that's easy with LLMs, I'd train my local Qwen or whatever to do it. Not a faceless corpo that runs this on gas turbines, poisoning people around them and lying to me abt how much it costs me.

anarchiddy reply

If you think "bad" is too vague, then that isnt a new problem.

Linux has always had to reject 'bad' code submissons - what's new here is that the kernel team isnt willing to prejudice all AI code as "bad", even if that would be easier.

404found

I don't understand the full picture here, but the person who is submitting AI slop will be held accountable. Never a company.

So if a company is pushing staff to us AI to complete projects faster and their code ends up being AI slop when submitted, only the person working for the company will be held responsible.

I'm not sure what the repercussions are here but hopefully it's not a large fine. Those fines could add up quick if the person is submitting code all the time and doesn't know they are messing up.

Wispy2891 reply

Which fines, this is just an internal rule in an organization.

At most can be rightfully banned from contributing

It someone is contributing with code that doesn't really understand, then shouldn't contribute

404found reply

Ah okay got it now. Thanks. I didn't understand it all the way. My comment is irrelevant

hperrin

This is a bad move. The GPL license cannot be enforced on AI generated code.

terabyterex reply

Thats not true. The new article being shoved down lemmy's throat is not correct. They site court cases and come to bad conclusions

hperrin reply

https://sciactive.com/human-contribution-policy/#More-Information

Ok, well here are quotes from the US Copyright Office that establish that what I said is true:

terabyterex reply

The copyright office never saif gpl could not be enforced. Thats s conclusion made. Hell even in what you linked the requirement for this is that ai had to be a "substantial" part. The linux teamsaid they would take submissions that were assisted but not all out generated. But to argue a point, lets pretend that an entire pull request was ai generated. That is only a small part of thr linux kernel since the kernel is what is licensed. A sma amount of uncopywrited code cant invalidate the whole project, which the license is on.

But regardless, the copyright office never said anything about enforcement of gpl. T very clear said code with no meaningful human involvement, which isnt the case here. So nothing establishes what you said true. Its all leaping to comclusions that cant be leaped to.

hperrin reply

The copyright office said material generated by AI is not copyrighted, even if that material is subsequently revised by the AI through additional prompts. That includes code. The GPL can only be used on copyrighted code. It is a copyleft license because it uses copyright law as a mechanism to enforce its terms. If you believe you can enforce a license on public domain material, that’s simply a gross misunderstanding of copyright law.

Yes, it will hopefully be a very small part of the kernel, but what happens thirty years from now if the kernel is all AI generated code? It may be a slippery slope, but it’s a valid slippery slope. The more the kernel is AI generated, the less of it the license can cover.

Goodlucksil reply

AI generated code cannot be copyrighted, can it? Then it can be relicensed as GPL.

-1

eleijeep reply

In order to "license" a work, you need to own the copyright.

FauxLiving reply

The status of generated code is 'uncopyrightable', which can be licensed.

Copyright law determines the copyright status and contract law enforces the terms of contracts. They are two separate issues.

If someone licenses you to use their AI generated code and you violate the license agreement, it doesn't matter that they don't have a claim under copyright law. They have a claim under contract law due to you violating the terms of the license (which is a contract).

-1

eleijeep reply

The GPL is not a contract.

FauxLiving reply

That is the FSF's position, but the case law has examples of cases where it was allowed to be treated by a contract.

SFC v. Vizio, the Software Freedom Conservancy sued Vizio as a third-party beneficiary of the GPL as a contract, and the court allowed the case to proceed on that theory.

eleijeep reply

Because in that case the copyright holder is the arbitrator of the terms under which their copyrighted material can be used and reproduced. If they did not own the copyright then any “license” would not be worth the paper it was written on and no judge would allow it to be treated as an implicit contract.

FauxLiving reply

You're right, I misread the context (I was trying to carry on multiple simultaneous conversations).

My apologies.

hperrin reply

- https://ironcladapp.com/journal/contracts/software-license-agreement

Distributing under the GPL is a software license agreement which is absolutely a contract:

A software license agreement is a legal contract that grants you permission to use software without transferring ownership. The software creator retains intellectual property rights while giving you specific usage rights under defined terms and conditions.

hperrin reply

Sure, you can license it whatever you want, but I can too, because it’s public domain. And neither of us can enforce those license terms on the other, because again, it’s public domain.

mesa

I hate ai in my kernel....

treadful

I'm curious how this is going to play out legally for copyright. If you accept AI code, you can't copyright it, so aren't you essentially forfeiting the copyleft license?

Blaster M reply

They aren't allowing fully ai generated code. Copyright office says ai used in the process does not forefit the copyright, but ai generating the content entirely (or almost entirely) does. By having the user be responsible for the code, it burdens the user to make sure this stuff isn't abused to do that.

treadful reply

Where's that line drawn? Just the fact that it's an open legal question makes accepting these contributions risky.

Venia Silente

How is this all supposed to be, when AI code can not be copyrighted and thus those submissions to the Linux kernel can not be eg.: GPLv{number}?

robinadams

lemmy.wtf

Well, time to switch to NetBSD

menas

lemmy.wtf

Ecological, social, economic issues and the answer is on the legal site. FOSS as usual I guess

GreenBeanMachine

MoogleMaestro reply

It's definitely financially motivated. Linus said himself that AI has been very lucrative for Linux as it has expanded investment from companies that normally wouldn't give a fuck (he name dropped NVidia specifically) on that one LTT video.

Horsey reply

Saying no to code just because it was AI generated is like saying you can’t trust excel to be your bookkeeper. It’s a tool, and the person using the tool being at fault is exactly what happened here.

GreenBeanMachine reply

Feyd reply

You can actually set it up to give the same outputs given the same inputs (temperature = 0). The variability is on purpose

-1

EzTerry reply

You can, at that will cause the same output on the same input if there is no variation in floating point rounding errors. (True if the same code is running but easy when optimizing to hit a round up/down and if the tokens are very close the output will diverge)

The point the people (or llm arguing against llms) miss is the world is not deterministic, humans are not deterministic (at least in a practical way at the human scale). And if a system is you should indeed not use an llm.. Its powere is how it provides answers with messy data.. If you need repeatability make a scripts / code ect.

(Note I do think if the output is for human use it's important a human validate its useful.. The llms can help brainstorm, can with some tests manage a surprising amount of code, but if you don't validate and test the code it will be slop and maybe work for one test but not for a generic user.

Feyd reply

You can, at that will cause the same output on the same input if there is no variation in floating point rounding errors. (True if the same code is running but easy when optimizing to hit a round up/down and if the tokens are very close the output will diverge)

There are more aspects to the randomness such as race conditions and intentionally nondeterministic tiebreaking when tokens have the same probability, apparently.

I actually think LLMs are ill suited for the vast majority of things people are currently using them for, and there are obviously the ethical problems with data centers bringing new fossil fuel power sources online, but the technology is interesting in and of itself

-3

GreenBeanMachine reply

Feyd reply

Floating point math is deterministic.
Systems don't have to be programmed with race conditions. That is not a fundamental aspect of an LLM, but a design decision.
Systems don't have to be programmed to tie break with random methods. That is not a fundamental aspect of an LLM, but a design decision.

This is not hard stuff to understand, if you understand computing.

-3

GreenBeanMachine reply

Feyd reply

You also have to run the model with the input to determine what the output will be, no way to determine it BEFORE running. With a deterministic system, if you know the code you can predict the output with 100% accuracy without ever running it.

This is not the definition of determinism. You are adding qualifications.

I did look it up and I see now there are other factors that aren't under your control if you're using a remote system, so I'll amend my statement to say that you can have deterministic inference systems, but the big ones most people use cannot be configured to be by the user.

-1

GreenBeanMachine reply

Feyd reply

Deterministic systems are always predictable, even if you never ran the system. Can you determine the output of an LLM with zero temperature without ever having ran it?

You don't have to understand a deterministic system for it to be deterministic. You are making that up.

And even disregarding the above, no, they are still NOT deterministic systems

I conceded that setting temperature to 0 for an arbitrary system (including all the remote ones most people are using) does not mean it is deterministic after reading about other factors that influence inference in these systems. That does not mean there are not deterministic implementations of LLM inference, and repeating yourself with NO additional information and using CAPS does NOT make you more CORRECT lol.

-3

michaelmrose reply

Unlike brilliant people like you who have created nothing one millionth the importance of Linux

-12

Siegfried reply

Was that necessary?

michaelmrose reply

Yes. Dude who created one of the most useful projects in software history in large part because of pragmatic decision making makes a pragmatic decision and Joe Rando says "Must be in the pockets of big AI!" because he can't grasp any singular aspect of a complex issue. Can't even hold in his head a tiny number of things just vomits crap over the internet. That person needs to spend a lot more time reading and thinking and less typing.

-8

Exatron reply

You should try taking your own advice, kiddo.

raspberriesareyummy

The rule should be "if you get caught using LLMs or caling them 'AI', you're a dipshit and will never ever be let near the Kernel again."

REEEEEE!!! Kernel now AI SLOP like LUTRIS!!!! 11

-6

Constellation

if linux continues being this reluctant about tool improvement, all of linux will be replaced with a startup from shenzhen by 2028.

-18

leftzero

TL;DR: Linux is now slop, switch to your BSD of choice.

-20

yabbadabaddon

-22

neclimdul reply

Fuck the corporate ransacking, chatbot subscription hell hole, and general breaking of the internet done under the framing of "AI".

Guess that doesn't really roll of the tongue like Fuck AI but sure so yeah let's just move to a mountain instead of pushing for a better world.

yabbadabaddon reply

Funny how nothing you wrote has anything to do with AI but with capitalism but yeah sure let's blame AI instead of the USA, its government and its oligarchs ruining the world for everybody.

-17

desertdruid reply

oh that's why this is the "it's just a tool" gun debate

yabbadabaddon reply

Wtf is this moronic take

-6

quack reply

Obviously capitalism makes pretty much everything worse but let’s not pretend AI wouldn’t have issues without capitalism too.

yabbadabaddon reply

Stop fucking talking about "AI". It does not fucking exist.

-7

quack reply

You’re very angry for a person who literally used the term themselves a couple of comments ago. What term would you rather use then? It’s colloquial, everyone knows what I’m talking about. Are you the kind of person who gets angry when someone doesn’t call it “GNU/Linux” too?

yabbadabaddon reply

You're having basic reading comprehension issue.

-8

quack reply

Damn man, go for a walk or something.

dwemthy reply

For the master's tool shall never dismantle the master's house

Miaou reply

People confuse GPTs with AI, but your comment takes the wrong approach: it's not that AI hate is not deserved, it's that the hate should be directed towards the chatbots and the associated bubble.

imjustmsk reply

Yea, but when am average person talks about AI they just mean a Chatbot or GenAI right?

Sharkticon reply

99.99% of the time.

jackalope reply

"AI" is simply a field of study. There is no true bar for "AI" that GPTs fail. Because there is no true bar for AI. a symbolic AI system is as much AI as the most advanced LLM or world model or whatever.

yabbadabaddon reply

AI hate is not deserved. Hate the game not the player.

-13

Exatron reply

No, it's definitely deserved, sparky. The game and the player are both horrible.

yabbadabaddon reply

This is like your opinion, and I think it's a dumb one.

-5

Exatron reply

You've shown from your comments here that thinking isn't exactly your forte, spud.

crypt0cler1c reply

yabbadabaddon reply

Big monks huhu

-2

desertdruid reply

no need for ableist slurs

Mr_Dr_Oink reply

I didnt think that was the point. Fuck AI is just a slogan representing peoples disdain for corporate types who think chatgpt literally the second coming of jesus and is going to save us all. Its people who are taking LLMs and pretending they can reason and think like humans. People that think they can sack all their staff and replace them with AI. Its more complex than that. You know that, i am certain you do. AI can do somethings very well, and other things it absolutely falls over flat on its face.

Unless i am misunderstanding, this was never about the blanket boycotting of anything AI and it was more about not pretending it is more than it is and shoving down the throats of non consenting consumers.

yabbadabaddon reply

Then the issue is the fucking American oligarchs and their fucking piece of shit government, not "AI".

-1

Mr_Dr_Oink reply

How you got that from what i said will remain a mystery.

lightnsfw reply

reddthat.com

yabbadabaddon reply

-4

lightnsfw reply

reddthat.com

imjustmsk reply

Fuck AI- anyway.

The whole AI hype is just making tech giant whackjobs more rich, as well as FUCKING us over in somany ways.

The world ain't black and white you cannot just hate AI, its just a general term, but fuck allose mofos tryna make more bucks off of this- as if they weren't rich already.

I wonder why they just give away free "intelligence", as in free AI chatbots that everyone can access which is so obviously - extremely non profitable, They keep yapping they need to make "information" more accessible and keeps throwing money into a hole.

FUCKING make education more accessible :|

People I know, most of them rely on texting to their little chatgpt in their phone to get through day to day tasks, algorithms chose what they watch, now Language learning models decides what they do throughout their life- We are supposed to learn shit ourselves, if we cognitively Offload every shit from our brain- We are just making ourselves more stupid.

TL; DR That was just a useless and brainless rant on AI lol

Washedupcynic reply

People I know, most of them rely on texting to their little chatgpt in their phone to get through day to day tasks, algorithms chose what they watch, now Language learning models decides what they do throughout their life- We are supposed to learn shit ourselves, if we cognitively Offload every shit from our brain- We are just making ourselves more stupid.

That's what they oligarchs want. They want us ignorant so we will be good little wage slaves and consumers.

yabbadabaddon reply

Then fight against what matters : the fucking oligarchs and their fucking piece of shit friends at the head of the USA.

Doorbook reply

The new gun debate

yabbadabaddon reply

Did you ever see a gun diagnose a cancer?

-7

Mr_Dr_Oink reply

As someone who works in healthcare, in IT, who has been directly involved in the commissioning of an AI designed to spot skin cancers from pictures taken with special lenses attached to iphones. No healthcare provider is using these tools in place of doctors. These AI models are incredibly accurate but the human is still needed to spot false positives. They dont leave diagnostic decisions up to AI. I can tell you that for a fact.

yabbadabaddon reply

Same thing with everything related to every single algorithm implementation in every single sector.

crypt0cler1c reply

This is the dumbest comment I've seen in a while

desertdruid reply

ashughes reply