Technology @lemmy.world lautan @lemmy.ca 6 mo. ago

NetBSD bans all commits of AI-generated code

mastodon.sdf.org NetBSD Foundation 🚩 (@netbsd@mastodon.sdf.org)

New development policy: code generated by a large language model or similar technology (e.g. ChatGPT, GitHub Copilot) is presumed to be tainted (i.e. of unclear copyright, not fitting NetBSD's licensing goals) and cannot be committed to NetBSD. https://www.NetBSD.org/developers/commit-guidelines.ht...

New development policy: code generated by a large language model or similar technology (e.g. ChatGPT, GitHub Copilot) is presumed to be tainted (i.e. of unclear copyright, not fitting NetBSD's licensing goals) and cannot be committed to NetBSD.

https://www.NetBSD.org/developers/commit-guidelines.html

Hacker News @lemmy.smeargle.fans bot @lemmy.smeargle.fans

BOT

6 mo. ago

NetBSD bans all commits of AI-generated code

mastodon.sdf.org /@netbsd/112446618914747900

39 comments

Ok but how is anyone meant to know if you generated your docstrings using copilot?
- How do they know that you wrote it yourself and didn't just steal it?
  
  This is a rule to protect themselves. If there is ever a case around this, they can push the blame to the person that committed the code for breaking that rule.
  
  This is the only reason rules exist, not to stop people doing a thing but to be able to enforce or defect responsibility when they do.
- They'll use AI to detect it.... obviously. ☺️
- I'm saddened to use this phrase but it is literally virtue signalling. They have no way of knowing lmao
  
  It’s actually simple to detect: if the code sucks or is written by a bad programmer, and the docstrings are perfect, it’s AI. I’ve seen this more than once and it never fails.
  
  It's also probably to make things slightly simpler from a legal perspective.
- Are they long, super verbose and often incorrect?
- Because they'll be shit?
  
  Docstrings based on the method signature and literal contents of a method or class are completely pointless, and that's all copilot can do. It can't Intuit anything that docstrings are actually there for.
  
  Definitely not my experience. With a well structured code base it can be pretty uncanny. I think it's context is limited to files that are currently opened in the editor, so that may be your issue if you're coding with just one file open?
- Magic, I guess ?
Lots of stupid people asking "how would they know?"

That's not the fucking point. The point is that if they catch you they can block future commits and review your past commits for poor quality code. They're setting a quality standard, and establishing consequences for violating it.

If your AI generated code isn't setting off red flags, you're probably fine, but if something stupid slips through and the maintainers believe it to be the result of Generative AI, they will remove your code from the codebase and you from the project.

It's like laws against weapons. If you have a concealed gun on your person and enter a public school, chances are that nobody will know and you'll get away with it over and over again. But if anyone ever notices, you're going to jail, you're getting permanently trespassed from school grounds, and you're probably not going to be allowed to own guns for a while.

And, it's a message to everyone else quietly breaking the rules that they have something to lose if they don't stop.
- Lots of stupid people asking "how would they know?"
  
  That's not the fucking point.
  
  Okay, easy there, Chief. We were just trying to figure out how it worked. Sorry.
  
  It was a fair question, but this is just going to turn out like universities failing or expelling people for alleged AI content in papers.
  
  They can't prove it. They try to use AI tools to prove it, but those same tools will say a thesis paper from a decade ago is also AI generated. Pretty sure I saw a story of a professor accusing someone based off a tool having his own past paper fail the same tool
  
  Short of an admission of guilt, it's a witch hunt.
This is a good move for international open source projects, with multiple lawsuits in multiple countries around the globe currently ongoing, the intellectual property nature of code made using AI isn't really secure enough to open yourself up to the liability.

I've done the same internally at our company. You're free to use whatever tool you want but if the tool you use spits out copyrighted code, and the law eventually has decided that model users instead of model trainers are liable for model output, then that's on you buddy.
- Yup. We don't allow AI tools on our codebase, but I allow it for interviews. I honestly haven't been impressed by it at all, it just encourages not understanding the code.
- Does this mean you have indicated to your employees and/or contractors that you intend to hold them legally liable in the case someone launches litigation against you?
So proud of you NetBSD, this is why I sponsor you, slam dunk for the future. I'm working on a NetBSD hardening script and Rice as we speak, great OS with some fantastically valuable niche applications and I think, a new broad approach I'm cooking up, a University Edition. I did hardening for all the other BSD, I saved the best for last!

[EDIT 5/16/2024 15:04 GMT -7] NetBSD got Odin lang support yesterday. That totally seals the NetBSD deal for me if I can come up with something cool for my workstation with Odin.

If you would like to vote on whether, or by what year, AI will be in the Linux Kernel on Infosec.space:

https://infosec.space/@wravoc/112441828127082611
- Thanks for your efforts Elias!
I was hoping they ban it because it’s shit, but banning it for copyright reasons is fine too.
I can understand why a project might want to do this until the law is fully implemented and testing in court, but I can tell most of the people in this thread haven’t actually figured out how to effectively use LLMs productively. They’re not about to replace software engineers, but as a software engineer, tools like GitHub copilot and ChatGPT are excellent at speeding up a workflow. ChatGPT for example is an excellent search engine that can give you a quick understanding of a topic. It’ll generate small amounts of code more quickly than I could write it by hand. Of course I’m still going to review that code to ensure it is to the same quality that hand written code would be, but overall this is still a much faster problem.

The luddites who hate on LLMs would have complained about the first compilers too, because they could write marginally faster assembly by hand.
- Same for intellisense, IDEs, Debuggers, linters, static analyzers, dynamic languages, garbage collection, NoSQL databases...
- Why use a computer to do the work when I could do myself? /s
I never felt so close to try NetBSD as after reading this 😃
Hell yeah! Get that shit… OUTTA HERE!!!

Ok but seriously, that is a very good reason to ban it. Who knows what would happen if the AI just fully ripped someone else’s code off that’s supposed to be like GPL licensed or something. If humans can plagiarize, than AIs can plagiarize.

But also, how are they still using CVS? CVS is so slow and so bad. Even Subversion would be an upgrade.
We need to see more of this
I get banning for quality, but for potential copyright is pretty stupid.
- It's not really stupid at all. See the matrix code example from this article: https://spectrum.ieee.org/ai-code-generation-ownership
  
  You can't really know when the genAI is synthesizing from thousands of inputs or just outright reciting copyrighted code. Not kosher if it's the latter.

You've viewed 39 comments.