OpenAI now tries to hide that ChatGPT was trained on copyrighted books, including J.K. Rowling's Harry Potter series

www.businessinsider.com OpenAI now tries to hide that ChatGPT was trained on copyrighted books, including J.K. Rowling's Harry Potter series

A new research paper laid out ways in which AI developers should try and avoid showing LLMs have been trained on copyrighted material.

OpenAI now tries to hide that ChatGPT was trained on copyrighted books, including J.K. Rowling's Harry Potter series::A new research paper laid out ways in which AI developers should try and avoid showing LLMs have been trained on copyrighted material.

300

AI Copyright @lemm.ee BitOneZero @ .world @lemmy.world 1 yr. ago

OpenAI now tries to hide that ChatGPT was trained on copyrighted books, including J.K. Rowling's Harry Potter series

www.businessinsider.com /openais-latest-chatgpt-version-hides-training-on-copyrighted-material-2023-8

You're viewing a single thread.

300 comments

Its a bit pedantic, but I'm not really sure I support this kind of extremist view of copyright and the scale of whats being interpreted as 'possessed' under the idea of copyright. Once an idea is communicated, it becomes a part of the collective consciousness. Different people interpret and build upon that idea in various ways, making it a dynamic entity that evolves beyond the original creator's intention. Its like issues with sampling beats or records in the early days of hiphop. Its like the very principal of an idea goes against this vision, more that, once you put something out into the commons, its irretrievable. Its not really yours any more once its been communicated. I think if you want to keep an idea truly yours, then you should keep it to yourself. Otherwise you are participating in a shared vision of the idea. You don't control how the idea is interpreted so its not really yours any more.

If thats ChatGPT or Public Enemy is neither here nor there to me. The idea that a work like Peter Pan is still possessed is such a very real but very silly obvious malady of this weirdly accepted but very extreme view of the ability to possess an idea.
- Ai isn't interpreting anything. This isn't the sci-fi style of ai that people think of, that's general ai. This is narrow AI, which is really just an advanced algorithm. It can't create new things with intent and design, it can only regurgitate a mix of pre-existing stuff based on narrow guidelines programmed into it to try and keep it coherent, with no actual thought or interpretation involved in the result. The issue isn't that it's derivative, the issue is that it can only ever be inherently derivative without any intentional interpretation or creativity, and nothing else.
  
  Even collage art has to qualify as fair use to avoid copyright infringement if it's being done for profit, and fair use requires it to provide commentary, criticism, or parody of the original work used (which requires intent). Even if it's transformative enough to make the original unrecognizable, if the majority of the work is not your own art, then you need to get permission to use it otherwise you aren't automatically safe from getting in trouble over copyright. Even using images for photoshop involves creative commons and commercial use licenses. Fanart and fanfic is also considered a grey area and the only reason more of a stink isn't kicked up over it regarding copyright is because it's generally beneficial to the original creators, and credit is naturally provided by the nature of fan works so long as someone doesn't try to claim the characters or IP as their own. So most creators turn a blind eye to the copyright aspect of the genre, but if any ever did want to kick up a stink, they could, and have in the past like with Anne Rice. And as a result most fanfiction sites do not allow writers to profit off of fanfics, or advertise fanfic commissions. And those are cases with actual humans being the ones to produce the works based on something that inspired them or that they are interpreting. So even human made derivative works have rules and laws applied to them as well. Ai isn't a creative force with thoughts and ideas and intent, it's just a pattern recognition and replication tool, and it doesn't benefit creators when it's used to replace them entirely, like Hollywood is attempting to do (among other corporate entities). Viewing AI at least as critically as actual human beings is the very least we can do, as well as establishing protection for human creators so that they can't be taken advantage of because of AI.
  
  I'm not inherently against AI as a concept and as a tool for creators to use, but I am against AI works with no human input being used to replace creators entirely, and I am against using works to train it without the permission of the original creators. Even in the artist/writer/etc communities it's considered to be a common courtesy to credit other people/works that you based a work on or took inspiration from, even if what you made would be safe under copyright law regardless. Sure, humans get some leeway in this because we are imperfect meat creatures with imperfect memories and may not be aware of all our influences, but a coded algorithm doesn't have that excuse. If the current AIs in circulation can't function without being fed stolen works without credit or permission, then they're simply not ready for commercial use yet as far as I'm concerned. If it's never going to be possible, which I just simply don't believe, then it should never be used commercially period. And it should be used by creators to assist in their work, not used to replace them entirely. If it takes longer to develop, fine. If it takes more effort and manpower, fine. That's the price I'm willing to pay for it to be ethical. If it can't be done ethically, then imo it shouldn't be done at all.
  
  Your broader point would be stronger if it weren't framed around what seems like a misunderstanding of modern AI. To be clear, you don't need to believe that AI is "just" a "coded algorithm" to believe it's wrong for humans to exploit other humans with it. But to say that modern AI is "just an advanced algorithm" is technically correct in exactly the same way that a blender is "just a deterministic shuffling algorithm." We understand that the blender chops up food by spinning a blade, and we understand that it turns solid food into liquid. The precise way in which it rearranges the matter of the food is both incomprehensible and irrelevant. In the same way, we understand the basic algorithms of model training and evaluation, and we understand the basic domain task that a model performs. The "rules" governing this behavior at a fine level are incomprehensible and irrelevant-- and certainly not dictated by humans. They are an emergent property of a simple algorithm applied to billions-to-trillions of numerical parameters, in which all the interesting behavior is encoded in some incomprehensible way.
  
  Bro I don't think you have any idea what you're talking about. These AIs aren't blenders, they are designed to recognize and replicate specific aspects of art and writing and whatever else, in a way that is coherent and recognizable. Unless there's a blender that can sculpt Michelangelo's David out of apple peels, AI isn't like a blender in any way.
  
  But even if they were comparable, a blender is meant to produce chaos. It is meant to, you know, blend the food we put into it. So yes, the outcome is dictated by humans. We want the individual pieces to be indistinguishable, and deliberate design decisions get made by the humans making them to try and produce a blender that blends things sufficiently, and makes the right amount of chaos with as many ingredients as possible.
  
  And here's the thing, if we wanted to determine what foods were put into a blender, even assuming we had blindfolds on while tossing random shit in, we could test the resulting mixture to determine what the ingredients were before they got mashed together. We also use blenders for our own personal use the majority of the time, not for profit, and we use our own fruits and vegetables rather than stuff we stole from a neighbor's yard, which would be, you know, trespassing and theft. And even people who use blenders to make something that they sell or offer publicly almost always list the ingredients, like restaurants.
  
  So even if AI was like a blender, that wouldn't be an excuse, nor would it contradict anything I've said.
  
  Super interesting response, you managed to miss every possible point.
  
  I disagree with your interpretation of how an AI works, but I think the way that AI works is pretty much irrelevant to the discussion in the first place. I think your argument stands completely the same regardless. Even if AI worked much like a human mind and was very intelligent and creative, I would still say that usage of an idea by AI without the consent of the original artist is fundamentally exploitative.
  
  You can easily train an AI (with next to no human labor) to launder an artist's works, by using the artist's own works as reference. There's no human input or hard work involved, which is a factor in what dictates whether a work is transformative. I'd argue that if you can put a work into a machine, type in a prompt, and get a new work out, then you still haven't really transformed it. No matter how creative or novel the work is, the reality is that no human really put any effort into it, and it was built off the backs of unpaid and uncredited artists.
  
  You could probably make an argument for being able to sell works made by an AI trained only on the public domain, but it still should not be copyrightable IMO, cause it's not a human creation.
  
  TL;DR - No matter how creative an AI is, its works should not be considered transformative in a copyright sense, as no human did the transformation.
  
  I thought this way too, but after playing with ChatGPT and Mid Journey near daily, I have seen many moments of creativity way beyond the source it was trained on. I think a good example that I saw was on a YouTube video (sorry I cannot recall which to link) where thr prompt was animals made of sushi and wow, was it ever good and creative on how it made them and it was photo realistic. This is just not something you an find anywhere on the Internet. I just did a search and found some hand drawn Japanese style sushi with eyes and such, but nothing like what I saw in that video.
  
  I have also experienced it suggested ways to handle coding on my VR Theme Park app that is very unconventional and not something anyone has posted about as near as I can tell. It seems to be able to put 2 and 2 together and get 8. Likely as it sees so much of everything at once that it can connect the dots on ways we would struggle too. It is more than regurgitated data and it surprises me near daily.
  
  Just because you think it seems creative due to your lack of experience with human creativity, that doesn't mean it is uniquely creative. It's not, it can't be by its very nature, it can only regurgitate an amalgamation of stuff fed into it. What you think you see is the equivalent of paradoilia.
  
  Why you making personal jabs to make a point? How do you know my creative experience?
  
  if it’s being done for profit, and fair use requires it to provide commentary, criticism, or parody of the original work used. Even if it’s transformative enough to make the original unrecognizable
  
  I'm going to need a source for that. Fair use is a flexible and context-specific, It depends on the situation and four things: why, what, how much, and how it affects the work. No one thing is more important than the others, and it is possible to have a fair use defense even if you do not meet all the criteria of fair use.
  
  I'm a bit confused about what point you're trying to make. There is not a single paragraph or example in the link you provided that doesn't support what I've said, and none of the examples provided in that link are something that qualified as fair use despite not meeting any criteria. In fact one was the opposite, as something that met all the criteria but still didn't qualify as fair use.
  
  The key aspect of how they define transformative is here:
  
  Has the material you have taken from the original work been transformed by adding new expression or meaning?
  
  These (narrow) AIs cannot add new expression or meaning, because they do not have intent. They are just replicating and rearranging learned patterns mindlessly.
  
  Was value added to the original by creating new information, new aesthetics, new insights, and understandings?
  
  These AIs can't provide new information because they can't create something new, they can only reconfigure previously provided info. They can't provide new aesthetics for the same reason, they can only recreate pre-existing aesthetics from the works fed to them, and they definitely can't provide new insights or understandings because again, there is no intent or interpretation going on, just regurgitation.
  
  The fact that it's so strict that even stuff that meets all the criteria might still not qualify as fair use only supports what I said about how even derivative works made by humans are subject to a lot of laws and regulations, and if human works are under that much scrutiny then there's no reason why AI works shouldn't also be under at least as much scrutiny or more. The fact that so much of fair use defense is dependent on having intent, and providing new meaning, insights, and information, is just another reason why AI can't hide behind fair use or be given a pass automatically because "humans make derivative works too". Even derivative human works are subject to scrutiny, criticism, and regulation, and so should AI works.
  
  I’m a bit confused about what point you’re trying to make. There is not a single paragraph or example in the link you provided that doesn’t support what I’ve said, and none of the examples provided in that link are something that qualified as fair use despite not meeting any criteria. In fact one was the opposite, as something that met all the criteria but still didn’t qualify as fair use.
  
  You said "...fair use requires it to provide commentary, criticism, or parody of the original work used. " This isn't true, if you look at the summaries of fair use cases I provided you can see there are plenty of cases where there was no purpose stated.
  
  These (narrow) AIs cannot add new expression or meaning, because they do not have intent. They are just replicating and rearranging learned patterns mindlessly.
  
  You're anthropomorphizing a machine here, the intent is that of the person using the tool, not the tool itself. These are tools made by humans for humans to use. It's up to the artist to make all the content choices when it comes to the input and output and everything in between.
  
  These AIs can’t provide new information because they can’t create something new, they can only reconfigure previously provided info. They can’t provide new aesthetics for the same reason, they can only recreate pre-existing aesthetics from the works fed to them, and they definitely can’t provide new insights or understandings because again, there is no intent or interpretation going on, just regurgitation.
  
  I'm going to need a source on this too. This statement isn't backed up with anything.
  
  The fact that it’s so strict that even stuff that meets all the criteria might still not qualify as fair use only supports what I said about how even derivative works made by humans are subject to a lot of laws and regulations, and if human works are under that much scrutiny then there’s no reason why AI works shouldn’t also be under at least as much scrutiny or more. The fact that so much of fair use defense is dependent on having intent, and providing new meaning, insights, and information, is just another reason why AI can’t hide behind fair use or be given a pass automatically because “humans make derivative works too”. Even derivative human works are subject to scrutiny, criticism, and regulation, and so should AI works.
  
  AI works are human works. AI can't be authors or hold copyright.
  
  Isn't your last sentence making his point?
  
  Neural networks are based on the same principles as the human brain, they are literally learning in the exact same way humans are. Copyrighting the training of neural nets is the essentially the same thing as copyrighting interpretation and learning by humans.
  
  These AIs are not neural networks based on the human brain. They're literally just algorithms designed to perform a single task.
- Well, I'd consider agreeing if the LLMs were considered as a generic knowledge database. However I had the impression that the whole response from OpenAI & cie. to this copyright issue is "they build original content", both for LLMs and stable diffusion models. Now that they started this line of defence I think that they are stuck with proving that their "original content" is not derivated from copyrighted content 🤷
  
  Well, I’d consider agreeing if the LLMs were considered as a generic knowledge database. However I had the impression that the whole response from OpenAI & cie. to this copyright issue is “they build original content”, both for LLMs and stable diffusion models. Now that they started this line of defence I think that they are stuck with proving that their “original content” is not derivated from copyrighted content 🤷
  
  Yeah I suppose that's on them.
- Copyright definitely needs to be stripped back severely. Artists need time to use their own work, but after a certain time everything needs to enter the public space for the sake of creativity.
- If you sample someone else's music and turn around and try to sell it, without first asking permission from the original artist, that's copyright infringement.
  
  So, if the same rules apply, as your post suggests, OpenAI is also infringing on copyright.
  
  If you sample someone else’s music and turn around and try to sell it, without first asking permission from the original artist, that’s copyright infringement.
  
  I think you completely and thoroughly do not understand what I'm saying or why I'm saying it. No where did I suggest that I do not understand modern copyright. I'm saying I'm questioning my belief in this extreme interpretation of copyright which is represented by exactly what you just parroted. That this interpretation is both functionally and materially unworkable, but also antithetical to a reasonable understanding of how ideas and communication work.
  
  That's life under capitalism.
  
  I agree with you in essence (I've put a lot of time into a free software game).
  
  However, people are entitled to the fruits of their labor, and until we learn to leave capitalism behind artists have to protect their work to survive. To eat. To feed their kids. And pay their rent.
  
  Unless OpenAi is planning to pay out royalties to everyone they stole from, what their doing is illegal and immoral under our current, capitalist paradigm.
  
  Yeah, this is definitely leaning a little too "People shouldn't pump their own gas because gas attendants need to eat, feed their kids, pay rent" for me.
  
  Because in practical terms, writers and artists' livelihoods are being threatened by AIs who were trained on their work without their consent or compensation. Ultimately the only valid justification for copyright is to enable the career of professional creators who contribute to our culture. We knew how ideas and communication worked when copyright was first created. That is why it's a limited time protection, a compromise.
  
  All the philosophical arguments about the nature of ideas and learning, and how much a machine may be like a person don't change that if anyone dedicates years of their efforts to develop their skills only to be undercut by an AI who was trained on their own works, is an incredibly shitty position to be in.
  
  That's actually not what copyright is for. Copyright was made to enhance the public culture by promoting the creation of art.
  
  If these record label types impede public culture then they are antithetical to copyright
  
  All the philosophical arguments about the nature of ideas and learning, and how much a machine may be like a person don’t change that if anyone dedicates years of their efforts to develop their skills only to be undercut by an AI who was trained on their own works, is an incredibly shitty position to be in.
  
  So should Dread Zeplin be hauled off to jail because they created derivative works without permission? I mean maybe they should, but not for copyright imo. How about the fan Star Wars movies getting their balls sued off by Disney?
  
  writers and artists’ livelihoods are being threatened by AIs who were trained on their work without their consent or compensation
  
  Guess what? The actual copyright owners of the world; those who own tens of thousands or millions of copyrighted works, will be the precise individuals paying for and developing that kind of automation, and in the current legal interpretation of copyright, its their property to do so with. This outrage masturbation the internet is engaged in ignores the current reality of copyright, that its not small makers and artists benefiting from it but billion dollar multinational corporations benefiting from it.
  
  This is a philosophical argument and an important one: Should we legally constrain the flow and use of ideas because an individuals right to extract profit from an idea.
  
  I don't think so.
  
  Dread Zeppelin could have been sued. They were just lucky to be liked by Robert Plant.
  
  As for the Star Wars fan movie, the copyright claim about the music was dropped because it was frivolous. The video creator made a deal with Lucasfilm to use Star Wars copyrighted material, he didn’t just go yolo.
  
  You are conditioning the rights of artists making derivative works to the rights of systems being used to take advantage of those artists without consent or compensation. Not only those are two different situations but also supporting the latter doesn't mean supporting the former.
  
  Like I said somewhere in this discussion, AI are not people. People have rights that tools do not. If you want to argue in favor of parody and fan artists, do that. If you want to speak out again how the current state of copyright makes it so corporations rather than the actual artists gets the rights and profit over the works they create, do that. Leaping in defense of AI is not it.
  
  I'm challenging the legal precedent of the barrier of creating derivative works in any media, including AI.
  
  Here is an alternative Piped link(s): https://piped.video/3CO7FPU7a2g
  
  Piped is a privacy-respecting open-source alternative frontend to YouTube.
  
  I'm open-source, check me out at GitHub.
  
  Permanently Deleted
  
  AI is not a person. If you replace it with a person in an analogy, that's a whole different discussion.
  
  We actually do restrict how tools can engage with artworks all the time. You know, "don't take pictures".
  
  A sample is a fundamental part of a song’s output, not just its input. If LLMs are changing the input’s work to a high enough degree is it not protected as a transformative work?
  
  it's more like a collage of everyone's words. it doesn't make anything creative because ot doesn't have a body or life or real social inputs you could say. basically it's just rearranging other people's words.
  
  A song that's nothing but samples. but so many samples it hides that fact. this is my view anyway.
  
  and only a handful of people are getting rich of the outputs.
  
  if we were in some kinda post capitalism economy or if we had UBI it wouldn't bother me really. it's not the artists ego I'm sticking up for, but their livelihood
- To add to that, Harry Potter is the worst example to use here. There is no extra billion that JK Rowling needs to allow her to spend time writing more books.
  
  Copyright was meant to encourage authors to invest in their work in the same way that patents do. If you were going to argue about the issue of lifting content from books, you should be using books that need the protection of copyright, not ones that don't.
  
  Copyright was meant
  
  I just don't know that I agree that this line of reasoning is useful. Who cares what it was meant for? What is it now, currently and functionally, doing?

You've viewed 300 comments.