I fucked with the title a bit. What i linked to was actually a mastodon post linking to an actual thing. but in my defense, i found it because cory doctorow boosted it, so, in a way, i am providing the original source here.
Google scanned millions of books and made them available online. Courts ruled that was fair use because the purpose and interface didn't lend itself to actually reading the books in Google books, but just searching them for information. If that is fair use, then I don't see how training an LLM (which doesn't retain the exact copy of the training data at least in the vast majority of cases) isn't fair use. You aren't going to get an argument from me.
I think most people who will disagree are reflexively anti AI, and that's fine. But I just haven't heard a good argument that AI training isn't fair use.
here's a sidechannel attack on your position: every use, even infringing uses, are fair use until adjudicated, because what fair use means is that a court has agreed that your infringing use is allowed. so of course ai training (broadly) is always fair use. but particular instances of ai training may be found to not be fair use, and so we can't be sure that you are always going to be right (for the specific ai models that may come into question legally).
I am no lawyer, but I suspect what will be considered either fair use or infringing will probably depend on how the programmed AI model is used.
For example, if you train it on a book of poetry, asking it questions about the poetry will probably be considered fair use. If you ask the AI to write poetry in the style of the book's poems and you publish the AI's poetry, I suspect it might be considered laundering copyright and infringing. Especially if it is substantially similar to specific poems in the book.
If you ask the AI to write poetry in the style of the book’s poems and you publish the AI’s poetry, I suspect it might be considered laundering copyright and infringing.
is the image of a cabin in a snowy landscape copyrighted by Thomas kinkade? fuck no. That's an idea. ideas can't be copyrighted. a style isn't a discreet work. it is an idea. it can't be copyrighted. if I produce something in the style of Keats or Stephen King or Rowling, they can't sue me for copyright unless I make a substantially infringing use of their work. The style isn't sufficient, because the style can't be copyrighted.