Pressure grows on artificial intelligence firms over the content used to train their products
‘Impossible’ to create AI tools like ChatGPT without copyrighted material, OpenAI says::Pressure grows on artificial intelligence firms over the content used to train their products
It feels to be like every other post on lemmy is taking about how copyright is bad and should be changed, or piracy is caused by fragmentation and difficulty accessing information (streaming sites).
Then whenever this topic comes up everyone completely flips. But in my mind all this would do is fragment the ai market much like streaming services (suddenly you have 10 different models with different licenses), and make it harder for non mega corps without infinite money to fund their own llms (of good quality).
Like seriously, can't we just stay consistent and keep saying copyright bad even in this case? It's not really an ai problem that jobs are effected, just a capitalism problem. Throw in some good social safety nets and tax these big ai companies and we wouldn't even have to worry about the artist's well-being.
I think looking at copyright in a vacuum is unhelpful because it's only one part of the problem. IMO, the reason people are okay with piracy of name brand media but are not okay with OpenAI using human-created artwork is from the same logic of not liking companies and capitalism in general. People don't like the fact that AI is extracting value from individual artists to make the rich even richer while not giving anything in return to the individual artists, in the same way we object to massive and extremely profitable media companies paying their artists peanuts. It's also extremely hypocritical that the government and by extention "copyright" seems to care much more that OpenAI is using name brand media than it cares about OpenAI scraping the internet for independent artists' work.
Something else to consider is that AI is also undermining copyleft licenses. We saw this in the GitHub Autopilot AI, a 100% proprietary product, but was trained on all of GitHub's user-generated code, including GPL and other copyleft licensed code. The art equivalent would be CC-BY-SA licenses where derivatives have to also be creative commons.
Maybe I'm optimistic but I think your comparison to big media companies paying their artist's peanuts highlights to me that the best outcome is to let ai go wild and just... Provide some form of government support (I don't care what form, that's another discussion). Because in the end the more stuff we can train ai on freely the faster we automate away labour.
I think another good comparison is reparations. If you could come to me with some plan that perfectly pays out the correct amount of money to every person on earth that was impacted by slavery and other racist policies to make up what they missed out on, ids probably be fine with it. But that is such a complex (impossible, id say) task that it can't be done, and so I end up being against reparations and instead just say "give everyone money, it might overcompensate some, but better that than under compensating others". Why bother figuring out such a complex, costly and bureaucratic way to repay artists when we could just give everyone robust social services paid for by taxing ai products an amount equal to however much money they have removed from the work force with automation.
Journalist: Read a press release. Write it in my own words. See some Tweets. Put them together in a page padded with my commentary. Learn from, reference, and quote copyrighted material everywhere.
AI
I do that too.
Journalists
How dare AI learn! Especially from copyrighted material!