Google's DeepMind unit is unveiling today a new method it says can invisibly and permanently label images that have been generated by artificial intelligence.
Someone with sufficient technical ability can run their own model and compile out the watermarks and all the limiters to avoid the REAL heinous shit.
The vast majority of people don't have the ability. And, as models get larger and more complex, the people who DO have the technical ability don't have the ability to (re-)train in a timely manner. Same with integrating these watermarks even deeper into the code. Anyone can comment out addWatermark(img). But if you are adding aspects of the watermark in 9 billion places?
You heard of stable difusion? They got 1 line installs nowadays then all you have to enter is a prompt and go.
Entirely open source so anyone could improve the model or not, and it'd be more than legal to release a non watermarked version (if a watermarked version even ever appeared).
I saw down the chain it was compared to deuvono, which I'd argue is a bad analogy - cause whos gonna run a rootkit on their PC just to create an image, especially when there's a million options not to (unlike games which are generally unique)
Stable Diffusion is exactly what I was thinking of when I talked about removing "all the limiters". Holy shit that was a dark ass weekend...
But we are talking orders of magnitude of complexity and power. If all anyone needed to run a Bard/ChatGPT/whatever level LLM was a somewhat modern computer then everything would be retraining near constantly (rather than once every couple of months... maybe) and the SAG/WGA strikes would already be over because "Hollywood" would be making dozens of movies a week with AI media generation.
Almost everything we are seeing right now is about preparing for The Future. People LOVE to fixate on "AI can't draw hands" or whatever nonsense and that is very much a limited time thing (also, humans can't draw hands. There is a reason four fingers are so common. It is more the training data than anything else). And having the major companies embed watermarking, DRM, and asset tracking in at a low level is one big "key" to that.
Like, I expect an SD level tool to be part of Cortana for Windows 12 or whatever. "Hey Cortana, make me a meme picture of Pikachu snorting coke off a Squirtle while referencing a Ludacris song. You know, the one where he had the big gloves" and it working. But that won't be the kinds of deep fakes or media generation that stuff like this is trying to preemptively stop.
I see what you mean yes, but of course such large resources are required to train the model - not run it. So reasonably as long as a bunch of users can pool resources to compete with big tech, there will always be an 'un-watermark-able' crowd out there, making all the watermakrs essentially useless because they only got half the picture.
And how training these models works is insanely parallel, so reasonably - if (ideally a FOSS) project pops up allowing users to donate cpu time to train the model as a whole - users could actually have more computational power than the big tech companies
The resources to train these models are such that even Google/Amazon/MS are doing it selectively. Google.../AWS/Azure are some of the biggest resources on the planet and these companies get it for as close to "free" as it is ever going to get, and even they ration those resources.
A bunch of kids on 4chan aren't going to even make a drop in the bucket. And, in the act of organizing, the ringleaders will likely get caught for violating copyright law. Because, unless the open source model really is the best (and it won't be), they are using proprietary code that they modified to remove said watermarks.
As for "We'll folding@home it!": Those efforts are largely dead because people realized power matters (and you are almost always better off just donating some cash so they can buy some AWS time). But, again, you aren't getting a massive movement out of people who just need to make untraceable deepfakes.
Also, this also ignores all of the shady ass crap being done to get that training data. Like... there is a reason MS bought Github.
I think youre mixing together a couple angles here to try n make a point.
'Unless the open source model is the best...theyre using proprietary code' youre talking about a hypothetical program hypothetically being stolen and referencing it as a definite?
As per the companies, of course they only use certain resoures, theyre companies they need returns to exist. A couple million down the drain could be some CEO's next bonus, so they won't do anything theyre into sure they'll get something from (even if only short term)
As per the 4chan, was that a coincidence or are you referencing unstable diffusion? Cause they did do almost exactly that (before of course it got mismanaged cause the nsfw industry is always been a bit ghetto)
And like sure fold it at home or donate for aws, same end result really doesn't matter what the user's are comfortable with
And whew finally sure ms bought github but like you think stable diffusion bought the internet? Courts have proven webscraping is legal...
Ik this is a wall of text but like I said these arguments all feel like a bunch of thoughts tangentially related
I am covering "a couple angles" because this falls apart under almost the most cursory of examination.
But sure. If we are at a point where there is sufficient publicly available training data, a FOSS product performs comparable to the flagship products of 100 billion dollar companies, and training costs have been reduced to the point that a couple kids on 4chan can train up a new model over the course of a few days: Sure.
Until we reach that point? Actually, even after we reach that point, it would still be unlikely. Because if training is that cheap then you can bet said companies have funded the development of new technologies that allow them to take advantage of their.. advantages.
This is Google were talking about. We're probably going to find out that you can remove the mark by resizing or reducing the color depth or something stupid like that. Remember how YouTube added ContentID and it would flag innocent users while giving actual pirates a pass? As said in a related article:
"There are few or no watermarks that have proven robust over time," said Ben Zhao, professor at the University of Chicago studying AI authentication. "An attacker seeking to promote deepfake imagery as real, or discredit a real photo as fake, will have a lot to gain, and will not stop at cropping, or lossy compression or changing colors."
Lowering quality and futzing with the colors is already how fakes are identified. So that undoes most of the benefits of a "deepfake". And people have already been trained to understand why every video of Bigfoot is super blurry and shakey even though we all have ridiculously good cameras (many with auto-stabilization) in our pockets at all time.
As for technical competence: Meh. Like it or not but Youtube is pretty awesome tech and a lot of the issues with false positives is more to do with people gaming the system than failings of the algorithm. But we also have MS, Amazon, Facebook, etc involved in this. And it is in their best interest to make the most realistic AI generated images and videos possible (if only for media/content creation) while still being able to identify a fake. And attributing said images/video/whatever to "Deepmind" or "ChatGPT" is pretty easy since it can be done at creation rather than relying on a creator to fill out the paperwork.
It is basically the exact same situation as DRM in video games. A truly dedicated person can find a workaround (actually... more on that shortly). But the vast majority of people aren't going to put in any more effort than it takes to search "Generic Barney Game No CD".
And... stuff like Denuvo has consistently demonstrated itself to be something that a very limited number of people can crack. Part of that is just a general lack of interest but part of it is the same as it was with Starforce and even activation model Securom back in the day: Shit is hard and you need to put the time and effort in to knowing how to recognize a call.
Albeit, the difference there is that people actively are not paying for video game cracks. Whereas there would be a market for "unlocked" LLMs. But there is also a strong demand for people who know how to really code those and make them sing so... it becomes a question of whether it is worth running a dark web site and getting paid in crypto versus just working for Google.
So yeah, maybe some of the open source LLMs will have teams of people who find every single call to anything that might be a watermark AND debug whether those impact the final product. But the percentage of people who will be able to run their own LLM will get increasingly small as things become more and more complex and computationally/data intensive. So maybe large state backed organizations will be doing this. But, with sufficient watermarking/DRM/content tracing, the ability for someone to ask DALL-E 2 to make them a realistic picture of Biden having an orgie with the entire cast of Sex Education and it not being identified as a fake fairly easily is... pretty much at the same level as people not realizing that someone photoshopped a person's head onto some porn. Idiots will believe it. Everyone else will just see a quick twitter post debunking it and move on with their lives.