Technology @lemmy.world assassin_aragorn @lemmy.world 1 yr. ago

A.I.’s un-learning problem: Researchers say it’s virtually impossible to make an A.I. model ‘forget’ the things it learns from private user data

finance.yahoo.com A.I.’s un-learning problem: Researchers say it’s virtually impossible to make an A.I. model ‘forget’ the things it learns from private user data

As it turns out, it’s impossible to remove a user’s data from a trained A.I. model. Deleting the model entirely is also difficult—and there’s little regulation to enforce either option.

A.I.’s un-learning problem: Researchers say it’s virtually impossible to make an A.I. model ‘forget’ the things it learns from private user data

I'm rather curious to see how the EU's privacy laws are going to handle this.

(Original article is from Fortune, but Yahoo Finance doesn't have a paywall)

207

You're viewing a single thread.

207 comments

For the AI heads here: is this another problem caused by the "black box" style of LLM creation where they don't really know how it actually works, so they don't really know how to take out the data?
- They know how it works. It's a statistical model. Given a sequence of words, there's a set of probabilities for what the next word will be. That's the problem, an LLM doesn't "know" anything. It's not a collection of facts. It's like a pachinko machine where each peg in the machine is a word. The prompt you give it determines where/how the ball gets dropped in and all the pins it hits on the way down corresponds to the output. How those pins get labeled is the learning process. Once that's done there really isn't any going back. You can't unscramble that egg to pick out one piece of the training data.
  
  While you are overall correct, there is still a sort of "black box" effect going on. While we understand the mechanics of how the network architecture works the actual information encoded by training is, as you have said, not stored in a way that is easily accessible or editable by a human.
  
  I am not sure if this is what OP meant by it, but it kinda fits and I wanted to add a bit of clarification. Relatedly, the easiest way to uncook (or unscramble) an egg is to feed it to a chicken, which amounts to basically retraining a model.
  
  https://www.understandingai.org/p/large-language-models-explained-with I don’t think you’re intending to be purposefully misleading, but I would recommend checking this article out because the pachinko analogy is not accurate, really. There are several layers of considerations that the model makes when analyzing context to derive meaning. How well these models do with analogies is, I think, a compelling case for the model having, if not “knowledge” of something, at least a good enough analogue to knowledge to be useful.
  
  Training a model on the way we use language is also training the model on how we think, or at least how we express our thoughts. There’s still a ton of gaps to work on before it’s an AGI, but LLMs are on to what’s looking more and more like the right path to getting there.
  
  While it glosses over a lot of details it's not fundamentally wrong in any fashion. A LLM does not in any meaningful fashion "know" anything. Training an LLM is training it on what words are used in relation to each other in different contexts. It's like training someone to sing a song in a foreign language they don't know. They can repeat the sounds and may even recognize when certain words often occur in proximity to each other, but that's a far cry from actually understanding those words.
  
  A LLM is in no way shape or form anything even remotely like a AGI. I wouldn't even classify a LLM as AI. LLM are machine learning.
  
  The entire point I was trying to make though is that a LLM does not store specific training data, rather what it stores is more like the hashed results of its training data. It's a one way transform, there is absolutely no way to start at the finished model and drive it backwards to derive its training input. You could probably show from its output that it's highly likely some specific piece of data was used to train it, but even that isn't absolutely certain. Nor can you point at any given piece of the model and say what specific part of the training data it corresponds to or vice versa. Because of that it's impossible to pluck out some specfic piece of data from the model. The only way to remove data from the model is to throw the model away and train a new model from the original training data with the specific data removed from it.
  
  I really like that pachinko analogy. It gets the basic concept across without having to wade into technical descriptions.
  
  It’s a statistical model. Given a sequence of words, there’s a set of probabilities for what the next word will be.
  
  That is a gross oversimplification. LLM's operate on much more than just statistical probabilities. It's true that they predict the next word based on probabilities learned from training datasets, but they also have layers of transformers to process the context provided from a prompt to eke out meaningful relationships between words and phrases.
  
  For example: Imagine you give an LLM the prompt, "Dumbledore went to the store to get ice cream and passed his friend Sam along the way. At the store, he got chocolate ice cream." Now, if you ask the model, "who got chocolate ice cream from the store?" it doesn't just blindly rely on statistical likelihood. There's no way you could argue that "Dumbledore" is a statistically likely word to follow the text "who got chocolate ice cream from the store?" Instead, it uses its understanding of the specific context to determine that "Dumbledore" is the one who got chocolate ice cream from the store.
  
  So, it's not just statistical probabilities; the models' have an ability to comprehend context and generate meaningful responses based on that context.
  
  This is mostly true, except they do store information - it's just not in a consistent, machine readable form.
  
  You can analyze it with specialized tools, and an expert can gain some ability to understand what is stored in a specific link and manually modify it (in a very blunt way)
  
  Scrambling an egg is a good analogy to a point - you can't extract out the training data. It's essentially extremely high, loss full compression from an informational perspective.
  
  You can't get the egg back, but you can modify the model to change the information inside of it. It's extremely complex, but it's a very active field of study - with simpler models we've been able to separate data out from ability - the idea is to use something closer to a database that can be modified without doing brain surgery every time. It's
  
  You can't guarantee destruction of information without complete understanding of the model, but we might be able to scramble personal details... Granted, it's not like we can do now
- More that they know enough about how it works that they know it's impossible to do. The data isn't stored like files on a hard drive, in some discrete bundle of bytes somewhere, and the problem is simply trying to find and erase them. It's stored as a distributed haze of weightings spread out over all of the nodes in the network, blended with all the other distributed hazes of everything else that the AI knows. A court may as well order a human to forget a specific fact, memories are stored in a similar manner.
  
  Best the law can probably do right now is forbid AIs from speaking about certain facts. And even then as we've seen with the like of ChatGPT there will be ways to talk around such bans.
  
  they know it's impossible to do
  
  There is some research into ML data deletion and its shown to be possible, but maybe not on larger scales and maybe not something that is actually feasible compared to retraining.
- Sort of. We know 'how it works' to the extent that it was engineered with a particular method and purpose. The problem is that it's incredibly difficult to gain any insight into what's 'inside' the network once the data has been propagated through it.
  
  Visualizing a neural network can look a little bit like a constellation of stars. Each star is a node and is connected to other nodes. When given an input, each node makes a small calculation and passes the result to the other nodes they are connected to. The calculation is modified by the connection (by what is called a weight), and the results of the calculations change the weights of the connections. That's what's in the black box.
  
  The constellations in an LLM are very large (the first L in LLM). Each 'layer' may have hundreds of nodes, each of which is connected to every node of the next layer. If there are 100 nodes in two adjacent layers, that makes 10,000 connections. There are many layers in an LLM.
  
  Notice that I didn't mention anything about the nodes or the connections storing any data. That's because they don't, at least in the sense that we're used to thinking about it. There doesn't exist a string of text that says 'Bill Burr's SSN is ###-##-####'. It's just the nodes that do the calculations, and the weights of their connections.
  
  So by now you can probably see why it's so tricky to determine what's 'inside' a neural network, because really it's a set of operations instead of a set of data. The most reliable way to see what it does (so far) is to put something in and see what comes out.
- Model does not keep track of where it learns it from. Even if it did, it couldn't separate what it learnt and discard. Learning of AI resembles to improving your motor skills more than filling an excell sheet. You can discard any row from an Excell sheet. Can you forget, or even separate/distinguish/filter the motor skills you learnt during 4th grade art classes?
  
  It's wild to me that the model doesn't record its training materials, even for diagnostic purposes. It would be a useful way to understand how it's processing the material.
- Think of it like this: you need a bunch of data points to determine the average of them all, but if you're only given the average of a group of numbers, you can't then go back and determine the original data points. It just doesn't work like that.

You've viewed 207 comments.