Training AI models with AI-generated synthetic content causes the quality of the models' outputs to disintegrate, a new paper shows.
"Our primary conclusion across all scenarios is that without enough fresh real data in each generation of an autophagous loop, future generative models are doomed to have their quality (precision) or diversity (recall) progressively decrease," they added. "We term this condition Model Autophagy Disorder (MAD)."
Interestingly, this might be a more challenging problem as we increase the use of generative AI models online.
But...isn't unsupervised backfeeding the same as simply overtraining the same dataset? We already know overtraining causes broken models.
Besides, the next AI models will be fed with the interactions from humans with AI, not just it's own content. ChatGPT already works like this, it learns with every interaction, every chat.
And the generative image models will be fed with AI-assisted images where humans will have fixed flaws like anatomy (the famous hands) or other glitches.
So as interesting as this is, as long as humans interact with AI the hybrid output used for training will contain enough new "input" to keep the models on track. There are already refined image generators trained with their own but human-assisted output that are better than their predecessor.
People in this thread seem really eager to jump to any "aha, AIs aren't intelligent after all" conclusions they can grab hold of. This experiment isn't analogous to anything that we put real people or animals through and seems like a relatively straightforward thing to correct for in future AI training.