Skip Navigation
Hacker News @lemmy.smeargle.fans bot @lemmy.smeargle.fans
BOT

AI models collapse when trained on recursively generated data

www.nature.com AI models collapse when trained on recursively generated data - Nature

 Analysis shows that indiscriminately training generative artificial intelligence on real and generated content, usually done by scraping data from the Internet, can lead to a collapse in the ability of the models to generate diverse high-quality output.

AI models collapse when trained on recursively generated data - Nature
3
3 comments
  • We find that indiscriminate use of model-generated content in training causes irreversible defects in the resulting models, in which tails of the original content distribution disappear. We refer to this effect as ‘model collapse’ and show that it can occur in LLMs as well as in variational autoencoders (VAEs) and Gaussian mixture models (GMMs).

    So, maybe this is the end of LLM companies simply scraping the web, because they will be unable to distinguish between LLM and non-LLM content. If someone would have made a snapshot of the complete web in 2020, and waits to sell it until 2030, the person may just become the richest person on earth.