Skip Navigation

Futurology @futurology.today voidx @futurology.today 7 mo. ago

AI Companies Running Out of Training Data After Burning Through Entire Internet

futurism.com AI Companies Running Out of Training Data After Burning Through Entire Internet

AI companies are swiftly running into a massive problem: there isn't enough data on the internet to train the next generation of models.

AI Companies Running Out of Training Data After Burning Through Entire Internet

Web³ XR @sh.itjust.works Gadg8eer @sh.itjust.works 7 mo. ago

AI Companies Running Out of Training Data After Burning Through Entire Internet

futurism.com /the-byte/ai-training-data-shortage

A Nerdy Dystopia @sh.itjust.works Gadg8eer @sh.itjust.works 7 mo. ago

AI Companies Running Out of Training Data After Burning Through Entire Internet

futurism.com /the-byte/ai-training-data-shortage

You're viewing a single thread.

53 comments

There’s already more than enough training data out there. The important thing that remains is to filter it so it doesn’t also include humanity’s stupidest data.

That and make the algorithms smarter so they are resistant to hallucination and misinformation - that’s not a data problem, it’s an architecture problem.
- Stupid data can be useful for training as a negative example. Image generators use negative prompts to good effect.
- Butbutbut my ignorant racism is the truth!! That's why I hear it from everyone, including [insert near by relatives here]!!
  
  Well is the goal truth? Or a simulacrum of a human?
  
  Considering not even all humans are hireable, I'd say only a fool aims for a simulacrum.
- You also have to filter out the AI generated garbage that is rapidly becoming a majority of content on the internet.
- Well, it's established wisdom that the dataset size needs to scale with the number of model parameters. Quadratically, IIRC. If you don't have that much data the training basically won't work; it will overfit or just not progress.

You've viewed 53 comments.