But malicious actors don't want their generated data to be recognizable to LLMs. They want it to be impersonating real people in order to promote advertising/misinformation goals.
Which means that even if they started flagging LLM generated content as LLM generated, that would just mean only the most malicious and vile LLM contents will be out there training models in the future.
I don't see any solution to this on the horizon. Pandora is out of the box.
To flip it, this means that only AI which responsibly manages it's initial data set will be successful. Can't simply scrape and pray, need to have some level of vetting with input.
More labor intensive? Sure, but AI companies aren't entitled to quick and easy solutions they started with...
Dead internet theory seems like a completely inevitable future place that we're all racing to. I don't see any way to avoid it. It's a tragedy of the commons in a place where there is no organizing body that can step in and prevent private actors from destroying everything. Worse, we're more concerned with those private actors being strong and competitive which is only accelerating us towards the doomed endgame.