The level of effort it would take to prevent would be infeasible to ask of even a non volunteer admin let alone a volunteer let alone literally all of them
Good, hopefully it’ll make AI that is slightly less toxic than the rest of the internet.
It always baffles me that people don’t want their content represented in an AI - every word you write that gets indexed is a vote for how future AI will behave.
Wait, do you actually want those companies to make even more money from your data, and want these environmentally disastrous "bullshit generators" to keep on going? I'm not saying stopping them is realistically possible, but if I had to choose, I'd greatly prefer a world without AI.
It's structural - you can be open or locked down, and it's hard to decentralize if you're not open
You can make it easier or harder to work with that data, but ultimately it's obsfucation - you could make it hard to parse and obscure details, but ultimately if you want decentralized federation you can't hide too much
You don't need to scrape. If you want to get all the content on Lemmy, just set up an instance and subscribe to all the top communities, and the instances will just send you all the content.
So there isn't really a way to monetise or block it. I guess you could only federate to a whitelist, but the biggest instances will federate by default with any new instances until they are given a reason to defederate.