Have companies that claim to anonymize the data gathered on individuals ever been independently audited to verify that?
Also outside of perhaps the EU, are there any legal enforcement mechanisms to hold them accountable for lying about it, if an audit showed that they were?
It becomes inherently difficult to make datasets actually anonymous the more data points they have about a given individual - it doesn't much matter whether names and such are listed data points if they can be inferred from the rest. This investigation by Svea Eckert and Andreas Dewes, for instance, managed to identify a named German member of parliament (Valerie Wilms) and other public functionaries within a data set on web browsing habits they received from data brokers.
Most countries do have data privacy legislation and relevant regulatory/enforcement agencies, but the data brokerage business is big and intensely international so the picture on audits is kind of unavoidably complicated.
This sounds like an interesting theorem that needs proving. Data points aren't inherently going to give someone away. You could probably have a list of every meal I've ever had. It might narrow me down to a region. Maybe even a city. But I guarantee a list of: burrito, spaghetti, oatmeal, banana, etc. wouldn't result in knowing about a person. If you added more data like a column of all the websites I've ever visited, you'd still not know who I am. You might be able to look through data and find "individuals" (e.g. only one person at a burrito and read Lemmy.world at the same time), but you wouldn't know it's me.
You could add a lot of data points that don't recombine into an identity. But I'm guessing most datasets have something that can turn into lat/long. As soon as you have lat/long, you narrow down people significantly. I think the problem is our datasets are tracking where and not what. Though "what" does narrow it down a lot, it's much easier to make anonymous.
Most of the time I hear about a congressional person being outed by data analysis, it comes from correlating places they visit.
You're not really missing anything, other than how easy it is for a collection of datapoints to become unique... If you had burrito Monday, spaghetti Tuesday, oatmeal Wednesday, Banana Thursday, then I doubt there'd be more than a few hundred people that match that pattern.
Some people don't like that because they think it's the same as being identified by name, others don't really care