They can 'prove' they don't explicitly train the models on race or gender but that doesn't really prove anything. A model will inevitably take into account data that it will correlate to race or gender- names, zip codes, education and financial history, etc, and those correlations will result in similarly biased decisions that regular human racism and sexism produce. Weeding that out completely may not even be possible.
I figure you’d audit it by examining the results, and if bias isn’t detectable in the results then I’d argue that’s at the very least still better than the human-based systems we’ve been relying on up til now.
If you have a bunch of otherwise identical résumés, with the only difference being the racial connotation of the name, and the AI gives significantly different results, there's an identifiable problem.
When the demographics of the output are roughly equivalent to the demographics of the input. If ten men and fifty women apply, and eight men and two women are hired, that is worth investigating.
That would be a pretty extreme bias to have, so yeah that would make sense. If it’s not so drastic it might be harder to spot by just looking at the results.
I’m not a policy expert, author of the bill, or in charge of the department that will lead these investigations. Even if I were an expert on the subject, what I’d do and what this department will do aren’t likely to be the same.
I just support civilian oversight and audits of these algorithms and LLMs as they take up a more prominent position in hiring and firing.
I figure you’d audit it by examining the results, and if bias isn’t detectable in the results
I was just asking for more info on how you’d examine the results for bias since it would need to be pretty extreme like the example you gave to be identifiable as something worth investigating.
All good if you don’t have a more specific answer, I was just curious what your personal thoughts were here.
Hey, I am a machine learning engineer that works with people data. Generally you measure bias in the training data, the validation sets, and the outcomes ( in an ongoing fashion - AIF 360 is a common library and approach ). There are lots of ways to measure bias and or fairness. Just checking if a feature was used isn’t considered “enough” by any standards or practitioner. There are also ways to detect and mitigate some of the proxy relationships you’re pointing to. That being said, I am 100% skeptical that any hiring algorithm isn’t going to be extremely bias. A lot of big companies have tried and quit because despite using all the right steps the models were still bias https://www.reuters.com/article/us-amazon-com-jobs-automation-insight-idUSKCN1MK08G. Also many of the metrics used to report fairness have some deep flaws ( disprate impact ).
All that being said the current state is that there are no requirements for reporting so vendors don’t do the minimum 90% of the time because if they did it would cost a lot more and get in the way of the “AI will solve all your problems with no effort” narrative they want to put forward so I am happy to see any regulation coming into place even if it won’t be perfect.