This is the fourth in a series of posts about the effects of bias on ML algorithms. In the previous post we discussed what happens when you train an algorithm with data that isn’t representative of the universe the algorithm will operate in. This week’s focus in on the effects of human prejudice on machine learning.
Bias in this context is probably what most people think of when they hear the word. And this kind of bias is the source of many scary AI headlines. And yet, this phenomenon of human prejudice influencing algorithms is complex.
Prejudice can be benign, as in this example:
A data science team relies on an off-shore crowd to label training data for a CV project involving clothing recognition. The off-shore crowd labels women’s high-heeled shoes as “boots” because that is how they’re referred to in that geography. The algorithm’s performance is sub-optimal from the data science team’s perspective, because it misidentifies high-heeled shoes as boots.
Fixing this form of prejudice is as simple as training the humans in the loop, so that they can overcome their own regional prejudice.
But prejudice can also be injurious, as in this example:
An algorithm is exposed to millions of annotated images of people at work. With training completed the algorithm identifies every new image of a nurse as “female” and every new image of computer scientist as “male”, regardless of the actual gender of the subject.
Bias in society - “nursing is a woman’s job” - coupled with the overwhelming tendency of nurses to be female - 40 years ago only 3% of nurses were male, and even today less than 10% of nurses are male - resulted in a training dataset that taught the algorithm all the wrong lessons. It was led to believe that there is a causal connection between “nurse” and “female” because of the overwhelming number of female nurses it was exposed to. And it has concluded that all computer scientists are male because that’s what the training data suggested.
We know that neither of those conclusions is true. And we absolutely do not want an algorithm to perpetuate stereotypes that have contributed to the prejudice in the workplace and the training data.
Fixing this form of prejudice is more involved. Ironically, one way to mitigate this kind of bias is to intentionally introduce sample bias. The algorithm needs to be exposed to a mix of photos of people in the workplace that removes the bias in the original training dataset and society at large.
Data science teams have a responsibility to keep prejudice out of our machines. At the same time, stereotypes and prejudices are controversial topics. We’ll be writing about this more in future posts here.
There’s one more form of data bias we’ll cover in the next post in this series. If you simply can’t wait that long, you can grab a copy of our AI bias blueprint.
Does your data science team want to talk about bias, or which shoes to wear with that outfit? Email us.