We’re doing a series of posts on the ways that bias can influence machine learning.
When used in the context of ML algorithms, for lay persons the word bias has a non-obvious meaning: bias is a term data scientists use to describe a particular mathematical property of the algorithm that influences its prediction performance.
Bias is generally coupled with variance, another algorithm property. Bias and variance interact, and data scientists typically seek a balance between the two.
Algorithm bias is associated with rigidity. High bias, as shown on the left, can cause an algorithm to adhere so strongly to rules that it misses complexities in the data. Conversely, high variance, on the right, can cause an algorithm to pay too much attention to data points that might actually be noise.
Finding the appropriate balance between these two properties for a given model in a given environment is a critical data science skill set. While it isn’t easy - this is why data scientists train so long and are paid so well - optimizing the bias-variance tradeoff is well understood.
Because of the maturity around this topic, it’s less common for scary AI headlines to come from algorithm bias. But data? That’s a different story.
In the next post we’ll talk about a form of bias that affects ML training data.