Type Three Error
Posts
Does Data Let Us Avoid Bias in Our Decisions?

Does Data Let Us Avoid Bias in Our Decisions?

It can go both ways

Zohar Strinka
May 13, 2024

Photo by Duy Pham on Unsplash

I am not the first person to write about bias and algorithmic decision making. I think there have been enough articles that most people know that yes, algorithms can lead to biased decisions. It is also true though that algorithms have no stereotypes of their own that they are trying to reinforce by making biased decisions. Let’s start by answering “what do we mean by biased decisions?”

The short version is that bias is measured in the outcomes. That does not mean you actually have to use an algorithm on people and see how many were harmed to realize it was biased. You can test as you build, and there are also organizations developing tools as a service to test algorithms for bias. Algorithms checking algorithms may seem funny if you have not seen the idea before. But it is the mathematical foundation of a lot of the more impressive machine learning applications today like image generation.

So, the first answer, algorithms can help us identify bias in other algorithms, which we could then choose to eliminate. Before that point though, as we are developing the algorithm we intend to use, what should we do to avoid bias? There are four main factors to consider and avoid if you want to create less biased algorithms.

People feed the algorithm biased data

A lot of the applications that have a risk of biased outcomes rely on data from past (human) decisions to tell the algorithm what it is looking for. Basically, there are people who could face a more desirable or less desirable outcome depending on the decision of the algorithm. Algorithms frequently take things to extremes and so if there is just a smattering of bias in that input data, it tends to be exaggerated in the algorithm.

Anything you can reasonably do to make the input data less biased will make it easier to avoid bias through the rest of the process. This might look like careful review of the input data, balancing the data set, or anything that takes out known biases.

People design it with bias (on purpose)

We can see that algorithms 100% can lead to bias in the outcomes with a simple thought experiment.

Biased algorithm: Give marginalized groups the bad outcome and non-marginalized groups the good outcome.

There is nothing about this algorithm that makes it “not an algorithm” and therefore not part of the title of this article. What this algorithm doesn’t do is care about the usefulness of the outputs except that they enhance bias. Fortunately, bias tends to be an unfortunate side effect rather than the purpose of decision-making processes. But in the wrong hands, it is not hard to specifically add bias to an algorithm.

People design it with bias (on accident)

Depending on how careful the data scientist is, accidentally designing an algorithm with bias can look an awful lot like doing it on purpose. Loan applications is an example of a regulated industry where there are rules about which data cannot be included in your decision-making. As one example, race is not allowed to be part of the algorithm. However, zip code is allowed to be part of the algorithms and is a decent proxy for race.

As a not-a-lawyer I would argue that using zip code in credit decisions certainly seems like designing your algorithm with bias on purpose. In order to counter that, mathematically you have to lean into something like affirmative action to then incorporate race as a positive factor (which in this specific illustration would violate the regulations). Constructing the math to make outcomes un-biased while still incorporating biased factors is very challenging, especially when you consider all the ways someone can be marginalized.

In short, there is certainly some room to say that there are algorithms that are designed to be biased accidentally. In a perfect world, the companies deploying these algorithms would have an incentive to weigh the increased predictive accuracy compared to the harm to marginalized groups. When the only outcome you care about is profit, it makes sense to ignore everything else. But as a member of society, it would be nice to know how much it is actually helping before we agree that knowingly biased algorithms are ok.

The ML thinks it’s really clever when it identifies very predictive bias in the data.

This is now one step further down the path of “the data told me to.” If you feed a bunch of data into a machine learning algorithm, the algorithm’s job is generally to pick the smallest set of features (pieces of data) that lets it do a good job predicting the outcomes. We tell it to use the smallest set of features it can because that usually helps it generalize better (People named Zohar Strinka are predicted to like to knit with 100% predictive accuracy, but it isn’t very useful for predicting anyone outside the training data).

The rule of thumb I learned at an INFORMS Annual Meeting conference a couple years ago was to evaluate the data before you feed it to the algorithm. Specifically they suggested a policy to ask “is it reasonable to ask the individual in question to change this data in order to get a different decision?” The usual protected classes clearly fail this question “we’ll approve your credit application if you change your age, sound good?” However, it seems like a good litmus test for some of the more borderline data as well. “Move to a different zip code, everything else can stay the same, and we’ll approve you.”

Conclusions

Algorithms are not biased because they love to marginalize people. In general the problem comes down to incentives. Companies are incentivized by profits and not a need to remove bias from their models (except as required by law or public opinion). Algorithms and frequently people developing those algorithms are incentivized by predictive accuracy alone. Avoiding bias then is about specifically removing the opportunity (by eliminating data that has bias hiding in it) or by explicitly trying to counteract biases that exist.

In either case, there is likely to be a lower predictive accuracy in part because someone from a marginalized group will probably face other biases that could negatively affect their ability to repay a loan or succeed in a job. The question to us as a society then is how much lower profits is equity worth?

Reply

or to participate.