Handling data bias

Original Source Here

Handling data bias

A journey towards ethical AI

Photo by Louis Reed on Unsplash, with edits from author

If you also carry a vision of ensuring that the product you are working on follows all the written rules of “AI for good”, then you would have definitely encountered a situation where your data is biased.

Biased models, biased data, or biased implementation — are typical woes of a Data Scientist’s life. So first we need to understand and acknowledge that bias exists and can take any shape and form.

Yes, bias is a broad term and it can be present in the data collection, algorithm, or even at the ML output interpretation stage.

Created by the author using PowerPoint

Why does bias hurt?

bias can lead to disparate access to opportunities on the grounds of several human characteristics such as race, age, or gender and should be discouraged

As per the AI index report by Stanford University, AI/ML organizations construe the following risks as prevalent to the industry and are trying hard to mitigate such risks as they are detrimental to their business and humanity in general.

AI Index Report

Data bias can be of many forms:

  • Structural Bias: The data can be biased purely because it is at the disposal of structural differences. The representation of women synonymous with the nurse, cook, the teacher is apparently emanating from societal construct. An e-commerce giant tried to build a recruitment tool that picked up the nuances of their existing staff, which was needless to say, biased. A lot of attributes such as sports, social activities, achievements, etc were picked by machines that led to a biased tool with a preference towards men.
  • Data Collection: Possible reasons for the bias in data collection could be based on time of the day, age group of people, country of origin, the strata of class, etc. Data fed to the algorithms should be continuously updated to reflect the true picture of the world we live in and in turn of the future state of the world that we want to make predictions on.
  • Data Manipulation: It is easier to drop the instances with no label attached or the ones with missing values. But it is important to check whether the observations being eliminated are leading to misrepresented data specific to gender, race, nationality, and related attributes.
  • Algorithm bias: The algorithm will learn what the data pattern suggests it learn. The algorithm either mirrors the prevalent biases or to our worst fear, amplifies them. If the judgments have been biased towards a particular group of people, so does the machine learn from the training data. The bias in algorithms stems from the data which is either not a correct representative or is sprouting from the existential prejudices. If the input data is imbalanced, then we need to ensure the algorithm still sees sufficient instances of minority class to perform well on it. There are multiple ways to achieve data rebalancing, primary ones include synthetic data creation, or assigning class weights so that the algorithm puts a higher penalty on each wrong prediction made on minority class.
  • Implementation bias: All ML models are built on the fundamental assumptions that the train and test dataset should belong to similar distribution. A model trained on summer season data might have different feature distribution and hence will not be an appropriate fit to predict consumer behavior in the winter season. The model will only do good if the new data is similar to the data observed in the past on which the model was trained. Not just implementation, but the interpretation can also be biased. What if we, in our pursuit to analyze the algorithm output, try to superimpose our beliefs and support our (biased) view.

While bias is one of the factors to be mended in our pursuit of an Ethical AI framework, it is certainly not trivial to mitigate.

Some of the important aspects to build an “AI for good” ecosystem is:

  • The data collector, developer, and product manager are generally the people who work in the field and are closer to the data. It is important for organizations to sensitize their employees and spread awareness about the possible causes of bias and how to mitigate them
  • Having an expert (AI Ethicist) who is adept at identifying the sources of bias can help businesses align their vision with the Ethical framework
  • A governance team comprised of people from different teams like privacy, ethics, and compliance, product, and engineering will help provide a fresh perspective at identifying the conceivably ignored biases.

There is no one rulebook that can be read and implemented at once, it is an ever-evolving framework.

Further, it is commendable that the efforts in maintaining an unbiased, fair, and trustworthy AI framework are not seen as esoteric anymore and are garnering the right attention across the world.


Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot

%d bloggers like this: