"Unintentional bias in Training Data"

Finland / Media / Blogi / "Unintentional bias in Training Data"

Blog post with our test experts Eva Holmqvist and Rik Marselis highlighting the fact that one of the things that are important to consider when you’re testing AI is the possibility that you have an unintentional bias in your training data. Because if there is, you’ll end up with a biased system…

So, how can this happen and what should you look for?

There is a very interesting youtube poem by Joy Buolamwini which highlights the ways in which artificial intelligence can misinterpret the images of iconic black women: Oprah, Serena Williams, Michelle Obama, Sojourner Truth, Ida B. Wells, and Shirley Chisholm. You can find it here: https://youtu.be/QxuyfWoVV98

This is just one example where the results of an AI-system only are correct for a part of the population. In this example, the same system is very accurate when it comes to identifying white males. The problem is most likely that they have used training data where white males where over-represented and black women were not represented enough. (for more information about bias in facial recognition algorithms see Joy’s TED talk here: https://youtu.be/QxuyfWoVV98)

Another example is a system aimed for supporting the recruitment of new employees that deems it necessary for the candidate to be male if he’s going to work in the IT department. In this case, the gender shouldn’t be a criterion for hiring at all, but whenever a system learns from historic training data there is a possibility for it to discover a pattern that isn’t relevant (in this case males working in IT), correlation doesn’t mean there is causality.

In the case you’re involved when selecting the training data, you, as a tester, should examine the training data for unintentional bias. You also could change the training data to be more representative or mask data that shouldn’t influence the decision.

When testing these systems, we don’t always have access to the training data. Therefore, we need to test if there is bias in the system or if the wrong thing determines the result. To be able to do so, we need to guess what could get wrong.

If it’s an image recognizing system, we should use images of different types of people based on for example sex, looks, age, skincolor and disabilities. We should also use images that aren’t of people at all, for example, animals and objects. We also need to test if it’s something in the background that determines the identification.

If it’s a text-based system, we need to examine what data determines the result. For example, in the example of the recruitment system determining if we should hire a person, the gender, religion, political views or sexual orientation shouldn’t influence the decision.

So, if you’re testing artificial intelligence, think about what unintentional bias could be in the system and test to discover those problems and enable them to be fixed before people get to be treated badly because the AI-system drew a wrong conclusion.

About Eva Holmquist (Sogeti Sweden) and Rik Marselis (Sogeti Netherlands)

Eva Holmquist is a senior test specialist at Sogeti. She has worked with activities from test planning to execution of tests. She has also worked with Test Process Improvement and Test Education. She is the author of the book "Praktisk mjukvarutestning" which is a book on Software Testing in Practice.

Rik Marselis is a test expert at Sogeti. He worked with many organizations and people to improve their testing practices and skills. Rik contributed to 19 books on quality and testing. His latest book is “Testing in the digital age; AI makes the difference” about testing OF and testing WITH intelligent machines.

CONTACT

Eva Holmquist
Senior Test Specialist
072-502 83 93

Eva Holmquist
Senior Test Specialist
072-502 83 93
Rik Marselis
Quality and Testing Consultant | Netherlands
+31 886 606 600

Rik Marselis
Quality and Testing Consultant | Netherlands
+31 886 606 600

Food for thought

Visit our online bookstore

Cookies	Description
Registered visitor cookie	Cookie given to each registered user.
Registered visitor functionality cookie	Cookies used to remember the unique identifier given to each registered user.
Social plug-in content sharing cookie	Cookies set by services such as Facebook Connect or Twitter Button, which allow social networks users to share the content of our websites on social networks.
Unregistered visitor cookie	Cookies used to give to unregistered users a unique identifier in order to recognize them and to analyze how they use the website.
Analytic cookie	Cookies used to store URLs of the previous page visited, enabling to track users navigating from inside or from outside the website. If you click on a Sogeti advertisement on a non-Sogeti website, a cookie may be used to log which website you are on, in order to ensure our advertisements are served effectively and to measure whether our advertisements are viewed. Google Analytics: cookies set by Google analytics are used for web analytical purpose, but are not used to track individual users. For further information on how Google Analytics collects and uses information on our behalf and the right to use such cookies, please refer to the Google Analytics products and services privacy statement. If you object to your Personal Data being collected by Google Analytics, you may download and install the Google Analytics Opt-out Browser Add-on. Pardot: cookies set by Pardot are used to track users on our website. Visits are tracked for known users only. Unknown users are recorded as anonymous users. Please refer to Pardot privacy policy for any further information on their use and your rights related to the use of such cookies.