From the course: Introduction to Auditing AI Systems

Data for auditing AI

- [Instructor] There are two main methods for auditing AI systems. The first method is to take a sample of prior predictions, then add demographics for each individual and calculate if adverse impact exists in model outcomes. The second is to manually run completely new data through a model and track the results along protected classes. For this method, we need to acquire benchmark data to perform an audit. Data used to benchmark AI systems are crucial. This data should include information about protected classes and sometimes their proxies. Protected classes are groups given specific legal protections, outlined in prior laws. Ideally, benchmark data should be representative of the populations a model would be used on. Some common proxies for race include zip code and skin tone while height and weight are often proxies for gender. These proxies can make it hard to identify bias issues because they can still discriminate without a protected class existing in a dataset. Data for AI audits can come from various places, including a company's existing data, data purchased from data brokers or collected for the purpose of auditing. Now, let's imagine we're auditing a hospital's emergency room triage system. The existing data used in the model includes patient records, which have vital sign information as well as details like weight, race and age. So how would we test how this model makes predictions about new patients? Without access to existing data, we'd struggle to find public data with a demographic information like what we're looking for. So we'd embark on a data collection journey. Auditors can often rely on demographic data from data brokers, as they're more likely to have this PII, or personally identifiable information. Sometimes auditors can leverage data collected from an AI system's production environment or leverage some synthetic data. Before engaging in the data collection process, auditors should pre-engineer the makeup of the data set. There are a few key aspects of viable benchmark data sets. First, benchmark data sets should be relevant to the problem and capture the characteristics of a problem's domain. These data sets should also be large enough to capture variations in protected classes. Next, good benchmark data sets have to be non-redundant and should not contain similar or greatly overlapping records. For classification problems, data sets should contain both positive and negative cases so that the capability of methods to distinguish between them can be tested. Benchmark datasets should also be somewhat balanced. So whether we have two classes or 20, we'd want to design benchmark data to have relatively equal sizes. Let's take, for example, a data set to determine someone's credit worthiness. This should be made up of nearly equal examples of credit worthy and non-credit worthy records. In order for audits to maintain credibility, communicating clearly about the assumptions made when designing benchmark data sets is highly critical.

Contents