Actions to move toward responsible AI
1. Know the data: from source to table
The first line of defense against algorithmic bias is to have a clear understanding of the reasons and ways in which data is being collected, organized, processed and prepared for model consumption. AI-induced bias can be a difficult target to identify, as it can result from unseen factors embedded within the data that renders the modeling process to be unreliable or potentially harmful.
2. Test data labeling and proxies
Apply testing rigor to measure pretraining bias and optimize features and labels in the training data. For example, equality of opportunity measurements can observe whether the consumers who should qualify for an opportunity are equally likely to do so regardless of their group membership. Disparate impact measurements can gauge whether algorithmic decision-making processes impact population subgroups disproportionately and thereby disadvantage some subgroups relative to others.5
3. Analyze results and identify key risk areas
Systematically investigate and study results from testing to identify key risk areas for bias in the modeling process. Tag material data points for human reviewers who can assess machine-based outputs and help to reclassify results for greater effectiveness. Train machine-learning models based on qualitative evaluations and then apply them to the entire population to assist in bias detection, along with documenting historical incidents of bias and monitoring against unfair practices.
4. Independently verify and validate fairness in modeling
Engage a third-party organization that is not involved in the development of data modeling frameworks. Assess whether each product has been designed to meet requirements and specifications (e.g., technical, compliance, regulatory, legal), and confirm that any unintended algorithmic bias and discrimination has been identified and eliminated.
5. Harness the power of synthetic data
Safeguard sensitive information in accordance with data privacy laws. Help to improve modeling strength and solve for data bias through meticulous manufacturing of synthetic (artificial) data that replicates events or objects in the real world and removes risky variables that can induce forms of digital discrimination against protected classes.