Mitigating Bias in Predictions from Machine Learning Models
The increasing popularity of machine learning models in real-world automated and decision support systems has underscored the need for assessing and then mitigating biases that may manifest, often spuriously, in their predictions either at the population, sub-population, or individual level. These biases can be assessed in terms of calibration, performance stratification, fairness metrics, prediction interval coverages, etc., and are mainly due to poor model specification (e.g., overparameterization without regularization or loss/likelihood mismatch) or data collection issues (e.g., population misrepresentation or unmeasured confounders).
Overview
Abstract
The increasing popularity of machine learning models in real-world automated and decision support systems has underscored the need for assessing and then mitigating biases that may manifest, often spuriously, in their predictions either at the population, sub-population, or individual level. These biases can be assessed in terms of calibration, performance stratification, fairness metrics, prediction interval coverages, etc., and are mainly due to poor model specification (e.g., overparameterization without regularization or loss/likelihood mismatch) or data collection issues (e.g., population misrepresentation or unmeasured confounders). In this seminar we will explore three recently proposed approaches to mitigate bias in applications involving I) risk prediction, II) text generation, and III) image retrieval.
Brief Biography
Ricardo Henao, a quantitative scientist, is an Associate Professor in the Biological and Environmental Science and Engineering (BESE) Division, member of the Smart Health Initiative (SHI), at KAUST (King Abdullah University of Science and Technology). He is also currently an Associate Professor in the department of Biostatistics and Bioinformatics, Department of Electrical and Computer Engineering (ECE), member of the Information Initiative at Duke (iiD), Duke AI Health and the Duke Clinical Research Institute (DCRI), all at Duke University. The theme of his research is the development of novel statistical methods and machine learning algorithms primarily based on probabilistic modeling. His expertise covers several fields including applied statistics, signal processing, pattern recognition and machine learning. His methods research focuses on hierarchical or multilayer probabilistic models to describe complex data, such as that characterized by high-dimensions, multiple modalities, more variables than observations, noisy measurements, missing values, time-series, multiple modalities, etc., in terms of low-dimensional representations for the purposes of hypothesis generation and improved predictive modeling. Most of his applied work is dedicated to the analysis of biological data such as gene expression, medical imaging, clinical narrative, and electronic health records, with applications to predictive modeling for diverse clinical outcomes.