Summary: AI is rapidly transforming healthcare, particularly in medical imaging, but concerns over transparency, fairness, and demographic biases remain, as highlighted by a recent study showing that improved model performance does not automatically ensure equitable outcomes.
Key Takeaways
- AI models in healthcare, particularly medical imaging, can predict demographic factors like sex, age, and race, which may lead to biased predictions and potentially reinforce healthcare disparities.
- High model performance does not guarantee fairness; models can perform well overall but still be less accurate for certain subgroups, as shown by the 30% fairness gap found between elderly and young patients.
- Improving fairness requires careful adjustments, such as using more representative datasets and removing demographic information, but optimizing solely for fairness can impact other metrics like precision, and fairness may vary depending on the deployment location.
———————————————————————————————————————————————————————————————
Artificial intelligence (AI) is transforming healthcare, with applications ranging from robotic surgeries to electronic health record (EHR) analysis and medical image interpretation. As of August 7, the U.S. FDA had approved 950 AI/machine learning-enabled medical devices, with over 100 of these authorizations granted this year, showing an exponential increase in AI use.
Most of these approved algorithms focus on medical imaging, where AI can analyze complex data to identify patterns crucial for diagnosis and prognosis. However, the methods AI uses to make predictions are often unclear, raising concerns about transparency and fairness.
AI in Medical Imaging
To ensure these models work fairly across different patient groups, a team from MIT and Emory University trained over 3,000 models, exploring various configurations, algorithms, and clinical tasks. Their analysis, recently published in Nature Medicine, reinforced known biases in AI while also providing new insights.
One key finding is that “AI can predict sex from ophthalmology images and illustrate that AI can predict age and gender from dermatology images.” This points to a broader issue: if AI can detect demographic factors, it might use this information in its decision-making. This “heuristic shortcut” means the model might rely on attributes like race or insurance status instead of actual pathological features. As a result, models could make predictions based on demographic characteristics alone, reducing accuracy and potentially reinforcing healthcare disparities.
High model performance does not guarantee fairness. As study author Judy Gichoya, MD, explains, “Model performance does not automatically translate to model fairness. While high model performance is required for algorithm authorization, fairness is not always explicitly evaluated.” For example, a model might predict mortality from chest X-rays with high accuracy on average but still be less accurate for younger patients than older ones.
30% Fairness Gap in AI Radiology Models
The researchers found that a model’s ability to predict demographic factors often correlates with its fairness. Strong encoding of demographic features usually leads to a less fair model. For instance, when analyzing radiology data, the models accurately predicted a patient’s age 75% of the time. However, these models showed a 30% fairness gap between elderly (ages 80-100) and young (ages 18-40) patients.
To improve fairness, the researchers adjusted datasets to be more representative and removed demographic information, which enhanced fairness without significantly sacrificing performance. However, optimizing solely for fairness could affect other metrics like precision or calibration. As Gichoya explains, “There’s always a tension between model performance and model fairness. Blindly optimizing for fairness could ultimately hinder the model’s utility, rendering it less reliable when it makes its prediction.”
Another key insight is that fairness can be location-specific. Models optimized for fairness in one setting may not remain fair when deployed elsewhere. Although models maintained high performance across different data sources, their fairness was not guaranteed, indicating that changes in data distribution can impact fairness.
This study underscores the complex balance between model fairness and performance, highlighting the ongoing challenge of making AI equitable and effective in healthcare.