How Bias Can Creep into Medical Databanks That Drive Precision Health and Clinical AI

In the race to harness medical data for artificial intelligence tools and personalized healthcare, a new study shows how easily unintentional design bias can affect those efforts. It also points to specific ways to increase the chances that patients who are traditionally underrepresented in research can be included in the massive banks of genetic samples and data from digital medical records that underlie these efforts.

Not only could that be important to the accuracy of the tools based on those data, but it would also make it more likely that they’d benefit diverse patient communities. The study, in the December issue of Health Affairs, comes from a team at the University of Michigan (U-M) and Michigan State University that studied U-M’s efforts to build a large bank of data and samples for researchers to use.

Key Findings

The study focuses on the Michigan Genomics Initiative (MGI), which originally designed its recruitment effort around approaching patients to donate a small amount of blood for the research biobank when they were waiting for surgery at Michigan Medicine, U-M’s academic medical center. Trained MGI recruiters aimed to approach all adult surgical patients in the preoperative setting during typical surgical hours.

There were several reasons why MGI used this approach—including the fact that patients in such settings have time to engage in recruitment and enrollment procedures, and that they often already have an intravenous line placed in preparation for their treatment, so it’s convenient to draw a blood sample for research use if they consent.

But the new study found that that the pool of surgical patients from which MGI staff recruited were more likely to be older, white and socioeconomically advantaged men when compared to the general Michigan Medicine patient population. In addition, when approached, patients who consented to enroll in MGI were younger than the average patient waiting for surgery, and less likely to be Black or African American, Asian, or Hispanic.

The result: The blood samples collected for the biobank came from a sub-population that was less demographically diverse than Michigan Medicine’s overall patient population.

Changing the Approach

While recruiting surgical patients remains a key component of MGI’s recruitment strategy, Precision Health has since expanded its recruiting efforts to include a mail-in saliva-collection kit—giving a broader patient population the opportunity to engage in the research if they choose. Precision Health’s MY PART effort aims to recruit a nationally representative study population into the university’s biobank.

The authors hope that by sharing their deep-dive into differences in recruitment and consent rates, they can help other institutions, organizations, and companies design more equitable databanks of their own. If they don’t, all the tools and products that will emerge from research using those databanks will reflect demographic biases and make them less accessible or generalizable for underrepresented communities, the researchers say.

“We know that large research datasets often do not reflect the diversity of the patient population across the United States, but our study gives a detailed analysis about how these disparities become embedded in scientific advances from the ground up,” says Kayte Spector-Bagdady, JD, MBE., co-first author of the new paper and a research ethicist at Michigan Medicine.

“This way we were able to highlight practical improvements that we could implement immediately,” she adds.

Downstream Effects

Spector-Bagdady, a U-M Medical School assistant professor who is the Associate Director of U-M’s Center for Bioethics and Social Sciences in Medicine, led the study along with senior author Jenna Wiens, Ph.D., one of the co-directors of Precision Health and an associate professor of computer science and engineering at the U-M College of Engineering. Both are members of the U-M Institute for Healthcare Policy and Innovation.

“A lot of the research that goes on in precision health, machine learning, and AI for health care across the country leverages data from the electronic health records of major health systems, and data from the subset of patients who have consented to give biospecimens,” Wiens says. “For an AI researcher who builds machine learning and clinical decision support tools, generalizability is so important. Otherwise, we risk building tools that perpetuate disparities in care and outcomes.”

Levels of Consent Unlock More Precision

The authors note that many academic medical centers, including Michigan Medicine, inform patients when they consent to receive care that their medical records might be used by researchers. At U-M, such use is permitted with authorization from the Institutional Review Boards at the Medical School.

Taking part in MGI involves consenting to allow those records to be used in conjunction with a sample of their DNA. For instance, researchers might analyze part of their genetic sequence and look at how their genetic traits relate to conditions they have or how well they do when given certain treatments.

This is a powerful tool for understanding what drives certain diseases, or what treatments work best for people with different characteristics who have the same type of cancer, for instance. It could also form the basis for AI tools that can predict which patients will suffer certain complications, or help doctors pick from among various treatments for them.

Using just the Michigan Medicine electronic medical record data would mean capturing a patient population with more demographic diversity, but does not offer patients the same research-level informed consent as the biobank consent process.

Records-based research also means less precision for some studies, because it doesn’t include the ability to study genetic variation and biomarkers — such as proteins in the blood that could be associated with disease. That means biobank teams must go to extra lengths to recruit people from groups that are less likely to give consent.

“Building long-term trust between healthcare systems and those underrepresented in biobanks, and the research enterprise in general, is a task that must be prioritized. Any attempts at equity building must be hyper-localized, attentive to historical neglect, and situated in justice considerations beyond the research question,” added co-author Melissa Creary, Ph.D., an assistant professor at the U-M School of Public Health.

Making it clear to participants how their data will be used if they give consent, including any commercial uses, and being careful about sharing data with industry is crucial for earning trust and is already a top priority at U-M. “There’s an important tension between respecting patients’ informed consent and also supporting generalizable research,” Spector-Bagdady says. “The ideal resolution is a structure that doesn’t put those two in tension to begin with.”