According to an open-access article in the American Journal of Roentgenology (AJR),convolutional neural networks (CNN) trained to identify abnormalities on upper extremity radiographs are susceptible to a ubiquitous confounding image feature that could limit their clinical utility: radiograph labels. 

“We recommend that such potential image confounders be collected when possible during dataset curation, and that covering these labels be considered during CNN training,” writes corresponding author Paul H. Yi, MD, a musculoskeletal radiologist and imaging informaticist at the University of Maryland’s Medical Intelligent Imaging Center in Baltimore.

Yi and the team’s retrospective study evaluated 40,561 upper extremity musculoskeletal radiographs from Stanford’s MURA dataset that were used to train three DenseNet-121 CNN classifiers. Three inputs were used to distinguish normal from abnormal radiographs: original images with both anatomy and labels; images with laterality and/or technologist labels subsequently covered by a black box; images where anatomy had been removed and only labels remained.

For the original radiographs, AUC was 0.844, frequently emphasizing laterality and/or technologist labels for decision-making. Covering these labels increased AUC to 0.857 (p=.02) and redirected CNN attention from the labels to the bones. Using labels alone, AUC was 0.638, indicating that radiograph labels are associated with abnormal examinations.

“While we can infer that labels are associated with normal versus abnormal disease categories,” the authors of the AJR article add, “we cannot determine the specific aspect of the labels that resulted in their being confounding factors.”

Featured image: Grad-CAM heatmaps for deep learning models trained on (A) original radiograph, shows emphasis on laterality and/or technologist initial labels; (B) radiograph with label covered by black box, shows emphasis on anatomic features, such as bones. (Colors toward red end of spectrum indicate greater emphasis, whereas colors toward blue end of spectrum indicate less importance.)