In a recent article published in the Journal of the National Cancer Institute, a group of academic radiologists from McGee-Womens Hospital, at the University of Pittsburgh Medical Center, studied the influence of computer-aided detection (CAD) on screening mammography recall and cancer detection rates.1 This article is thought-provoking on many levels, and the author will examine a few of its more general propositions and implications.
Is Academic Experience Relevant?
There is no question that academic medicine has and continues to make great contributions to the practice of medicine; a contribution that goes well beyond the training of the next generation of physicians. In some instances, however, the academic experience has limited correlation to the day-to-day clinical practice of medicine. It is important, therefore, to carefully examine the environment in which a study is performed, and to take the circumstances that are found in that environment into consideration, before extrapolating its results into other practice environments.
Academic practice, even academic clinical practice, rarely duplicates the conditions in which private practice medicine takes place today. Physicians in academic practice are shielded to a substantial extent from pressures imposed by low reimbursement and those that flow from the need to run a hospital or medical practice as a viable business. Studies of new technologies or techniques may appear quite relevant on the surface, but may bear little relevance in a clinical practice setting.
The McGee-Womens study illustrates just such a challenge. In this study, 24 academic radiologists reviewed more than 115,000 screening mammograms over a 3-year period. Approximately one half of these mammograms were interpreted without CAD, with the balance interpreted with the assistance of CAD. A subset of seven radiologists interpreted 82,000 (71%) of these mammograms, with the split between those interpreted without CAD and those interpreted with CAD being 54% and 46%, respectively. Each member of the latter group of radiologists interpreted more than 8,000 mammograms during the period of the study, a number well above the MQSA minimums, which would qualify them as breast radiologists. Their patient recall rates ranged from 7.7% to 14.9%, a fairly wide range. The cancer detection rate for this group of radiologists averaged 3.61 and 3.45 per 1,000 screening mammograms reviewed without and with the assistance of CAD, respectively.
Differences found in cancer detection rates between those interpretations that were aided by CAD and those that were not were negligible in the McGee-Womens study. Other studies have found increases in cancer detection rates from 12% to 19.5% using CAD. Why is there such a difference? Perhaps differences in the construction of the studies explain the difference in results, therefore modifying the implications for clinical practice.
A review of the aforementioned studies reveals some fundamental differences in approach that may be important. The McGee-Womens study involved two different series of consecutive screening mammograms, while the Bandodkar2 and Freer3 prospective studies were structured in a manner that had the interpreting physician first review the screening mammogram in a conventional manner, recording his or her findings, then activate the CAD markers, reinterpret in view of any markers presented, and report the findings with CAD. The latter process is exactly the way that CAD is utilized in clinical practice, with the exception that the unaided result is not typically recorded.
CAD’s value is to shift detection earlier in time, an activity that is measurable by studying changes in reporting for the same series of cases, with the same radiologists, a key characteristic of the Bandodkar and Freer studies. The McGee-Womens study measured the changes in “cancer detection rate” from the period before CAD was adopted to the period after it was adopted. Given the long time period encompassed by the two legs of the study, what the McGee-Womens study called the cancer detection rate was actually more of a cancer incidence rate in its screening population, which will not change appreciably over the time intervals involved in this study.
The authors of the McGee-Womens study speculate that the protocol used in the Bandodkar and Freer studies “may have introduced a lower level of vigilance among radiologists during the initial interpretation without [CAD], because they knew that [CAD] would be available to them for the final recommendation and that the initial recommendation did not constitute a formal clinical recommendation.” That comment raises the question of which study more accurately reflects the exigencies of clinical practice.
CAD is both a clinical and a work-flow tool. It should be evaluated under standard operating conditions. When it has been evaluated in this manner, it has been shown to add value to the interpretive process, in a manner that is correlative of the early retrospective studies that initially established the potential of CAD.4-7 The Bandodkar study had its origins in the clinical practice at Stanford, accounting for the lower (12%), but still very positive, increase in cancer detection, when compared with the Freer study (19.5%), which was generated from a purely community-based breast practice in Plano, Tex.
Notwithstanding the logic of valuing studies that compare or evaluate technologies under actual operating conditions, there is still a question as to why the McGee-Womens study found virtually no difference between interpretations that were unaided by CAD, and those that were aided. The answer may partially be found in the fact that radiologists in academic practices are seldom under the same level of pressure to interpret large numbers of mammograms as in busy clinical practices. This can afford a luxury of time that can lead to a more thorough interpretive review and that should be reflected in both higher sensitivity and specificity in the interpretation of screening examinations. Subspecialization can also improve the interpretive skills of the breast radiologist over those of the general radiologists in private practice.
There is another consideration as well, and that is in the nature of CAD itself, and in the way that radiologists adapt to using CAD. CAD, particularly the early software versions, had relatively low specificity, reflected in the number of false-positives, or marks that identify regions of interest (ROI) that are not malignancies. Physicians typically reject most ROI that are marked by CAD, and one of the most frequent complaints about the early CAD software was that it produced too many false-positives. Later versions have improved the specificity of CAD, but CAD is a detection tool, not a diagnostic tool, and there will always be marks that are rejected. CAD’s purpose is to increase the sensitivity of the interpreting physician’s review. It cannot assure, however, that the interpreting physician will not reject marks over regions of interest that are actually early cancers. Only a retrospective review would indicate whether valid marks had been rejected.
Interestingly, the construction of the Bandodkar and Freer studies, which specifically called for the participating physicians to record their pre-CAD findings before going on to CAD, rather than making these physicians less vigilant, as speculated in the McGee-Womens study, probably encouraged more attention to CAD than might otherwise have been the case, allowing the full value of CAD to be realized. In this respect the methodology of these studies just might provide a model protocol for physicians and practices seeking to optimize their utilization of CAD.
It is important for breast practices to consider the source and the content of studies that appear in the medical literature. Studies that evaluate the effectiveness of clinical technologies need to themselves be evaluated in terms of the environment in which they are developed and pursued. And while the McGee-Womens authors concluded that CAD did not positively affect the performance of the interpreting physicians, CAD has been subjected to review in several sound, prospective, clinical studies modeled on the way that the technology was designed to be used, which have established the benefit of CAD in assisting radiologists in detecting breast cancer earlier, when it is more treatable. There is, in fact, preliminary evidence that radiologists using CAD regularly in their practices become more effective in their interpretive skills. A study of this observation would be valuable in assessing the long-term value of CAD for mammography.
Gerald R. Kolb, JD, is president of Breast Health Management Inc in Bend, Ore, [email protected] In addition to consulting on work-flow issues in breast centers, he consults for several vendors of digital imaging equipment, including computer-aided detection.
- Gur D, Sumkin JH, Rockette HE, et al. Changes in breast cancer detection and mammography recall rates after the introduction of a computer-aided detection system. J Natl Cancer Inst. 2004;96(3): 185190.
- Bandodkar P, Birdwell RL, Ikeda DM. Computer aided detection (CAD) with screening mammography in an academic institution. Preliminary findings. Presented at the 88th Scientific Assembly and Annual Meetings of the Radiological Society of North America, December 16, 2002, Chicago, IL. Radiology. 2002;225(P):600.
- Freer TW, Ulissey MJ. Screening mammography with computer-aided detection: prospective study of 12,860 patients in a community breast center. Radiology. 2001; 220:781786.
- Warren Burhenne LJ, Wood SA, D’Orsi, et al. Potential contribution of computer-aided detection to the sensitivity of screening mammography. Radiology. 2000;215:554562.
- Birdwell RL, Ikeda DM, O’Shaugh-nessy KF, et al. Mammographic characteristics of 115 missed cancers later detected with screening mammography and the potential utility of computer-aided detection. Radiology. 2001;219:192.
- Vyborny CJ, Doi T, O’Shaughnessy KF, et al. Breast cancer: importance of spiculation in computer-aided detection. Radiology. 2000;215:703.
- Brem RF, Schoonjans JM. Radiologist detection of microcalcifications with and without computer-aided detection. A comparative study. Clin Radiol. 2001;56:150154.