by Renee Diiulio
Speech recognition technology is entering a next-generation cycle, where the future of artificial intelligence is poised to leverage information in new ways.
For some physicians, watching their spoken words become written text on the screen before them in near-real time is a futuristic technology still in the future. Yet for the greater portion of radiologists, this technological “wonder” is already old school. Some have been using it for many, many years.
“[Radiologists] started using PACS 10 or 12 years ago, and they’re used to using technology. If they can get a report back to their customers quicker and have the level of quality be very high or even better than before, that’s a competitive advantage—and why we’ve seen that adoption in radiology,” said Ben Brown, general manager of imaging informatics and medical equipment research for KLAS, Orem, Utah.
Now, next-generation technologies are poised to bring speech recognition to new levels, new disciplines, and new competitions. “It’s a mature product, and as it continues to improve, more people are going to be using it,” said Gerald Roth, MD, chief executive officer and president of Tower Imaging Medical Group in Los Angeles.
Demand is driven by the usual factors: the need for greater efficiency and improved customer service, the requirements for high quality and compliant reporting and documentation, and the ability to accommodate increased volume with decreased cost. “Physicians who are on the fence regarding adoption [of speech recognition technology] should realize that, in most instances and especially in a busy or time-critical practice, it can really make a huge difference,” Roth said.
These benefits manifest themselves in multiple ways that not only balance corresponding barriers to adoption but also can actually produce a tangible return on investment. This is often seen in shorter report turnaround times and decreased personnel costs, in terms of transcription. Of course, the intangible benefits can be just as beneficial.
The altered workflow creates efficiencies that help to facilitate the process of reading exams and delivering a report despite potentially taking more of the radiologist’s time in the chair up front self-editing. Integration with other information systems can help to eliminate steps in the workflow by importing data automatically into reports, helping to save time and improve accuracy. On the data output end, developers have begun to leverage natural language processing (NLP) to enable the creation of mineable data that can be used in many other ways to expand efficiency or improve care in other areas of the health care continuum.
Front-End Time, Back-End Savings
Don Fallati of M*Modal
The biggest issue facing the adoption of speech recognition technology today is the perception regarding the change in workflow that results with its use. Many radiologists who are still holding out worry that the software, particularly self-editing features, will turn them into transcriptionists. However, two types of speech recognition processes exist, one of which allows the physician to replace the transcriptionist with an editor—meaning self-editing is not required.
“This method doesn’t impact the physician workflow at all,” Brown said. Radiologists still dictate as they normally would, and the dictation is run through a speech engine, which translates the voice file into text. A transcriptionist, although more accurately called an editor, reviews the typed document for inaccuracies in the electronic translation and returns it to the physician for signature.
This process does result in time savings but does not offer the same savings seen when the physician self-edits. Roth and his colleagues now self-edit nearly 100% of reads despite the extra time required at the front end. This investment can range from no additional time spent self-editing on a normal case to an extra 10% to 20% longer on a much more complex case.
However, it is more than made up with the elimination of the separate transcription/editing step. “With a transcriptionist as backup, you lose some of the improvement in turnaround time. We eliminated that preliminary step to get filed reports out very, very fast,” Roth said.
The increased speed is the result not only of the fact that the report is completed in one read cycle, but also of the radiologist’s ability to work more efficiently (and, therefore, with less frustration in some instances). One of the most noticeable changes with a switch to speech recognition is the reduction of calls (and related interruptions) from physicians or nurses wanting results, particularly those that are normal or expected. The rapid turnaround time means many reports go out before the calls come in.
“Many of these [results] can be easily looked at by the doctor and don’t require an interaction,” Roth said. Subsequently, those calls that do get made to the radiologist are advanced conversations, discussing the findings that both physicians have already seen.
Similarly, returning to a read after taking these calls is also easier with a written document to refer to. “When you’re dictating the conventional way, you have to rewind the tape or voice file,” Roth said, noting it is much easier to catch up on-screen.
These benefits may be realized as early as the demonstration stage, when leery physicians are won over by the back-end editing option. “Their concerns about workflow changes are generally overcome with demonstrations and shared experiences from colleagues,” said Don Fallati, senior vice president of marketing and product development for M*Modal in Pittsburgh. The flexibility is appreciated, and many physicians, as they become more comfortable with the software, eventually elect to self-edit, Fallati added.
Pulling Information Automatically, Pushing Data Rapidly
Another big bonus that can be seen with the right speech recognition demonstration is the immediate population of some fields in the radiology report with data from the information system, meaning the physician does not have to repeat what the database already knows.
This was a key feature for Roth and his team and factored heavily in the selection of PowerScribe, the speech recognition system of Burlington, Mass-based Nuance Communications. Using natural language understanding (NLU) technology, the system offers a feature called tokens that pulls information from the RIS to populate the radiology report, such as the examination date and time or the clinical history. The system, using another feature titled Power Normals, will also match the exam type to corresponding billing codes (eg, CPT and hospital-specific) and offer normal template options for the dictation.
“So if a two-view chest x-ray comes in, the system recognizes it and lights up a button specific to that exam type and another to choose a normal dictation. But this all happens behind the scenes and with the power to override,” Roth said.
Speech recognition software has traditionally been a stand-alone system, but as information technology advances, there is greater opportunity for stronger integration among all information databases. “Over the years, the demand for speech recognition software to be very tightly integrated with PACS has become a requirement. And as we get into a next-generation cycle of PACS itself, we’ll see a corresponding cycling in speech recognition,” Fallati said.
Speech Recognition vs Speech Understanding
The next-generation technologies can be expected to recognize speech even better, although there are still some challenges with electronic transcription, and every system has its unique stumbling blocks. The initial experience may be a little slow going.
“The training and adoption curve can be a real challenge depending on which vendor or which product they partner with,” Brown said. Much of the difficulty lies in training the software to recognize the physician’s voice.
But it is certainly “good enough,” said Roth, adding, “We’ve really embraced the technology. Obviously, the better it gets, the more we like it.”
In general, physicians would still like to see greater accuracy in the recognition of small words, as well as dialects and accents, according to Brown. Individuals may have specific complaints, such as a particular word that is always misunderstood, but with accuracy rates often above 95%, these glitches are not deal breakers.
Rather, it is some of the extended features on which tomorrow’s speech recognition systems may compete and which require a new definition of the technology. NLP or NLU, used interchangeably, represents “a semantic, linguistics-based technology that is statistically based like speech recognition but goes beyond word spotting,” Fallati said. NLU applications can read and understand the meaning and context of electronic text, thereby enabling the identification of medical concepts, facts, and data from narrative.
“You therefore have structured information without having had to enter it into structure-constrained fields as with a template or giving the system verbal cues,” Fallati said. The immediate benefit is reversion back to a more natural, conversational style of dictation; the long-term benefit is the ability to produce structured reports that can mine the data for research or drive clinical activity.
Fallati offers an example: If the PQRI—Physician Quality Reporting Initiative—requires three specific pieces of information for a particular examination read and one of the items is missing, an intelligent system will identify this error and call attention to it with an alert. “The radiologist is now able to agree or not, but in either case, they can complete the dictation and produce a much higher quality document,” Fallati said.
Today’s Advances, Tomorrow’s Tools
These technological advances leverage the benefits already associated with speech recognition to provide even further competitive advantage for users. Although the artificial intelligence capabilities are still in their early stages, physicians already want more on both the input and output ends.
Roth would love to see more captured electronic information automatically entered into dictation reports, such as comparison study data and modality-specific facts (eg, type and dose of IV contrast, amount of CT radiation exposure, ultrasound organ measurements, etc). “So the major thing I’m looking for is the system to pull in more data that’s already available in electronic form,” Roth said.
However, Roth also wouldn’t mind self-correction features that utilize artificial intelligence to pick up on internal inconsistencies, for instance, switching from the left to the right foot mid-dictation after an interruption.
And, of course, the ability to mine that data would be another plus. “How do we take this raw data that a speech engine is able to translate into text and then turn that text into data that can be searched against, that’s normative, that can be standardized to leverage as business intelligence or with best practices and evidence-based medicine protocols?” Brown asks.
Expansion of these types of artificial intelligence abilities is very likely going to be the future of speech technology, particularly with greater adoption of the EMR and the requirements for meaningful use. By all accounts, we can expect these systems to keep getting smarter.
Renee Diiulio is a contributing writer for Axis Imaging News.