The Word's Out

As speech recognition systems evolve, radiologists find the technologies are increasingly useful tools to save time and costs.

In?the?early?days?of?speech?recognition?technology?users?of?the?software?had?to?speak?slowly?and?clearly. And?even?them (then)?there?were?many?misidentified?worlds (words).

Not so today. Speech and voice recognition systems have come a long way, and while no technology translates everything perfectly, expanding capabilities have increased both usefulness and effectiveness. Transcription is cleaner, and turnaround time faster. The introduction of structured data and integration with other databases, such as the EMR (electronic medical record), can increase value further and help provide a quick return on investment.

?Buying speech recognition is costly, but luckily, that return on investment (ROI) is very high. Integration with the EMR can shorten the return on investment from 6 months or a year to 2 or 3 months,? said Stephen Willis, chief information officer of Greensboro Radiology, Greensboro, NC.

The return on investment results from savings in both time and cost. ?The way most radiologists see [ROI for voice recognition technology] is twofold: one, they have a dramatic reduction in transcription costs; and two, the turnaround time for reports drops tremendously. It?s almost a greater benefit in some ways than the reduction in transcription costs,? said David S. Mendelson, MD, FACR, chief of clinical informatics, professor of radiology, and director of radiology information systems, pulmonary radiology, at The Mount Sinai Medical Center (MSMC) in New York.

The reduction in transcription costs is, however, significant on its own. ?Transcriptionists are more expensive than a speech recognition system for a facility of any size,? Willis said, citing additional personnel costs, such as benefits and vacations, which contribute to the extra expense.

Yet even with the costs associated with maintenance of a voice recognition system??you do have to pay people to support it,? Willis acknowledged?Greensboro Radiology has seen transcription costs drop by ?at least half? when implemented within the eight hospitals served by the radiology service provider. For some organizations, this savings can reach hundreds of thousands of dollars.

At the same time that Greensboro Radiology helps client hospitals achieve these savings, it uses voice recognition technology in-house as a profit center. The medical diagnostic imaging and interventional radiology practice charges clients on a per-report fee structure. ?From our original purchase price, we make 300% to 400% profit annually,? Willis said, adding it?s a win-win situation for all.

Improved Accuracy

That win extends to patients and physicians as well, in the form of faster turnaround time. Willis notes that in addition to the financial benefits realized with the implementation of a voice recognition system, the turnaround time for reports has been reduced by 90%.

?Once you enter a preliminary report using a speech recognition system, if you so choose, you can make it instantaneously available through a variety of electronic interchange mechanisms. That was just a huge benefit we saw right away. Clinicians came down and were very appreciative that they had results waiting for them in advance of being ready to use them,? Mendelson said.

To maximize efficiency, facilities want to select the system that does the best job. ?There is only one thing that matters when looking for a voice recognition system and that is how good a job it does converting your speech to text on paper,? Willis emphasized.

The difference between a system that is 99% accurate and 99.7% accurate is immense, according to Willis, who notes that nearly all reputable speech recognition systems boast recognition rates greater than 90%. ?You don?t think it?s annoying to have a single mistake made over and over again if it?s only one mistake per hour, but to a radiologist, who thinks about efficiency all day, it becomes paramount for the recognition system to be highly accurate,? Willis said.

David S. Mendelson, MD, FACR

Stephen Willis

He recommends buyers demo numerous systems to find the speech engine that best translates their voice(s) to text. ?We did find a large variance in how annoying some systems could be in making the same mistakes over and over again,? Willis said.

These errors tended to be independent of accent. Greensboro Radiology?s 50 radiologists hail from several different nations; even those from within the United States come from different regions and do not share the same dialects.

If a speech engine does consistently misidentify a word, specific algorithms addressing the problem can be overlaid on the original program to facilitate correction. ?For instance, a lot of these products initially wrote down the word ?number? when you really wanted the pound sign. Now they?ve got little [fixes] so the system can understand the context and insert the pound sign instead of the word ?number,?? Mendelson said.

Lexicons, the dictionaries integral to the speech engine, are also more specific, often tailored to a specialty. ?When a radiologist is dictating, if they say ulna, that makes great sense. It?s a body part, and it?s something a radiologist would say. ?Umbrella? is not something a radiologist would ever say and is, therefore, not in the dictionary, so the system would never confuse those words,? Willis said.

The accuracy simplifies training. Radiologists at Greensboro Radiology find they are finished with training within 2 to 15 minutes.

Greater accuracy also means less self-correction and faster report creation. ?The entire workflow is expedited. Having results available earlier advances the patient?s care to the next step earlier,? Mendelson said.

In Plain Language

?The whole concept is about saving time,? Willis said. Today?s speech and voice recognition makers are taking the concept further, using structured data to enter report information automatically without even requiring the physician to speak.

?You want your voice recognition system to be able to look inside your medical records as a whole?the HIS, the RIS?and pull appropriate patient data into your reports, populating them in the right places. So, for instance, in radiology, all we have to do is say, ?Here are the findings,? ?Here is the impression,? and we?re done. You don?t want the radiologist having to repeat information that everybody already knows,? Willis said.

When the Greensboro radiologists open their voice recognition and PACS programs, they find the report they are creating already begun, including information such as the patient demographic information, the exam name, the exam technique, and contrast levels.

The use of structured data allows the information to be more easily transferred, but integrating the multiple systems successfully is hard work. ?It?s not nearly as simple as recognizing speech and throwing it in text. A lot of practices come to visit us and see what we?ve done, and we find them a little surprised that you can?t just plug the two together. We do a lot of behind-the-scenes work to make sure that happens,? Willis said.

WHAT VERSUS WHO?

Despite the fact that the terms are often used interchangeably, speech recognition and voice recognition refer to two differing technologies:

• Speech recognition systems recognize words or phrases independent of who is speaking. This technology works best with a large number of users. Automated telephone services and voice automated computer menus typically use this type of system.

• Voice recognition systems are speaker dependent and must be trained to recognize the pronunciation techniques and speech patterns of each user. This is generally accomplished with the individual stating a specific series of words and phrases so that the system recognizes the user?s voice. This typically allows for larger vocabularies than possible with speech recognition systems. Dictation software often employs voice recognition technology.

The installation, implementation, and integration processes took the radiology service provider 6 months and involved creating an integration interface that incorporated the multiple exam code dictionaries and ordering methods of its eight hospital clients. ?It?s important to have people who know what they are doing if you want to do a major rollout of a speech recognition system,? Willis said.

Data Mining Tools

The investment can be worth it, particularly if the information is used beyond reading reports. When data is stored digitally in one place, it can be more easily collected and compared. Greensboro Radiology provides a quality analysis on the back of each report with updated data. Quantitative numbers are produced on turnaround (by physician and workstation), volume, and quality.

?We?re able to do a full radiologist-effectiveness survey across modality, across the workstation they sit at, across the facility they work in, and across the types of exams they read,? Willis said. The quality analyses help to ensure report accuracy as well as physician ownership of the results.

As health care providers become increasingly accountable for demonstrating quality and cost-efficiency, the ability to mine data is expected to help drive adoption in larger markets. ?Structured data is the holy grail of most clinical informatics research and work because it means you can then extract the information easily and analyze it. We have a rich data-collection environment, and we have a lot of information available to us that we are barely touching but could really help engineer cost-efficient care and quality care,? said Mendelson.

Within a hospital, such as Mount Sinai, data mining can identify areas for improvement in patient care. ?You can have decision support based on the data presented in the EMR, recognizing for which patients you might expedite certain services and tests and which patients you will need to follow up,? Mendelson said.

Say When

Speech and voice recognition companies are currently releasing and developing systems that translate voice and text into structured data that can be shared with the EMR. This is a first step to widespread use of speech recognition.

Currently, these technologies have thrived within specialty fields, such as radiology. ?The biggest success for these products has been in the world of radiology. We have a very easy-to-grasp use case. We have an apparently easy vocabulary. And we?re a contained environment with a limited number of users,? said Mendelson.

Expanding the use of speech or voice recognition beyond smaller user groups may present challenges related to larger vocabularies, more users, a greater number of workstations, varying skill levels, and interfacing. ?Let?s say we have 75 potential users in the radiology department. We probably have 1,500 potential users of our EMR,? Mendelson said.

Yet, the integration of speech recognition for EMR users is likely inevitable, in part because without it, the value of the technology will wane. However, it may take another decade to see this application become mainstream.

Voice recognition technology has only recently become prevalent in radiology. Many potential users, such as the radiologists of Mount Sinai, waited for evidence of success stories. The medical center signed an agreement for PACS, which included speech recognition in 2001, but did not take advantage of it until 3 years later.

?Given that transcription is such a key part of the radiology business, many people sat on the sidelines to wait for these solutions to become more refined rather than put in place a system that might jeopardize their ability to deliver service,? Mendelson said.

Many radiologists no longer feel this way. Willis feels the field can be described as having fully adopted speech recognition technology. ?In the disciplines, like radiology for instance, where turnaround time is a factor or macroeconomics come to bear, it is universally accepted: ?we?re doing it, we?ve done it, we?ve been doing it for a while,? or ?we are going to have to do it and we?re doing an ROI study, an RFP, or something along those lines to bring it in-house,?? Willis said.

Meanwhile, other disciplines, such as family practice and emergency care, are just beginning to explore the technology. ?As far as the rest of health care, I think we?re seeing the early adopters starting to grab hold of the technology,? said Willis. These new users, however, will not?have?to?start?slow.

Renee Diiulio is a contributing writer for Axis Imaging News.

FINDING A VOICE

There are a number of speech and voice recognition systems available to radiologists. Some of the more popular programs on the market include:

Dragon Medical (www.nuance.com) by Nuance Communications Inc. Program vocabularies have been created for nearly 80 specialties and subspecialties, including radiology. Physicians are able to dictate in real time into the EMR and make reports instantly available. Associated systems, such as RadCube, RadPort, and RadWhere, help tailor a system?s capabilities to the facility?s specific needs, including data mining, multiple system integration, and smaller environments.

SpeechMagic (www.speechmagic.com), a speech recognition solution developed by Philips Speech Recognition Systems and acquired by Nuance, is the speech engine behind GE Centricity, the RIS/PACS system offered by GE Healthcare, for a Citrix Application Delivery Infrastructure. The program captures dictated information and generates formatted and structured data. Mount Sinai has implemented this system successfully for transcription, but has not yet taken full advantage of data mining capabilities available through other Nuance systems, such as RadCube.

SpeechQ for Radiology (www.medquist.com) is a voice recognition system offered by MedQuist. It has been designed to seamlessly integrate with most leading RIS and PACS. It is able to improve recognition with continued individual use, format documents, and prepopulate information. Workflow can eliminate transcription altogether or implement a review step. Greensboro Radiology has used its MedQuist systems for dictation, report generation, and data mining successfully.

M*Modal?s Speech Understanding (www.mmodal.com) technology translates physician dictation in real time into a searchable, structured document. The system learns by adapting to edits and can handle a variety of dictation styles. The service-oriented architecture is highly configurable and uses platform-independent, thin-client, user-interface components to integrate the use of voice recognition into workflow.