Speech Recognition/Health Care

This learning resource is about speech recognition in health care

Learning Tasks

(X-Ray Report) In this learning task we discuss the job of a medical doctor that analyses several X-ray images of patients and dictates a summary about the diagnosis. A medical documentation or report about an x-ray image has a special vocabulary. How would you limit the vocabulary to have high accuracy in the dictation result without the need of too much manual correction?

Medical Documentation and Reports

In the health care sector, speech recognition can be implemented in front-end or back-end of the medical documentation process. Front-end speech recognition is where the provider dictates into a speech-recognition engine, the recognized words are displayed as they are spoken, and the dictator is responsible for editing and signing off on the document. Back-end or deferred speech recognition is where the provider dictates into a digital dictation system, the voice is routed through a speech-recognition machine and the recognized draft document is routed along with the original voice file to the editor, where the draft is edited and report finalized. Deferred speech recognition is widely used in the industry currently.

One of the major issues relating to the use of speech recognition in healthcare is that the American Recovery and Reinvestment Act of 2009 (ARRA) provides for substantial financial benefits to physicians who utilize an EMR according to "Meaningful Use" standards. These standards require that a substantial amount of data be maintained by the EMR (now more commonly referred to as an Electronic Health Record or EHR). The use of speech recognition is more naturally suited to the generation of narrative text, as part of a radiology/pathology interpretation, progress note or discharge summary: the ergonomic gains of using speech recognition to enter structured discrete data (e.g., numeric values or codes from a list or a controlled vocabulary) are relatively minimal for people who are sighted and who can operate a keyboard and mouse.

A more significant issue is that most EHRs have not been expressly tailored to take advantage of voice-recognition capabilities. A large part of the clinician's interaction with the EHR involves navigation through the user interface using menus, and tab/button clicks, and is heavily dependent on keyboard and mouse: voice-based navigation provides only modest ergonomic benefits. By contrast, many highly customized systems for radiology or pathology dictation implement voice "macros", where the use of certain phrases – e.g., "normal report", will automatically fill in a large number of default values and/or generate boilerplate, which will vary with the type of the exam – e.g., a chest X-ray vs. a gastrointestinal contrast series for a radiology system.

As an alternative to this navigation by hand, cascaded use of speech recognition and information extraction has been studied^[1] as a way to fill out a handover form for clinical proofing and sign-off. The results are encouraging, and the paper also opens data, together with the related performance benchmarks and some processing software, to the research and development community for studying clinical documentation and language-processing.

Therapeutic use

Prolonged use of speech recognition software in conjunction with word processors has shown benefits to short-term-memory restrengthening in brain AVM patients who have been treated with resection. Further research needs to be conducted to determine cognitive benefits for individuals whose AVMs have been treated using radiologic techniques.[citation needed]

↑ Suominen, Hanna; Zhou, Liyuan; Hanlen, Leif; Ferraro, Gabriela (2015). "Benchmarking Clinical Speech Recognition and Information Extraction: New Data, Methods, and Evaluations". JMIR Medical Informatics 3 (2): e19. doi:10.2196/medinform.4321. PMID 25917752. PMC 4427705. //www.ncbi.nlm.nih.gov/pmc/articles/PMC4427705/.

[1] Suominen, Hanna; Zhou, Liyuan; Hanlen, Leif; Ferraro, Gabriela (2015). "Benchmarking Clinical Speech Recognition and Information Extraction: New Data, Methods, and Evaluations". JMIR Medical Informatics 3 (2): e19. doi:10.2196/medinform.4321. PMID 25917752. PMC 4427705. //www.ncbi.nlm.nih.gov/pmc/articles/PMC4427705/.

[1]