An overview of software programs that transcribe recorded dictations.
What is speech recognition software:
Also referred to as voice recognition software, refers to an application that converts speech to computer text.
Types of speech recognition applications:
1) Command systems: only recognize a few hundred words.
2) Discrete speech: used for dictation but require a pause between words.
3) Continuous speech: systems allow for continuous (conversational) speech.
2005 Market Overview:
Today, the most popular continuous speech engine is Dragon NaturallySpeaking owned by Nuance (previously ScanSoft). IBM ViaVoice desktop dictation products were once a market contender, however in 2003 IBM granted distribution rights to Nuance, their largest competitor. Today Nuance distributes, sells, and manages the ViaVoice dictation products along with their Dragon NaturallySpeaking line.
Microsoft released a speech recognition feature as part of their Microsoft Office suite in 2003. The program is a bit simplistic in comparison. However the technology is worth watching as you never know when Microsoft will dedicate resources to development and corner the market.
The Basics:
The desktop dictation products mentioned above are "speaker dependent". This means they require an "enrollment" to utilize the software. Enrollment more commonly referred to as training, only takes a few minutes but is an important part of the process. The user is asked to read a selected article into the system to complete training. Based on the information, the system creates a user profile for the speaker. Once the profile is complete, a user can start dictating or transcribing their recorded memos/dictations.
When creating a user profile there are multiple input options, for example, headset microphone, array microphone, or a digital voice recorder. While the MS Speech engine does not support mobile devices, Dragon NaturallySpeaking and ViaVoice offer integration with mobile devices/ digital voice recorders.
Note: The software can only recognize one speaker at a time. It does not support the transcription of multiple speakers, for example, recorded conversations.