Sunday, June 12, 2005

Microsoft speech recognition

I’ve been experimenting with speech recognition technology that’s built into Microsoft Office. The technology has definitely improved since the last time I tried this, which was about four years ago. Having said that, it will probably be another four years before speech recognition is actually a productivity tool rather than a novelty. Still, all things considered, the technology has made great strides. Not bad for a technology that more than one of my professors declared flat out impossible when I was an undergraduate.

After dictating the two paragraphs above, I had to make a half dozen corrections with the keyboard. Even so, it might have still been faster than typing it in by hand. The software is built into Microsoft Word, so all I needed was a microphone. I bought a refurbished Logitech model that plugs right into one of my keyboard's USB ports. The microphone is great; the software is a work in progress.


This is day two of my experiment, and Microsoft cautions me to expect about 85 percent accuracy at this stage. With practice and some more training, I’m told that accuracy can improve to 95 percent. Microsoft offers you a choice of training materials including excerpts from H. G. Wells’ science fiction classic The War of the Worlds, as well as the forward from the less-than-classic The Road Ahead by Bill Gates. A little case of corporate brown-nosing, I guess.

I’ve only tried to type Chinese want to twice before giving up in frustration. The general process involves typing words been using a phonetic alphabet. Once you have spelled a complete sound, the computer displays a menu with all the characters that sound just like it. This menu usually has five or six or a dozen characters. See you can imagine how tedious typing in Chinese is. Assuming speech recognition continues to improve, I can see as scenario in which it becomes the primary way Chinese speakers interact with computers.

The promise of making computers dramatically easier to use for the world's most populous country is one of the reasons Microsoft has set up shop in Beijing to attack this opportunity.

My parents seemed very excited by the demonstration I showed them this afternoon. though my mom seemed more excited by what she thought was that she could speak in Chinese and the computer would translate into English. I suppose that's in the realm of possibility, but it's probably another five years out after they get the voice recognition part working first.

Bottom line: if you want a cheap way to see how far speech recognition has progressed, get yourself a microphone, fire up Microsoft Word, and click on Speech in the Tools menu. You don't have to be cognitive science groupie like me to appreciate the technology. And the program's attempts to understand what you're saying are worth a few laughs, reminiscent of the Apple Newton and Go PenPoint days.

By the way, Microsoft has added a speech recognition capabilities in awaited other applications can use it. For example, I am currently dictating this text into Firefox so I can publish it here on my blog.