I was recently asked the question about where did I see speech recognition moving towards in the near future. In my mind, the two directions I’m seeing have to do with who is using it and the reason they are using it.
On one hand, I see speech recognition becoming a critical player on the multi-modal market. This is the market dominated by tech-savvy, young customers, who are used to be always online, anywhere, using various devices, not only cell phones. Therefore, they are bringing to the phone a similar set of expectations from what they experience online. This can be observed in the amount of activity taking place from a Web 2.0 and social media perspective, where users are being given the choice of their preferred input method, whatever it might be – text, audio, video, etc. – while being offered innovative applications of the technology as well as well-crafted “Mashup”applications and services.
And on the other, I see speech recognition evolving from a server architecture towards a networked model where the cell phone simply becomes the equivalent of a ‘web browser’ that gives them access to a whole suite of services that exist on the network cloud. This can be observed in the amount of consolidation activity taking place in our industry (Nuance and Bevocal, Microsoft and Tellme), in the integration of speech recognition as part of the platform itself or the Operating System (as seen in the recent Nuance competition and Microsoft’s Vista), as well as in the new offerings from startups such as Mobeus who will soon offer speech to text capabilities for cell phones.
As explained in this video, Mobeus follows the network-based model in the sense that the cell phone contains a small ‘client’ that performs the voice capture (end-pointing, compression, etc.), which then processes the utterance on a network of powerful servers and then returns the results back to the client. This is a similar play to what AT&T has been offering its wireless subscribers as the #121 service (aka “Voice Info”) which is basically a shortcut into Tellme’s services.
What do you think? Are you seeing other trends in our industry? Which of these models do you think will ultimately survive?