Mobile search has been identified as one of those applications where Speech Recognition can become the killer app.  There are many instances in which speech recognition has been integrated with mobile devices, some of which include doing recognition embedded on the device, some others that perform the recognition on “the network” (a remote server farm) which then returns the results to the device, and some others rely on real human beings transcribing the contents of the request so they can be processed accordingly.

Then of course, comes the part of the search itself.  Some services for example, provide you with a list of links to Web pages (such as Google and Yahoo). Others, like ChaCha, uses humans to find the answers for you and then send you the response via text (and yes, you can become a “guide” for them). While some others attempt to integrate other features and capabilities of the devices such as the use of GPS and maps, or trigger subsequent reactions on other services such as changing your status in Facebook or Twitter (as is the case with Vlingo).

Now, if someone could simply find a way to use voice to find out where I left my phone, or my remote, or the car keys…

Just when you though that having a single phone number that would ring all your phones, coupled with a central voicemail inbox accesible from the web, including the ability to screen calls by listening in live as callers leave a voicemail for free couldn’t get any better, Google does it again.

That’s right. Google is revamping their GrandCentral system (which we’ve talked about before) and changing it’s name to Google Voice.

Aside from the privacy concerns that have been popping up everywhere in the blogosphere, the system and it’s features have received rave reviews and praise for the enhancements added to the platform. The one I like the best? Transcription of voicemail into text of course! That way you can read your voicemail at leisure, copy/paste them, search for specific terms, etc.  Details about all the features are available here.

The official Google blog has more details about it and Mr. Pogue has a great video showing it in action.  Even though it’s currently only available to existing GrandCentral customers (what can I saw, I got lucky ;) ), you can still request an invitation for when Google Voice becomes available to the public sometime next week.

Microsoft Recite

We’ve talked in the past about the use of speech recognition in the realm of note taking, where tools such as Jott allow you to obtain a text version of a voice message, making it easier to document and search for information.

Well, Microsoft just recently unveiled a new application of speech recognition, but this time with a twist. Microsoft Recite (available as a preview which can be downloaded) allows anyone using a Windows Mobile phone to record a voice message or “remembrance”, store it, and then retrieve it later using speech pattern recognition.

The obvious advantage of pattern recognition compared to other types of speech searches is that the message itself doesn’t have to be decoded, transcribed or converted.  It simply uses a “search” sample as a pattern to match one or more of the words against existing “remembrances”.

Even though initial test have received possitive feedback, I’m hoping they’ll expand the tool to include other devices and languages (it currently only works with US English).

The proverbial dilemma, user-centered design calls for simplicity, yet business requirements often conflict with that principle. The result? Watch, remember, and cry…

Enjoy!

Smashing Magazine just published an article about Useful Techniques for Good UI Design, that even though concentrated on Web Applications, I felt there were a few techniques that could be reviewed with an eye towards VUI design:

  1. Highlight important changes
  2. And by this I don’t mean “Please listen careful for our menu options have changed”. Here we’re talking about system status visibility - knowing where you are and that actions led to expected results. Implicit confirmations are a good example of this.

  3. Enable shortcuts in your application
  4. The key concept here is to offer users more responsive user interfaces. In the case of the phone, this can be somewhat compared to having “touch-tone equivalents” which allow expert users to breeze through applications without having to wait to be prompted for input. But here I would also argue that other things would fall under this same principle: good command selection, “shadow” options that even though not prompted for can be supported if tuning data backs up the decision, and even “parallel” choices which allow someone picking one branch to swiftly change “branches” without having to “go back” or return to a “main menu”.

  5. “Upgrade” options
  6. I feel this would be similar to having functionality available for “expert users”. Proper use of barge-in as well as multi-slot support allows someone to interact in more “advanced” ways with a system keeping such a transition simple and intuitive.

  7. Advertise features of the application
  8. And no, sending out “user manuals” for applications does not qualify for this. Here we would be talking about pointing out “new” features or even “advanced” features that are available. As with most non-task-oriented information, this should be presented only after the user has successfully completed their task.

  9. Use “color-coded” lists
  10. Items appearing together are an issue, no matter what type of interface we’re talking about. Therefore it is really important to put attention to the different tools at our disposal to avoid this issue: voice intonation/emphasis (such as making menu options clearer), word choice (avoid confusion amongst different choices), silences (natural pauses to allow mental processing of the information), etc.

  11. Offer personalization options
  12. I’m sorry to say this has been out there for quite a few years for most interfaces, yet voice user interfaces hardly ever offer them. It may be harder to do than with visual interfaces (where you can change colors, rearrange items, etc.) yet small things like offering only relevant choices based on someone profile or remembering the last tasks they’ve performed go a long way in making users feel like they “own” the system.

  13. Display help messages
  14. Help is definitively one of those topics that has been debated a lot, and like most controversial ones, whether or not it should be used and how it should be used can be summarized in two words: “it depends”. Personally I feel help should not be a stand-alone piece of a design (whether it is a tutorial or a separate collection of help prompts) but rather an integral part of design – commands offered should be helpful to a user, error messages should add relevant information to help users recover, transition messages should help users know where they are and where they are going, etc.

  15. Design feedback messages carefully
  16. Here we can think about transition messages (exit prompts) and error recovery prompts. Here, there are various things that can be done to make them more effective: control voice intonation so as to convey the right meaning, reword questions and statements so as to convey the same meaning without simply “reprompting” users (if the problem had to do with users not understanding you in the first place, simply repeating is not going to help at all)

  17. Use tabbed navigation
  18. This is one of those areas where visual design has an advantage over non-visual design. Nevertheless, good information architecture can help obtain similar results, where applications reflect user’s mental models making it easier for them to know what choices mean and how the system works. This makes users feel at ease since they feel more in control.

  19. Darken background
  20. Conversely, here non-visual design has the advantage since all interactions happen in sequence. No need to “gray out” parts of our design since by the time we move on with a new “task”, all previous information now belongs to the past.

  21. Lightboxes and Slideshows
  22. The concept here is to allow users to navigate back and forth. In the visual world there are a lot of tricks to reduce the noise and allow a user to focus on a particular item. In the case of non-visual design, there are ways to do similar things when the system has been properly architected, which should include consideration of “go back” behavior that would contextually match a user’s mental model, as well as proper “sub task” definitions so that if users need to make a change or update a certain piece of information, they don’t have to start all over again or get confused about how to do it.

  23. Short sign-up forms
  24. This is another one where non-visual design has the opportunity to leverage information. The whole notion here is to minimize the effort needed to identify a user and to speed up the process. On one hand, this means taking advantage of what technology has to offer: ANI identification, DNIS separation (ideal for language selection), CTI (please, please, please don’t loose everything that the user has accomplished so far). And on the other, there are design techniques such as the removal of “optional” elements if that allows a user to proceed (and especially if that avoids confusion and distraction from the user’s real goal).

With more people spending more time watching video on their computers than on TV, it makes sense to have a XUI – A channel for User Interface Design. Enjoy!

It seems after all the criticism Amazon received on the user interface of it’s original Kindle, they’ve addressed not only some of the concerns but also took some of the suggestions which are now part of their second version of the device.

Some of those suggestions included adding speech-to-text capabilities to the Kindle 2.0, making it indeed an e-book “reader”.  I think this is a magnificent idea not only because it not only addresses how devices should evolve to support people with disabilities but also gives control back to the users on how to best interact with the information.

We’ve heard about the global phenomenon of a population that is aging, yet there are very little talks about what design strategies should be used for them.  Other than the classical stereotypes - louder volume, slower pace yet not condescending, pitch within a certain range, more information, etc. – the impact this tech-savvy population will have on how interfaces are being designed is still vague and uncertain.

Quick fact: currently, the population under the age of 5 years old exceeds those above the age of 65. It is estimated that within the next 8 years, that trend will reverse. By 2050? They will double the number of people under the age of 5.

Source: U.N. Department of Economic and Social Affairs, Population Division

So, if we were to consider all the things that will change around us to accommodate those users – from healthcare products and services to how stores are arranged and houses are build – it should be easy to realize the wide range of possibilities and opportunities to improve interactions – from how information is provided to what types and levels and services are expected both from automated self-service solutions as well as live human beings.

Maybe getting serious about the topic and continuing research will finally take over the assumptions and stereotypes that so often appear in most user designs…

My appologies for being away for so long. Between the Holidays and the January blues, I’ve been crazy busy (which considering the current situation is something I’m thankful for).

Some of those projects have brought in new pieces of information which I’m looking forward to share and discuss – multimodal design and usability, speech-recognition in automobiles, dialog design for senior citizens, new trends in international design, etc.

I know we’re used to relate the notion of IVRs with arcane self-service over-the-phone systems and IVR jails, yet a company called Moshi found away to leverage the notion of “Interactive Voice Response” in a totally distinctive way.

The Moshi IVR Alarm Clock is the first one to my knowledge that allows you to set the time and the alarm by using your voice. To start interacting with it, you simply say “Hello Moshi” and the clock responds with “Command Please” (I know, a little VUI help never hurt anyone). It currently supports a list of 12 commands including things such as “time”, “set alarm”, “temperature” and “help” (apparently “help” still has its uses).

A demo is currently available at the Moshi website which shows how the phone responds to various commands and Endgadet has some more details about it. Personally, I think it’s pretty cool, plus the price is not bad either ($50). But from a design perspective, I think it’s just a shame they didn’t invest a little bit more in having better sounding prompts (with a professional voice talent), which combined with the use of more natural, concatenated prompting, would’ve yield much better results (let’s face it, anyone still concatenating time in the form of “six” “o’clock” “a m” is being a lousy designer).