Category Archives: Multimodality

Design and Business – The skills you’ll need in the future

I recently ran across the fantastic short film “Design the New Business” which you can watch online. It talks about the trials and tribulations designers and business people go through when they work together in new ways to solve challenging problems facing businesses today (sounds familiar?). In particular, I really appreciate the fact that it covers business and designers outside the US: The Netherlands, Germany, Switzerland, Spain, Australia, UK, which in my opinion gives a much broader perspective on the state of design nowadays.

Few key points that stroke a chord with me:

1) Fail to Learn

Days of designing in the vacuum are gone. We need to experiment – see what works and what doesn’t work as early as possible so we can switch and adapt. We as designers have to change and find new ways to do things. That’s the only way we’re going to be able to create value for people, and for our partners and customers. Also, we need to realize that design isn’t linear (which becomes pretty obvious in multi-modal projects), meaning solutions nowadays have to consider different users, with various needs and lifestyles, as well as external systems we all need to interact with.

2) Skills for the Future

What skills will we need in this new world? How do we stay relevant and demonstrate the value of design to businesses? Fact is aesthetics are no longer enough. Any company that just worries about redesigns, improvements and optimizations is not going to survive for very long. Aesthetics are but just one outcome of the design process. Pure observations without interpretation are useless. And because it is as much about numbers as it is about users, social insights and understanding should be feeding the design so that it concentrates on creating something new, that satisfies the needs and desires of users, that yields an engaging experience that fulfill their expectations, while at the same time making sure they have a positive impact on our customer’s business bottom-lines.

3) The Era of Service Design and Innovation

Prices and Products are easily imitated, so I guess the key form of differentiation for us [Virgin] moving forward will be experience innovation.

Applying proven solutions from the past to new problems doesn’t work anymore because problems are now different (please read as – “VUI design solutions we’ve implemented in the past might not work anymore!!!). The solution that allows you to think differently is Design. One interesting way of thinking mentioned in the book is that during design, we should start by exploring multiple solutions and problems, and that the last thing we should define is the problem we’re trying to solve, to make sure we identify the right one to pursue. On this point, I feel that we very often rush to identify the problem with the [fill in the blank – requirement, prompt, grammar, text, code, etc.] instead of taking a step back and exploring the universe of solutions and problems.

And of course, design cannot happen in isolation anymore; designers now have to work with cross-functional teams, that take into account lots of variables, until the design reaches a good level of maturity and is ready to go out into the world.

Finally, here are some links for further reading based on some of the things discussed throughout the movie:

Enjoy, and would love to hear your thoughts on these ideas, your experiences, etc.

SpeechTEK – Multimodal Interaction Design Slides

I just realized that for some reason the digital handout for my presentation isn’t available on SpeechTEK’s site.
While I sort that out, I though about proactively posting the deck for anyone wanting to download a copy.

The session is entitled “Lessons in Multimodal Interaction Design”, and particularly, the topic I’m going to cover is “The Coexistence of IVRs and Small Screens”. If you’re attending SpeechTEK, I would love to have you join us tomorrow, August 3rd, at session D203 from 1:45 pm – 2:30 pm.

See you there!

Speech and Mobile Usability

A very interesting report from Nielsen was recently published highlighting some of the challenges mobile users face when accessing web information.

Aside from the sad news about average success rates being around 59%, it was interesting to me to see how most of the Mobile Problems outlined in the report can be actually seen as opportunities to seriously consider the use of Speech Recognition.

I know most companies suggest Speech Recognition as the killer app for mobile devices, but I would argue that it should be seen instead as the ideal complementary mode of interaction when navigating the internet and retrieving information on mobile devices, not as the silver bullet that would solve all mobility hurdles.

For example, thinking about speech in the context of those problems raised in the report:

  • Small screens: Yes, small size is a natural result of being portable. Yet, having a limited number of options at any given time and relying on short-term memory are the bread and butter of most Speech Recognition Systems. Therefore, adding an audible element and allowing users to express themselves in more natural ways helps compensate those visual limitations. Furthermore, multislot interactions and natural language understanding help alleviate the challenge of multiple windows and advanced behaviors present in purely visual interactions.
  • Awkward input (especially for typing): Once again, Speech Recognition shines here since it’s the facto way of interaction amongst humans. Words can easily trump visual counterparts such as menus, buttons, and links not only because of how natural interactions are but also because it avoids the inherent limitations of tiny keypads, trackballs and mini-keyboards.
  • Download delays: Even though Speech cannot solve the problem of being able to download screens faster, it can help in those instances where information can be delivered in an audible form since users can continue to interact with the system and move along their intended goal since prompts and logic can be embedded in a device without requiring network connectivity or optimized and compressed for faster delivery.

Hello, this is your medication. Have you forgotten about me?

Outbound calling (meaning automated phone calls that go out to specific individuals) is a very profitable business that thrives at times such as this one when companies need to reach more consumers yet want to reduce the costs of making those calls since most of the time they are nothing more than the equivalent of “phone spam”.

Therefore, I’ve never been a big fan of these types of services, except for those situations where I know we’re adding value to the conversation. Those situations where we’re providing a benefit to consumers, particularly in win-win scenarios where both parties benefit from the interaction.

One product/service I recently found out about that does exactly that is GlowCaps Connect. GlowCaps are electronic pill caps that use some very clever means to ensure patients take their medicine at the times and frequency that they should.

So picture this. If you know someone that needs to manage a chronic disease like diabetes or depression, daily medications are essential for their well being. What this device does is that every day, at the prescribed time, the GlowCap uses a myriad of modalities to remind users and attract their attention. For example, it may flash a visual reminder which is followed by sound if the bottle is not opened within the first hour. If the patient still doesn’t open the bottle, then the cap triggers a phone call to remind them and can even send weekly updates to friends and family as well as send reports to the patient’s doctor with a monthly summary of the bottle’s activity.

So, to summarize, better prescription handling which can be rewarded with coupons and incentives, better healthcare management with the doctor, and an opportunity for pharmacies to handle automatic refills. Those are the types of calls I wouldn’t mind at dinner time.

Total Recall

I have to admit that in this time and age, even with all the advantages technology provides, when it comes down to keeping track of pending items, errands and to-do-items, I tend to stay in the analog world (read pen and paper).

So I was very excited to read about a new smart phone app which seems to be tackling this problem in a very clever way by integrating the best aspects of disparate technologies such as Post-It notes, email, calendars, and voice for free! (with a Pro option available too)

It is called reQall and the way it works is that you call a free number to add “items” via your voice or text such as notes, appointments and memos.  If using your voice, they use speech recognition software to transcribe your message into text so that based on your situation (time, location, etc.), the system can remind you of those items via email, SMS, IM or even a “phone shake”.

A very nice feature is that you can also share your account with other family members and friends, so they can enter reminders for you.  Mmm, I wonder why wives love this feature so much…

Voice Recognition and Mobile Search

Mobile search has been identified as one of those applications where Speech Recognition can become the killer app.  There are many instances in which speech recognition has been integrated with mobile devices, some of which include doing recognition embedded on the device, some others that perform the recognition on “the network” (a remote server farm) which then returns the results to the device, and some others rely on real human beings transcribing the contents of the request so they can be processed accordingly.

Then of course, comes the part of the search itself.  Some services for example, provide you with a list of links to Web pages (such as Google and Yahoo). Others, like ChaCha, uses humans to find the answers for you and then send you the response via text (and yes, you can become a “guide” for them). While some others attempt to integrate other features and capabilities of the devices such as the use of GPS and maps, or trigger subsequent reactions on other services such as changing your status in Facebook or Twitter (as is the case with Vlingo).

Now, if someone could simply find a way to use voice to find out where I left my phone, or my remote, or the car keys…

Microsoft Recite

Microsoft Recite

We’ve talked in the past about the use of speech recognition in the realm of note taking, where tools such as Jott allow you to obtain a text version of a voice message, making it easier to document and search for information.

Well, Microsoft just recently unveiled a new application of speech recognition, but this time with a twist. Microsoft Recite (available as a preview which can be downloaded) allows anyone using a Windows Mobile phone to record a voice message or “remembrance”, store it, and then retrieve it later using speech pattern recognition.

The obvious advantage of pattern recognition compared to other types of speech searches is that the message itself doesn’t have to be decoded, transcribed or converted.  It simply uses a “search” sample as a pattern to match one or more of the words against existing “remembrances”.

Even though initial test have received possitive feedback, I’m hoping they’ll expand the tool to include other devices and languages (it currently only works with US English).

Kindle 2.0 – An ebook “reader” in every sense of the word

It seems after all the criticism Amazon received on the user interface of it’s original Kindle, they’ve addressed not only some of the concerns but also took some of the suggestions which are now part of their second version of the device.

Some of those suggestions included adding speech-to-text capabilities to the Kindle 2.0, making it indeed an e-book “reader”.  I think this is a magnificent idea not only because it not only addresses how devices should evolve to support people with disabilities but also gives control back to the users on how to best interact with the information.

Alternate Reality Games and Android

As I mentioned here and here, one of the most appealing aspects of the G1 phone is the openess of the platform which allows developers to get really creative when it comes to apps that leverage all the features contained in the phone.

One company worth mentioning is JOYity which was recently covered by TechCrunch. They are leveraging the GPS capabilities of the phone, allowing users to engage in location-based games such as YouCatch, Roads of San Francisco and City Race Munich).

The most engaging by far is YouCatch which is an enhanced version of Manhunt. The concept is pretty simple: you and a handful of friends sign up to play the game, and then each one is randomly assigned a target, making everyone both a hunter and a target.

I hope they add voice features soon, which could allow you to play the game in a less obvious way (running around watching a phone screen kinda give you away) and maybe even team up with others for the hunt.

Here’s a quick review of the game and the interface:

3 Google Phone Lessons in UI Compromises

As a follow-up to my previous post, it was interesting to read David Pogue’s review of Google’s First Phone,  particularly in regards to some of the UI Compromises designers had to make on this first iteration of the Android-based phone:

  1. The Menu Button – This feature provides context-relevant options based on the current task.  David compares it to the functionality of a mouse right-button that offers commands like Hold, Mute and Speaker when you’re on a call.  It also offers next-step related commands such as Archive and Delete once you’ve read an email.  This is a great strategy I always like to implement, particularly on Voice User Interfaces where callers can only be presented with a limited set of choices, and there’s a clear set of task-related options that callers would be looking for without having to ‘go back’ to a so-called Main Menu.  In my mind, this should be renamed as the “Common-Sense Button”
  2. Two different programs for e-mail – Ouch, this one really hurts.  Granted Gmail has a different mental model and framework than other e-mail programs, I think this one shows a lack of understanding of what users look for: simplicity and efficiency We know complexity exists everywhere, but that complexity should be hidden, whenever possible, from the UI and the user interaction.  And to add insult to injury, it seems that replying to an email in the non-Gmail program puts your cursor in the To box…  I’m just glad they have an open architecture that allows anyone to improve these interfaces :)
  3. (Useless?) Tilt sensor – This has to be the weirdest one of them all.  If the phone contains a sensor similar to the one powering the iPhone, why did they not hook it to the screen?  The fact that someone is turning the phone 90 degrees should be enough indication of intent, so why put users through the extra step of making a menu selection or pressing a key?  This one feels like those menu prompts that first ask you to press 1 for “Arrivals or Departures Information” – which gives intent information, albeit not in an ideal way – followed up by an absurd follow-up menu asking you to “press 1 for Arrivals or 2 for Departures”.