Category Archives: Speech Industry

Speech and IVR industry articles and topics.

Design and Business – The skills you’ll need in the future

I recently ran across the fantastic short film “Design the New Business” which you can watch online. It talks about the trials and tribulations designers and business people go through when they work together in new ways to solve challenging problems facing businesses today (sounds familiar?). In particular, I really appreciate the fact that it covers business and designers outside the US: The Netherlands, Germany, Switzerland, Spain, Australia, UK, which in my opinion gives a much broader perspective on the state of design nowadays.

Few key points that stroke a chord with me:

1) Fail to Learn

Days of designing in the vacuum are gone. We need to experiment – see what works and what doesn’t work as early as possible so we can switch and adapt. We as designers have to change and find new ways to do things. That’s the only way we’re going to be able to create value for people, and for our partners and customers. Also, we need to realize that design isn’t linear (which becomes pretty obvious in multi-modal projects), meaning solutions nowadays have to consider different users, with various needs and lifestyles, as well as external systems we all need to interact with.

2) Skills for the Future

What skills will we need in this new world? How do we stay relevant and demonstrate the value of design to businesses? Fact is aesthetics are no longer enough. Any company that just worries about redesigns, improvements and optimizations is not going to survive for very long. Aesthetics are but just one outcome of the design process. Pure observations without interpretation are useless. And because it is as much about numbers as it is about users, social insights and understanding should be feeding the design so that it concentrates on creating something new, that satisfies the needs and desires of users, that yields an engaging experience that fulfill their expectations, while at the same time making sure they have a positive impact on our customer’s business bottom-lines.

3) The Era of Service Design and Innovation

Prices and Products are easily imitated, so I guess the key form of differentiation for us [Virgin] moving forward will be experience innovation.

Applying proven solutions from the past to new problems doesn’t work anymore because problems are now different (please read as – “VUI design solutions we’ve implemented in the past might not work anymore!!!). The solution that allows you to think differently is Design. One interesting way of thinking mentioned in the book is that during design, we should start by exploring multiple solutions and problems, and that the last thing we should define is the problem we’re trying to solve, to make sure we identify the right one to pursue. On this point, I feel that we very often rush to identify the problem with the [fill in the blank – requirement, prompt, grammar, text, code, etc.] instead of taking a step back and exploring the universe of solutions and problems.

And of course, design cannot happen in isolation anymore; designers now have to work with cross-functional teams, that take into account lots of variables, until the design reaches a good level of maturity and is ready to go out into the world.

Finally, here are some links for further reading based on some of the things discussed throughout the movie:

Enjoy, and would love to hear your thoughts on these ideas, your experiences, etc.

SpeechTEK – Multimodal Interaction Design Slides

I just realized that for some reason the digital handout for my presentation isn’t available on SpeechTEK’s site.
While I sort that out, I though about proactively posting the deck for anyone wanting to download a copy.

The session is entitled “Lessons in Multimodal Interaction Design”, and particularly, the topic I’m going to cover is “The Coexistence of IVRs and Small Screens”. If you’re attending SpeechTEK, I would love to have you join us tomorrow, August 3rd, at session D203 from 1:45 pm – 2:30 pm.

See you there!

Are designers really necessary?

The role of “experience” designers and “user interface” designers has been much harder to justify than other designer disciplines such as graphic design or industrial design.

For that reason, I find it interesting that the topic of value added by designers has been coming up more and more often, particularly when customers are pretty adamant about designing systems/interactions themselves simply because they “know the business” or have been doing maintenance on an existing system “for a long time”.

Even amongst peers there has been debate recently about whether the industry has been either making systems “hard to build” in an attempt to retain control over those systems and to create dependency (aka. keep the money flowing) or not being as diligent when it comes to educating customers and allowing them to maintain their systems themselves.

In my opinion, I don’t think there any sort of industry conspiracy going on nor I see designers making things harder than they need to justify their jobs or to serve a hidden agenda.

I think part of the problem relies on the fact that our profession isn’t as well defined or as structured as other design professions, meaning that in our midst we have linguists, psychologists, engineers, designers, sociologists, cognitive scientists, human factors practitioners, etc. that even though share similar goals, can tackle a problem from very distinct approaches, with their own processes and even “vocabulary” which can explain some of the confusion customers might experience.

I think the other culprit is the current economic environment. Companies might be inclined to pick one technology over another simply based on cost, not on customer experience or interaction capabilities. Furthermore, companies are squeezing their budgets as much as they can while trying to keep more control over their projects.

I’m convinced that if they could design the solutions themselves, they probably would, but the truth is they simply can’t. But they don’t realize they can’t! So that’s where I think designers like us come into play to help them learn about our design processes and methodologies in a way that they may be confident enough to contribute, which in return allows designers to obtain very rich feedback out of them.

I really liked the way Mark Baskinger explained the differences he sees between industrial designers and interaction designers:

“[Customers] may think they are directing, but really what they are doing is learning, and as a designer we’re interpreting their direction as sort of boundaries, wishes and desires we can operate within to really challenge the opportunity and do some really good design.”

I think that if designers are conscious about this situation and continue to play the role of sounding boards that customers can leverage to bounce ideas off of, help plan strategies and the guide them through the process, the ones that will benefit the most are the ones that really keep us all in business — our users.

See you at SpeechTEK 2009

Oh yeah, it’s that time of the year again. If you’re planning to attend this year’s SpeechTEK in New York, please stop by and say hi.

Also, you can now look at the final version of the program. In particular, I would like to invite you to the following sessions:

  1. Introduction to Voice User Interface Design (STKU-2)

    Sunday August 23rd, from 1:30 PM to 5:00 PM. This workshop is designed to quickly get those new to VUI design up-to-speed so they can make the most of the Principles of VUI Design track at the conference

  2. Efficient Design (B102)

    Monday August 24rd, from 11:15 AM to 12:00 PM. Here we’ll talk about “Truths and Myths About Reusable Designs”. How can you design for reuse? Can user requirements be captured the standard way?

  3. Bilingual Spanish/English Design (B301)

    Wednesday August 26th, from 10:45 AM to 11:30 AM. Here we’ll talk about “How to Present Names of
    Geographical Locations in Spanish Systems”. Yes, listening for and capturing names of places seems like a trivial task, but what factors should be considered when making translation/pronunciation decision? What do those decisions say about you and your company?

Safe travels, and see you there.

Speech and Mobile Usability

A very interesting report from Nielsen was recently published highlighting some of the challenges mobile users face when accessing web information.

Aside from the sad news about average success rates being around 59%, it was interesting to me to see how most of the Mobile Problems outlined in the report can be actually seen as opportunities to seriously consider the use of Speech Recognition.

I know most companies suggest Speech Recognition as the killer app for mobile devices, but I would argue that it should be seen instead as the ideal complementary mode of interaction when navigating the internet and retrieving information on mobile devices, not as the silver bullet that would solve all mobility hurdles.

For example, thinking about speech in the context of those problems raised in the report:

  • Small screens: Yes, small size is a natural result of being portable. Yet, having a limited number of options at any given time and relying on short-term memory are the bread and butter of most Speech Recognition Systems. Therefore, adding an audible element and allowing users to express themselves in more natural ways helps compensate those visual limitations. Furthermore, multislot interactions and natural language understanding help alleviate the challenge of multiple windows and advanced behaviors present in purely visual interactions.
  • Awkward input (especially for typing): Once again, Speech Recognition shines here since it’s the facto way of interaction amongst humans. Words can easily trump visual counterparts such as menus, buttons, and links not only because of how natural interactions are but also because it avoids the inherent limitations of tiny keypads, trackballs and mini-keyboards.
  • Download delays: Even though Speech cannot solve the problem of being able to download screens faster, it can help in those instances where information can be delivered in an audible form since users can continue to interact with the system and move along their intended goal since prompts and logic can be embedded in a device without requiring network connectivity or optimized and compressed for faster delivery.

Google Voice and three truths about testing

Very interesting debate was triggered by the recent tests (Part 1 and Part 2) on Google Voice performed by readers of the Gadgetwise blog.

The overall premise was for readers to call the article?s writer phone number and leave a creative voicemail to gauge the effectiveness of Google?s voicemail transcription system.

Even though at first glance this seems to be another “why speech recognition isn’t ready for prime time” type of article (yes, I know the author claimed they wanted to test the boundaries of the technologies, but as many readers pointed out, individual items such as the president?s name aren?t that far fetched and should?ve worked), I think it also brings up some interesting issues often faced when testing speech recognition systems:

1) What are we testing? - This one very often depends on who you?re talking to, particularly on the business side of things. Some team members look care about containment rates (how many individuals stay in a system without having to talk to an agent), some others care about transfer rates (the increase or decrease in the volume of calls going into the call center), while some others care about customer experience (how long does it take for someone to solve accomplish their goal), and even a few (sorry to say upper management included) call systems to see how well they work when given odd statements or commands, or even worse, how close the system matches their particular expectations (without taking into consideration how the system was designed in the first place). So for me, this is one of the most important aspects of any system, which should be captured as part of the requirements phase – knowing what project owners want to test allows you to stir your design in the right direction (and push back when necessary as early as possible). For example, in the case of these Google Voice tests, there were some very interesting comments from readers because some felt they were testing the accuracy of the transcriptions, while others thought the test should only involve how well is the overall intent being captured, while some others (sadly) though they were testing how does speech recognition work nowadays.

2) How are we testing it? - This one depends a lot on what the answer to #1 might be. For example, in the case of Google Voice, I felt the test would?ve been much more valid if readers were asked to forward samples of their own voicemails into the writer?s voicemail (meaning real world examples) instead of having them come up with messages that seemed to have turned into a challenge to see who came up with the one that broke the system the most. Going back to some of the things business owners normally want to test, some of the methods in which we might need to test those items might vary significantly: for example, to test containment or transfer rates, one should not only look at raw numbers but at reasons behind those numbers – it?s very different if the numbers are driven by users exceeding a failure threshold than if they are due to users pressing 0 or if they are truly due to business requirements whose proper behavior is to retain/transfer the user.

3) What do these results mean? Particularly when dealing with numbers and percentages, the interpretation of results if very often tricky. For example, would you modify a menu if 50% of your users end up making the wrong selection? (I?m sure your gut reaction is “yes”, “of course”)… but what if that number is based on 2 out of 4 users that someone listened to during a morning?s test? Similarly, we sometimes run into situations where decisions are based solely on someone?s like or dislike (often C-level individuals) about how the system is performing (subjective analysis) without any consideration for the reasons behind the choice, the data of a much larger sample, or the fact that the system might still be on a pre-pilot phase that will eventually get tuned. I felt this was probably one of the main things lacking from the article.

The examples are definitively interesting (and funny sometimes), but I think it would?ve been worth doing some sort of analysis about the possible reasons behind some of those misrecognitions (line quality, odd pausing, user?s accents, etc.) as well as a more detailed explanation of what the transcription process really is. Some readers might think the results reflect the accuracy of an advanced speech recognition engine when in reality most transcription processes out there in the market involve a hybrid environment where the recognition engine might perform the first pass, and then human beings perform a second pass, reviewing what the machine recognized and/or interpreting those segments the machine might not have been able to recognize in the first place.

Have you tried it yet?

Hello, this is your medication. Have you forgotten about me?

Outbound calling (meaning automated phone calls that go out to specific individuals) is a very profitable business that thrives at times such as this one when companies need to reach more consumers yet want to reduce the costs of making those calls since most of the time they are nothing more than the equivalent of “phone spam”.

Therefore, I’ve never been a big fan of these types of services, except for those situations where I know we’re adding value to the conversation. Those situations where we’re providing a benefit to consumers, particularly in win-win scenarios where both parties benefit from the interaction.

One product/service I recently found out about that does exactly that is GlowCaps Connect. GlowCaps are electronic pill caps that use some very clever means to ensure patients take their medicine at the times and frequency that they should.

So picture this. If you know someone that needs to manage a chronic disease like diabetes or depression, daily medications are essential for their well being. What this device does is that every day, at the prescribed time, the GlowCap uses a myriad of modalities to remind users and attract their attention. For example, it may flash a visual reminder which is followed by sound if the bottle is not opened within the first hour. If the patient still doesn’t open the bottle, then the cap triggers a phone call to remind them and can even send weekly updates to friends and family as well as send reports to the patient’s doctor with a monthly summary of the bottle’s activity.

So, to summarize, better prescription handling which can be rewarded with coupons and incentives, better healthcare management with the doctor, and an opportunity for pharmacies to handle automatic refills. Those are the types of calls I wouldn’t mind at dinner time.

Designing for Senior Users

We’ve heard about the global phenomenon of a population that is aging, yet there are very little talks about what design strategies should be used for them.  Other than the classical stereotypes - louder volume, slower pace yet not condescending, pitch within a certain range, more information, etc. – the impact this tech-savvy population will have on how interfaces are being designed is still vague and uncertain.

Quick fact: currently, the population under the age of 5 years old exceeds those above the age of 65. It is estimated that within the next 8 years, that trend will reverse. By 2050? They will double the number of people under the age of 5.

Source: U.N. Department of Economic and Social Affairs, Population Division

So, if we were to consider all the things that will change around us to accommodate those users – from healthcare products and services to how stores are arranged and houses are build – it should be easy to realize the wide range of possibilities and opportunities to improve interactions – from how information is provided to what types and levels and services are expected both from automated self-service solutions as well as live human beings.

Maybe getting serious about the topic and continuing research will finally take over the assumptions and stereotypes that so often appear in most user designs…

See you at SpeechTEK

It’s that time of year again… If you’re planning to attend next week’s conference, please stop by and say hi. And if you still haven’t decided what presentations to attend, here are a few where I’m participating and which I think will be very interesting (even though I may be a little biased ;-))

  • Introduction to Voice User Interface Design (Workshop) – Sunday, August 17th from 1:30pm to 4:30pm. This 3-hour workshop attempts to cover all the basics of VUI Design, so all newbies and aspiring designers are welcome.
  • Lessons in Multimodal UsabilityMonday, August 18th from 3:15pm to 4:00pm. This talk shares some of Multimodal Design lessons learned from real-world usability. With more and more systems allowing various modes of interaction, the things we have to account for are definitively growing. A good follow-up is the Wednesday talk which covers the specifics about how to perform Multimodal Usability testing in the first place.
  • Developing and Testing Multimodal ApplicationsWednesday, August 20th from 2:45pm to 3:45pm. This talk shares some of the experiences while performing Multimodal Usability evaluation of an application and gives some insights into new things that should be considered to avoid modality bias.

See you there!

Picturing mood and experience

Interesting article came out today on the New York Times regarding the efforts LG Electronics goes through when designing a new phone. Some others like Nokia are certainly not far behind, thinking about how personal communication will look like in the future via ideas such as Morph (depicted on the image).

I certainly wish more companies would apply similar these same principles they apply to design new products and consumer electronics but in the context of new services and consumer support. For example, they talk about participants being able to call a toll-free number to share their emotions about a phone they may be testing, or they are asked to draw pictures representing their mood when holding a phone. Can you imagine having something like this for self-service applications – being able to leave a message about how you felt about your phone experience, or being able to ask callers to represent their mood when using the system via pictures?

I can definitively understand the business motivation to be able to come up with innovative devices that draw people to spend money on them. But I would love to see a similar “hit-driven” mentality when it comes to self-services, having designers being not only aware of the latest usability and human-factors strategies but also about popular culture trends and user’s subliminal needs. What will callers want or need 3 to 15 years from now?

I loved the phrase they used to explain how companies like Motorola are now “forced to give consumers what they want even before they know they want it.” When was the last time your UI design strived to do that?

And I also felt our industry got reflected in Nokia’s statement “Design used to be inconsequential: just make it pretty, make it sell”, which in our case could probably be rephrased as “Just make it comply with requirements, make it work.”

So, a couple of final questions that kept me thinking about how our industry should evolve: what impact will eco-friendly concerns have on self-service? Will users be more willing to use automation if they see a real benefit in not having to drive down to a location of having to print-out and mail information hence saving trees in the process? Is it possible to combine the functionality our systems offer with something else (akin to them combining music players with mobile phones)?