Archive for the 'Fun Stuff' Category

Microsoft Recite

We’ve talked in the past about the use of speech recognition in the realm of note taking, where tools such as Jott allow you to obtain a text version of a voice message, making it easier to document and search for information.

Well, Microsoft just recently unveiled a new application of speech recognition, but this time with a twist. Microsoft Recite (available as a preview which can be downloaded) allows anyone using a Windows Mobile phone to record a voice message or “remembrance”, store it, and then retrieve it later using speech pattern recognition.

The obvious advantage of pattern recognition compared to other types of speech searches is that the message itself doesn’t have to be decoded, transcribed or converted.  It simply uses a “search” sample as a pattern to match one or more of the words against existing “remembrances”.

Even though initial test have received possitive feedback, I’m hoping they’ll expand the tool to include other devices and languages (it currently only works with US English).

The proverbial dilemma, user-centered design calls for simplicity, yet business requirements often conflict with that principle. The result? Watch, remember, and cry…

Enjoy!

My appologies for being away for so long. Between the Holidays and the January blues, I’ve been crazy busy (which considering the current situation is something I’m thankful for).

Some of those projects have brought in new pieces of information which I’m looking forward to share and discuss – multimodal design and usability, speech-recognition in automobiles, dialog design for senior citizens, new trends in international design, etc.

I know we’re used to relate the notion of IVRs with arcane self-service over-the-phone systems and IVR jails, yet a company called Moshi found away to leverage the notion of “Interactive Voice Response” in a totally distinctive way.

The Moshi IVR Alarm Clock is the first one to my knowledge that allows you to set the time and the alarm by using your voice. To start interacting with it, you simply say “Hello Moshi” and the clock responds with “Command Please” (I know, a little VUI help never hurt anyone). It currently supports a list of 12 commands including things such as “time”, “set alarm”, “temperature” and “help” (apparently “help” still has its uses).

A demo is currently available at the Moshi website which shows how the phone responds to various commands and Endgadet has some more details about it. Personally, I think it’s pretty cool, plus the price is not bad either ($50). But from a design perspective, I think it’s just a shame they didn’t invest a little bit more in having better sounding prompts (with a professional voice talent), which combined with the use of more natural, concatenated prompting, would’ve yield much better results (let’s face it, anyone still concatenating time in the form of “six” “o’clock” “a m” is being a lousy designer).

As I mentioned here and here, one of the most appealing aspects of the G1 phone is the openess of the platform which allows developers to get really creative when it comes to apps that leverage all the features contained in the phone.

One company worth mentioning is JOYity which was recently covered by TechCrunch. They are leveraging the GPS capabilities of the phone, allowing users to engage in location-based games such as YouCatch, Roads of San Francisco and City Race Munich).

The most engaging by far is YouCatch which is an enhanced version of Manhunt. The concept is pretty simple: you and a handful of friends sign up to play the game, and then each one is randomly assigned a target, making everyone both a hunter and a target.

I hope they add voice features soon, which could allow you to play the game in a less obvious way (running around watching a phone screen kinda give you away) and maybe even team up with others for the hunt.

Here’s a quick review of the game and the interface:

Wow, it seems politic uses for the phone aren’t just limited to campaign messages and voting reminders.

(DISCLAIMER: This note is not intended to promote a particular party or candidate, I’m simply using it as a case-study for interesting uses of the phone for UI purposes)

Just today I received an email from Sarah@PalinTalk.net with the following text:

“It’s been a rough couple of weeks, what with TrooperGate and Neiman-MarcusGate and Obama-Is-Maybe-A-TerroristGate and, darn it, I just really need to talk.
Give me a call at 888-372-7908 or go to www.palintalk.net and I’ll call you (or anybody else you think I oughtta talk to).
Now, don’t worry – I won’t share your phone number with anybody else (like that Dick Cheney),,, it’ll just be between you and me.
Sarah”

So just for kicks I decided to go to the website where it once again offers the same telephone number or allows you to enter your own so “she can call you”.
I decided not to provide my own number (for obvious reasons) and decided to call instead (afterwards I obviously realized my attempt to protect my privacy went out the window since they could just as easily collect my ANI).

At any rate, I have to say the experience was definitively interesting. They attempt to engage callers in a very natural conversation – the first thing you hear is “Hello? This is Sarah?” and then the system waits for you to say something. Afterwards, “Sarah” asks you what you consider to be the most important issue in this election. To make the interaction more relevant, it seems they prepared some generic topics and phrases so they can respond in a more intelligent way (which didn’t work too well for me since I said “I don’t know” and it got misrecognized as something having to do with “terrorism”, go figure).

The error handling is very interesting as well. Some times it simply ignores the error and moves the caller along under the assumption that you said something relevant that can be followed up with a generic comment such as “That’s what I was hoping you’d say” or with a question such as “Alright, and do you think we should…?”

When the caller doesn’t say anything, it’s hilarious. Sometimes, the system will respond with “Helloooo?” or “I know it’s hard to talk to someone important…”, in some cases they even simulate side-speech as if talking to someone else on the side saying “They’re not saying anything. I think it may be Rummy again…”

Mmm, I wonder… what would happen if I press 0 or demand to talk to an operator?

That’s right! Big news over the past couple of days due to the launch of the $179 T-Mobile G1 device, the first commercially available “Google Phone” in the market. As usual, David Pogue did a great review of the new phone that even tough clearly “borrows” many features from the iPhone, it also takes advantage of the open source free mobile platform known as Android (which we’ve talked about before).

As is typical with these types of breakthroughs”, the G1 attempts to offer those things the iPhone was lacking – full keyboard, Bluetooth, etc. – at the price of making the UI more complex, and the design definitively less slick than the original (more buttons, bulkier, etc.).

On the other hand, the huge advantage of a device and platform as open as this one is that you can choose whatever carrier you want, developers are free to create any sort of application for the device without much censorship, and users are free to personalize their phones in any way they want. And if there’s anything we’ve learned from other open initiatives (Linux, Apache, Firefox, etc.), users are the ones who’ll win the most. This is definitively just the tip of the iceberg…

As for what that means for UI designers, well, I think more and more users are going to expect (and demand) multimodality (even if they don’t refer to it in those terms). They’ll be able to choose the interaction mode that’s more convenient to them (speech, keypad, pen, gesture, etc.) and switch between them, they’ll expect preferences to remain active no matter the interaction mode they choose (for example, notification preferences set up on the website should carry over to all other contact points), and if they aren’t happy, they have all the tools they need to make you the next Consumerist or Saturday Night Live star.

So yeah, bring it on!

Wow, as someone that loves to see old technologies and principles being applied in creative (and unexpected) new ways, I have to say I was definitively impressed by Dialtones (A Telesymphony).

As designers, we always struggle with the balancing act of attempting to create new “works of art” that follow some basic principles, yet attempt to be one step forward from our “previous piece”, always forcing ourselves to be creative within the boundaries of both technological and business limitations (and requirements).

Keeping those technological and technical limitations in mind, the team behind Telesymphony realized that nowadays most of us carry with us a musical instrument – our cell phones. So what they did was to create a large-scale concert performance where they choreographed the ringing of the audience’s phones. The process they followed was asking participants to register their phone so that specific ringtones could be transmitted to their phones and certain seats would be assigned. During the concert, the participant’s phones are dialed up by live performers and the results are quite interesting.

And as with most art, it’s hard to describe it in words, so here’s a video excerpt. Enjoy!

I recently ran across a series of posts and a SpeechTEK magazine article about a new service currently in Private Beta called Fonolo.

The premise is definitively very interesting. What this Canadian start-up is attempting to do is to replicate the concept of bookmarking and deep linking (process of linking pages in the lower levels of a Web site from a home page – or other pages – to help search engines index them) so common these days on the web. What this means for a phone user is that they don’t need to get to the application’s “home page” (aka Main Menu) every time to then have to traverse the phone tree to reach a specific destination…

According to their founder’s pitch, the way it works is that you “bookmark” a spot deep inside a telephone’s system tree. To do this, you go to the Fonolo’s website, find the Company you need, review a transcription of their menu structure, find the spot you need and click on it. By doing that, they call the company for you, navigate the menu up to that specific spot and call you back so you can continue your transaction from that point on.

How do they know what systems look like? Well, as noted in the VoIP weblog, they seem to be using a combination of voice recognition and human editors to generate “maps” of the interactive voice response system.

Aside from the obvious privacy concerns a service like this might raise, the hottest one in my opinion is the service they refer to as “Intelligent Call History”. Since in reality all your calls start from their home page, they are attempting to become something like a Google of sorts for “phone sites”. What I mean by that is that they would keep track of all the interactions you’ve had with a certain company (regardless of the phone you used), along with the actual recordings of those conversations!

In the web world, we’re all familiar with how certain companies keep logs that track your web habits which could then be linked to your IP address. The biggest difference to me is that they mostly keep track of where you’ve been and where you’re going, but not of what’s happening when you are there… and in this case, since you’re using them as a bridge to connect to a Company, how can you be sure those recordings (which may contain account numbers, PINs, etc.) are kept safe and out of anybody’s reach?

Presumably the advantage of something like this is that in case of a dispute, you could play back a recording from the actual conversation and prove a certain transaction happened. But is this benefit really worth the risk? Particularly when it is known that once it becomes available to the public, it will be ad-supported.

On the other hand, they have also expressed that their ultimate goal is to craft partnerships with those companies Fonolo has mapped so that those companies can notify them “when they change or update their IVR” to the point where they hope companies will indeed send transcripts of their IVRs so they don’t have to be mapped anymore

Again, I definitively like the idea of empowering users and allowing them to accomplish their task in the most efficient way, but I think a system like this would be a much better fit for an actual device feature (similar to the GOOG411 dedicated button now present in some phones) – which you could turn on at the beginning of a call and stop once you reach the spot you want to “bookmark” so that in the future your phone would simply repeat the steps you followed and get you to that same spot. And of course, rather than finding more ways to patch user-unfriendly architectures, companies should be looking at fixing the root problem, which in the short term can be somewhat addressed via the deployment of more SayAnything/ SpeakFreely-type menus so callers can say what it is they need right at the beginning of an application…

I know for some people Google’s announcement about Android wasn’t as exciting as the expectation of hearing them announce an actual “Gphone” (as it was often called there was still a rumor Google was working on an actual device), yet it seem implementations and applications based on this open and free mobile platform are finally coming out (albeit only as prototypes) and demonstrating how such an approach can in fact result in easier ways for consumers to obtain access to a wide variety of applications.

The prototype included a Google browser, phone dialer, audio player, Google maps, camera, games, calendar, contacts manager, calculator and notes. Sweet!

Since it seems the idea itself and its implementation are definitively feasible, it seems now the only remaining questions have to do with all the other non-technical reasons that will have a definitive impact on other players in the wireless and mobile arena. As Gigaom cleverly pointed out, some of those include:

  1. All users of carriers that aren’t part of the Open Handset Alliance
  2. Device-makers which now have to worry about yet another OS
  3. Application developers, which will now have to deal with a significant number of handset/carrier/OS combinations
  4. Support departments at participating carriers dealing with non-supported application issues
  5. Users having to adapt to yet another set of user interfaces and frameworks

And to this I would like to add a 6th one: “UI Designers having to deal with new interaction paradigms, higher customer expectations, while maintaining design simplicity.”

This is definitively a great opportunity for us UI designers to start thinking about new challenges we’ll be facing when these applications/frameworks become available to the masses, in particular when user habits and natural ways of interacting with them call for the use of speech recognition as either the primary way of interaction, or as a back-up/supportive mode for certain types of goals and contexts.