Sunday, July 18, 2010

Emotive speech and navigation systems

During a recent family weekend, my best friend, my mother and I found ourselves in a car using my aunt's navigation system to guide us to our destination. We quickly developed a love-hate relationship with the device -- our feelings of annoyance generally outweighing our gratefulness of having been guided efficiently to our destination.

My mom then circulated an article from CNN entitled Why GPS voices are so condescending. And my aunt mailed back, 'Hey, isn't this what you work on?'

The answer to that question is, yes, well, not quite. I go the other direction. Instead of automatically producing emotive speech, I start with a recording of emotive speech and automatically analyze how it was produced. We just got a paper accepted in a session entitled "Paralanguage" at Interspeech 2010 :

Jochems, B., Larson, M., Ordelman, R., Poppe, R. and Thruong, K. Towards Affective State Modeling in Narrative and Conversational Settings. Proceedings of Interspeech 2010 (to appear).

The CNN article also falls into the category of paralanguage. Paralanguage is basically the things that we do with speech that modifies the factual content or conventional meaning of what we are saying. In this case, it's adding emotive nuance.

The designers of navigation systems are stuck with the following impasse: If a navigation system is good, it will always be right. Socially, there is a tabu against always being right. An always-right system will always be perceived as condescending, be its voice ever so loving and sweet. That's simply the way that social behavior works -- we count on each other to act responsibly, but not to pretend that we're perfect. The implication is that we, as humans, will never truly adopt the metaphor of "it's just a person telling me where to drive" for a navigation system if that system's understood purpose is to deliver infallibility.

In my opinion, what the designers of navigational systems should do is to use the voice of someone who enjoys special social status and as such, "gets away" with being always right. For example, theoretical physicist Stephen Hawking. His smarts are generally acknowledged to transcend the smarts of the rest of us mere mortals. Interestingly, he also speaks using a computer voice because he has neuro-muscular distrophy. It wouldn't take a whole lot of memory space on your little navigation device in order to produce a believable rendition of his speech.

The issue also has a huge safety aspect (which is also raised in the CNN article). If the navigational system uses emotive speech in a very convincing manner, it is smooth sailing. However, what if something goes wrong? To the driver, it will be like a thunder-bolt out of a blue sky. Everything was going fine, and all of a sudden the device turned and lashed-out with an emotively inappropriate direction. Possibly, this would happen at a critical driving point. The driver shouldn't be so comfortable with the device as to completely exclude the possibility that it goes way off the mark.

Basically, a car navigation system presents us with another instance of the Paradox of 'Simplicity'. It takes a lot of very complicated innards to make a device that drivers perceive simply as a human telling us how to get there. The paradox comes in when that device does something wrong and all of a sudden the human is stuck both solving the immediate driving issue and also compensating for the apparently inexplicable (those complicated innards!) failure of the system. In this case, for example, a beautifully real rendition of a plaintive tone pleading "Turn back! Turn back!" when actually we find ourselves stuck in the express lane in heavy traffic.

The theoretical physicist persona would help to lessen the impact of such errors. Sorry, Stephen Hawking, but theoretical physicists can get away with being socially inappropriate once in a while without throwing us into a state of shock -- we assume that they are simply busy on a higher plane and don't mean to really insult or confuse us.

However, instead of talking to Stephen Hawking about a deal to have him donate his authority to make navigational systems safer, navigational system companies (according to CNN) are looking into fitting the systems with the driver's own voices. It sounds cool, until you think about some of the implications.

First, there are probably people who don't react well to their own voices. Perhaps I could accept my own voice reminding myself of the route to somewhere I've been before, but my own voice directing me to somewhere I have never been, for example, Makuhari, Japan (where Interspeech 2010 will take place in September) is absolutely implausible. I know I can't trust myself on that one.

Second, drivers need to be encouraged not to turn off their human intelligence when driving with a navigational system. The system doesn't tell you, "Stop here, the light is red". Listening to your own voice is probably not the right way to ensure that you are actively applying the underlying rules and your own common sense to driving.

Third, it's not uncommon to rent a car borrow someone else's car or navigational system on a single-case basis. For example, my aunt lent us her device for one trip. Wouldn't we like our devices more if they were one-size-fits all? Just as Walter Cronkite provided widespread satisfaction as the voice of the evening news, what's wrong with generally-acceptable central voice for all navigation systems?

Fourth, it's not only the driver would needs to listen to the navigation system. With several people in the car, navigation often involves pooling knowledge of the route and negotiating consensus. If the driver's voice is talking on the navigational system, the passengers are shut out of the process. For maximally safe driving, you don't want a "back seat driver", but a co-pilot who is engaged in the process is very helpful.

Fifth, it is not clear that the navigation system companies are the ones that should be making the decision about how navigation system personae can be made more acceptable to drivers. If they can convince individual drivers that they need to have a personalized voice for their system will open up an incredible new opportunity for profit for navigation device companies. On top of the system and the route information, they will also be able to sell you your own personna.

Additionally, a universal "Stephen Hawking" solution, which I am arguing may actually be safer, would make it impossible for navigation system companies to distinguish themselves from each other on the basis of the differential appeal of their navigation personna and is simply not in companies best business interest.

My suggestion is simply to learn to love the condescending dead-pan delivery of your current navigation system -- demanding anything different may prompt the designers of navigation systems to make the situation a whole lot worse.

Don't we do this already? How often have you ever been directed somewhere by a fellow human issuing emotively inappropriate directions? You've reminded yourself to take some deep breaths, stay concentrated on the road and gotten there in the end. We shouldn't demand from our automatic devices more than what we get from our fellow human beings.

P.S. Whoa, this claims to be a blog on the topic of search, what does this have to do with search? OK. You've caught me. Sometimes I just write things here because I know that I can find them again.