Monday, July 26, 2010

Advances in Multimedia Retrieval Tutorial at ACM Multimedia 2010

I worked a long day today and now I'm home and have eaten dinner and thinking about how to relax. I'd like to watch an episode of Merlin on the Internet. Preferably legal and one I've never seen before -- wouldn't mind paying if it was a site I trusted. That seems like a pretty complicated search and my prediction is that it will lead to frustration. So here I am writing in my blog instead.

Today the longplay description of our upcoming tutorial on Frontiers in Multimedia Search went online. We want to start out by addressing the question of how can multimedia search benefit people's daily lives, at work and otherwise. I'm feeling rather a strong need for the benefit of multimedia at the moment. If I can't have my Merlin, right now I wouldn't mind browsing back through recordings of the SIGIR presentations that I heard last week -- and maybe some of the ones that I have missed.

Then we are planning to go on to take a look at new approaches to multimedia retrieval that we divide into three categories (I include a couple of my own notes on each):
  • Making the most of the user: In motion on the Internet we dribble information behind us. We tag, we query, we click, we brush over a page without a second glance. We have the capacity to glance at a set of snippets, glaze over what does not interest us to find what does. Making the most of the user is about letting the search engine turn the computational crank and do the look ups, leaving those fine-grained semantic judgments to the human brain.
  • Making the most of the collection: Sometimes the collection can speak for itself. Pseudo-relevance feedback may dilute our queries, but it also is a valuable tool for increasing recall. And then there is collaborative filtering: Making use of patterns that we as users leave behind -- but now at the collection or community level.
  • Making the most of individual items: What is important here is how you can do the best that you can with noisy sources of features (speech recognition, visual concept detection) to represent items. You don't need to necessarily provide a complete representation of an item -- information that helps distinguish items or keeps them from getting confused can sometimes be a big help.
All in all, the aim is to present our favorite new multimedia search work -- working to inject new techniques and perspectives from the IR community and speech community into multimedia. And, of course, develop our own understanding of multimedia search along the way. Maybe I will then also feel better equipped to find something that I could watch now that would make me as happy as Merlin. Or at least to understand exactly what is still missing...

Friday, July 23, 2010

SIGIR 2010 Crowdsourcing for Search Evaluation Workshop

We used Amazon Mechanical Turk (MTurk) to gather annotations for the video corpus to be used in the Affect Task at the MediaEval 2010 benchmark evaluation. The task involves automatically identifying videos that viewers report to be particularly boring. We wrote the corpus development up and submitted it to the Crowdsourcing for Search Evaluation Workshop at SIGIR 2010. Initially we wondered a bit if the paper was appropriate for the workshop, since we were working on affect and not directly on search. But we were glad that we took the risk and went for it. The paper was accepted and the workshop was great -- right on target with our interests.

Soleymani, M. and Larson, M. Crowdsourcing for Affective Annotation of Video: Development of a Viewer-reported Boredom Corpus.
In Proceedings of the SIGIR 2010 Workshop on Crowdsourcing for Search Evaluation.

We also received the Runner Up for the Most Innovative Paper Award, which was sponsored by Microsoft Bing. Thank you! We are already considering how to get the most bang for our Bing bucks. Probably it will flow directly back into MTurk for our next crowdsourcing project.

Sunday, July 18, 2010

Emotive speech and navigation systems

During a recent family weekend, my best friend, my mother and I found ourselves in a car using my aunt's navigation system to guide us to our destination. We quickly developed a love-hate relationship with the device -- our feelings of annoyance generally outweighing our gratefulness of having been guided efficiently to our destination.

My mom then circulated an article from CNN entitled Why GPS voices are so condescending. And my aunt mailed back, 'Hey, isn't this what you work on?'

The answer to that question is, yes, well, not quite. I go the other direction. Instead of automatically producing emotive speech, I start with a recording of emotive speech and automatically analyze how it was produced. We just got a paper accepted in a session entitled "Paralanguage" at Interspeech 2010 :

Jochems, B., Larson, M., Ordelman, R., Poppe, R. and Thruong, K. Towards Affective State Modeling in Narrative and Conversational Settings. Proceedings of Interspeech 2010 (to appear).

The CNN article also falls into the category of paralanguage. Paralanguage is basically the things that we do with speech that modifies the factual content or conventional meaning of what we are saying. In this case, it's adding emotive nuance.

The designers of navigation systems are stuck with the following impasse: If a navigation system is good, it will always be right. Socially, there is a tabu against always being right. An always-right system will always be perceived as condescending, be its voice ever so loving and sweet. That's simply the way that social behavior works -- we count on each other to act responsibly, but not to pretend that we're perfect. The implication is that we, as humans, will never truly adopt the metaphor of "it's just a person telling me where to drive" for a navigation system if that system's understood purpose is to deliver infallibility.

In my opinion, what the designers of navigational systems should do is to use the voice of someone who enjoys special social status and as such, "gets away" with being always right. For example, theoretical physicist Stephen Hawking. His smarts are generally acknowledged to transcend the smarts of the rest of us mere mortals. Interestingly, he also speaks using a computer voice because he has neuro-muscular distrophy. It wouldn't take a whole lot of memory space on your little navigation device in order to produce a believable rendition of his speech.

The issue also has a huge safety aspect (which is also raised in the CNN article). If the navigational system uses emotive speech in a very convincing manner, it is smooth sailing. However, what if something goes wrong? To the driver, it will be like a thunder-bolt out of a blue sky. Everything was going fine, and all of a sudden the device turned and lashed-out with an emotively inappropriate direction. Possibly, this would happen at a critical driving point. The driver shouldn't be so comfortable with the device as to completely exclude the possibility that it goes way off the mark.

Basically, a car navigation system presents us with another instance of the Paradox of 'Simplicity'. It takes a lot of very complicated innards to make a device that drivers perceive simply as a human telling us how to get there. The paradox comes in when that device does something wrong and all of a sudden the human is stuck both solving the immediate driving issue and also compensating for the apparently inexplicable (those complicated innards!) failure of the system. In this case, for example, a beautifully real rendition of a plaintive tone pleading "Turn back! Turn back!" when actually we find ourselves stuck in the express lane in heavy traffic.

The theoretical physicist persona would help to lessen the impact of such errors. Sorry, Stephen Hawking, but theoretical physicists can get away with being socially inappropriate once in a while without throwing us into a state of shock -- we assume that they are simply busy on a higher plane and don't mean to really insult or confuse us.

However, instead of talking to Stephen Hawking about a deal to have him donate his authority to make navigational systems safer, navigational system companies (according to CNN) are looking into fitting the systems with the driver's own voices. It sounds cool, until you think about some of the implications.

First, there are probably people who don't react well to their own voices. Perhaps I could accept my own voice reminding myself of the route to somewhere I've been before, but my own voice directing me to somewhere I have never been, for example, Makuhari, Japan (where Interspeech 2010 will take place in September) is absolutely implausible. I know I can't trust myself on that one.

Second, drivers need to be encouraged not to turn off their human intelligence when driving with a navigational system. The system doesn't tell you, "Stop here, the light is red". Listening to your own voice is probably not the right way to ensure that you are actively applying the underlying rules and your own common sense to driving.

Third, it's not uncommon to rent a car borrow someone else's car or navigational system on a single-case basis. For example, my aunt lent us her device for one trip. Wouldn't we like our devices more if they were one-size-fits all? Just as Walter Cronkite provided widespread satisfaction as the voice of the evening news, what's wrong with generally-acceptable central voice for all navigation systems?

Fourth, it's not only the driver would needs to listen to the navigation system. With several people in the car, navigation often involves pooling knowledge of the route and negotiating consensus. If the driver's voice is talking on the navigational system, the passengers are shut out of the process. For maximally safe driving, you don't want a "back seat driver", but a co-pilot who is engaged in the process is very helpful.

Fifth, it is not clear that the navigation system companies are the ones that should be making the decision about how navigation system personae can be made more acceptable to drivers. If they can convince individual drivers that they need to have a personalized voice for their system will open up an incredible new opportunity for profit for navigation device companies. On top of the system and the route information, they will also be able to sell you your own personna.

Additionally, a universal "Stephen Hawking" solution, which I am arguing may actually be safer, would make it impossible for navigation system companies to distinguish themselves from each other on the basis of the differential appeal of their navigation personna and is simply not in companies best business interest.

My suggestion is simply to learn to love the condescending dead-pan delivery of your current navigation system -- demanding anything different may prompt the designers of navigation systems to make the situation a whole lot worse.

Don't we do this already? How often have you ever been directed somewhere by a fellow human issuing emotively inappropriate directions? You've reminded yourself to take some deep breaths, stay concentrated on the road and gotten there in the end. We shouldn't demand from our automatic devices more than what we get from our fellow human beings.

P.S. Whoa, this claims to be a blog on the topic of search, what does this have to do with search? OK. You've caught me. Sometimes I just write things here because I know that I can find them again.

Saturday, July 17, 2010

IF discouraged THEN write good reviews

Ever get a Bad Review? I don't mean one where the reviewer gives constructive criticism and recommends rejection. I mean one that is really bad in the sense that it is unhelpful, off-topic, lacking in rigor, poorly written, pedantic or pompous. It takes a lot of energy to sort through these sorts of reviews, find the wheat discard the chaff and make sure that the experience doesn't drag you down to the point of derailing a potentially productive scientific endeavor. Bad Reviews sometimes even recommend acceptance. Acceptance leads, perhaps not to disappointment at the moment, but rather to more general scientific disheartenment: Is this really the level of intellectual standards that characterizes the field to which I have chosen to devote my career?

A surprisingly satisfying way to push back against Bad Reviews, is to engage in reflection upon one's own reviewing skills and strive to improve them. This course of action is not going to have an immediately measurable effect of improving the system as a whole, but it does restore a sense of balance. Especially if you interact with a lot of students, you have an amazing opportunity to teach them to review. There's something cheering about knowing that scientists that you have mentored are not going to be the ones generating the Bad Reviews of the next generation.

In order to be able to tell people quickly about my own reviewing values and my campaign to constantly improve my own reviews, I have packed the points I consider while reviewing into a scheme that I call IF THEN:
  • I is for Issue: Does the paper motivate the issue that it addresses and then close the loop in the end, convincing the reader that it has accomplished what it set out to do?
  • F is for Fit: Does the paper fit with the call for papers of the conference or scope of the journal to which it was submitted?
  • T is for Technical soundness: Do the authors apply solid, state-of-the-art experimental and/or analytical method?
  • H is for Historical context: Do the authors present the context of their work? (Including both the related work and the outlook onto future work.)
  • E is for Exposition: Is the paper clearly written and a pleasure to read? Is the information it contains complete and comprehensive?
  • N is for Novelty: Is the idea new in the field? Is it the sort of innovation that is destined to make an impact?
If you're doing IF THEN right it should be a bit uncomfortable. Particularly difficult is the self-reflection necessary to make an honest estimate of one's own expertise in some subjects. I didn't promise this wouldn't hurt, what I promised is that constant work on your own reviewing standards eases the pain (and and controls the damage) caused by receiving a Bad Review.

But what if you are already a world class reviewer? What then? Like musicians we must remember that excellence is not static. Moshe Vardi introduced a rule for reviewing an editorial in the current edition of Communications of the ACM. The rule reads, "Write a review as if you are writing it to yourself." He calls it The Golden Rule of Reviewing. Most people are still working on putting in to practiced the Golden Rule they learned in their childhoods. The Bad Reviews will keep on coming, and about the only thing we have control over is how we review back.