Sunday, December 13, 2009

The nature of social queries

In my last post, I described the location of my camera as my most pressing information need. Soon thereafter, the need was satisfied via my social network, within which the lost camera acted as an implicit query. This process was, however, had none of the revolutionary flavor otherwise associated with social search. I got a call from my mother saying, "You uncle told me that you left your camera sitting on their kitchen counter."

The camera experience set me to thinking about social queries and social search. The missing camera information need falls into the category of known-item search. (Although, I did entertain the idea along the way that I should actually be shopping for a new and better camera.) Also, it is interesting to note, that the query could be answered within my own social network. It's a rather obvious point. After TRECVid I went to my uncle's house and not some other unmotivated place in the area.

Before the call from my mother that I was also pursuing a sort of a social search solution -- a completely conventional proceedure. I called the rental car and the hotel. I was trying to shake down the ad hoc social network including the people who entered the car and the hotel room just after me to figure out what happened to that camera. What I suspect is that a lot of the information needs we have as individuals correspond to known-item social queries and are answerable via a network containing relatively few rather mundanely predictable nodes.

I'm apparently not the only one thinking about known-item search in networks. Recently, the DARPA Network Challenge concluded. It involved the information need, not of an individual, but of an organization. To win you had to be the first to report the locations of ten red weather balloons across the continental US. DARPA moored the balloons and made them visible from nearby roads. The challenge was won by a team at MIT.

Looks like it was fun. The point I'd like to make here, though, concerns the nature of the query. It was a known-item (items, to be exact) query. But because someone already knew (created, in fact) the answer, the search space was radically limited. The US$ 40,000 prize money meant that this could not be a query "typical" of the average node in the network. I'm not sure what we can say that we learned as a result of the experience. On the other hand, we can be grateful that DARPA is smart not to ask something potentially destructive, like, "...." (Sorry, couldn't bring myself to even write an example here -- but a couple readily come to mind.)

Although the DARPA Network Challenge might have been fun, I am afraid it falls short of being good, healthy fun. It's a social search problem both initiated and solved by entities with, it's safe to say, fairly high centrality status within the graph of the social network used to solve the problem. As potential nodes future networks solving similar problems, such experiences effectively serve the purpose establishing precedent and teaching us about problem solving procedure. What we've learned from DARPA is: Sit around and wait until some entity poses money-backed question and then contribute to the MIT-sponsored site and get your piece of the payoff. Good, healthy fun would have taught us that we have to think very, very carefully about who we tell what. It is important not to proceed a step further until we have a mechanism in place by which we can makes sure that everyone understands the difference between sending in a geo-coordinate- and time-stamped photo of a balloon and one of the neighbors putting out the trash.

In particular, it's important to understand the implications of who we tell what about whom. In grade school we come to terms with the delicate balance between supporting fairness within the school world and not betraying our fellows by being a "nark". On the scale of today's social networks, the connection between our actions of telling and the consequences for our fellow human beings is no where near to being adequately transparent to allow for direct learning by individuals. It seems like a harmless piece of information, but when are we morally obliged to contribute it because it would help and when should we stay our urge to be part of the MIT lottery-like fun because of the potential harm?

Perhaps the issue is not yet relevant. People search, by which I mean search for location information about real living people, is still difficult. The sadness over the disappearance of Jim Gray is for some among us in a curious way inseparable from the disappointment at the failure of technology-enhanced large-scale search. You can launch a large scale distributed search to find someone and still not succeed. Even shaking down the vast social networks in the US doesn't seem so easy. Earlier this year, Wired posed a challenge called Vanish. To win you had to find a Wired writer named Evan Ratliff, within a month of him assuming a new undercover identity. He was discovered, but only undertaking a challenge put to him by Wired that forced him to radically narrow the search space for his pursuers -- i.e., enter a know location within a known time frame.

Sunday, November 22, 2009


Since I was going to be mentioning Flickr Video in my presentation about VideoCLEF at last week's TRECVid 2009 workshop, I decided I should try and make one. This video was shot out the window in the morning. And yes, I dutifully made sure it was properly geo-tagged as Gaithersburg. Sure indeed, capturing the ripple makes the flag image come alive..."It's like a photo, but it moves!"

Is Flickr video a long photo?

Maybe it's even a bit too alive. The ripples would be more dramatic had the camera been can see I'm not holding it still. Where does it end? Interactive moving pictures, a la Harry Potter's newspaper, I suppose.

Immediately after downloading this video off of my camera, I lost it. My first reaction: Had I downloaded all of the pictures on the camera? (Yes!) Only then did I start fretting about how much money I paid for the camera, a Cannon PowerShot SD1100, and how much it would cost to replace it.

I felt it viscerally: it's not just a bunch of bits! Content matters to us in a very personal way. Since I didn't empty the camera memory card, are my photos are floating around out there, in whatever space that the photos on lost cameras go. What if this space turns out to be the Internet? Would I be embarrassed?

In my opinion the most embarrassing picture on the camera is a lemon-face from Jay and Silent Bob. And yes, there is also a corresponding lion-face out there...somewhere. I have this most pressing information need: "Where is my camera?"

Friday, October 23, 2009

SSCS 2009

The Third Workshop on Searching Spontaneous Conversational Speech (SSCS 2009) took place on 23 October 2009 in Beijing China in conjunction with ACM MultiMedia 2009. We had a great set of demos and talks. As an organizer this gives you a warm pleased feeling -- all that work is worth it. Domains covered included broadcast, meetings, interviews, telephone conversations, podcasts and voice tagging for photos. The approaches presented involved using a variety of techniques including subword units, exploiting dialogue structure, fusing retrieval models, modeling topics and integrating visual features. Such events serve to highlight the importance of the spoken word in many multimedia access and retrieval applications. And also to remind us how far we are from exploiting it fully.

Thursday, October 22, 2009

Well, behind every joke there's some truth

The winner of the ACM MultiMedia Grand Challenge at ACM MultiMedia 2009 was "Joke-O-Mat: Browsing Sitcoms Punchline by Punchline." This application uses speaker diarization and laughter detection to annotate sitcoms and present them to the viewer in an interface that allows presents jokes ranked by laugh reaction, grouped by character and associated with context. Joke-O-Mat underlines the importance of the speech track for multimedia access.

What to do when multimedia doesn't contain a laugh track? In VideoCLEF 2009 we ran a narrative peak detection task. The goal was to detect points in videos where viewers perceive heightened dramatic tension. Today, CLEF working notes, tomorrow our own Peak-O-Mat?

Thursday, September 10, 2009

Getting the words right

The final day of Interspeech 2009 here in Brighton. It's been a great conference and each and every keynote has been well worth getting up for. This morning, Mari Ostendorf talked about "Transcribing Speech for Spoken Language Processing." Interspeech encompasses a staggeringly broad spectrum of perspectives on speech research and technology. For every point here, there is an immediate counterpoint, and it was without doubt under influence of this chorus that the opening slide of the keynote this morning displayed a long-play version of the title reminding the audience that they would be hearing about transcribing human-directed human speech, as opposed to speech that humans produce to communicate with computers.

The message from the keynote that will ring longest in my ears was, "The goal of speech transcription is information access." This leaves open of the course, the question of what is the information and what is the access when it comes to content that contains the spoken word. I find myself compiling little lists of domains in which information encoded in spoken audio could be important: podcasts, video diaries, lifelogs, meetings, call center recordings, social video networks, Web TV, conversational broadcast, lectures, discussions, debates, interviews and cultural heritage archives, home videos, photo annotations, video conferences. These lists invariable end with etc. etc. etc. And what constitutes access (keyword search, retrieval, question answering, browsing, recommendation...) is another question to which we can't give a closed-set answer.

My personal experience doesn't really support the idea that we need to push the envelope. The last video I watched I found because a link was sent to me by my cousin. The content of the video was a short clip of her new cat purring. No real access problem there. No information either. The purr did not inform me in the conventional sense. In fact, there wasn't much human speech involved at all. Nonetheless, I found the content supremely worthwhile of my watching time. Although my own multimedia access needs are a string of examples of this pre-solved sort, I do agree that the challenge of access to speech-based information is a serious one and will require a great deal of effort to address.

The full phrase in Ostendorf's slide read, "The goal of speech transcription is information access, not just getting the words right." But maybe it is about "getting the words right". The words referred to are, presumably, white-space delineated grapheme strings, lexical words, citation forms. But we can also see a word as the totality of knowledge that a human needs to possess in order to deploy it in human-to-human communication. There may be a limit to how far we can go beyond that sort of word and still remain within what is meaningful in the context of our information access needs.

We can go for prosody, for speech act, subjectivity, affect, but in the end we'll never capture the "you had to have been there" component of understanding. And already the moment of that particular purr video has passed and my next need for video content will be for a new one.

Friday, August 7, 2009

Say Anything

Standing drinking a Diet Coke I gazed at an advertisement for the Guardian outside the Spar on the campus of Dublin City University.

The advertisement stated, "Owned by no one, free to say anything." I paused. My paper of choice is associated with the motto "All the news that's fit to print." Never worried about it before, but in comparison, it suddenly seemed a bit dated.

It's relatively uncontroversial to consider source when assessing the credibility of media. In our work on the PodCred Framework we cite Rubin and Liddy (2006) as a source for the notion that user generated media builds credibility by avoiding hidden bias. I smiled at the idea of The Guardian as a huge blog; and then again at myself for finding that funny.

The PodCred Framework includes an indicator meant to capture the source of the income of the podcaster: stores, sponsors, advertisers. The idea that transparency of funding does indeed impact listener satisfaction with podcasts hasn't been test driven yet, too my knowledge. But seeing the Guardian sign made my thoughts return to consideration of its potential.

Then I finished my break and went back inside to continue working on VideoCLEF assessment management tasks, which is why I am currently in Dublin, and my mind turned to other things.

I've found the image to accompany this post at If this is indeed a scary idea, I wonder if it indeed sells papers. But if it's really owned by no one, perhaps they need to make the link up to who is actually doing the writing. (Note to self: I do, too.)

Friday, July 24, 2009

Preferential Attachment at SIGIR 2009

Waiting for my plane yesterday at Boston Logan airport I found myself starting out the window and reflecting on scale-free networks rather than attending to my e-mail as I should have been. The reflections were, of course, inspired by the keynote of Albert László Barabási entitled "From Networks to Human Behavior" and consisted mostly of wondering why the properties of scale-free structure are perceived as unintuitive or unnatural. A conspiracy is improbable, but can it be that we are subtly taught to expect behaviors characteristic of random connections? Barabási used the word "democratic" in describing random networks and one of the audience questions afterwards dealt with how to overcome potential isolation between nodes in scale free configurations. Yes, they are out there, but how do we fix them? The answer was, to paraphrase, know it's there and work with it. I'd been standing in line with an audience question of my own, but I resumed my seat, deciding that it was basically covering the same ground: "Are the scale-free networks in and around us inherently irreconcilable with our notions of democratic forms of organization?"

Later that evening, in a semi-conscious effort not to always hang with exactly the same crowd, I fell in with group of bloggers for an interval that would, unknown to me as it was happening, later be referred to as Day 2 post banquet. Face-to-face discussion with bloggers seems to have helped to counter my conviction that I am almost, but not quite, entirely unlike a blogger myself. Except for my apparently innate repugnance for preferential attachment. Work with it.

Monday, June 1, 2009

The Netbook Effect

"Give a laptop. Change the world." are the welcoming words of the website of One Laptop Per Child. A Wired article entitled "The Netbook Effect: How Cheap Little Laptops Hit the Big Time" (Wired, March 2009) tells the story of how the low cost no-frills netbook evolved from the One Laptop Per Child initiative and predicts that netbooks will constitute 12% of the laptop market in 2010.
It sort of takes my breath away that I might actually live to see One Laptop Per Child happen. It gives me hope that we might eventually move not only as individuals, communities and countries, but as an entire species beyond subsistence level concerns.
At the same time, another part of my brain is formulating questions about what happens as an increasing proportion of personal machines are "thin" in the sense that they have low processing power and low storage capacity. The vision of democratizing the supply of services with high computational loads by distributing them over private citizens with unused processing capacity may have to be abandoned. Do we want to encourage a future in which we must rely on a distant center to store and swap our video content? Do we want to close the door on the possibility of internet content search that is supported by a million modest contributions of storage space and cpu cycles? Are we ready to give up our private capacities and allow resource rich hands to further accumulate a power monopoly removed beyond influence of the individual?

Wednesday, May 27, 2009

CHORUS Conference in Brussels

CHORUS, the European Coordinated Action on search technology for audio visual content held its Final Conference in Brussels. PetaMedia and VideoCLEF were presented.


Alan Hanjalic

VideoCLEF: Video Analysis and Retrieval Benchmark

Martha Larson, Gareth Jones

Friday, May 8, 2009

WIAMIS 2009 in London

Attended the International Workshop on Image Analysis for Multimedia Interactive Services a.k.a. WIAMIS 2009. The PetaMedia task force organized a special session to showcase topics related to combining multimedia content analysis with user contributed information and social network structure. We presented some initial work on combining speech-based indexing features with low level visual information for improved video retrieval.

Thursday, April 9, 2009

ECIR 2009 in Toulouse

I've never attended the European Conference on Information Retireval before, but this year I flew to Toulouse for ECIR 2009. My continued reflection on the larger implications of the properties of the error produced by speech recognition systems finally yielded some fruit...but it is still only a small window on a larger story.

Larson M., Tsagkias E., He J., de Rijke M., Exploring the Global Semantic Impact of Speech Recognition Errors on Spoken Content Retrieval, 31st European Conference on Information Retrieval Conference (ECIR 2009).

Tsagkias E., Larson M., de Rijke M., Exploiting Surface Features for the Prediction of Podcast Preference, 31st European Conference on Information Retrieval Conference (ECIR 2009).

Wednesday, April 1, 2009

Kill your blog?

A "rant" I sent to Wired in reaction to "Kill your Blog" (Wired Nov 2008 p. 27). It didn't get published there--it seemed only fitting that it should land here in the blogosphere:

"How did we come to so completely repress our dissatisfaction with mainstream search engines? You point a finger at blogging and bloggers, but the issues you raise could also be laid at the doorstep of Google. Dare to imagine a search engine that lets our voices be heard within the intimate internet communities important to us, a search engine that distills for us the pith of our posts, keeping pace with publication. My rant passed your magic 140 character Twitter limit in my second sentence--shouldn't internet search technology be addressing the challenge of making my opinion heard anyway?"