Saturday, December 24, 2011

Peaceable Kingdom: Snowflake button for YouTube snow

A post that mixes the magic and delight of the holiday season with multimedia information retrieval? Let's try it and see what happens.

The past couple of weeks holiday cards have been dropping through the mail slot in the front door -- but also emails have been entering my inbox: greetings, photos, and yes, also videos. This morning it was an email with a greeting and a link to a music video "Peaceable Kingdom".

I watched the video for a while and pondered its relationship with Christmas: The music is melodious, soothing and the lyrics take the listener to the manger to make the connection with the adoring state of mind of those who gathered there the first Christmas Eve. Unexpected minor cadences highlight that this is no usual Christmas carol and invite consideration of the multiplicity of the Christmas experience -- how the holiday itself integrates traditions preceding Christianity and how, as each new group and generation reinvents it for their own spirit and needs, it will continue to develop into some future Christmas. From the perspective of the here and now, that future Christmas could seem full of sweetness, hope and light, but also distorted and distinctly pagan.

Of course, the strongest signal I get from the video is that of Margaret Atwood's dystopic visions. I haven't read The Year of the Flood, but what has been written and said about the book has so much fascinated and disturbed me, that the existence of the book as itself as a text seems somehow less important to me -- the setting is already so palpable that what it tells is, in a way, no longer left to be said.

In the end, maybe my personal Christmas feeling associated with the video is that it gives me a chance to spend some time feeling close to the person who sent it to me. The strength of this feeling of connection goes beyond -- indeed exists in a completely different life dimension -- than my reflections on meta-text usurping text or on the length of time that has transpired since I have sat down and read a worthwhile book not related to work.

Where is the multimedia information retrieval tie-in? Well, first, as a result of this video it has occurred to me for the umpteenth time that we need a verb other than "watch" to describe this kind of interaction with this video. It's a music video, so I am mainly listening to it and then looking at the visual stimuli. There could potentially be rather large changes in the visuals -- different pictures, different editing -- and these changes could possibly leave my watching experience largely untouched. I would argue, if I were only "watching", these elements would necessarily have a major defining impact on my experience. They don't. Here, I am rather "watch/listening", which I suppose could give us the new concept of "wistening".

There's a second tie-in as well: There is a little snowflake in the player bar, which I discovered after "wistening" for a while. I usually find snowflake icons ambiguous: especially on climate control units in strange hotel rooms -- do I turn the setting to "snowflake" if it's cold outside or is the "snowflake" setting going to cause the system to start producing cool? I've encountered both. So I've learned just to click on the snowflake and see what happens...

I clicked.

And lo and behold it started snowing. Right into the Peaceable Kingdom -- flakes floating down slowly -- different sorts of flakes at different speeds -- and accumulating at the bottom of the frame. I felt the smile spread on my face -- and grow wider as a realized that I was witnessing one little bit of a sort of world-wide holiday miracle as people in front of screens around the planet discover that you make it snow on YouTube. I thought about people watching this on their laptops and tables, using the mouse to play a bit in the snow and then gathering their friends, colleagues, family around their screens in one big Christmas "You gotta check this out!"

Apparently, you can't do this to every video: and this is where it really starts getting interesting to me. How did YouTube decide which videos to add this feature to? There must have been some multimedia classification algorithm that maybe looked for keywords in the title and description and something like music in the audio channel or colors in the visual channel and combined this with the upload date -- and then enabled "snow" for this video.

I want to make these kinds of algorithms! How do we put everything that we know how to do in terms of multimodal video processing and machine learning and figure out for which videos it needs to be able to snow?

And it's not just snow. There are other ways in which this could go -- and should go -- it has potential to cause so much joy. I am sitting here "wistening" and thinking about friends and family and playing in the snow, but it's clear that we need to go being "wistening" and we need a very for watching+listening+reflecting+playing. It's also clear that we need the technologies that support these activities. Imagine a search engine that can find videos that are appropriate for 'snow': that goes so far beyond user information needs as they are currently conceptualized for multimedia that it sort of takes your breath away.

How to enable the multimedia community to work at these new (from the perspective of this moment, utterly fantastic) frontiers?

The key to doing work in this direction, is to evaluating it. How do we know if we were right in presenting the snow option for a given video? YouTube is probably analyzing its interaction logs at this very moment. But I hate to think that I need to go to work for YouTube in order to ever be able to do the evaluation necessary to write a paper on this topic. Everyone loves the snow, so everyone should be able to work in order to make it better.

Note to self qua New Year's resolution: Keep up commitment to evaluation -- we need it to push ourselves forward into the unknown in a meaningful way. Maybe it's what actually makes the difference between what we call computer science and what we call art. But I'll leave that thought to another day.

In the meantime, the overall conclusion is that holidays and multimedia information retrieval do indeed mix well in a blog post. So happy holidays (ans enjoy the video):