Friday, September 2, 2011

MediaEval 2011: Reflections on community-powered benchmarking

The 2011 season of the MediaEval benchmark culminated with the MediaEval 2011 workshop that was held 1-2 September in Pisa, Italy at Santa Croce in Fossabanda. The workshop was an official satellite event of Interspeech 2011.

For me, it was an amazing experience. So many people worked so hard to organize the tasks, to develop algorithms and also to write their working notes papers and prepare their workshop presentations. I ran around like crazy worrying about logistics details, but every time I stopped for a moment I was immediately caught up in amazement of learning something new. Or of realizing that someone had pushed a step further on an issue where I had been blocked in my own thinking. There's a real sense of traction -- the wheels are connected with the road and we are moving forward.

I make lists of points that are designed to fit on a Power Point slide and to succinctly convey what MediaEval actually is. My most recently version of this slide states that MediaEval is:
  • ...a multimedia benchmarking initiative.
  • ...evaluates new algorithms for multimedia access and retrieval.
  • ...emphasizes the "multi" in multimedia: speech, audio, visual content, tags, users, context.
  • ...innovates new tasks and techniques focusing on the human and social aspects of multimedia content.
  • ...is open for participation from the research community
I make these lists and they capture the external reality of what we do, but actually I have no real understanding of how MediaEval works -- of how exactly the traction arises.

At the workshop I attempted to explain it with a bunch of circles drawn on a flip chart (image above). The circles represent people and/or teams in the community. A year of MediaEval consists of a set of relatively autonomous tasks, each with their own organizers. Starting in 2011, we also required that each task have five core participants who commit to crossing the finishing line on the tasks. Effectively, the core participants started playing the role of "sub-organizers", supporting the organizers by doing things like beta testing evaluation scripts.

This set up served to distribute the work and the responsibility over an even wider base of the MediaEval community. Although I do not know exactly how MediaEval works, I have the impression that this distribution is a key factor. I am interested to see how this configuration develops further next year.

MediaEval has the ambitious aim of quantitatively evaluating algorithms that have been developed at different research sites. We would like to determine the most effective methods for approaching multimedia access and retrieval tasks. At the same time, we would like to retain other information about our experience. It is critical that we do not reduce a year of a MediaEval task to a pair (winner, score). Rather, we would like to know which new approaches show promise. We would like to know this independently of whether they are already far enough along in order to show improvement in a quantitative evaluation score. In this way, we hope that our benchmark will encourage and not repress innovation.

I turned from trying to understand MediaEval as a whole to trying to understand what I do. Among all the circles on this flip chart, I am one of the circles. I am a task organizer, a participant (time permitting) and also play a global glue function: coordinating the logistics.

The MediaEval 2012 season kicks-off with one of the largest logistics tasks: collecting people's proposals for new MediaEval tasks, making sure that they include all the necessary information, a good set of sub-question and getting them packed into the MediaEval survey. It is on the basis of this survey that we decide the tasks that will run in the next year. We use the experience, knowledge and preferences of the community in order to select the most interesting, most viable tasks to run in the next year and also to decide on some of the details of their design.

Five years ago, if someone told me I would be editing surveys for the sake of advancing science, I would have said they were crazy. Oh, I guess I also ordered the "mediaeval multimedia benchmark" T-Shirts. That's just what my little circle in the network does.

Let's keep moving forward and find out where our traction lets us go.