Wednesday, January 2, 2013

Brave New Tasks: Incubating Innovation in the MediaEval Multimedia Benchmark

An important innovation of MediaEval 2012 was the "Brave New Task" track. MediaEval is a multimedia benchmark that offers and promotes challenging new tasks in the area of multimedia access and retrieval. We focus on tasks that emphasize multiple modalities ("The 'multi' in multimedia") and that have social and human relevance.

Brave New Tasks were introduced because we noticed that there is rather a tension between benchmark evaluation and innovation. Benchmarking is essentially a conservative activity: we want to compare algorithms on the same task using the same data set and the same evaluation procedure. This sameness allows us to chart progress with respect to the state of the art, especially over the course of time. How do we innovate, when the key strength of benchmarking is that we repeatedly do the same thing?

We innovate by tackling new problems. However, in order to create a successful benchmarking task from a new problem, a number of questions must be answered. Is the problem suitable for evaluation in a benchmark setting? What sort of data is needed to evaluate solutions developed by benchmark participants? How much effort is needed to create ground truth? Do we need to refine our definition of the task and of the evaluation procedure? Is there an actual chance that algorithms can be developed to solve the task and what resources are needed? Is there a critical mass of interest in solving this problem? Are the solutions appropriate for application?

The easy way forward would be to insist that there are clear answers to all of these questions prior to running a task. In some cases, is will be possible to gather the answers. In others, however, it will not. Forcing tasks to have answers before attempting to create a benchmark poses a serious risk that researchers will avoid the truly challenging and innovative tasks because they receive the message that they need to "play it safe."

People in the MediaEval community rattle their swords and shields when they are told that they need to "play it safe." Brave New Tasks support innovation in MediaEval by incubating tasks in their first year, allowing the task organizers to answer these questions. We value the advantages that the conservative aspects of benchmarking bring to the community, but we also thrive by taking risks. The Brave New Task track is a lightly protected space that allows us to take the risks that allow our benchmark to continue to renew itself.

Because people have asked me about Brave New Tasks in MediaEval 2013 and "How did you do it?" I am providing here a more detailed description of how it works and how we anticipate that it will develop in 2013:

To start, let me write a few words about makes a main "mainstream" task in MediaEval (i.e., a task that is not a Brave New Task). At the end of the calendar year, MediaEval solicits proposals from teams who are interested in organizing a task in the next MediaEval season. For an example, see the MediaEval 2013 call for task proposals. Whether a proposal is accepted as a MediaEval task depends on the interest expressed on the MediaEval survey. The survey is published in the first days of January and circulated widely to the larger research community.

During the survey, task proposers gather information on who is interested in carrying out their tasks. By the time the survey concludes, the proposers must have promises from five core participants (who are not themselves organizers) who will cross the finish line of the task (including submitting, results writing the working notes paper and attending the MediaEval workshop) come "hell or high water". This selection criteria is set up so that we have a minimum number of results to compare across sites for any given task---if there are only one or two, we don't get the "benchmark effect".

Tasks that the community finds interesting and promising, but that do not necessarily meet these stringent selection criteria, can be selected as Brave New Tasks. The difference between a Brave New Task and other MediaEval tasks is that these tasks are new, and ideally also a scientifically risky (in the responsible sense of "risky").

Brave New Tasks are run "by invitation only". The "invitation only" clause does not make the task exclusive: anyone who asks the task organizers can be granted an invitation. Instead, the clause allows the tasks to handle unexpected situation by, if necessary, decoupling their schedules from the main task schedules to accommodate unforeseen delays in data set releases. Participants of past editions of MediaEval will recognize the usefulness of a mechanism that makes the benchmark robust to unexpected situations.

Further,  Brave New Tasks do not require their participants to submit working notes papers or attend the workshop. The "only" requirement that the task must fulfill is to contribute an overview paper in the MediaEval working notes proceedings that sums up the task and presents and outlook for future years. One or more of the organizers attends the workshop to make the presentation and participate in the discussion about whether the task should target developing into a mainstream task in the next year.

A "Brave New Task" is encouraged to go far beyond the minimum requirement. In fact, 2012 saw one of the Brave New Tasks "Search and Hyperlinking" achieve the scope of a mainstream task, with six working notes papers from task participants appearing in the MediaEval 2012 working notes proceedings. The task was effectively indistinguishable from mainstream tasks in its contribution to the benchmark.

In 2013, we plan to strengthen the Brave New Task track by providing them with more central support. The tasks will be run under the same infrastructure as the mainstream tasks and decoupled from the schedule only if it is absolutely necessary. They will also be given the option of using the central registration system.

Brave New Tasks have been a successful innovation in MediaEval 2012 and one that we hope to strengthen in the future. I'd like to end by pointing out that it is not so much the "rules" of Brave New Tasks that have made them such a success, but rather the efforts of the Brave New Task organizers. Success is dependent on having a group of devoted researchers with a vision for a new task idea and the capacity and stamina to see it through the first year...including long hours spent reading related work, developing new evaluation metrics (if necessary), contacting and following up with participants, collecting data and creating ground truth. It is not so much the tasks themselves that are brave, but the organizers who are fearless and relentless in their pursuit of innovation.

Forward, charge!