N-grams: Where's Wikipedia?

Tuesday, September 28, 2010

Where's Wikipedia?

The ACM Multimedia Grand Challenge is a high-adrenaline event where researchers from the Multimedia community compete against each other to develop the best solutions to problems posed by industry. For example, Google formulated two challenges, Video Genre Classification and Personal Diaries, in this year's competition.

Today in Tokyo at Interspeech 2010, I stopped to chat with last year's Grand Challenge winner, who is competing once again this year. I was struck anew by the realization that in the pressure-cooker of the Grand Challenge, creativity, raw intelligence, technical competence, competitive drive and off-beat thinking gives rise to lines of attack that might never have emerged in a traditional R&D setting. Such solutions stand to benefit us all.

But is it really only industry who should be formulating the challenges for such competitions? Where, for example, is Wikipedia? If there is any major player in the Internet information arena that deserves a crowd-sourced solution from the research community, it is Wikipedia, the knowledge resource homegrown by collaborative effort.

Wikipedia does truly inspire the research community. Very recently I've witnessed up close how fired up scientists get about Wikipedia. The Tribler team, who sit on the ninth floor of our building, have been sinking unbelievable time and effort into the development of the Swarmplayer V2.0. Their dedication is inspiring and their incredible belief in the power of a distributed solution for videos on Wikipedia is infective.

Datasets from Wikipedia have been used by multiple benchmarking initiatives such ImageCLEF and INEX as well as in MediaEval, the benchmark I co-ordinate. We certainly enjoyed coming up withour own Wikipedia-related task. However, it would be great to hear directly from the Wikimedia Foundation, in the form of a Grand Challenge, what problems they see on the horizon in the next 2-5 years for which the research community could be helpful in generating solutions. The Challenge takes the form of a simple textual description of the problem and researchers do the rest, presenting the solution in form of a system or system demo and a paper describing it.

There's a lot out there of course that I don't know about. For example, just read this post on the ECML PKDD 2010 Data Challenge: Measuring Web Data Quality. But I've never seen a clear Challenge originating from the Wikipedia community and published for the research community.

One aspect that researchers need to think seriously about, however, is the form in which solutions for Wikipedia or developed using Wikipedia data are published. ACM Multimedia Proceedings are not an open access publication. It's a contradiction to carry out research on a free knowledge resource and publish results under conventional copyright. Peer-reviewed open access journals such as the Journal of Digital Information should be preferred when publishing results obtained using Creative Commons licensed data.

Maybe that's actually one Challenge that the Wikimedia Foundation actually has to offer the research community: challenging us to breaking the habit of creating solutions in a rush of creative joy and technical muscle, and then publishing them where they cannot be accessed by everyone.

N-grams

Tuesday, September 28, 2010

Where's Wikipedia?

Search Happens

Search This Blog

Blog Archive

Labels

Twitter