Back in June, I gave a talk at the Communication Science Department here at Radboud University Nijmegen. Today, I presented a version of that talk to my colleagues in the Language and Speech Technology Research Meeting. The abstract is below together with the slides, which are on SlideShare. During the discussion it became clear that many problems in natural language processing and information retrieval face the issue of human interpretations. It is important to find ways to move forward, although it may not be possible to pack our challenges into neat classification or ranking problems with a single set of consensus ground truth labels. A way forward, is to look to other disciplines for theory of how people understand and use media, and let these inform what we design our systems to do and the ways that we measure success.
Within computer science, "Multimedia" is a field of research that investigates how computers can support people in communication, information finding, and knowledge/opinion building. Multimedia content is defined broadly. It includes not only video, but also images accompanied by text and other information (for example, a geo-location). It can be professionally produced, or generated by users for online sharing. Computer scientists historically have a “love-hate” relationship with multimedia. They “love” it because of the richness of the data sources and the wealth of available data, which leads to interesting problems to tackle with machine learning. They “hate” it because multimedia is a diffuse and moving target: the interpretation of multimedia differs from person to person, and changes over time in the course of its use as a communication medium. This talk gives a view onto ongoing research in the area of multimedia information retrieval algorithms, which help people find multimedia. We look at a series of topics that reveal how pattern recognition, text processing, and crowdsourcing tools are used in multimedia research, and discuss both their limitations and their potential.