Saturday, August 24, 2013

What is multimedia?

These days a mirrored ceiling is an unambiguous call to take a photo.
Yesterday evening after the conclusion of a very successful First Workshop on Speech, Language, and Audio in Multimedia (SLAM 2013) in Marseille, participants naturally drifted to various scenic spots for debriefing, including the Vieux Port. There, over a glass of pastis (predictable, given the locale) the conversation naturally moved to the question, "What is multimedia?"

One obvious answer to the question is the Wikipedia definition of multimedia, "Multimedia is media and content that uses a combination of different content forms". The classic example, is of course, video, which has a visual modality and also an audio modality. Other examples include social images (for example, images on Flickr have a visual modality, but also have tags and geo-tags) and podcasts (which have an audio modality and also a textual modality included in their RSS feeds).

One can argue about that answer, for example, by pointing out that some people define multimedia as being any non-text media. For example, an image, like the one above. The image was taken at the new events pavilion in the Vieux Port in Marseille. The events pavilion is basically a set of columns with a plane laid on top of them, the bottom of the plane is shiny, so that when you stand under it you are looking up at a ceiling, which is a large mirror.

In my view, an image in isolation cannot be taken to be multimedia since it includes a single medium, name pixels. It become multimedia in conjunction with this blog post, which adds an additional medium, namely text.

Another dimension for the debate on multimedia is whether it must necessarily involve human communication. The combination of this image and this blogpost were created by me with the intent to communicate a message to a certain audience, i.e., the readership of my blog (which, as I have previously mentioned, is largely a few fellow researchers in conjunction with future instantiations of myself).

Researchers who share my view of multimedia, insist on the point that multimedia must contain a message. It must come into being as an act of communication and also be consumed in a process that involves the interpretation of meaning. This definition excludes a set of geo-tagged surveillance videos as being multimedia, although they would involve two different media, namely video content and geo-coordinates.

Note that when you require multimedia to contain a message created with explicit human intent, you enter a bit of a slippery slope. If a human being had set up the surveillance cameras with the specific intent of creating a body of information that would give fellow humans information on current street conditions, then we are back to multimedia.

The slipperiness of the definition of multimedia reveals to us something very important about our field. In order to know whether or not something is multimedia, it is not sufficient to examine the multimedia data, rather it is also necessary to look at the production and consumption chain. Multimodal data in some contexts remains just data, but if an act of encoding and decoding meaning is involved, then the same multimodal data must be considered to be multimedia.

A report on SLAM 2013 has appeared in SIGMM Records. The SLAM 2014 website went online immediately after SLAM 2013 and the anticipation is already building.