Thursday, July 28, 2016

How to use the word "subjective" in multimedia content analysis

Multimedia content analysis is devoted to the automatic processing of video, image, audio, and text content with the purpose of describing it, or otherwise associating it with information that will make it findable, and also useful, to users. Previously, I have urged multimedia content analysis researchers to avoid the word “subjective” and instead formulate their insights in terms of inter-annotator agreement with respect to the data that they are using and the protocol that they give to the annotators who are providing the target labels. Since we don’t seem to be inclined to stop using the word “subjective” soon, it makes sense to formulate some guidelines on how to use it "safely".

Best practice for the use of the word “subjective”: When the word "subjective" it is used, it should be first defined.

The word "subjective" has different definitions. It’s not particularly productive to fix any one way of using it as “the only right way”. Instead, when using the word "subjective" you should simply declare which definition you are using, and you will avoid a lot of unproductive confusion. You do not want to risk that you use “subjective” in one sense, and your reader/listener interprets it in another sense.


We can gain further understanding of why it is important to "define well before use" by examining the dictionary entry for “subjective” provided by Merriam-Webster. Here, you can see the many meanings that “subjective” can take on. I haven’t observed any issues caused by definitions 1 or 2. Multimedia content analysis research is generally not interested in these definitions. Where we get into trouble is with 3-5, so I will focus on these.

Let’s start with definition 4c: “arising out of or identified by means of one's perception of one's own states and processes” This definition of subjective is related to the conceptualization of a situation as being exclusively determined by the point of view of the “subject”, i.e., the person who is undergoing the experience of perceiving something.

Such a conceptualization, in the case of certain situations, is standard, and when we communicate with each other, we don’t even think about the fact that we assume it.  Let’s take a closer look at how this conceptualization works. When we use language, we rely on an unspoken agreement that certain phenomena (for example, the emotion that music evokes in a person) are subjective. Specifically, the agreement means that the way in which we understand the world gives all listeners the power to determine what they feel when listening to music (i.e., induced emotion) for themselves.

Simply stated: if someone says, “This music makes me so happy”, it is nonsensical for me to assert, “No, it doesn’t”. I might say this to tease someone, but it is clear that I am not using language in a standard way. An emotion felt while listening to music can only be asserted by the subject, and I, who am not in the subject’s mind, do not have the power to originate a meaningful statement on the matter. It is not a trivial point: Without this shared understanding, the convention/assumption of subjectivity behind "This music makes me happy", the function of language would break down and we would have failed to communicate.

Here’s where things can go wrong for a researcher working in the area of multimedia content analysis. Imagine you are collecting multimedia content labels from a group of annotators who are judging the content, and you at the end of experiment, and declare, “The results show that the phenomenon we are studying is subjective”. Readers who are using definition 4c of subjective will find this conclusion invalid. The reason is that under this definition, “subjective” is something that is established ahead of time by convention: it cannot be determined experimentally. (Full disclosure: for me this is the preferred definition of "subjective", because it is the most literal interpretation. The word "subjective" contains the word "subject". I also prefer it since it ensures the sanctity of the private world of the individual, and the right of the individual to an independent voice.)

Moving to 4b: “arising from conditions within the brain or sense organs and not directly caused by external stimuli” This definition is not so interesting for multimedia researchers: we study multimedia content, which is an external stimuli. 

Now, we go on to definition 4a: (1): “peculiar to a particular individual:  personal This definition of subjective is related to the idea that each individual has their own unique view. (Merrian-Webster's Definition 1 of "peculiar" is "characteristic of only one person, group, or thing") Under this definition, something is "subjective" it means that everyone disagrees with everyone else. This definition is also not so interesting for multimedia researchers: if everyone has their own completely different interpretation, then we are lost: we cannot hope to build algorithms that generalize over the different meanings that find in multimedia. Until the field of multimedia starts working extensively on systems used only by a single person, this definition of subjective is probably not one that will be used often.

Note that the field of recommender systems strives to develop personalized algorithms, and users evaluation methodologies that assess whether personal predictions are successful. However, even recommender systems rely on the fact that people are similar to each other. In a world populated exclusively with utterly unique individuals, collaborative filtering algorithms will necessarily fail.

More helpful is definition 4a (2): “modified or affected by personal views, experience, or background” This definition of "subjective" is often implicitly assumed in multimedia content analysis. People’s interpretations are affected by what they know, the opinions they hold, and the life experience that they have had. These factors can lead to there being a multitude of different interpretations that apply to certain multimedia content. However, in contrast to the situation above with definition 4a (1), we are not assuming that everyone has their own “peculiar” interpretations. It makes sense for us to try to create systems that generalize or predict meaning, only in the case that we are not dealing with exclusively unique interpretations.

We can see 4a (2) as closely related to 3b: “relating to or being experience or knowledge as conditioned by personal mental characteristics or states”

With both of these definitions, 4a (2) and 3b, we can reasonably have hope that we can find islands of consistency in the perceptions of users of multimedia (and in the labels of our annotators). Within these islands we can make stable inferences that will be useful to users.

Let’s check again if, under these definitions, you can make a statement in your paper, “The results show that the phenomenon we are studying is subjective”. This time you can. But in order to do so, you need to have an experiment that shows that the background of the users is what is causing your classifier not to give you stable predictions. Otherwise, it might be the case that your classifier just has not been well designed or trained.

You also need to provide evidence that the protocol that your annotators are using to make judgements is not unduly steering people to diverse interpretations. Your protocol should put people reasonably on the same page, and then ask them for judgements at all times being careful not to ask "leading" questions, cf. [1, 2]. For some research work, you might not be using a protocol. Many tasks involve "found" labels such as tags. In this case, you need to state the assumptions that you are making concerning the original labeling context, including the reasons for which the labels were assigned.

With any definition of subjective, it is important to strictly avoid arguing along these lines: “This phenomenon is subjective, and therefore it is not important and we should not be studying it.” 

Scientifically, there is no a priori reason to prioritize the “objective” over the “subjective” if we use definitions 4a (2) and 3b.  It is true that we tend to study phenomena with high inner-annotator agreement since these are easier to get a handle on. However, at the same time we remain aware that this tendency steers us dangerously close to the famous story of Nasreddin Hodja who looks for his ring outside, since it is too dark inside where he lost it. In short, define “subjective”, but never use it as an excuse for failure or avoidance.

To drive that particular point home: The message is "Keep up your guard". Your problem should arise from the needs of users. Practically, speaking the problem you choose will be influenced by your ability to access the resources needed to study it, including carrying out a well designed, conclusive experiment. It will not, however, be influenced by your personal decision that something is "subjective".
Next, we turn to definition 3a: “characteristic of or belonging to reality as perceived rather than as independent of mind.” Using this definition is dangerous. It forces you to take a position on the difference between effects that are real, and effects that are imagined. As scientists, we determine this difference experimentally. We do not presume it. Unless we are undertaking experiments directed a making this difference, it makes sense to steer clear of this definition.

Finally, definition 5: “lacking in reality or substance” The same comment applies as in the case of definition 3a. We cannot a priori say whether patterns that can be found in multimedia content lack reality or substance.  If we don’t find evidence for the reality of some phenomenon in our data, it simply means that there is no evidence for its reality in our data. Lack of observation does not disprove existence. We must guard ourselves against jumping to conclusions. Again, this is a definition to be avoided, unless you are actually directly investigating the nature of reality.

As researchers in the area of multimedia content analysis, we must carefully keep ourselves from creating our own realities: the reality we assume must be the reality (possible multiple realities) of the users that we serveall of them. The fact that we do not necessarily understand this reality fully, or have the type of information or data that would capture it in its complexity, richness, and continuous, rapid evolution, is a challenge that we face. This challenge is inherent to the types of algorithms and technologies that we design and develop.

[1] Larson, M., Melenhorst, M., Men̩ndez, M. and Peng Xu. Using Crowdsourcing to Capture Complexity in Human Interpretations of Multimedia Content. In: Ionescu, B. et al. Fusion in Computer Vision РUnderstanding Complex Visual Content, Springer, pp. 229-269, 2014.

[2] M. Riegler, V. R. Gaddam, M. Larson, R. Eg, P. Halvorsen and C. Griwodz, "Crowdsourcing as self-fulfilling prophecy: Influence of discarding workers in subjective assessment tasks," 2016 14th International Workshop on Content-Based Multimedia Indexing (CBMI), Bucharest, 2016, pp. 1-6.