Saturday, August 13, 2011

Human computational semantics

What I termed "Human computational relevance" in my previous blog post is probably more appropriately termed "Human computational semantics". The model in the figure in that post can be extended in a straightforward manner to accommodate "Human computational semantics". The model involves comparing multimedia items (again within a specific functional context and a specific demographic) and assigning them a pair-wise similarity value according to the proportion of human subjects that agree that they are similar.

Fig. 1: The similarity between two multimedia items is measured in terms of the the proportion of human subjects within a real-world functional context and drawn from a well-defined demographic that agree that they are similar. I claim that this is the only notion of semantic similarity that we need.

I hit the ceiling when I hear people describe multimedia items as "obviously related" or "clearly semantically similar". The notion of "obvious" is necessarily defined with respect to a perceiver. If you want to say "obvious", you must necessarily specify the assumption you make about "obvious to whom". Likewise, there is no ultimate notion of "similarity" that is floating around out there for everyone to access. If you want to say "similar", you must specify the assumption that you make about "similar in what context."

If you don't make these specifications, then you are sweeping an implicit assumption you are making right under the rug and it's sure to give you trouble later. It's dangerous to let ourselves lose sight of our unconscious assumptions of who our users are and what the functional context actually is in which we expect our algorithms to operate. Even if it is difficult to come up with a formal definition at least we can remind ourselves how slippery these notions are be. It seems that we naturally as humans like to emphasize universality and our own commonality, and that in most situations it's difficult to really convince people that "obvious to everyone" and "always similar" are not sufficiently formalized characterizations to be useful in multimedia research. However, in the case of multimedia content analysis the risks are too great and I feel obliged to at least try.

A common objection to the proposed model runs as follows: "So then you have a semantic system that consists of pairwise comparisons between elements, what about the global system?" My answer is: The model gives you local, example-based semantics. The global properties emerge from local interactions in the system. We do no require the system to be globally consistent, instead we gather pairwise comparisons until a useful level of consistency emerges.

Our insistence on a global semantics, I maintain, is a throwback to the days that we only had conventional books to store knowledge. Paper books are necessarily linear, necessarily of a restricted length and have no random access function. So, we began abstracting and organizing and ordering to back human understanding of the world into an encyclopedic or dictionary form. It's a fun and rewarding activity to construct compendiums of what we know. However, there is no a priori reason why a semantic system based on a global semantic model must necessarily be chosen for use by a search engine.

Language itself is quite naturally defined as a set of conventions that arise and are maintained via highly local acts of communication within a human population. Under this view, we can ask about Fig. 1, why I didn't draw in connections between the human subjects in order to indicate that the basis of their judgements rests in a common understanding -- a language pact as it were. This understanding is negotiated over years of interaction in a world that it exists beyond the immediate moment at which they are asked to answer the question. Our impression that we need an a prior global semantics arises from the fact that there is no practical way to integrate models language evolution or personal language variation into our system. Again, it's sort of comforting to see that when people think about these issues their first response is to emphasize universality and our human commonality.

It's going to hurt us a little inside to work with systems that represent meaning in a distributed, pairwise fashion. It goes against our feeling, perhaps, that everyone should listen to and understand everything we say. We might not want to think too hard about how our web search engines have actually already been using a form of ad hoc distributed semantics for years.

In closing: The model is there. The wider implications of its existence are that we should direct our efforts to solving the engineering and design problems necessary to be able to efficiently and economically generate estimations of human computational relevance and also of the reliability of these estimates. If we accomplish this task, we are in a position to be able to create better algorithms for our systems. Because we are using crowdsourcing -- computation carried out by individual humans -- we also need to address the ethics question: Can we generate such models without tipping the equilibrium of the crowdsroucing-universe so that it disadvantages (or fail to advantages) already fragile human populations?

This post is dedicated to my colleague David Tax: One of the perks of my job is an office on the floor with the guys from the Pattern Recognition Lab -- and one of the downsides is a low-level, but nagging sense of regret that we don't meet at the coffee machine and talk more often. This post articulates the larger story that I'd like to tell you.