Friday, November 30, 2012

ImageNet and the Edge of the World: On visual concept labels for images


Google Image Search results for the query "two-year-old horse"
ImageNet (http://www.image-net.org/) is a collection of images depicting concepts in the lexical database WordNet (http://wordnet.princeton.edu/). ImageNet consists of groups of images that illustrate the same WordNet concept. On WordNet a concept is a set of cognitive synonyms, or words that are understood to express the same thing.

ImageNet is very cool. If I were a kid, this would have been better than any of those other picture dictionaries...it is fun just to click through and explore what exists in the world.

I recently made a video about this paper:

Jia Deng; Wei Dong; Socher, R.; Li-Jia Li; Kai Li; Li Fei-Fei; , "ImageNet: A large-scale hierarchical image database," Proc. IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009.  pp.248-255.

The video's down below, in case you want to get a quick overview of the paper...but in the video I am mostly focusing on discussing the crowdsourcing methods applied to create ImageNet.

Here, I will discuss "Edge of the ImageNet Image World". I identify the edge with the idea of a concept being "difficult to be illustrated". This concept I found mentioned in footnote 1 of the  paper:

About 20% of the synsets have very few images, because either there are very few web images available, e.g. “vespertilian bat”, or the synset by definition is difficult to be illustrated by images, e.g. "two-year-old horse". (p. 249)

I would like to make the point that we may be radically underestimating the importance of "difficult to be illustrated" visual concepts in our multimedia information indexing systems.

Difficult to be illustrated? It is quite obvious that it is not inherently difficult to have a picture of a two-year-old horse.

I have a horse, two years ago I watched it being born, and I take a picture of it. Finished.

What is difficult is to find a group of people (annotators or users) who will look only at the picture (knowing nothing about me and the horse) and agree that the horse is two years old.

It is difficult for two reasons:
  1. Context of use: The concept "two-year old horse" is difficult to pin down exactly. Does a horse that is two-years and one day old still count as a two-year-old horse? It depends on what you are using the picture for. If you are using it for a collection of "horses on their second birthdays" it won't count. However, if you are using it to illustrate horses that are less than full grown, that day doesn't matter.
  2. Background of user: You have to know something about horses to distinguish a horse that is a foal (under one year) from one that is a colt or a filly (which Wikipedia tells us are terms that may be used until the horse is 3 or 4).
The ImageNet paper claims that "ImageNet aims to provide the most comprehensive and diverse coverage of the image world".

As multimedia researchers, we seem to assume that these "difficult to illustrate concepts" represent some marginal part of multimedia meaning. I mean, it's less that 20% of the concepts in WordNet that have this problem, so isn't it a good first approximation to just ignore them and focus on the 80% that are easily illustrated by images?

Context of use: We can just concentrate on the formal definitions of concepts. It's about delivering precise results lists when we search for images isn't it? Under that view, we can solve point 1. by deciding to use the most restrictive definition possible: the horse that turned three yesterday is no longer a two-year-old horse.

OK. So we all are totally annoyed at the guy who just sits immobile when we say, "Hand me that red screwdriver?" You climb down from the ladder just to hear him say, "I see a crimson screwdriver, but no red screwdriver." We are annoyed because we know that language is built to be used, and part of that use is the fact that we accommodate the meanings of words within their contexts of use.

But we learn to live with it. We realize the guy is literally right, so we grab the screwdriver ourselves and climb back up the ladder. We could learn to live with image search engines that behave like that as well, couldn't we?

Background of user: We can just concentrate on what the "man on the street" thinks about the image. It's about delivering results that are generally recognizable and not results that require some expert insight, isn't it? Under that view we can solve point 2. by deciding to use what a member of the general public would say about the image: it's a horse, probably not a grown up horse, but there's no telling if it's two years old.

Whoa. Hold your horses right there! Who gets to then decide who constitutes the "man on the street" of the "general public"?

Many people that I meet on the street in my daily life are not going to know the difference between quite obvious concepts like "bananas" vs. "plantains". It depends on what street I chose to look at.

With respect to many streets in Western Europe, "plantain" would be "difficult to be illustrated": people that can identify them are somehow considered experts. Not so in West Africa.

Irresponsible intuitions: In a split second, we as multimedia researchers can make a decision that seems "obvious", but that on closer consideration has potential to come back and haunt us.

We are reinforced to make these "obvious" decisions because they are the ones that allow us to continue on with our research with a minimal investment of resources in creating labeled image sets.

If I use restrictive, formal decisions, I don't have to turn to actual users of image search engines to try to understand how the "language of concepts" that they use when they search.

I also don't have to try to dig down to more subtle forms of cultural bias that exist in WordNet. Who of us has time to read a volume on cultural bias in dictionaries with contributions from 40 scholars?

In the end, although "difficult to be illustrated concepts" may constitute 20% of the concepts in WordNet, we have no idea of what percent of actual user image need might be related to these concepts. It could be huge!

Edge of the world: Google somehow gets it right. The search results at the top are returned by Google Images in response to the query "two-year-old horse". The first image occurs on the Internet in conjunction with the text "2 Year old Buckskin Quarter horse Colt". Someone apparently took a picture of their two-year-old horse and that seems to be right.

In the next picture, it's the kid and not the horse that's two, but that's pretty obviously wrong, and even amusing.

At the very least, this discussion allows us that to conclude that if ImageNet covers "The Image World", that is a very flat world indeed. It is easy to follow a "difficult to be illustrated" concept to the end of that world and stand there looking over the edge...

...ImageNet is a valuable research tool and serves the community well. However, we should all be aware of exactly where the edge of the ImageNet world is, not that we want to avoid it, but perhaps because that is exactly the place from which we want to leap off.


Saturday, August 25, 2012

Gender in Advertising Images: The Devil is in the Detail

My Saturday was unbalanced already at breakfast, while reading the Economist and drinking my orange juice. On page 67 of the August 25-31, 2012 issue, I discovered that TU Delft is recruiting a Professor of Safety Science (good news!). Unfortunately, whoever designed this ad has made some unconventional decisions (not such good news).

The most obvious "bug" is the choice to include in the advertisement an image of a person. Since this misstep is a useful illustration of the limitations of visual depictions in multimedia, I decided to dedicate a blogpost to discussing it.

At first consideration, it seems obvious that our university should advertise using image of people. One of the reasons that I love working at TU Delft is the emphasis on solving societal problems. Using pictures containing people and not just technology wherever possible seems to be a good strategy for getting the importance of our work to address human and social challenges across.

However, a major limitation for visual depictions such as images and videos is "the curse of instance  depiction". Basically, it is impossible to create such visual imagery without committing yourself to depicting a full range of details. You can't get across and abstract concept, for example, "car" without actually committing yourself to an instance of a single car existing in the real world, which you take to stand for all cars. Instead, you are going to need to show in your image a specific type, make and model.

Here, the concept that the ad is trying to convey is "professor". The "type, make and model" chosen to convey this concept are an adult of a certain gender and a certain age group, wearing glasses. It seems plausible that the person designing the ad was aware of the problem of instance depiction. The decision to use a model with a shaved head makes it possible to avoid depicting the hair color, which could serve to further specify the ethnic background or the age.

However, it is extremely difficult, if not impossible, to "hedge" on the gender question in images of the real world. A person depicted in a daytime work setting will generally be identifiable as a male person or a female person.

If we assume that the process by which we choose and interpret images that are being used to represent categories follows prototype theory, then the choice of a male to represent a TU Delft professor is no just unbalancing for the reader of the advertisement, but is very serious indeed. Prototype theory tells us that in our cognitive representations, some members of conceptual categories are more salient than others. We think of them first when we think of a category and we react to them more quickly when confirming category membership.

The use of a male person in this advertisement sends the message that males are the canonical professors at the TU Delft. Although men are clearly in the majority in the faculty, there is not any sort of a conscious intention at the university to keep the situation that way. In fact, I have the impression that everyone is working to shift their idea of how can be a professor to encompass a diverse demographic more directly representative of the general population.

Visual depictions in multimedia, i.e., images depicting the real world, are limited in what they can express because they deprive us of the possibilities of leaving certain details unpecified. What we have is a reversal of the saying "A picture is worth a thousand words." Instead, the spoken or the written word is able to express more in this case because human language can directly convey concepts without having to make use of specific instances to do so. In effect, the possibility for ambiguity or underspecification is makes human language more expressive that multimedia.

And so, the saying "The devil is in the detail" takes on a new shade of meaning.

What to do about the advertisement? I advise having a closer look at some advertising guidelines. Advertising Standards Canada, a non-profit self-regulation body for advertising, has a helpful list of guidelines for balancing gender representations in advertising online and surely Europe has a similar set of guidelines.

An "quick and dirty" solution is to look to see how other universities advertise. In the Economist, a general tendency to avoid imagery is readily apparent. For example, next to the TU Delft advertisement is a classical advertisement for Harvard faculty positions, whose only graphic content in the Harvard Business School logo.

I was cheered up again when my Google Googles app confirmed for me that the logo used was from the business school (i.e., distinct from the main Harvard Logo). It is my first use of Google Goggles for something other than just playing around with while hanging out with my multimedia information retrieval colleagues.

For completeness, I note a less obvious bug. The advertisement contains the text "Maximum employment: 38 hours per week (1 FTE)" In order to interpret this text, you need to know that "FTE" stands for "full time equivalent". 1 FTE means this position is a full time job. Contrary to what the text implies, no one the Safety Science processor to working 38 hours a week.

Wednesday, August 8, 2012

Worry-Free Social Sharing for Social Networks

 Flickr: Phil Wiffen
Quite a few people have heard me say that social networks should come with a warning: when you sign up, for example for Facebook, the company should be required to notify you of the danger of long term impact of social  sharing on your personal privacy (and sometimes I also add two other factors: how many hours you are projected to spend "Facebooking" over the coming years and also how much peer pressure and social isolation you will endure if you want to leave the platform). In my lifetime, awareness has developed and legislation has changed such that cigarettes and cigarette advertisements are required to bear warnings about the health hazards: maybe I'll yet live to see awareness rise about the consequences of social sharing.

In contrast to smoking, social sharing done right actually helps rather than hurts. In fact, the rise of online social networking and social multimedia sharing has been downright amazing technological development. Moved by this awe, last year in a project proposal, I effused that social networks are, "...a virtual prosthetic that extends the strong fabric of social connectivity critical to the well-being and growth of human societies into the online realm."

That proposal developed the idea of "worry-free social sharing": a social sharing client that would gently alert us when our sharing actions, in ways we do not intend, threaten to compromise our privacy---and then suggest alternative actions, which allow us to share our personal experiences, but in a wiser way.

Yes, people are responsible for their own actions. But in some cases, we as individual users do not have the understanding of multimedia analysis technology, or of the power of algorithms to combine different sorts of data to reveal facts about us that we thought were hidden. We all would need such understanding in order to allow us to make informed decisions about which types of social sharing is harmless and which types should better be avoided.

Even for the most savvy of us there are always surprises: Did you know that if you upload a video to YouTube and you carefully avoid geo-tagging it, but if you happen to be in a city and capture an ambulance siren in the background, that siren will serve to indicate in which city you are? Check out the work on multimodal location estimation [1]. Maybe you don't care if the world knows where you are, but if you do happen to be worried about having left your house empty during your vacation, it would be good to know that you just about betrayed your location to the world without realizing it.  I've written about this before, e.g., in this post that mentions cybercasing.

It is within the reach of technology to build a "worry-free social sharing" client. The problem is getting the research funding to do so. Industry doesn't really have an interest in having users start being concerned about the implications of their sharing behavior. (It's in their interest to just send the message "share more".) Sure, it's unpleasant and possibly off-putting to have to reflect on the fact that someone might break into your house based on information about your location gleaned from videos that you post to YouTube. But is seems to me that "worry-free sharing" is an idea that users could identify with: just like the cereal box in the morning that announces how much fiber and how many vitamins we are consuming promotes consumption rather than driving people away from a product.

Another project proposal won the competition over the "worry-free social sharing" idea. One of the professors involved in the review later informed me that "worry-free social sharing" sounded like something female. I wasn't really sure what to do with that remark beyond thinking that it probably wasn't one of the considerations for the decision and storing it away for future reference.

I hadn't thought about the femaleness of privacy protection until this weekend, during the new Batman Movie. Here, we watched Cat Woman chasing something called "Clean Slate". She knows that what she needs in order to live her life the way she wants it is to make a clean break with the past. But Batman eventually recognizes this too. And I am happy to see other voices online interested in the privacy themes of the Batman movie. So I am not going to assume that there is only one half of the world population that would be interested in "worry free" sharing solutions.

Thinking about Batman also brought me back to the parallel with the cigarette warning label case. The label pictured above warns of the dangers of second hand smoke, "You're not the only one smoking this cigarette." If warning people about the dangers for their near and dear ones motivates people to cut back or stop smoking, maybe the same effect is true of social sharing. The "worry-free social sharing" client can remind us: Hey, you don't mind posting this picture, but maybe it will have unintended consequences for your friend, who is also pictured.

If you don't believe me, believe Batman: "You wear the mask to protect those you love."

Gerald Friedland, Oriol Vinyals, and Trevor Darrell. Multimodal location estimation. In Proceedings of the international conference on Multimedia (MM '10). ACM, New York, NY, USA, 1245-1252.

Wednesday, July 11, 2012

Time Machine Session at ICME 2012 and beyond

Today was the day of the Time Machine Session at ICME 2012. The session consisted of talks given by experts in the field of multimedia about "Time Machine Topics", defined as: ideas that were published before their time and have yet to reach their full potential. 

At first, it might sound like just digging around in the past and brushing off some old ideas. Or it even might sound like some futuristic science recycling scheme, designed to make the most of a limited resource.  

But a Time Machine Topic is far from dusty, outdated or rare. Instead, a Time Machine Topic is a topic that is currently experiencing renewed relevance because of subsequent developments in technology and also in our expectations and needs as users. 

We think that there are a large number of Time Machine Topics and that some of them bear repeated mention to support the integration of new researchers into the research community and also cross-pollination between related research domains.

The Time Machine Session was born at ICME 2012 because Mercan Topkara and I were appointed under the title "Innovation and Demo Chairs". To be honest, I had never heard of a position called "Innovation and Demo Chair" before. The "Demo" part seemed pretty straightforward, but "Innovation"? What could we possibly offer? 

We decided that our innovation should create something for the multimedia community that was new and that served a pressing need. With the Time Machine Session we set out to achieve a number of goals:
  1. Stimulate observation and discussion among researchers.
  2. Emphasize the benefits of knowing the literature.
  3. Streamline innovation by reducing redundancy.
  4. Encourage reproducing and reproducible research.
  5. Maintain the breadth of the solution space to stimulate new algorithms and approaches
For me personally, a major reason for proposing the Time Machine Session is to create a forum where we publicly and, perhaps a bit ritualistically, demonstrate that we as researchers value knowing the literature and knowing where we have been. 

Google Scholar reminds us that we "Stand on the shoulders of giants" and the Time Machine Session gives us as scientists an opportunity to remind ourselves of exactly whose shoulders those are (and there are lots of them). 

If Time Machine Sessions exist at conferences (and we hope that there will be more in the future at ICME and elsewhere) we think it will incentivize us as researchers to really study and understand the literature. It will ensure that the "Related Work" sections of our papers are a truly integral part of our research that contributes to the forward movement of our field.

I am making the slides I used for the opening of the Time Machine Session available in the hope that they might be useful for other people who want to hold other Time Machine Sessions elsewhere. In the slides, I discuss the session goals in a bit more detail and use plain language and some great mood-setting images. 

I wanted to explicitly point out that the images really made the introduction special, and here I owe much thanks on Auntie K on Flickr, who is so thoughtful to make some of her work available under a Creative Commons license.

The four talks in the ICME 2012 Time Machine Session were the following:
  • Dynamic Time Warping's New Youth (Xavier Anguera, Telefonica, Spain )
  • Designing Calm Technology (John N.A. Brown, Alpen-Adria Universität Klagenfurt, Austria & Universitat Politècnica de Catalunya, Spain )
  • Affective multimedia analysis (Mohammad Soleymani, Imperial College London, UK)
  • High Order Entropy Coding, (Wenjun Zeng, University of Missouri, USA)
More information on the talks can be found at the ICME 2012 website's expert talks page.

Also, John N.A. Brown creative a short documentary video at ICME 2013 about the Time Machine Session. The video contains people's reactions to the session and a bit more information on how and why we organized it.



The talks in the Time Machine Session were recorded by videolectures.net and is available at the bottom of the page at http://videolectures.net/icme2012_melbourne/ 

The opening is here:
   
Time Machine Session: Introduction

Martha Larson

The original call for proposals for expert talks is repeated below, or read it at: http://www.icme2012.org/CallForPapers_ExpertTalk.php 

Time Machine Session 
Expert Talks  on Innovating the Future Leveraging the Past
IEEE International Conference on Multimedia & Expo (ICME) 2012
11 July, 2012, Melbourne, Australia

Multimedia research is moving ahead in leaps and bounds. In order to pursue the most innovative and productive paths forward, we need an in-depth understanding of where we have already been. The Time Machine Session at the ICME 2012 is dedicated to the principle of improving the future by leveraging valuable insights from the past. The session will consist of a series of expert talks that re-introduce ideas that were published "before their time" and, as a result, were never fully exploited. A "Time Machine Topic" is distinguished by the fact that subsequent technological and social developments have led to a renewal of its relevance, making it currently of critical interest and value to the multimedia research community. A Time Machine Talk covers not only the original idea, but also explains why it currently deserves renewed attention and how it can influence the future of multimedia research.  We invite the submission of proposals for oral presentations in the ICME 2012 Time Machine Session.

Time Machine Talks should reflect expert-level understanding of the technological and social developments that have taken place in the field of multimedia and have brought about renewed relevance of past concepts. These developments include, but are not limited to:
  • Expansion in the volume, diversity and sources of multimedia content
  • Increase in the size, speed and sophistication of distribution networks
  • Improvement of computing infrastructures in terms of processing, storage and distribution,
  • Growth of the variety and capacity of user end devices
  • Development of user expectations for new multimedia applications
In sum, the goals of the Time Machine Session are to stimulate the creative thinking of today's multimedia researchers and to maintain the breadth of the solution space in which we develop new algorithms and approaches. Additionally, we believe that Time Machine Talks can help streamline and defragment the innovation process, by encouraging reproduction and reducing redundancy. Finally, we hope that the Time Machine Session will stimulate interesting and productive discussion in the community.

Selection
From the pool of submissions, a panel will make a selection of talks for presentation at the Time Machine Session. The decision will be made on using the following criteria:
  • Renewed relevance of the idea for today's multimedia researchers and research domains as set out in the general ICME 2012 CFP
  • Scope of the potential impact of the re-introduction of the idea on innovation in the multimedia research community
  • Importance of re-introduction of the idea to prevent the community from wasting time by "reinventing the wheel"
  • Presentation of the idea i.e., compelling argumentation and engaging presentation style 
Submission Format
The submission consists of three parts:
  1. The reference (i.e., bibliographic citation) of the paper that originally introduced the idea (pub-lished at least five years ago and still publicly available),
  2. A three minute video summarizing the idea and explaining why at the present moment its time is finally ripe,
  3. A 300-400 word abstract to accompany the Time Machine Talk in the ICME 2012 program. Note that the person submitting the proposal does not necessarily need to be one of the authors of the original paper.

Friday, June 1, 2012

Criteria for judging a demo in a conference demo session

Mercan Topkara and I are the "Innovation and Demo Chairs" for ICME 2012 to be held 9-13 July 2012 in Melbourne, Australia. We were called upon to organize the decision making process by which the ICME organizers would arrive at the decision of which demo would take home the ICME 2012 best demo award.

The decision is a difficult one because demos in the area of multimedia tend to be radically different in nature. For this reason, I formulated a list of six dimensions to use when judging demos.

1. Clarity: Understandability of the demo paper and the presentation.
2. Realization: Well implemented, robust, good use of technology.
3. Innovation: Addresses a problem that has not yet been tackled (or has proven difficult to solve).
4. Impact: The number of people the technology potentially touches and the importance of its influence on their lives.
5. Representativity: Centrality to the topics covered by the conference (in this case ICME)
6. Magic: How closely the technology fills the description, "Any sufficiently advanced technology is indistinguishable from magic" (cf. http://en.wikipedia.org/wiki/Clarke%27s_three_laws)

Number 6 is basically a wild card that makes it possible to introduce in a controlled way that factor of je ne sais quoi, which seems to slip into the considerations made when judging demos in any case.

In practice, another factor that always seems to be important is how close the demo is to a working system that is or is about to be deployed in the real world. Also, when judging demos it seems that one is always trying to project forward: how important will this technology be five or ten years from now? Will the passage of time reveal that it is a disruptive technology? (Or, as Wikipedia prefers to call it disruptive innovation?)

For a list of the demos to be presented at ICME 2012, see the ICME 2012 Demo page.

I am dating the post 1 June, when I formulated the list of criteria. It's later now, but time seems to have simply gotten away from me, not surprising given the ICME 2012 Time Machine Session.

Thursday, May 17, 2012

Search by misconception: Should search engines support information needs that are ill conceived?


The real Mozartkugeln? (Flickr: davidroethler)
Should our search engines make information findable based on misconceptions? Well, it's complicated. This post gives some examples to highlight the relationship of misconceptions to search.

Let's start with an example. When I refer to "Mozartkugeln" I mean the ones in the pictured here. They are gold and show Mozart in his red jacket. Their shape is round. I differentiate these from the ones with flat bottoms, which are for me "fake" Mozartkugeln.

My idea of Mozartkugeln can be considered a misconception. The original Mozartkugeln are apparently produced by a company called "Fürst" and are silver with a blue Mozart. Additionally, the producer of the flat-bottom ones apparently has the right to call their product "Real Reber Mozartkugeln". Digging on Wikipedia and on other websites supplied me with this information.

But what should an image search engine return in response to the query "Mozartkugeln"? Is it obliged to make an effort to resolve the question of which is the "real" Mozartkugel? Or is it fine if it just returns images that users have uploaded a tagged with "Mozartkugel"?

Effectively, simply returning images tagged with "Mozartkugel" allows users to search by misconception. The search engine returns images who have been tagged by people like me, who have a certain view on the matter (based on conversations with an Austrian roommate now a couple decades old and several subsequent trips to Austria, none including Salzburg), which is not necessarily universal. I am not immediately convinced that I can be satisfied with such a search engine as a source of information. Although, it seems reasonable to assume that if enough voices are combined, a consensus will emerge. I noticed that if you search for "champagne" on Google images, the top hits (at least the ones that depict identifiable bottles) clearly hail from the Champagne region in France and don't include the large range of other bubbly wines from other corners of the world that are widely enjoyed under the name "champagne".

In short, allowing search by misconception seems relatively innocuous. But we should be careful about assuming that the Mozartkugeln example is the end of the story. What is unique about this example, is that the search engine is relatively transparent in the way that it works. The images are returned by seeking exact matches in their user-assigned tagsets; without such a match, the image is not relevant. Users of the search engine have a chance of being at least vaguely aware of the reason for the match and they can propagate their understanding of the reliability of the taggers to create an understanding about the reliability of the results.

However, when the search engine becomes more sophisticated, the situation quickly gets quite murky. For example, if I had a visual concept detector that was trained to detect Mozartkugeln in images and assign to them the appropriate tags. The design of the detector would require collecting examples of Mozartkugeln, which means that whoever trains the detector holds the ultimate control over deciding what a Mozartkugel actually is.

The example of Mozartkugeln is interesting. In some cases, one could argue that common sense knowledge will tell you what an object is, for example, a helicopter or a pram. Everyone can identify these objects, right? But in the case of the Mozartkugeln, there is no right answer. It depends on your perspective. A long discussion will arrive at the conclusion "It's complicated". (And you may already find yourself with the same issue for the pram, if not actually the helicopter.)

It seems like a good idea to do away with the central authority that collects the examples used to train the detectors. After all, no one really likes they guy who walks around the party reminding people, "Yes, but it's not real champagne".

But do we really want to admit search by misconception? I had quite an unsettling experience with Google's query suggestion. On 17 May 2011, I was looking for a news story on one of the Facebook founders have renounced his US citizenship. No sooner had I typed in "facebook founder" did Google present me with the following list of suggested queries:

facebook founder mark
facebook founder saverin
facebook founder buys new republic
facebook founder college
facebook founder gay
facebook founder bios
facebook founder dead
facebook founder movie

Did I really need to know about the existence of the circulating rumors? Do I go on to passively "believe" a query or do I dig deeper that find out if it is true? Do we really want our search engines to allow us to so easily flow down the same information paths worn by searchers before us who mis-received a rumor?

In the case of "facebook founder dead" I did dig deeper. That query led to a Fox News article on the death of Ilya Zhitomirskiy, one of the co-founders of Diaspora*, an alternative to Facebook. I was left wondering at how query suggestions have taken on an information dissemination (news broadcast, if you will) role of their own.

From the fun of searching for pictures of bonbons (...and wondering if round vs. flatbottom Mozartkugel relates to a real misconception or rather an alternate interpretation) we hit on a matter of true importance (Diaspora* upends the Facebook model because it is based on the idea that every member of the network should “own” his personal information). Suddenly, it gets extremely serious. In light of this seriousness, it looks like if we really do not want search engines to admit search by misconception at all.

This whole line of though was started while I was at a symposium entitled "Cultural Heritage Gets Social" of the SEALINCMedia project (Socially-enriched access to linked cultural media). Alice Warley (Public Catalogue Foundation, UK) gave a talk entitled "Your Paintings Tagger: Crowd-sourcing, art history and the UK's national oil painting collection" about a website where visitors collaborate to tag painters

Apparently, general public users tend to tag older paintings "formal wear" when what the people pictured in the paintings are wearing is not formal wear at all, but rather daily clothing.
The reason for this misconception is that today's formal wear evolved from what was worn on a daily basis in certain social circles in past eras. Wikipedia is rather silent on the history of formal wear. The "misconceptions" of the taggers are actually a source of information about something that is not widely known, but actually an real historical connection.

So we're back to the Mozartkugeln, considering whether Mozart is dressed for a concert, or is in his everyday work clothing that he uses for composing. It seems like misconceptions help us to uncover new and interesting information.

However, if we incorporate misconceptions, maybe we should call them 'exploration engines'. A 'search' engine should find answers or else gently reveal to us that our initial information need was ill conceived.

Saturday, April 28, 2012

English usage convention for scientific publications: "Related work" vs. "Related works"

Yesterday, we got back reviews for a journal article where a reviewer suggests that we change the title of our Section 2 from "Related Work" to "Related Works". In the review, the comment was included in a list headed "Typos (I think)" so the reviewer is not sure exactly what is going on. The reviewer's self doubt is well founded:

In the context of a review of previous research on a topic in a scientific publication, "related work" is the correct expression and not "related works".

There are several places on the internet that provide clarity on this point. Typing a query "related work" vs. "related works" into your favorite search engine will give you nice set of pages asking you to please use "related work" and not "related works" when writing a scientific publication. Examples are:

http://www.cs.columbia.edu/~hgs/etc/writing-bugs.html
http://www.iaria.org/editorialrules.html

I'm not trying to add another link to that list with this blog post. Rather, I want to add a couple of comments about why the distinction is important.

"Work" can be used as either a count noun or a mass noun. Mass nouns refer to undifferentiated substances, which can not be conceptualized as individual entities and, for this reason, it does not make sense to use such nouns in the plural. "Fuel" is a mass noun. Count nouns refer to entities that are discrete and differentiable and are, for this reason, countable---such such nouns can be used in the plural. "Airplane" is a count noun.

So, I suppose, as an author of a scientific paper, you are free to choose whether you consider research work to be an undifferentiated substance or set of discrete entities and choose "related work" or "related works" accordingly. Why then, is it so disturbing to read "related works"?

I thought about this point a bit and I realized that it is disturbing because the difference has some pretty profound implications for how we conceptualize scientific research. Is our scientific output making a contribution to a mass of knowledge, joining the efforts of scientists that went before us and available to those who go after? Is it meant to be tapped by others, reproduced and extended? For me the answers are "yes" and "yes'. The results of scientific effort flow together to become an undifferentiated resource pool whose value lies in how it advances the collective knowledge of the species and how it can be put to productive use. Under this conceptualization of scientific work, it is only possible to use the mass noun, which does not ever occur in the plural.

The expression "Related works" evokes a conceptualization of scientific work as a set of discrete entities, individual units of output created by the efforts of individuals with the idea that they will remain untouched in their individuality. Under such a view, each paper published would be a work of art, embodying the creative force of an individual researcher or an individual team and intended to be admired, rather than built upon, reproduced, validated or improved. Indeed, when the art world makes the plural of "work" they say "works" as in "works of art". These are clearly individual entities, finished works, ready to hang on the wall of the gallery.

We do sometimes feel this way about our work as researchers: our biggest and best conferences have the exciting, invigorating atmosphere of an art show opening, where you get to meet the great creative spirits and enjoy wonder and admiration at what they have produced. However, after the conference, we go home and take these amazing new papers apart and work with them in a way they makes clear that we see them as substance to be extended and not as finished works meant for framing and hanging on the wall.

So, please, yes, it's "related work". You probably already know that deep in your own scientist soul. You really wouldn't ever write "related researches" would you? "Research" is of course also a mass noun, also denoting our scientific effort and underlining its part of a larger flow.

At a more general level, I do sincerely believe that we should all adopt World English: drop some of the finer points about word choice and grammar conventions and just write papers in clear simple sentences that can be easily understood by anyone with a working grasp of the language. However, for a "work" vs. "works" distinction remains, in my opinion, an important one: it gives us a chance to reflect on how we conceptualized our contributions to our scientific field. Maintaining the distinction allows us to continue to express a subtlety important to our underlying scientific culture and mission.

Monday, March 12, 2012

The MediaEval Workshop: What it's meant to be and why you want to be there.

The MediaEval workshop is the event held each year in the fall at the culmination of the yearly benchmarking cycle. At the time of the MediaEval workshop many things have already happened in the benchmarking year: The task organizers have worked hard to define tasks and issue data sets. Participants have worked hard to develop algorithms that tackle the tasks, and they have run these algorithms on the data sets. The "runs" have been evaluated and each participating team has written their working notes paper. Now it's time for the workshop!

This blog post provides a view (from my perspective as one of the MediaEval co-ordinators) on the history of the workshop and on what the workshop is meant to be. In particular, it highlights the similarities and differences with other types of workshops.

The main goal of the MediaEval workshop is to bring everyone who carried MediaEval tasks together in one physical location to present and discuss their results, exchange experiences and develop ideas for how to improve their algorithms. The first year that we met at a medieval convent, Santa Croce in Fossabanda, it was mostly due to delight in the wordplay between medieval and MediaEval. However, we soon came to appreciate how getting everyone together working, eating and basically living in the same space creates an extremely productive focus on our our common tasks and goals. (Although unlike the nuns of the Middle Ages we don't go dashing off to prayer when we hear the convent bell.)

In 2010, we held the MediaEval workshop just before ACM Multimedia 2010, which was held at the Palazzo dei Congressi in Firenze, Italy. Santa Croce in Fossabanda is located in Pisa, about an hour's train ride from Firenze. We chose the dates and place to cut down on travel time and cost for people who wanted to attend both ACM Multimedia 2010 and the MediaEval 2010 workshop.

The next year, Interspeech 2011 came to the Palazzo dei Congressi in Firenze about the same time of year. MediaEval submitted a proposal and was granted the status of an "Official Satellite Event of Interspeech 2011". Now, instead of just taken advantage of the convenience for travel, we began emphasize hook-up to the topic of the conference: being associated with Interspeech reinforced the use of speech within MediaEval and helped us to better realize the goals expressed in the MediaEval slogan The "multi" in multimedia.

This year, the 12th European Conference on Computer Vision (ECCV 2012) will be held in Firenze. This conference provides us with the opportunity to reinforce the use of visual content within MediaEval, another of the multimedia "multi's". The MediaEval 2012 Workshop will be held right before this conference starts, again in Santa Croce in Fossabanda in Pisa. A very close contending idea for the MediaEval 2012 workshop was to hold it near 13th International Society for Music Information Retrieval Conference (ISMIR 2012) in Porto. However, the results of the MediaEval 2012 survey showed that the majority preferred to co-ordinate the date and place with ECCV 2012 and stay for a third year in Pisa.

In sum, the idea of being close to a large conference related to MediaEval topics has grown from being a convenience to being an aspect that strengthens and enriches both the benchmark and the workshop.

What exactly happens at the MediaEval workshop? The workshop consists of a series of sessions on the individual tasks. The task organizers present the task as a whole and each team presents its individual results, after which the floor is opened for discussion. More discussion and exchange occurs during meals and breaks in ad hoc groups. We try to build in a lot of space for discussion into the workshop schedule and especially try to create opportunities for students to discuss with more experienced researchers who help them to guide their efforts along the most effective path.

The proceedings are an important aspect of the workshop. The MediaEval workshop proceedings is a "Working Notes Proceedings" and consists of short (two page) papers written by the participating teams to report their results. These papers describe the algorithms that are used and present the results. They then also seek to understand the algorithm, and participants are requested to report:
  • which cases are easy/difficult and why
  • which approaches work best and why
  • which approaches do not work well
The working notes submissions are reviewed by the task organizers. The task organizers may either accept the submission as is, or may come back to the participant team with a request for revisions of the paper. The preferred mechanism is to have the papers revised rather than to reject them --- sometimes this revision cycle means that the working notes proceedings is not ready until just before the workshop. For this reason, the working notes proceedings is distributed at the workshop on a memory stick.

After the workshop, the working notes is made available online. The 2011 the "Working Notes Proceedings of the MediaEval 2011 Workshop" was published here: http://ceur-ws.org/Vol-807/ and in the future we would like to continue to use http://ceur-ws.org/. By publishing in this way, the copyright for the individual papers is with the papers' authors.

MediaEval working notes papers are intended to be first versions of work that is later extended by the authors and submitted to mainstream venues, such as conferences and journals. The fact that the MediaEval workshop proceedings consists of short working notes and that the copyright does stays with the authors keep the proceedings consistent with the goal of reporting an initial research result, which will then be refined and extended using input from the discussion at the MediaEval workshop.

Another important goal of the workshop is to discuss the tasks themselves. Did they help to move the state-of-the-art forward? Should we improve or replace them next year? Are their new questions that need to be answered that require new tasks. Any one attending the workshop is welcome to stand up in the final session and "pitch" an idea for a new task. Tasks which receive good community support in this session have a good chance of receiving the response levels they need on the yearly MediaEval survey to run as tasks in the next year.

Finally, the workshop also aims to connect ourselves and our research to the larger community. We welcome participants from industry who have tasks that they might want to pitch to the community. We also welcome representatives from other benchmarks: MediaEval grows stronger by staying in close contact with groups running benchmarking activities in areas beyond the MediaEval core domain of human and social multimedia. MediaEval 2011 was presented at CLEF 2011, NTCIR 2011 and FIRE 2011 and in 2012 we hope to convince some of our sister benchmarks to give a reciprocal presentation on their own experiences.

As part of staying connected to the larger community, the MediaEval 2011 workshop included a poster and demo session where projects that help to organize MediaEval tasks could present their results and where industry people could make a presentation of new and interesting problems that they would like to make known to the MediaEval community as possible benchmarking tasks.

The workshop closes with a gathering of task organizers and others who have invested time and effort into the community or want to get more closely involved in the future. During this session, we reflect back on what happened so far in the benchmarking year and also discuss the MediaEval related activities that we organize beyond the core benchmarking activities. These include joint papers among all the participants of a task and also special sessions at conferences. Looking forward we also plan to get involved in organizing more workshops (of the traditional variety) at conferences and also think about the possibilities of special issues. Finally, we reflect on our MediaEval goal, To offer the community innovative new tasks related to the human and social aspects of multimedia and our slogan The "multi" in multimedia (it not necessary to be able to say the slogan with a straight face.) On the basis of these reflections, we consider where we would like to go with the benchmark in the coming year.

So why do you want to be at the MediaEval workshop? Well, if you are a task participant, it gives you an opportunity to exchange with other people working on topics similar to yours and helps you to understand and improve the algorithms that you are using to approach your task. MediaEval needs a central group of dedicated researchers to organize the tasks that make the benchmark run. Attending the workshop is a good first step to getting more deeply involved in MediaEval, for example, by proposing a task for the year.

On the MediaEval 2012 survey, we had a question concerning what the community thinks about how MediaEval should grow. I personally, want to keep the workshop as small and intimate as possible. In 2011, we had nearly 60 people and that appeared to me to be a good maximum size for the workshop. However, when I examined the survey results, I realize that I am in the minority here, and that most of the people in the community would like growth. As a result, I am changing my opinion and we will not attempt to artificially restrict growth, as long as it is sustainable.

The issues and ideas around growth is just one area in which the concept of the MediaEval workshop is evolving. One of MediaEval's strengths is that it develops from year to year, guided by the input of the community --- and in particular those people who invest the most hours of their time to make it work. I look forward witnessing and being involved in this development in the 2012 season and, we hope, to seasons beyond.

For further impressions of the MediaEval workshop, check out the MediaEval 2011 workshop video:

Saturday, March 10, 2012

Querying the Collective: Why search engines should (responsibly!) support analytics

My thoughts today turned towards the social responsibility of search engines. As search improves, it seems that the question is not moving towards being resolved, but rather is becoming more important. Simply put, as the helpfulness that search engines have in our lives increases, so does their power over us. In this post, I'd like to attempt to unpack that thought a bit and build a case for the importance of social responsibility of search engines.

Let's consider the question of the identity of the entity with which knowledge resides. In other words, let's have a look at, "who knows what" in some basic search scenarios. User information needs, I claim, differ along the who-knows-what dimension:
  • Known-item search: "I know what I want and my information need will be fulfilled when I can get my hands on this item." (Knowledge effectively already with me.)
  • Ad hoc search: "I don't know exactly what I want. But I know that there are other people who know. My information need will be fulfilled when I can get my hands on information sources created by people who do know." (Knowledge with other people, soon to be with me.)

As long as we stick to these two variants, the world is relatively well behaved. Social responsibility arguably remains with the searcher and with the individuals making the information sources available.

However, with the next step in this typology, things definitely become different.

  • Analytics search: "I don't know exactly what I want. In fact, I know that there is no individual person who knows. My information need will be fulfilled if I can get my hands on information that is created using analytics over a larger number of information sources." (Knowledge with no one, not yet.)

We do analytics search all the time, even without realizing it. For me, it's often for small things while writing research papers, for example, for finding the more common usage "crowd-sourcing" vs. "crowdsourcing" or to see if "internet" has finally overtaken "Internet". I actually just this moment carried out a search for "analytics", which is being red-underlined as a spelling mistake as I write. In response to my query, Google tells me "About 117,000,000 results (0.24 seconds)". I decide that there are a lot of other people using this word -- many in the same way that I am using it now, so I ignore the spellchecker and move on.

The point is that this 117,000,000 is information that was derived on the spot by analyzing and aggregating a huge number of data sources. As a result, the responsibility for this information has shifted and now lies elsewhere. It is not so clear that it lies only with me, the person asking the question, or with the information sources that are being aggregated. Rather, the responsible for creating this information lies with the algorithm that made the calculation. If we think that a non-human entity such as an algorithm cannot be responsible, then the conclusion must be that the responsibility lies with the people who created and control the algorithm that made the calculation, i.e., the minds and masters behind the search engine.

Of course, many times that responsibility is not a particularly heavy weight and the answers to analytics search queries can often be wildly off and still not be harmful. Give or take a million, I still will see that there are a lot of people using the word "analytics" and that answers my question.

However, I would argue that there are enough cases in which the results of analytics search queries have a large enough impact that they should force us to think carefully about the social responsibility that is borne directly by our search algorithms, their creators and the providers of our search services.

Recently, I spent some time in a house by a lake in the forest. The local news reported an incident in which a hunter reported a man in the forest, carrying a gun, but not wearing the blaze-orange of a hunter. The man had fired at him, and the hunter returned fire. Basically, I made my decision about when to go out of the house after hearing that news report by using a search engine to monitor real-time media (Twitter and local news). My queries were analytics queries because they relied on the entire collection of available information being scanned. My conclusion about the situation relied on a relatively subtle difference between no one mentioning it and it being mentioned by a handful of people (the forest being rather sparsely populated). I continued periodic query sessions, and the story died out relatively quickly. I walked out of the house with confidence that the incident was a fluke and not a rampage and that no one was going to take a pot shot at me.

One could argue that I was irresponsible for potentially putting my safety in the hands of a search engine. But one could also argue that I was irresponsible for being there in the hunting season. There are also those that would claim that the place is a bit weird anyhow, and should be completely avoided. The problem is, that place is where I'm from. I'm probably not about to stop going back and also I'm probably not about to stop looking for information by carrying out analytics search with a search engine.

It seems inevitable: People use analytics search to form opinions and make assessments that influence their behavior and lead them to make important decisions.

We have little choice but to admit that we would like search engines to offer us as users the possibility to satisfy our information needs using analytic search. It gives us a lens to view the world around us. It takes us a step in the direction pointed to recently by Doug Oard, who was quoted on Twitter as wanting an information retrieval system that is an exoskeleton for the mind. Personally, (and to the bemusement of my colleagues) I tend to talk about search, especially in the context of social networks, as providing us with a prosthesis. In the end, all the metaphors boil down to analytics search being just plain important to us and to what we want to do in our lives.

We are left with the conclusion: Search engines should support analytics search, but they should take careful regard of social responsibility.

Why am I thinking of analytics search today? Probably because I've come up against another problem where it is useful. This problem involves no guns, so it's not particularly life threatening -- at the most it threatens the productivity of our lab.

At the beginning of the year, there was a high-level decision to restrict access to our building on the weekends. The cited reasons were that ICT technology no longer requires 24-hour building access and that weekend closure would save energy and security costs. The net result has been that on the weekends our lab is completely empty, when there used to be at least one or two PhD students working there, when I'd go in.

I became curious about exactly how much electricity is being saved and realized that we can actually make a rough estimate using social media to calculate the number of weekends in which the building is actually powered down. For example, today (a Saturday) it wasn't. Today, I took a picture (above) in the cafeteria which reveals the fact that it wasn't. There were also other people in this group in the picture that were themselves taking pictures. If a search engine will allow me to find other pictures (e.g., on Flickr) of weekend events in the building, it will be possible to make an estimate the total number of days of electrical consumption actually saved by keeping the PhD students out of the lab.

None of the individual picture takers know this information, but if the information can be aggregated with the support of a search, then we can know -- calling the information into being, as it were, using a couple of queries linked to dates and locations. I'll leave it to another day to examine the question of if I actually have a right to know how much energy is being saved by the building closure policy. Here, I draw a different conclusion: if I am going to rely on a search engine to formulate an impression of what happens in the building on the weekends, then I would like that search engine to have assumed the responsibility of giving the best answer it possibly can.

Wednesday, February 22, 2012

BBC Don't Make Me Evil: We need fragment level access to online spoken audio content

Today, I uploaded something to YouTube I strictly speaking probably should not have. It was a section from a BBC podcast Outriders containing an interview with Heather Marsh about connecting and protecting people by creating social networks from the bottom up rather than from the top down, so that the power and control remains with the individual and not with with an overarching central authority.

In particular, she discusses working with Tribler the open source peer-to-peer client at TU Delft. At 4 minutes and 28 seconds into the interview she states, "It's what we always wanted, it's what the Internet always was supposed to be and we're at the point now where there is no excuse not to have it."

C'mon BBC, there's no way that I am not going to take that sound byte and spread it through my social network. I just don't have that kind of will power. But are you allowing me to do this?

No. The podcast is one monolithic .mp3 file. The times at which the individual interviews (there are three on three different subjects) begin are not given. There is simply no easy fragment level access possible.

And here's where my self control breaks down. I excerpt the interview from the mp3 and upload the thing to YouTube. Twinged by guilt, I generate myself a neat deep link. (As I mentioned in my previous post that touched on deep links as used by YouTube, a deep link is a link that let you jump in to a particular point in an audio file.)

Voila, here is my link for the "there's no excuse not to have Tribler" sound byte:

(except my conscious got the better of me and I deleted it)

Most people I know are so much more likely to click on a deep link then to struggle with the podcast download at:

http://downloads.bbc.co.uk/podcasts/fivelive/pods/pods_20120221-0400a.mp3

Of course with the download it's still Tweetable:

TU Delft P2P client Tribler as heard on BBC Radio 5 "It's what we always wanted" Listen in at 4'28''
http://downloads.bbc.co.uk/podcasts/fivelive/pods/pods_20120221-0400a.mp3

Twitter's link shortening would get that down to 140 characters. But then I have to trust my followers are willing to devote about 10 times more attention to digesting my Tweet, then to other Tweets containing links to streams not downloads.

Not only can I not easily link to my sound byte, I also have no way of finding this reference to Tribler unless I already know this is there, occurring in the middle of an interview. There is no indication of the names of the interviewees and give only very limited information on the topic.

And to aid findability, of course, I turn on the YouTube automatic captioning and see what kind of text transcript the speech recognition will generate.

It's just a click on the little cc icon on the bottom and the transcripts are displayed. You can use the player bar to move quickly through the video and the changing transcript sort of gives you an idea of what the different parts of the interview about. You won't be blown away by the quality of the transcripts, "Tribler" for example is not correctly recognized. Nonetheless you can use them to figure out where to stop, and how to navigate to listen to particular questions.

In short, this workaround instantiates the principle of the intelligent multimedia player. Ask me about the Internet as it always was supposed to be and I would say that there should be no audio content, no interesting sound bytes, buried away without the possibility to find them and to share them easily. We need the fragment level access, the deep links, the player that tells us what is where when we're listening.

Can we get you to do this for us BBC? You might notice that my YouTube solution here is quite ugly and and strictly speaking probably not at all what you have in mind for me to be doing with this content.

But let me assume that the mp3 is online in order to be listened to and shared. Let's continue our search to find new ways to make this possible.

Monday, February 20, 2012

I am not my website: Warning! Something's not right here.

The server hosting the MediaEval website was compromised 16 February and infected with malware in the form of lines added into the .html code of several of the pages.

Google Safe Browsing Diagnostics caught the problem and MediaEval community members saw the warning in their browsers and started writing me immediately.

I fixed it quite quickly. However, at the moment, Chrome (above), Firefox and Safari and probably also other browsers are presenting this error screen. The malware is gone, but people are still being for-all-practical-purposes prevented from visiting the MediaEval site.

The warning links to a safe browsing diagnostics page that states, "Of the 3 pages we tested on the site over the past 90 days, 2 page(s) resulted in malicious software being downloaded without user consent." But that's pretty deep to have to dig to understand that the danger has been taken care of. The implication is that it's going to take us 90 days of a clean record to get back in the good graces of Google Safe Browsing Diagnostics.

At the bottom, the diagnostics page tells me that as the website owner I can request a review of the site using Google Webmaster Tools. It's nice to find a helping hand extended in a tough situation.

However,what happened next echoed the text of the warning: Something's really not right here. I have been quite concerned about Google's new privacy policy and my interaction with the Webmaster Tools further deepened that concern today.

Google Webmaster Tools wanted me to verify that I was the owner of the website, and, of course, it does this using my Google login. Now, that site is linked at the hip to this blog.

I suppose that was obvious anyway, for anyone reading the content. And people ask me why I care about some association that is deep in a Google server somewhere. It's a slippery slope, yeah sure. Let me articulate why I am not comfortable with this latest slide downwards.

I put a lot of thought into the fact that MediaEval is a community-driven initiative for which I act as the "glue person". Glue person means doing the infrastructure (which amounts to keeping a bunch of plates all spinning at the same time like they do in the circus) and co-ordinating the process by which we make tough decisions (in cases in which such are necessary). My MediaEval activities need to be understood clearly by everyone who cares to scrutinize them as being separate from what is written in this blog---which is my own personal view and does not claim to be anything like a community wide consensus.

The separation of my personal and my public role in MediaEval was previously naturally represented by the fact that on the Internet the default existence for websites are separate entities. These entities were perhaps associated with a webmaster, but that were not linked with a single author/owner person who has a personal history and a private life (as represented by my Google account). Now, I suppose I could set up a separate account to be MediaEval---but these means signing in and out if I want to go from one to the other and that is simply not practical given the amount of work I need to do.

The way that co-operative initiatives like MediaEval grow is that they can be set free from a single person or personality and can take on a life of their own. It's idealistic, I realize, but we do strive for a sort of grassroots democratic process in the benchmark. In order to come anywhere close to this ideal, we need technology whose default mode of operation allows leading members of the community to draw themselves away from the spotlight to stand at the sidelines and give the community room to speak for itself.

Today, I made a quick decision under the pressure of protecting the channel of communication with the MediaEval community. I tied my personal self yet more closely to the site of the benchmarking initiative, when another part of my brain is telling me that for growth and sustainability of the community the trend must go in the other direction.

When I am tired and desperately need a solution, I am in no place to insist on the principle that I am not my website.

I tell myself, that maybe the close technical connection will now remind me to be even more careful in making the conceptual distinction between the two hats I wear: research and community coordinator.

In the meantime, I sit back and take my mind off the issue by enjoying some YouTube---recently a category "Middle Ages" has appeared on my recommendation page. Hey, Google, it's not that kind of mediaeval that I care about! Watches those videos is a welcome form of distraction: especially because it underlines that point that putting all that data in one place isn't really necessarily going to get anyone closer to where they want to be. Let's just hope that the consequences remain innocuous and merely amusing.

Sunday, February 19, 2012

The PhD: Reflections of a mentor

This weekend, I have been reflecting on the mystical status of the PhD. I don't recall there ever being a moment when one of the people who mentored my own PhD process made an explicitly formulated and crystal clear declaration of what they believed that a PhD to be. Somehow their opinion remained obscured, the PhD appeared an arcane, untouchable concept and it seemed almost as if its true nature needed to be kept secret. Possibly, avoiding such declarations is grounded in a certain age-old mentoring wisdom. Since "It's different for everyone", it might be better not to set up specific expectations for a given PhD candidate. Or possibly, avoiding such declarations is a natural reaction to the rapid rate at which the PhD is changing. In fact, by necessity the concept of "The PhD" must develop radically in order to remain relevant---a topic generating a high volume of discussion, e.g., a special issue in Nature last year devoted to the future of the PhD. In the end, defining "The PhD" seems to open more questions that it answers.

However, I have been the daily mentor for PhD students for a number of years now and I have reached a point where I feel the need to take a snapshot of my perspective on PhD mentoring and set it down in linear form. Actually, it's not really any sort of a secret, so why not put it on my blog? In this post, I aim to describe what for me is the essence of a PhD and what I perceive to be the role of the PhD mentor. My definition of a PhD is:

A PhD is an independent contribution to the knowledge of humanity.

With independent, I mean that the key conceptual content of the PhD is original and arises from the PhD candidate's own insight. Note that the definition does not specify the magnitude of this contribution, or even a particular way in which the contribution must be measured. These aspects vary from field to field.

Given this variation, how is it possible to know that one has a PhD? In my view, someone has a PhD when:

A PhD candidate has earned their PhD when their supervisor declares that they are satisfied and when the candidate has successfully defended their contribution before a committee.

Working in the Netherlands is interesting, because the "completion criteria" are extremely salient. The PhD supervisor is a professor at the university who has something called ius promovendi, or the right to graduate PhD students. The system is set up to keep the the number of people with this right small and exclusive (the rest of us are merely mentors). The PhD defense is a public ceremony: the PhD candidate wears a tuxedo and the committee is dressed in academic regalia. Women usually don't wear the tuxedo---but choose an equally imposing alternative. The defense itself follows a special form, which includes a particular moment at which the doctorate is granted to the candidate. All in all, the tradition and the spectacle are awe inspiring. What is highlighted is the importance of this single moment, when the academic community represented by the committee tests the candidate and convenes in seclusion and makes the determination that s/he has fulfilled the requirements. It's the only moment when we see the candidate in a tuxedo during the whole PhD process. It's worth reflecting on the fact that it is indeed the sine qua non of the PhD---it is the only moment that occurs by necessity.

Beyond these points, the PhD process in a lot of hard work and keen insight on the part of the candidate and a lot of interaction with colleagues and mentors. In the Netherlands, the PhD is "officially" not a PhD at all, but a doctoraat. Both PhD and doctoraat are variations on a "research doctorate". I try to be sensitive to cultural differences between my Anglophone conceptions of the PhD and what is common here---but mostly these have turned out to be superficial (e.g., the mandatory tuxedos).

If the PhD candidate is the one doing the thinking and the heavy scientific lifting, what, one wonders, is exactly the role of the mentor? This role is, of course, going to vary widely from person to person, but it's worth trying to capture those aspects that remain invariant. Here is a list of the points that, in my mind, a mentor is responsible for:
  1. Making sure that the candidate has the skills and tools necessary to be a successful researcher. The candidate should be completely comfortable with carrying out the steps of the scientific method. Usually, people are well versed in the scientific method when they start their PhDs. What often needs a bit of help is recognizing the worthwhile examples and not so worthwhile examples of the scientific method. One learns this discrimination through practice and (again) a lot of thinking about it.
  2. Making sure the candidate has a viable topic. A PhD topic will stays with someone for the rest of their life. The mentor needs to the very best of their ability to guide the PhD candidate into growth areas and away from research dead-ends.
  3. Reminding the candidate to converge. It's only one little word in the definition above, but it is important to note that a PhD is an independent contribution and not many independent contributions. Good PhD students will generate ideas during their PhD, which will feed the rest of their scientific careers. The mentor should offer gentle reminders that the PhD thesis is finite in length and does not need to address every vista that arises in the course of investigation.
  4. Guiding the PhD candidate to acculturating into the research community. The process involves discussing the particular topics and methodologies used by a particular community (in our case, these communities are defined by specific conferences and journals) and also facilitating introductions of the candidate to other members of the community. The mentor should keep in mind that acculturation is bi-directional, i.e., that PhD candidates are also destined to change the communities that they join.
  5. Giving the PhD candidate space to fail. A certain number of failure experiences during the PhD-process ensures that we are actually testing the boundaries of the scientific field and pushing into uncharted territory. The space should be so large and generous, so that there is a non-vanishing possibility that the PhD candidate fails at the entire PhD. The number of "total failure" cases should be kept minimal and mentors should be vigilant to spot high risk cases early on in the process. However, it is critical to maintain room for failure in order to encourage the risk taking necessary to ensure that PhD research produces progress in the state of the art.
  6. Supporting the PhD student in understanding where they would like to go with their career after the PhD and how to get there. This point is difficult due to the unpredictable nature of the economy and, well, of the future in general. However, discussion of "life beyond defense" is helpful to achieving convergence (mentioned under point 3) and also to avoiding a letdown after defense, which has been likened to postpartum depression.
Equally important perhaps, is what the mentor doesn't do:

The mentor does not formulate the statement of the candidate's "independent contribution" for the candidate. The candidate comes up with the statement of the independent contribution. The mentor helps to formulate the research questions addressed by the candidate in the natural course of 1.-5. above. However, there is a moment in time when the candidate articulates the key contribution and the mentor says, "Hey, yeah, that's it."

There may be an actual audible click heard at this moment---colleagues like to report having actually heard it. On the other hand, some PhD candidates come up with a new formulation of their independent contribution every week and the process involves more of a sorting and winnowing rather than of a flash of enlightenment. All the same, the independent contribution is originated and owned by the candidate.

At first, making an independent contribution might sound a bit scarey to PhD candidates, since it's actually impossible to know in advance if of oneself if one will make a contribution before one actually knows what the contribution is. The two must necessarily occur in exactly the same moment: there are no guarantees.

However, when I dig a little deeper I find that most people that come to me for mentoring are there because they have a strong intuition about themselves that they have this contribution to make, that it's somewhere there inside of them and that they are looking for a way to realize it.

They also more or less consciously know that the formulation of the contribution is the fun part. In the end, a scientist or an engineer needs a PhD the way a Formula 1 racer needs a driver's license. It's a necessary part of what we do, but it's far, far away from being the actual essence of the scientific endeavor.

What I'd really would like the PhD students I mentor to understand is this: The way it works is that you wake up one morning and realize that in your own mind you are already through it, and have accomplished it with poise and flare. You see that really all of those people who you thought were being so critical are just standing there helping you along and itching to see you show your stuff on international race tracks around the world. There's a huge party of course, but then you are on to other things and don't really think about having passed your driver's test.

But it's the Netherlands, and you don't need a car. So let me spend a few words on bicycles. The PhD process is also a bit like riding a bicycle: when you first learn to balance yourself, you sort of have the feeling that it will fall over at any minute. But very quickly you are riding ahead and it just works and you are fully focused on where you are going and don't think about falling. And then you arrive and you yourself know that you have got there.

The interesting point about using the bicycle comparison is this: The whole nature of what a bicycle is would be altered if someone invented one you could ride and be 100% guaranteed never to fall. I certainly don't think any of the people I know here in the Netherlands would ride a bike like that. Riding a bike is about getting where you need to be.

So the PhD candidate is riding a bicycle towards a goal and the contribution of the mentor is...

...well, the candidate shouldn't even really be noticing that the mentor is making a contribution. Rather, the candidate should be wondering why the mentor doesn't have time to read all the related work papers that s/he is leaving on their desk and why the mentor sort of nods dumbly when s/he has bring them up to speed by explaining to them exactly why these papers contribute to the formulation of the research question or provide support for the experimental methods or results for the particular building block of the theses currently being worked on. Aaaargh! Points 1.-6. above, great, yeah, but what is the mentor actually doing?

Right. That right there is the PhD. In the end, there will be a certain healthy degree of ambiguity of whether the candidates achievement occurred because of the mentor or despite the mentor.

OK. That last part had a definitely mystical ring to it. And the fact that I myself have arrived as such a statement, makes it clearer to me why none of my own mentors ever explicitly formulated and crystal clear declaration of what a PhD is. They probably did, but I didn't recognize at the time that an essential part of the clarity lies in its very ambiguity.

I never pushed any of them either on this point---never asked for a definition. So then am I pushing myself now?

Actually, I think that in part it has to do with the tuxedos. In the Netherlands, that "Hey, yeah, that's it." moment on the part of the mentor is accompanied by a flash of this mental image of the candidate standing at the podium wearing a tuxedo and performing fabulously during the defense. Since my scientific training comes from outside of the Netherlands, my own natural inclination if you ask me to support someone towards wearing a tuxedo is teaching them how to waltz---which is with a large probability irrelevant. The irrelevance of my personal associations with tuxedos to "The PhD" triggered me to start thinking about what is relevant and to set down my perspective in linear form---quite probably, in the end, opening more questions than I have answered.