Sunday, January 23, 2011

How to title a paper

A colleague of mine once commented that the single most important factor contributing to whether or not a paper gets cited is its title. I recalled this comment this morning -- which is a Sunday morning when I had some interesting reading planned. Instead, I find myself brooding over possible titles of a conference submission -- exchanging mails full of "mights" and question marks with the first author.

Title is the paper's visiting card The paper title is a piece of metadata -- it will allow the paper to be found: discovered and also re-found. As such, titling is essentially choosing indexing features for your work. The title should tell the reader what the paper is about -- support the decision of whether or not to read the paper. But, as every good indexing feature, it needs to be not only representative, but also discriminative. The title must also help people to see the difference between your paper and other papers on a closely related, but different subject.

Predicting the future I'm fond of asserting (given the appropriate context) that indexing essentially involves predicting the future. When assigning indexing features, we try to guess what people will need to find in the future and choose features that will support their searches. When you write a paper, you consider not only the work at hand, but also the other work that you might in the future write about the subject. Select the title so that it will be able to flag the difference.

This process of forward-looking is very scary indeed. I find myself asking myself the question: If I turn out to be totally wrong about what I have shown in the paper, from which direction will that wrongness come from? How will I correct (or better, improve on -- or best yet: completely blow out of the water) my past methods? Contemplating potential wrongness or non-perfection is opening the door to thinking of something further -- something that really I should of thought of already and included in this paper. The process threatens to bring down the entire effort onto our heads -- and just at the moment when our every fiber is straining to the point in time that we can click on the submit button.

Pragmatic approach We can take another tack. What is the most immediate purpose which our title will serve? It gets put in the conference management system and is used by the reviewers to help pick the papers that they bid on and ultimately review. We want a title that will get our paper to the right reviewers. The right reviewers are the ones that know the topic area of our paper well -- especially the background literature. These reviewers will recognize immediately that we are citing the key contributions that have come before us and will also be able to point out anything that we missed.

The goal of getting to the right reviewers suggests that creating a descriptive, specific title is the way to go. But there is also something to be said for having an interesting title -- one that conveys that this is not another paper on xyz but a particular fine specimen of the genre. An interesting title signals to the reviewer that their reading effort will be richly rewarded.

Also, I worry about the title being straightforward. Many reviewers have a highly functional, but not particularly broad, mastery of English. As such, they are very competent in their areas, but are hard pressed to tell the difference between language use inconsistent with established grammatical convention and language use which simply diverges a bit from the formulations generally used in computer science research papers.

Titles and the art of getting cited If we create titles to be little advertisements for our papers, it's no wonder that my colleague sees a connection between a good title and a lot of citations. It can only help, if you also craft your title to ring with authority. It should be specific, but not so specific that it can be cited in only one particular context. People checking the citations should be able to look at the paper title and think, "Hmmm, yes, that's a plausible citation at this juncture." Of course, in an ideal world they would call up the paper and read it through in order to judge its appropriateness -- but in this day and age of information deluge that is nearly too much to ask.

The title finding process So, how to get to a title that fits the bill? Here, my main message is try to "Predict the future." However, there's descriptions of the process floating around out there which are perhaps a little more user friendly. One of the ones at WikiHow (that caught my fancy after a couple of searches) lists step 1 as "Get smart and think" It goes on with a reassuring "You can do it, you just don't know it." Right. Nothing there about brooding and write a blog entry -- better get back at it.

Coda If you catch me in a particular frame of mind, I will rail against the system that uses citation counts as the (exclusive) measure of academic productivity. But that rant is actually not all that much fun in the environment where I work, since the people around me generally accept that citation counts, although themselves quantitative, are only as good as the qualitative decisions that underlie them. For example, the decision on how to fairly account for cultural differences between fields that communicate via mainly via conferences vs. fields that communicate via journals. Or, the difference between quickly and slowly moving fields.

The real zinger remains this: we work in the field of information retrieval -- driven forward by our curiosity of all things search related. Isn't it the sign of a good IR researcher, that if you tell him that his output will be measured by Google scholar's calculation of citation count that the first thing that he will do is to set about developing a method to game the system?

I write papers to get my ideas read by the right readers -- people who will act upon what I have discovered. How many people this is and whether they are actually writing papers (and remember to cite the work they of myself and my colleagues that they have found influential) rather than developing software, systems and policy is of secondary concern. I tend to measure by the number of inquiries I get for more information about my work, e.g., along the lines of, "We are re-implementing your algorithm, do you mind giving us more information on aspect x."

But in the end, whether you are going for the cold hard cites or the multiplicity of other less quantifiable ways in which you know that your work is making a difference, the title of your paper is important. OK. Now I am back at it.

Sunday, January 2, 2011

Geert Bewilders

Every once in a while I have a clear sense that what I do has a positive impact outside of the academic community. Whether its large or small, I cherish it .... and, of course, blog it.

In this case, the story regards the Netherlands' Geert Wilders, a right-wing populist, leader of the "Freedom Party", which currently occupies 24 of the 150 places in the Dutch House of Representatives. The hero of the story is a blogger, René Erker, who is speaking out against his words and actions. His blog is titled "Stop the Danger of Wilders" (Dutch).

Mid-November, I got an e-mail from a former colleague of mine who knows the blogger and was helping to promote his blog. I share her sentiment of concern. In my view, Wilders gives the outside world a skewed view of the Netherlands and the Dutch political process, which could potentially work to damage the cooperative processes necessary to keep productive dialogue in motion and Europe and the world at peace.

But if the blog was intended to be pushing back against Wilders, it didn't quite get off to the right start. When I first read it, it had blasted off with two posts on the same day, with some entertaining rhetorical flourishes, but no information on who was writing or on the sources from which they were deriving their information. I wrote some words of critique to my former colleague, who passed it along to
René. The deal we struck was that if he would write one post a week until the end of the year, I would feature his blog in a post of my own.

René has followed through. I don't flatter myself that it's a big deal to be featured in a post on my blog. Especially, of course, since I usually discuss search and not politics. If the intended readerships of our blogs overlap, it is by chance. However, the interface of search to the real world necessarily touches on political issues. In this case, the link is that my past work in the area of user generated media (blogs and podcasts -- see references below) gives me an awareness of some of the issues involved with creating media that then gets read.

I don't claim to "understand" Wilders or why Wilders is able to claim so much space on the front pages of newspapers here in the Netherlands. Perhaps I could even be allowed to say that I'm "bewildered". He appears to spend a lot of time trying to get people expelled from the country on the basis of their system of belief. The Netherlands is a "Rechtsstaat", which means we have rule of law. Belief systems don't impact rule of law until beliefs cause people to break laws. In cases where that happens, then the courts take over. What's the role of the fulminating politician in keeping the Rechtsstaat running smoothly as it was conceived to run?

Is the concern that the core values of the country will shift? That people will vote to change the law? In the end, we are all linked to each other. Each time we turn to point to the shapeless nameless "Them", it simply evaporates. As a rule (Six Degrees of Separation), every person on the planet is separated from every other person on the planet by only six degrees. Imagine a project that would calculate Wilders' six degrees of separation to each an every human soul he harbors the wish to expel from Europe. What a graphic realization of the principle that there is no "Them". What there is instead is this: a series of long and challenging conversations between linked individuals, within linked groups, down the channels of the six degrees so that they finally reach all of us in a real way. The law is as stable as our network is strong.

What is the real danger of Wilders? In a way, by its existence (by the necessity for its existence!) says it all. Wilders obliges us to spend our time pushing back, responding one-by-one to the over-simplifications and push-button fomulations that he produces concerning the topics at the obsessive focus of his attention. Our work pushing back is the only way to restore the balance, to repave the arena for constructive discussion.

Here's the danger: The time we devote to Wilders push-back could be used for moving forward -- having the difficult conversations that we need to have with the people around us that are necessary to strengthen our society. Instead, we are stuck treading water.

Thank you to René Erker and the other bloggers that keep the current from sweeping us backwards. Keep giving us more links to each other and to your sources. And happy 2011!

He, J., Weerkamp, W., Larson M. and de Rijke, M., An Effective Coherence Measure to Determine Topical Consistency in User Generated Content. International Journal on Document Analysis and Recognition, Vol. 12, No. 3, pages 185-203, October 2009.

Tsagkias, M., Larson, M. and de Rijke, M. Predicting Podcast Preference: An Analysis Framework and its Application, Journal of the American Society for Information Science and Technology, Vol. 61, No. 2, pp. 374-391, February 2010.