Monday, November 11, 2019

Reflections on Discrimination by Data-based Systems

A student wrote to me to ask me to interview me about discrimination in text mining and classification systems. He is working on his bachelor thesis, and plans to concentrate on gender discrimination. I wrote him back with an informal entry into the topic, and posted it here, since it may be of more general interest.

Dear Student,

Discrimination in IR, classification, or text mining systems is caused by the mismatch between what is assumed to be represented by data and what is helpful, healthy and fair for people and society.

Why do we have this mismatch and why is it so hard to fix?

Data is never a perfect snapshot of a person or a person's life. There is no single "correct" interpretation inherent in data. Worse, data creates its own reality. Let's break it down.

Data keeps us stuck in the past. Data-based systems make the assumption that predictions made for use in the future, can be meaningfully based on what has happened in the past. With physical science, we don't mind being stuck in the past. A ballistic trajectory or a chemical reaction can indeed be predicted by historical data. With data science, when we build systems based on data collected from people, shaking off the past is a problem. Past discrimination perpetuates itself, since it gets built into predictions for the future. Skew in how datapoints are collected also gets built into predictions. Those predictions in turn get encoded into the data and the cycle continues.

In short, the expression "it's not rocket science" takes on a whole new interpretation. Data science really is not rocket science, and we should stop expecting it to resemble physical science in its predictive power.

Inequity is exacerbated by information echo chambers. In information environments, we have what is known as rich gets richer effects, i.e., videos with many views gain more views. It means that small initial tendencies are reinforced. Again, the data creates its own reality. There is a difference between data collected in online environments and data collected via a formal poll.

Other important issues:

"Proxy" discrimination: for example, when families move they tend to follow the employment opportunities of the father and not the mother. The trend can be related to the father often earning more because he tends to be just a bit older (more work experience) and also tends to have spent less time on pregnancy and kid care. This means that the mother's CV will be full of non-progressive job changes (i.e., gaps or changes that didn't represent career advancement), and gets down ranked by a job candidate ranking function. The job ranking function generalizes across the board over non-progressive CVs, and does not differentiate between the reasons that the person was not getting promoted. In this case, this non-progressiveness is a proxy for gender, and down-ranking candidates with non-progressive CVs leads to reinforcing gender inequity. Proxy discrimination means that it is not possible to address discrimination by looking at explicit information; implicit information also matters.

Binary gender: When you design a database (or database schema) you need to declare the variable type in advance, and you also want to make database interoperable with other databases. Gender is represented as a binary variable. The notion that gender is binary gets propagated through systems regardless of the ways that people actually map well to two gender classes. I notice a tendency among researchers to assume that gender is some how a super-important variable contributing to their predictions just because it seems easy to collect and encode. We give importance to the data we have, and forget about other, perhaps more relevant data, that are not in our database.

Everyone's impacted: We tend to focus on women when we talk about gender inequity. This is because of the examples of gender inequity that threaten life and limb tend to involve women, such as gender gaps in medical research. Clearly action needs to be taken. However, it is important to remember that everyone is impacted by gender inequity. When a lopsided team designs a product, we should not be surprised when the product itself is also lopsided. As men get more involved in caretaking roles in society, they struggle against pressure to become "Supermom", i.e., fulfill all the stereotypical male roles, and at the same time excel at the female roles. We should be careful while we are fixing one problem, not to fully ignore, or even create, another.

I have put a copy of the book Weapons of Math Destruction in my mailbox for you. You might have read it already, but if not, it is essential reading for your thesis.

From the recommender system community in which I work, check out:

Michael D. Ekstrand, Mucun Tian, Mohammed R. Imran Kazi, Hoda Mehrpouyan, and Daniel Kluver. 2018. Exploring author gender in book rating and recommendation. In Proceedings of the 12th ACM Conference on Recommender Systems (RecSys '18). ACM, New York, NY, USA, 242-250.

and also our own recent work, that has made be question the importance of gender for recommendation. 

Christopher Strucks, Manel Slokom, and Martha Larson, BlurM(or)e: Revisiting Gender Obfuscation in the User-Item Matrix. In Proceedings of the Workshop on Recommendation in Multistakeholder Environments (RMSE) Workshop at RecSys 2019.

Hope that these comments help with your thesis.

Best regards,

P. S. As I was about to hit the send button Sarah T. Roberts posted a thread on Twitter. I suggest that you read that, too.

Sunday, November 10, 2019

The unescapable (im)perfection of data

Serpiente alquimica

In data science, we often work with data collected from people. In the field of recommender system research, this data consist of ratings, likes, clicks, transactions and potentially all sorts of other quantities that we can measure: dwell time on a webpage, or how long someone watches a video. Sometimes we get so caught up in creating our systems, that we forget the underlying truth:

Data is unescapably imperfect.

Let's start to unpack this with a simple example. Think about a step counter. It's tempting to argue that this data is perfect. The step counter counts steps and that seems quite straightforward. However, if you try to use this information to draw conclusions, you run into problems: How accurate is the device? Do the steps reflect a systematic failure to exercise, or did the person just forget to wear the device? Were they just feeling a little bit sick? Are all steps the same? What if the person was walking uphill? Why was the person wearing the step counter? How were they reacting to wearing it? Did they do more steps because they were wearing the counter? How were they reacting to the goal for which the data was to be used? Did they decide to artificially increase the step count (by paying someone else to do steps for them)?

In this simple example, we already see the gaps, and we see the circle: collecting data influences data collection. The collection of data actually creates patterns that would not be there if the data were not being collected. In short, we need more information to interpret the data, and ultimately the data folds back upon itself to create patterns with no basis in reality. It is important to understand that this is not some exotic rare state of data safely ignored in day-to-day practice (like the fourth state of water). Let me continue until you are convinced that you cannot escape the imperfection of data.

Imagine that you have worked very hard and have contolled the gaps in your data, and done everything to prevent feedback loops. You use this new-and-improved data to create a data-based system, and this system makes marvelous predictions. But here's the problem: the minute that people start acting on those predictions the original data becomes out of date. Your original data is no longer consistent with a world in which your data-based system also exists. You are stuck with a sort of Heisenberg's Uncertainty Principle: either you get a short stretch of data that is not useful because it's not enough to be statistically representative of reality, or a longer stretch of data, which is not useful because it encodes the impact of the fact that you are collecting data, and making predictions on the basis of what you have collected.

So basically, data eats its own tail like the Ouroboros (image above). It becomes itself. As science fictiony as that might sound, this issue has practical implications that researchers and developers deal with (or ignore) constantly.  For example, in the area of recommender system research in which I am active, we constantly need to deal with the fact that people are interacting with items on a platform, but the items are being presented to them by a recommender system. There is no reality not influenced by the system.

The other way to see it, is that data is unescapably perfect. Whatever the gaps, whatever the nature of the feedback loops, data faithfully captures them. But if we take this perspective, we no longer have any way to relate data to an underlying reality. Perfection without a point.

And so we are left with unescapable.

Saturday, April 14, 2018

Pixel Privacy: Protecting multimedia from large-scale automatic inference

This post introduces the Pixel Privacy project, and provides related links. This week's Facebook congressional hearings have made us more aware how easily our data can be illicitly acquired and used in ways beyond our control or our knowledge. The discussions around Facebook have been focused on textual and behavior information. However, if we think forward, we should realize that now is the time to also start worrying about the information contained in images and videos. The Pixel Privacy project aims to stay ahead of the curve by highlighting the issues and possible solutions that will make multimedia safer online, before a multimedia privacy issues start to arise.

Pixel Privacy project is motivated by the fact that today's computer vision algorithms have super-human ability to "see" the contents of images and videos using large-scale pixel processing techniques. Many of us our aware that our smartphones are able to organize the images that we take by subject material. However, what most of us do not realize is that the same algorithms can infer sensitive information from our images and videos (such as location) that we ourselves do not see or do not notice. Even more concerning that automatic inference of sensitive information, is large-scale inference. Large scale processing of images and video could make it possible to identify users in particular victim categories (cf. cybercasing [1]).

The aim of the Pixel Privacy project is to jump-start research into technology that alerts users to the information that they might be sharing unwittingly. Such technology would also put tools in the hands of users to modify photos in a way that protects them without ruining them. A unique aspect of Pixel Privacy is that it aims to make privacy natural and even fun for users (building on work in [2]).

The Pixel Privacy project started with a 2 minute video:

The video was accompanied by a 2 page proposal. In the next round, I gave a 30 second pitch followed by rapid fire QA. The result was winning one of the 2017 NWO TTW Open Mind Awards (Dutch).

Related links:
  • The project was written up as "Change Perspective" feature on the website of Radboud University, my home institution: Big multimedia data: Balancing detection with protection (unfortunately, the article was deleted after a year or so).
  • The project also has been written up by Bard van de Weijer for Volkskrant in a piece with the title "Digital Privacy needs to become second nature". (In Dutch: "Digitale privacy moet onze tweede natuur worden")


[1] Gerald Friedland and Robin Sommer. 2010. Cybercasing the Joint: On the Privacy Implications of Geo-tagging. In Proceedings of the 5th USENIX Conference on Hot Topics in Security (HotSec’10). 1–8.

[2] Jaeyoung Choi, Martha Larson, Xinchao Li, Kevin Li, Gerald Friedland, and Alan Hanjalic. 2017. The Geo-Privacy Bonus of Popular Photo Enhancements. In Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval (ICMR '17). ACM, New York, NY, USA, 84-92.

[3] Ádám Erdélyi, Thomas Winkler and Bernhard Rinner. 2013. Serious Fun: Cartooning for Privacy Protection, In Proceedings of the MediaEval 2013 Multimedia Benchmark Workshop, Barcelona, Spain, October 18-19, 2013.

Monday, January 1, 2018

2018: The year we embrace the information check habit

The new year dawns in the Netherlands. The breakfast conversation was about the Newscheckers site in Leiden and about the ongoing "News or Nonsense" exhibition at the Netherlands Institute for Sound and Vision.

Signs are pointing to 2018 being the year that we embrace the information check habit: without thinking about it do a double check of the trustworthiness of the factuality and the framing of any piece of information that we consume in our daily lives. If the information will influence us, if we will act upon it, we will finally have learned to automatically stop, look, and listen: the same sort of skills that we internalized when we learned to cross the street as youngsters.

For me, 2018 is the year that I make peace with how costly that information quality is. On factuality: I spend hours reviewing papers and checking sources. On framing: I devote a lot of time to looking for resources in which key concepts and processes are explained in ways that my students would easily understand them. And too often I am prevented from working on factuality and framing by worrying about the consequences of missing something or making the wrong choices.

It is costly in terms of time and effort just to choose words. I need words to convey to the students in my information science course that the world is dependent on their skills and their professional standards: anyone whose work involves responsibility for communication must devote time and effort to information quality and must take constant care to inform, rather than manipulate.

What is the name for our era? I don't say "post-truth". A era can call itself "post-truth", but that's asking us to accept that it is fundamentally different than whatever came before---the "pre-post-truth" era. The moment we stop to reflect on how the evidence proves that we have shifted from truth to post-truth, we are engaging in truth seeking. Post-truth goes poof.

I don't say "fake news" era. I grew up with the National Enquirer readily available at the supermarket check out counter, with its bright and interesting pictures of UFOs and celebrity divorces. That content wasn't there to contribute to building my mental model of reality, any more than Pacman. "Fake news" has always been there.

My search for the right words continues. I am using the book Weaponized Lies by Daniel Levitin for the first time this year in order to teach critical thinking skills. Levitin uses words like "counterknowledge" and "misinformation". These are important terms, but they imply the existence of a intelligent adversary intentionally misleading us. It is important to defend against these forces. However, the idea that the problem is people putting effort into "weaponization" overlooks the less dramatic, and less easily identify problem, of reasoning from shaky, half remembered information sources or using flawed logic to build arguments.

Now at the end of the first day of 2018, I am staring at Weaponized Lies next to my keyboard, wishing there were shortcuts---that I didn't have to start from the bottom finding the words to talk about the importance of information quality, even before I start talking about information quality itself, and researching how to build safer more equitable information environments.

There are no shortcuts. The only thing that we can hope for is that we can routinize information check. Make it a habit.

I even stopped for a moment to dream about a rising demand for information quality creating new jobs. We need professionals who are able to help us monitor information without sliding into suppressing free speech and imposing censorship. This is the direction in which our knowledge society should grow.

I thought I remembered reading an article online that discussed 2018 as the "Information Year". Now, for the life of me, I cannot find it. It takes so long to track and keep track of sources. My first step in making peace with the cost of information quality: I end this blog post by admitting I have no proof for my thesis that 2018 is the year we embrace the information check habit. The title is instead an expression of hope that we can move in that direction.

Wednesday, May 24, 2017

Multimedia Meets Machine (Learning): Understanding images vs. Image Understanding

Today, I gave a talk at Radboud University's Good AIfternoon symposium, for Artificial Intelligence students.  I covered several papers that I have written with different subsets of my collaborators [1,2, 3]. The goal was to show students the difference in the way humans understand images, and in the type of understanding the can be achieved by computers applying visual content analysis, particularly concept detection.

Human Understanding of Images
Consider the images below from [1]. The concept detection paradigm claims success if a computer algorithm can identify these images as depicting a woman wearing a turquoise blue sundress with water in the background. For bonus points, in one image the woman is wearing sunglasses.
A person looking at these images would not say that such concept-based description of the images is wrong. In fact, if a person is presented with these pictures out of context, and asked what they depict, "A woman wearing a blue sundress at the beach" would be an unsurprising response. 

However, this response falls short of really characterizing the photos from the perspective of a human viewer. This shortcoming becomes clear by considering contexts of use. For example, if we needed to chose one of the two as a photo for selling a turquoise blue dress in a web shop, the right hand photo is clearly the photo we want. The left-hand photo is clearly unsuited for the job. Concept-based descriptions of these images fail to fully capture user perspectives on images. Upon reflection, a person looking at these images would conclude that the concept-based description is not wrong per se, but that it seriously misses the point of the image.

A often-heard argument is that you need to start somewhere and that concept-based description is a good place to start. However, we need to keep in mind that this starting point represents a build-in limitation on the ability of systems that use automatic image understanding (such as image retrieval systems) to serve users. 

Think of it this way. Indexing images with a preset set of concepts is a bit like those parking garages that paint each floor a different color. If you remember the color, that color is effective at allowing you to find your car. However, the relationship of the color and your car is one of convenience. The parking-garage-floor color is an essential property of your car when you are looking for it in the garage, but outside of the garage, you wouldn't consider it an important property of your car at all.

In short, automatic image understanding underestimates the uniqueness of these images, although this uniqueness is of the essence for a human viewer.

Machine Image Understanding
Consider the images below from  [4]. A human viewer would see these as two different images.
If the geo-location of the right-hand image is known, geo-location estimation algorithms [3] can correctly predict the geo-location of the left-hand image. In this case, a machine learning algorithms "understands" something about an image that is not particularly evident to a casual human viewer. Humans are largely unaware that the geo-location of their images is "obvious" to a computer algorithm that has accessed to other images known to have been taken at the same place.

In short, human understanding of images overestimates the uniqueness of these images, and visual content analysis algorithms understand more than people realize that they do.

Moving forward
Given the current state of the art in visual content analysis, "Multimedia Meets Machine" is perhaps a bit out dated, and we should be thinking in terms of titles like, "Multimedia Has Already Met Machine".

The key question moving forward is whether machine understanding of images supports the people who take and use those images, or if it is providing a little convenience, at the larger cost of personal privacy.

[1] Michael Riegler, Martha Larson, Mathias Lux, and Christoph Kofler. 2014. How 'How' Reflects What's What: Content-based Exploitation of How Users Frame Social Images. In Proceedings of the 22nd ACM international conference on Multimedia (MM '14). 

[2] Martha Larson, Christoph Kofler, and Alan Hanjalic. 2011. Reading between the tags to predict real-world size-class for visually depicted objects in images. In Proceedings of the 19th ACM international conference on Multimedia (MM '11).

[3] Xinchao Li, Alan Hanjalic, Martha Larson.  Geo-distinctive Visual Element Matching  for Location Estimation of Images, Under review.

[4] Jaeyoung Choi, Claudia Hauff, Olivier Van Laere and Bart Thomee. 2015. The Placing Task at MediaEval 2015. In Working Notes Proceedings of the MediaEval 2015 Workshop.

Saturday, April 22, 2017

March for Science: Einsteins at the Lake

A view of the Great Lakes from space

May break at Radboud University (which happens to fall in April this year) sees me arriving in the US, just in time to participate in the March for ScienceMilwaukee, on the shores of Lake Michigan. The weather was gorgeous and the march route was beautiful, taking me past sites familiar from school field trips of my childhood. This blogpost contains photos and some reflections on what the march means. 

Why march for science?

Marching restores the natural balance between listening and reading (I'm at overdose levels these days.) and expressing oneself. The thought expressed is not complicate: it is simply a statement of support for evidence-based policy making. The act of marching also serves to preserve our culture of freedom of expression, of open and informed criticism, and of citizens demanding that their values and interests be represented by their government.

In Dutch, a scientist is a "Wetenschapper", literally, a "Creator of Knowledge". Marching is a concrete and publicly visible sign of the importance of the knowledge created by the scientific method. This knowledge is the bedrock of our well-being as a society. Think: energy, food, health, housing, sanitation, security, transport, and the technology underlying today's digital information creation and exchange. The knowledge that we create by the scientific method is knowledge that we cannot live without.

Restoration is sorely needed in a world delivering a constant information deluge. There's news, but that news includes includes news about news. It is important to keep up, to read, track developments, form a position, and, on the basis of this position, vote. However, without working actively to keep the balance, too much reading becomes bookkeeping of who is on which side, and tallying points, wins or losses, for both.

Relief comes from falling back on common ground, seeking out the non-partisan issues, and focusing on these. We are mechanics, potters, brewers, nurses, birdwatchers, cooks. We drive cars, fly in airplanes, surf the Web, do our laundry, and, upon occasion, fool around with the physics and chemistry around us, e.g., by putting Mentos in Coke. These daily activities all represent science in action.

True to our Wisconsin roots, more than one person at the March for Science carried the sign, "No science, no beer". I thought about the Student's t-test: it might surprise you that beer is actually not that far away from much more science that you might expect.

The common ground is surprisingly sturdy. People, all of us, are constantly applying evidence-based approaches. We don't heat up tomato soup by putting a tin can directly in the microwave, we don't put airtight lids on our fishbowls, we water our plants and maybe even give them plant food, and we try to eat healthily ourselves.

Seen from this perspective of common ground, which we understand to be common sense, we are not experiencing a crisis of denial. Rather, it is perhaps a crisis of connection: putting what we collectively know into action for the benefit of us all. On Monday, 21 August, all of North America will have a special opportunity to watch an eclipse of the sun. No one expects it not to unroll exactly as NASA has announced. Surely, this certainty is something that can be productively built upon.

Relief comes from also falling back on shared values. One that is deeply ingrained in me from my Wisconsin youth is avoidance of waste. Waste of human life is at the top of that list of waste we must seek to avoid. I have taught myself to read Nicholas Kristof's columns on women's health without falling into despair. His latest is on the impact of the funding cuts of the current Republican Administration to women's health programs internationally. I have not seen what Kristof has seen in his travels, but I have seen enough beyond the borders of the US to realize that these cuts translate directly into suffering and death. The science to save lives is there. We are an affluent society: our pride should be that we devote resources to doing just that.

Avoidable waste is also to be observed closer to home. There is broad consensus on the importance of the Great Lakes Restoration Initiative, as discussed by the Chicago Tribune. The Great Lakes Restoration Initiative has the purpose of protecting and restoring the Great Lakes, which face threat from pollution and invasive species. These lakes contain 21% of the fresh water on the surface of the earth, measured by volume. Growing up, I wished they were not quite so deep, since it was cold as cold could be trying to swim in them. Today, the presence of that incomprehensibly large mass of water still remains with me. I feel it in the way that my stomach drops to read about planned funding cuts to an essential program preserving it. Many, many people across party lines have had a similar visceral reaction.

Who does the march's message reach?

If the march is about expressing a message, who receives that message? One goal is that it is received by policy makers: the sheer bio-mass of science-minded citizens on the street is a flashing red light signaling that the course needs to be corrected. More tangibly for me, the march is about reaching young people: people in school who are on the point of deciding for an education in STEM and for a career in science.

At the March for Science, I was enchanted by the many mini-Einsteins. My presence there is a signal to them: "You are clear sighted in your understanding, dear mini-Einsteins. You are right in your resolve. Stay steadfast in your studies and stay true to your vision. There are three thousand of us who turned out here today to show you that you are not alone."

Sunday, February 26, 2017

Shared-tasks for multimedia research: Bans, benchmarks, and being effective in 2017

Last week, I officially resigned from contributing as an organizer to TRECVid Video Retrieval Evaluation, which is sponsored by NIST, a US government agency in Gaithersburg, Maryland. In 2016, I was part of the Video Hyperlinking task, and contributed by defining the year's relevance criteria, creating queries, and helping to design the crowdsourcing-based assessment. It has been a very difficult decision, so I would like to record here in this blogpost why I have made it. 

Ultimately, we make such decisions ourselves, and everyone navigates these difficult processes alone. However, it takes a lot of time and energy to search for the relevant information, and to weigh the considerations. For this reason, I think that for some it may be helpful to know more details about my own process.

Benchmarking Multimedia Technologies
Since 2008, I have been involved in benchmarking new multimedia technology. Benchmarking is the process of systematically comparing technologies by assessing their performance on standardized tasks. The process makes it possible to quantify the degree to which one algorithm outperforms another. Quantification is necessary in order to understand if a new algorithm has succeeded in improving over the state of the art, defined by the performance of existing algorithms.

The strength of benchmarking lies in the degree to which a benchmark succeeds in achieving open participation. If a new algorithm is compared to some, but not all, existing algorithms, the results of the benchmark reflect less clearly a true improvement over the state of the art.

My emphasis in benchmarking is on tasks that focus on the human and social aspects of multimedia access and retrieval. In other words, I am interested in people producing and consuming video, images, and audio content in their daily lives, and how technology can create algorithms to give them back usefulness and value from these activities. It is difficult to pack these aspects into quantitative metrics, so I am also committed to research that develops new evaluation methodologies and new metrics, as well.

Due to this emphasis, it is not surprising that most of my contribution has been channeled through the MediaEval Benchmark for Multimedia Evaluation. (I coordinate the MediaEval "umbrella", which synchronizes the otherwise autonomous tasks.) However, the strength of the benchmarking paradigm is weakened if a single benchmark, with a limited spectrum of topics, becoming all-dominant. Instead, we need to act to prevent a single effort from "taking over the market". We need to work towards ensuring that a broad range of different types of problems are investigated by the research community. Fostering breadth means offering not only multiple tasks, but multiple benchmarks. This year, I am again involved in MediaEval, but also, as last year, in contributing to the organization of  the NewsREEL task at CLEF (where my role is to contribute to design, documentation, and reporting).

Open Participation in Benchmarks
Both MediaEval and CLEF are open participation benchmarks in three aspects:
  • First, anyone can propose a task (there is an open call for tasks). CLEF chooses its tasks by multi-institutional committee, cf. 2017 CLEF Call for Task Proposals. MediaEval also chooses its task by multi-institutional committee. However, the committee checks only for viability. The ultimate choice lies in the hands of all community members, including organizers and participants, cf. MediaEval 2017 Call for Task Proposals. The goal of an open call for tasks is to promote innovation---constantly evolving tasks prevent the community from "locking in" on certain topics, and becoming satisfied with incremental progress.
  • Second, anyone can sign up to participate. Participants submit working notes papers, which go through a review process (emphasizing completeness, clarity, and technical soundness). MediaEval and CLEF both publish open access working notes proceedings.
  • Third, for both MediaEval and CLEF, workshop registration is open to anyone, and requires only the payment of a fee to cover costs. For MediaEval, the fee covers the costs of the workshop, and also of hosting the website and organizer teleconferences. People/organizations contribute time to cover the rest of workshop organization.
Like MediaEval and CLEF,  TRECVid also pursues the mission of offering an open research venue. Historically, both TRECVid and CLEF grew from TREC (also, of course, organized by NIST) so the commitment to the common cause is unsurprising in this sense. However, TRECVid does not offer open participation in all three of the above aspects. Specifically, there is no publicly circulated call for task proposals, and the workshop is closed. (The stated policy is that the workshop is only open to task participants, and "to selected government personnel from sponsoring agencies and data donors", cf. TRECVid 2017 Call for Participation) Technically, TRECVid is not able to welcome all participants. The US does not maintain diplomatic relationships with Iran. US Government employees cannot answer email from Iran. It is important to understand that this is a historical challenge, and is not new with the current US Republican Administration.

Defining Priorities and Making Decisions
Considerations related to open participation made me hesitant to get deeply involved in TRECVid. However, over the years, I have been very open for exchange. TRECVid originally reached out to me to give an invited talk back in 2009, when MediaEval was still VideoCLEF. (There are some musings on my blog from that trip.)  The idea was to learn from each other. We hope this year to reciprocate with a TRECVid speaker at CLEF/MediaEval.

In 2016, I contributed to the Video Hyperlinking organization, since the move of Video Hyperlinking from MediaEval to TRECVid represented a spread of the emphasis on the human aspects of multimedia retrieval, and it was important to me to support that explicitly.

All and all, it has taken a lot of time to decide where to invest my resources in 2017 in order to most effectively support multimedia benchmarking efforts that provide venues that are open and therefore effective as benchmarks.

With the new Republican Administration in the US, two considerations grew to dominate my decision making process. The first is how to contribute to the movement whose goal is to demonstrate the relevance and importance of science to the public and to policy makers TRECVid, by virtue of being a benchmark, is certainly on the forefront of this movement (just by doing the same thing it has done for years). We need to support our US-based colleagues in the efforts to be a force for science, and hope that they support us as well, if we land in a similar situation.

The second is how to react to the travel ban, which would prevent scientists of certain countries from entering the US. The first-order effects of the travel have been constrained by court rulings. However, the future plans of the administration are uncertain, and there is a range of second-order effects that a court cannot un-do, e.g., people self-selecting out of participation since they are worried about their visa's being held up by additional processing steps (and granted, for example, only after the workshop has occurred). These secondary effects effectively prevent people from attending a US-based event even though technically they may be able to get a visa.

We are not alone in our thinking, but we are guided by a large number of organizations who have issued a public statement on the importance of openness for science (Statement of the International Council for Science, Statement of American Association for the Advancement of Science) including professional organizations that we belong to (Statement of the ACM, Statement of ACM SIGMM, Statement of IEEE) and European universities (Statement of the European University Association, Combined statement of all the universities in the Netherlands, Statement of Radboud University).

There is much power in making an open statement of values---more than one might think. However, we should avoid assuming that statements are enough and that the situation will go back to where it was before the current Republican Administration. In other words, the days are gone in which we had to dedicate relatively less time in protecting and upholding the values of openness in science. Instead, we need to think explicitly about where our effort can be best dedicated in 2017.

TREC/TRECVid celebrated their 25th anniversary in 2016. The event has been a constant through many changes of US administration, and it is heartening that the 2017 event will look, from the inside at least, with all probability pretty much like all other events over the past 25 years.

However, 2017 is the first year where people will be in the streets, in the US and around the world, marching for science: The large-scale sense of urgency tells us that 2017 is not just business as usual. For this reason, it is important in 2017 to reexamine the idea that the US should be such a strong attractor within the map of scientific research in the world.

On top of the merit and can-do attitude that attracts people from around the world to US institutions, we as scientists (because we study systems and networks) know that another force is at play. Specifically, we know that US institutions enjoy preferential attachment, meaning that past success is a determiner of future success. This effect translates into the reality that new or small events (e.g., research topics or benchmarking workshops) need a lot of extra time and attention to establish or maintain themselves in the field. 2017 is the year that we need to think carefully about to which extent we want to contribute this non-linear feedback loop that strengthens the pull towards US-based events, and to which extent we want to build counterweights.

I consciously use the word "counterweights" since I am referring to a balancing act. We stand in complete solidarity with our US-based colleagues. Providing counterweights in no way detracts from that fact. For multimedia research, counterweights include region-based initiatives, and benchmarks that allow anyone to propose a task. A network of diverse benchmarks makes benchmarking as a whole stronger, and makes us internationally more robust,

My personal decision is that time spent promoting and preserving diversity is, in 2017, a more effective way to achieve the larger goals of benchmarking, than time spent reinforcing the connection between benchmarking and Gaithersburg, Maryland. I was born in Maryland, outside of DC, but Maryland is not where I am needed now. TRECVid will be fine without extra help from Europe, but what can (and does) suffer is the availability to the research community of non-US-based benchmarks.

Recommendations to TRECVid
The intention is for my resignation to be a positive decision for and not a negative decision against. Reasoning that my reflections on the topics are probably helpful to NIST, I distilled my thinking into a set of three recommendations. Interestingly, these recommendations are relatively independent of the situation in the US caused by the current Republican Administration:
  • First, TRECVid is an open research venue. I recommend stating this explicitly on the website. An example is the ACM Open Participation statement. 
  • Second, TRECVid is supported by NIST. I recommended a clearer statement of the source and the distribution of the funding on the website. People familiar with the benchmark know that NIST is the powerhouse behind its success, but it is not clear to newcomers. Critically, currently, the cases in which defense funding supports TRECVid are not clear. This is important to people who personally, or whose institutions, have a commitment to pursue research for civil purposes only. For example, many German institutions have a Zivilklausel by which they commit themselves to pursuing exclusively research for civilian purposes. Even if participation is nominally open, unclarity on defense funding can scare people away, and the benchmark is effectively not as open as it would otherwise aspire to be. (For completeness: at least one colleague assumed I received NIST funding for my work on Video Hyperlinking. I did not. The unclarity in the funding causes confusion.) 
  • Third, attention should be devoted to the archival status of the proceedings. As a good next step, they should be indexed by mainstream search engines. Moving forward, attention should be paid to maintaining a historical record of TRECVid should at some point in the future NIST not be able to continue to support open participation/open access in the way it does now.
If you have read all the way to the end of this blog post, let me finish by thanking you: both  for your dedication to open participation in scientific research, which is so essential to benchmarking, but also for taking the time to read about my personal struggle. It has been a long path.

Don't miss the March for Science on 22 April. Inspire and be inspired.

Or find another march around the world here: