Wednesday, May 24, 2017

Multimedia Meets Machine (Learning): Understanding images vs. Image Understanding

Today, I gave a talk at Radboud University's Good AIfternoon symposium, for Artificial Intelligence students.  I covered several papers that I have written with different subsets of my collaborators [1,2, 3]. The goal was to show students the difference in the way humans understand images, and in the type of understanding the can be achieved by computers applying visual content analysis, particularly concept detection.

Human Understanding of Images
Consider the images below from [1]. The concept detection paradigm claims success if a computer algorithm can identify these images as depicting a woman wearing a turquoise blue sundress with water in the background. For bonus points, in one image the woman is wearing sunglasses.
A person looking at these images would not say that such concept-based description of the images is wrong. In fact, if a person is presented with these pictures out of context, and asked what they depict, "A woman wearing a blue sundress at the beach" would be an unsurprising response. 

However, this response falls short of really characterizing the photos from the perspective of a human viewer. This shortcoming becomes clear by considering contexts of use. For example, if we needed to chose one of the two as a photo for selling a turquoise blue dress in a web shop, the right hand photo is clearly the photo we want. The left-hand photo is clearly unsuited for the job. Concept-based descriptions of these images fail to fully capture user perspectives on images. Upon reflection, a person looking at these images would conclude that the concept-based description is not wrong per se, but that it seriously misses the point of the image.

A often-heard argument is that you need to start somewhere and that concept-based description is a good place to start. However, we need to keep in mind that this starting point represents a build-in limitation on the ability of systems that use automatic image understanding (such as image retrieval systems) to serve users. 

Think of it this way. Indexing images with a preset set of concepts is a bit like those parking garages that paint each floor a different color. If you remember the color, that color is effective at allowing you to find your car. However, the relationship of the color and your car is one of convenience. The parking-garage-floor color is an essential property of your car when you are looking for it in the garage, but outside of the garage, you wouldn't consider it an important property of your car at all.

In short, automatic image understanding underestimates the uniqueness of these images, although this uniqueness is of the essence for a human viewer.

Machine Image Understanding
Consider the images below from  [4]. A human viewer would see these as two different images.
If the geo-location of the right-hand image is known, geo-location estimation algorithms [3] can correctly predict the geo-location of the left-hand image. In this case, a machine learning algorithms "understands" something about an image that is not particularly evident to a casual human viewer. Humans are largely unaware that the geo-location of their images is "obvious" to a computer algorithm that has accessed to other images known to have been taken at the same place.

In short, human understanding of images overestimates the uniqueness of these images, and visual content analysis algorithms understand more than people realize that they do.

Moving forward
Given the current state of the art in visual content analysis, "Multimedia Meets Machine" is perhaps a bit out dated, and we should be thinking in terms of titles like, "Multimedia Has Already Met Machine".

The key question moving forward is whether machine understanding of images supports the people who take and use those images, or if it is providing a little convenience, at the larger cost of personal privacy.


[1] Michael Riegler, Martha Larson, Mathias Lux, and Christoph Kofler. 2014. How 'How' Reflects What's What: Content-based Exploitation of How Users Frame Social Images. In Proceedings of the 22nd ACM international conference on Multimedia (MM '14). 

[2] Martha Larson, Christoph Kofler, and Alan Hanjalic. 2011. Reading between the tags to predict real-world size-class for visually depicted objects in images. In Proceedings of the 19th ACM international conference on Multimedia (MM '11).

[3] Xinchao Li, Alan Hanjalic, Martha Larson.  Geo-distinctive Visual Element Matching  for Location Estimation of Images, Under review. http://arxiv.org/pdf/1601.07884v1.pdf

[4] Jaeyoung Choi, Claudia Hauff, Olivier Van Laere and Bart Thomee. 2015. The Placing Task at MediaEval 2015. In Working Notes Proceedings of the MediaEval 2015 Workshop.


Saturday, April 22, 2017

March for Science: Einsteins at the Lake

A view of the Great Lakes from space

May break at Radboud University (which happens to fall in April this year) sees me arriving in the US, just in time to participate in the March for ScienceMilwaukee, on the shores of Lake Michigan. The weather was gorgeous and the march route was beautiful, taking me past sites familiar from school field trips of my childhood. This blogpost contains photos and some reflections on what the march means. 

Why march for science?

Marching restores the natural balance between listening and reading (I'm at overdose levels these days.) and expressing oneself. The thought expressed is not complicate: it is simply a statement of support for evidence-based policy making. The act of marching also serves to preserve our culture of freedom of expression, of open and informed criticism, and of citizens demanding that their values and interests be represented by their government.


In Dutch, a scientist is a "Wetenschapper", literally, a "Creator of Knowledge". Marching is a concrete and publicly visible sign of the importance of the knowledge created by the scientific method. This knowledge is the bedrock of our well-being as a society. Think: energy, food, health, housing, sanitation, security, transport, and the technology underlying today's digital information creation and exchange. The knowledge that we create by the scientific method is knowledge that we cannot live without.


Restoration is sorely needed in a world delivering a constant information deluge. There's news, but that news includes includes news about news. It is important to keep up, to read, track developments, form a position, and, on the basis of this position, vote. However, without working actively to keep the balance, too much reading becomes bookkeeping of who is on which side, and tallying points, wins or losses, for both.

Relief comes from falling back on common ground, seeking out the non-partisan issues, and focusing on these. We are mechanics, potters, brewers, nurses, birdwatchers, cooks. We drive cars, fly in airplanes, surf the Web, do our laundry, and, upon occasion, fool around with the physics and chemistry around us, e.g., by putting Mentos in Coke. These daily activities all represent science in action.


True to our Wisconsin roots, more than one person at the March for Science carried the sign, "No science, no beer". I thought about the Student's t-test: it might surprise you that beer is actually not that far away from much more science that you might expect.


The common ground is surprisingly sturdy. People, all of us, are constantly applying evidence-based approaches. We don't heat up tomato soup by putting a tin can directly in the microwave, we don't put airtight lids on our fishbowls, we water our plants and maybe even give them plant food, and we try to eat healthily ourselves.

Seen from this perspective of common ground, which we understand to be common sense, we are not experiencing a crisis of denial. Rather, it is perhaps a crisis of connection: putting what we collectively know into action for the benefit of us all. On Monday, 21 August, all of North America will have a special opportunity to watch an eclipse of the sun. No one expects it not to unroll exactly as NASA has announced. Surely, this certainty is something that can be productively built upon.

Relief comes from also falling back on shared values. One that is deeply ingrained in me from my Wisconsin youth is avoidance of waste. Waste of human life is at the top of that list of waste we must seek to avoid. I have taught myself to read Nicholas Kristof's columns on women's health without falling into despair. His latest is on the impact of the funding cuts of the current Republican Administration to women's health programs internationally. I have not seen what Kristof has seen in his travels, but I have seen enough beyond the borders of the US to realize that these cuts translate directly into suffering and death. The science to save lives is there. We are an affluent society: our pride should be that we devote resources to doing just that.

Avoidable waste is also to be observed closer to home. There is broad consensus on the importance of the Great Lakes Restoration Initiative, as discussed by the Chicago Tribune. The Great Lakes Restoration Initiative has the purpose of protecting and restoring the Great Lakes, which face threat from pollution and invasive species. These lakes contain 21% of the fresh water on the surface of the earth, measured by volume. Growing up, I wished they were not quite so deep, since it was cold as cold could be trying to swim in them. Today, the presence of that incomprehensibly large mass of water still remains with me. I feel it in the way that my stomach drops to read about planned funding cuts to an essential program preserving it. Many, many people across party lines have had a similar visceral reaction.

Who does the march's message reach?

If the march is about expressing a message, who receives that message? One goal is that it is received by policy makers: the sheer bio-mass of science-minded citizens on the street is a flashing red light signaling that the course needs to be corrected. More tangibly for me, the march is about reaching young people: people in school who are on the point of deciding for an education in STEM and for a career in science.

At the March for Science, I was enchanted by the many mini-Einsteins. My presence there is a signal to them: "You are clear sighted in your understanding, dear mini-Einsteins. You are right in your resolve. Stay steadfast in your studies and stay true to your vision. There are three thousand of us who turned out here today to show you that you are not alone."



Sunday, February 26, 2017

Shared-tasks for multimedia research: Bans, benchmarks, and being effective in 2017

Last week, I officially resigned from contributing as an organizer to TRECVid Video Retrieval Evaluation, which is sponsored by NIST, a US government agency in Gaithersburg, Maryland. In 2016, I was part of the Video Hyperlinking task, and contributed by defining the year's relevance criteria, creating queries, and helping to design the crowdsourcing-based assessment. It has been a very difficult decision, so I would like to record here in this blogpost why I have made it. 

Ultimately, we make such decisions ourselves, and everyone navigates these difficult processes alone. However, it takes a lot of time and energy to search for the relevant information, and to weigh the considerations. For this reason, I think that for some it may be helpful to know more details about my own process.

Benchmarking Multimedia Technologies
Since 2008, I have been involved in benchmarking new multimedia technology. Benchmarking is the process of systematically comparing technologies by assessing their performance on standardized tasks. The process makes it possible to quantify the degree to which one algorithm outperforms another. Quantification is necessary in order to understand if a new algorithm has succeeded in improving over the state of the art, defined by the performance of existing algorithms.

The strength of benchmarking lies in the degree to which a benchmark succeeds in achieving open participation. If a new algorithm is compared to some, but not all, existing algorithms, the results of the benchmark reflect less clearly a true improvement over the state of the art.

My emphasis in benchmarking is on tasks that focus on the human and social aspects of multimedia access and retrieval. In other words, I am interested in people producing and consuming video, images, and audio content in their daily lives, and how technology can create algorithms to give them back usefulness and value from these activities. It is difficult to pack these aspects into quantitative metrics, so I am also committed to research that develops new evaluation methodologies and new metrics, as well.

Due to this emphasis, it is not surprising that most of my contribution has been channeled through the MediaEval Benchmark for Multimedia Evaluation. (I coordinate the MediaEval "umbrella", which synchronizes the otherwise autonomous tasks.) However, the strength of the benchmarking paradigm is weakened if a single benchmark, with a limited spectrum of topics, becoming all-dominant. Instead, we need to act to prevent a single effort from "taking over the market". We need to work towards ensuring that a broad range of different types of problems are investigated by the research community. Fostering breadth means offering not only multiple tasks, but multiple benchmarks. This year, I am again involved in MediaEval, but also, as last year, in contributing to the organization of  the NewsREEL task at CLEF (where my role is to contribute to design, documentation, and reporting).

Open Participation in Benchmarks
Both MediaEval and CLEF are open participation benchmarks in three aspects:
  • First, anyone can propose a task (there is an open call for tasks). CLEF chooses its tasks by multi-institutional committee, cf. 2017 CLEF Call for Task Proposals. MediaEval also chooses its task by multi-institutional committee. However, the committee checks only for viability. The ultimate choice lies in the hands of all community members, including organizers and participants, cf. MediaEval 2017 Call for Task Proposals. The goal of an open call for tasks is to promote innovation---constantly evolving tasks prevent the community from "locking in" on certain topics, and becoming satisfied with incremental progress.
  • Second, anyone can sign up to participate. Participants submit working notes papers, which go through a review process (emphasizing completeness, clarity, and technical soundness). MediaEval and CLEF both publish open access working notes proceedings.
  • Third, for both MediaEval and CLEF, workshop registration is open to anyone, and requires only the payment of a fee to cover costs. For MediaEval, the fee covers the costs of the workshop, and also of hosting the website and organizer teleconferences. People/organizations contribute time to cover the rest of workshop organization.
Like MediaEval and CLEF,  TRECVid also pursues the mission of offering an open research venue. Historically, both TRECVid and CLEF grew from TREC (also, of course, organized by NIST) so the commitment to the common cause is unsurprising in this sense. However, TRECVid does not offer open participation in all three of the above aspects. Specifically, there is no publicly circulated call for task proposals, and the workshop is closed. (The stated policy is that the workshop is only open to task participants, and "to selected government personnel from sponsoring agencies and data donors", cf. TRECVid 2017 Call for Participation) Technically, TRECVid is not able to welcome all participants. The US does not maintain diplomatic relationships with Iran. US Government employees cannot answer email from Iran. It is important to understand that this is a historical challenge, and is not new with the current US Republican Administration.

Defining Priorities and Making Decisions
Considerations related to open participation made me hesitant to get deeply involved in TRECVid. However, over the years, I have been very open for exchange. TRECVid originally reached out to me to give an invited talk back in 2009, when MediaEval was still VideoCLEF. (There are some musings on my blog from that trip.)  The idea was to learn from each other. We hope this year to reciprocate with a TRECVid speaker at CLEF/MediaEval.

In 2016, I contributed to the Video Hyperlinking organization, since the move of Video Hyperlinking from MediaEval to TRECVid represented a spread of the emphasis on the human aspects of multimedia retrieval, and it was important to me to support that explicitly.

All and all, it has taken a lot of time to decide where to invest my resources in 2017 in order to most effectively support multimedia benchmarking efforts that provide venues that are open and therefore effective as benchmarks.

With the new Republican Administration in the US, two considerations grew to dominate my decision making process. The first is how to contribute to the movement whose goal is to demonstrate the relevance and importance of science to the public and to policy makers https://www.forceforscience.org TRECVid, by virtue of being a benchmark, is certainly on the forefront of this movement (just by doing the same thing it has done for years). We need to support our US-based colleagues in the efforts to be a force for science, and hope that they support us as well, if we land in a similar situation.

The second is how to react to the travel ban, which would prevent scientists of certain countries from entering the US. The first-order effects of the travel have been constrained by court rulings. However, the future plans of the administration are uncertain, and there is a range of second-order effects that a court cannot un-do, e.g., people self-selecting out of participation since they are worried about their visa's being held up by additional processing steps (and granted, for example, only after the workshop has occurred). These secondary effects effectively prevent people from attending a US-based event even though technically they may be able to get a visa.

We are not alone in our thinking, but we are guided by a large number of organizations who have issued a public statement on the importance of openness for science (Statement of the International Council for Science, Statement of American Association for the Advancement of Science) including professional organizations that we belong to (Statement of the ACM, Statement of ACM SIGMM, Statement of IEEE) and European universities (Statement of the European University Association, Combined statement of all the universities in the Netherlands, Statement of Radboud University).

There is much power in making an open statement of values---more than one might think. However, we should avoid assuming that statements are enough and that the situation will go back to where it was before the current Republican Administration. In other words, the days are gone in which we had to dedicate relatively less time in protecting and upholding the values of openness in science. Instead, we need to think explicitly about where our effort can be best dedicated in 2017.

TREC/TRECVid celebrated their 25th anniversary in 2016. The event has been a constant through many changes of US administration, and it is heartening that the 2017 event will look, from the inside at least, with all probability pretty much like all other events over the past 25 years.

However, 2017 is the first year where people will be in the streets, in the US and around the world, marching for science:  https://www.marchforscience.com. The large-scale sense of urgency tells us that 2017 is not just business as usual. For this reason, it is important in 2017 to reexamine the idea that the US should be such a strong attractor within the map of scientific research in the world.

On top of the merit and can-do attitude that attracts people from around the world to US institutions, we as scientists (because we study systems and networks) know that another force is at play. Specifically, we know that US institutions enjoy preferential attachment, meaning that past success is a determiner of future success. This effect translates into the reality that new or small events (e.g., research topics or benchmarking workshops) need a lot of extra time and attention to establish or maintain themselves in the field. 2017 is the year that we need to think carefully about to which extent we want to contribute this non-linear feedback loop that strengthens the pull towards US-based events, and to which extent we want to build counterweights.

I consciously use the word "counterweights" since I am referring to a balancing act. We stand in complete solidarity with our US-based colleagues. Providing counterweights in no way detracts from that fact. For multimedia research, counterweights include region-based initiatives, and benchmarks that allow anyone to propose a task. A network of diverse benchmarks makes benchmarking as a whole stronger, and makes us internationally more robust,

My personal decision is that time spent promoting and preserving diversity is, in 2017, a more effective way to achieve the larger goals of benchmarking, than time spent reinforcing the connection between benchmarking and Gaithersburg, Maryland. I was born in Maryland, outside of DC, but Maryland is not where I am needed now. TRECVid will be fine without extra help from Europe, but what can (and does) suffer is the availability to the research community of non-US-based benchmarks.

Recommendations to TRECVid
The intention is for my resignation to be a positive decision for and not a negative decision against. Reasoning that my reflections on the topics are probably helpful to NIST, I distilled my thinking into a set of three recommendations. Interestingly, these recommendations are relatively independent of the situation in the US caused by the current Republican Administration:
  • First, TRECVid is an open research venue. I recommend stating this explicitly on the website. An example is the ACM Open Participation statement. 
  • Second, TRECVid is supported by NIST. I recommended a clearer statement of the source and the distribution of the funding on the website. People familiar with the benchmark know that NIST is the powerhouse behind its success, but it is not clear to newcomers. Critically, currently, the cases in which defense funding supports TRECVid are not clear. This is important to people who personally, or whose institutions, have a commitment to pursue research for civil purposes only. For example, many German institutions have a Zivilklausel by which they commit themselves to pursuing exclusively research for civilian purposes. Even if participation is nominally open, unclarity on defense funding can scare people away, and the benchmark is effectively not as open as it would otherwise aspire to be. (For completeness: at least one colleague assumed I received NIST funding for my work on Video Hyperlinking. I did not. The unclarity in the funding causes confusion.) 
  • Third, attention should be devoted to the archival status of the proceedings. As a good next step, they should be indexed by mainstream search engines. Moving forward, attention should be paid to maintaining a historical record of TRECVid should at some point in the future NIST not be able to continue to support open participation/open access in the way it does now.
If you have read all the way to the end of this blog post, let me finish by thanking you: both  for your dedication to open participation in scientific research, which is so essential to benchmarking, but also for taking the time to read about my personal struggle. It has been a long path.

Don't miss the March for Science on 22 April. Inspire and be inspired.

Or find another march around the world here: https://www.marchforscience.com/satellite-marches

Wednesday, February 8, 2017

Bytes and pixels meet the challenges of human media interpretation

Back in June, I gave a talk at the  Communication Science Department here at Radboud University Nijmegen. Today, I presented a version of that talk to my colleagues in the Language and Speech Technology Research Meeting. The abstract is below together with the slides, which are on SlideShare. During the discussion it became clear that many problems in natural language processing and information retrieval face the issue of human interpretations. It is important to find ways to move forward, although it may not be possible to pack our challenges into neat classification or ranking problems with a single set of consensus ground truth labels. A way forward, is to look to other disciplines for theory of how people understand and use media, and let these inform what we design our systems to do and the ways that we measure success.

Within computer science, "Multimedia" is a field of research that investigates how computers can support people in communication, information finding, and knowledge/opinion building. Multimedia content is defined broadly. It includes not only video, but also images accompanied by text and other information (for example, a geo-location). It can be professionally produced, or generated by users for online sharing. Computer scientists historically have a “love-hate” relationship with multimedia. They “love” it because of the richness of the data sources and the wealth of available data, which leads to interesting problems to tackle with machine learning. They “hate” it because multimedia is a diffuse and moving target: the interpretation of multimedia differs from person to person, and changes over time in the course of its use as a communication medium. This talk gives a view onto ongoing research in the area of multimedia information retrieval algorithms, which help people find multimedia. We look at a series of topics that reveal how pattern recognition, text processing, and crowdsourcing tools are used in multimedia research, and discuss both their limitations and their potential.


Sunday, January 29, 2017

Women's March on Washington: A shout heard around the world

In my previous post, I wrote up some observations on the Women's March on Washington (WMW) and how the technology that allows us to produce and share multimedia adds dimensions to what what we actually do when marching and by marching. I stated that what remains with me most clearly from a week later is people's voices: people speaking and listening to each other in ways they hadn't before. 

In that post, I looked at people's voices from the point of view of the information that they convey. However, because of my interest in speech and speech technology, I also see people's voices purely and simply as an audio signal. This post contains some observations following on from the fact that each person's voice is actually a sound wave.

Newly arrived at the WMW, we stood in the midst of a sea of people, wondering if we were actually going to see the stage. As we oriented ourselves, an enormous sound moved towards us over the crowd. It started from a way off, and moved closer and closer like a wave from far out in the ocean. It was an unfamiliar sound.

As the wave passed over us, it became clear that shouting was creating the wave. When it reached us, we also shouted, and it moved on.

My backstory: My main reason for marching at WMW was for health care, or at least that's where I started. I found myself quickly leaning towards the position of the person holding the sign saying, "Too many issues to fit on one sign". I don't join mono-gender initiatives: gender issues affect everyone and we only get to equality if we get to equality together. We must move on equal footing towards equal footing. The WMW was not mono-gender. About 10-20% of the crowd were men, but my estimate might be wrong, since putting the people around me into gender categories was one of the last things on my mind that day.

After the shout wave had passed, it struck me: What I have just experienced was an acoustic event that has never before occurred on the surface of the planet. The shout wave was the sound of a woman-dominated group of hundreds of thousands of voices acting in coordination. No wonder that it had struck me as an unfamiliar sound.

I can think of times in the history of the world in which a group of men would have created a shout wave, or even a 50/50 gender group. I can imagine what that would sound like, or perhaps I have even heard it before. However, this woman-dominated sound was fully new. We have collaboratively invented, as a species, the ability to generate a never-before-heard acoustic event. There was not one wave that day, but many.

Later, I saw this acoustic event that I call a "women's shout wave" referred to in the newspaper as a "rolling roar".

If you know something about sound, you know that it has a physical reality. Sound is a mechanical wave caused by compression of the air: it can knock you off of your feet. The size of those waves is directly and necessarily related to the size of what initially starts pushing the air to create the waves. The smaller the physical source, the faster the waves and the higher the sound. Our voices are created by vibrations in our larynx (voice box). On average, women have a smaller voice box than men. A woman-dominated crowd will produce a higher pitched sound than a gender-balanced crowd or a man-dominated crowd. Throw in a few kids' voices (also small voice boxes) as with the WMW, and the resulting rolling roar is a powerful, yet sparkling, acoustic event that deserves to be compared to the sound of a band of angels, however you might choose to imagine that.

In moments of philosophy, we often discuss the question of a tree falling in a forest: if no one hears it then did it really make a sound? The wave of women's voices at the WMW produced an acoustic event of a fully different nature. If you think about it, you cannot even ask this question about the women's shout wave. It is produced if and only if a crowd dominated by women comes together in one place and acts in coordination: The existence of this sound and the fact that it is also heard are one and the same.

The wave of women's voices at the WMW truly produced a shout heard around the world. Reflecting a bit you realize that this new sound is not the only new signal that was produced at the march: shout waves happened at the march, but were not the essence of the march. The essence was a new social signal. What this signal is will reveal itself in what we do next: it may not be a mechanical wave, but I have no doubt in its power to move things in our social/political/physical world.

The discovery of the women's shout wave may or may not excite you as much as it excites me. But you don't have to share my enthusiasm of having participated in the acoustic history of humanity in order to agree: as long as we are able to come together and make this coordinated sound, we are headed in the right direction as a species.

Tamika Mallory said at the WMW, "When you go home remember how you felt": https://youtu.be/dDc9Ochrifw?t=3h38m36s Remembering the unique sound of the rolling roar has the ability to place us back again in the moment, in recalling what we have heard, we can recall what we felt.

But the rolling roar also reminds us of something very important about the movement: the decision to create the women's shout wave is the decision of every individual in the crowd at the moment the wave breaks over her. Every single time when it is time for you to contribute to the wave, you must check your alignment with the overall goals, and then you must shout your lungs out. You have a responsibility as an organizer to target a straight, true line, but also as a participant, an "organizee" as it were. Each person individually is responsible to periodically check that we still are on track towards the right goals, and after every check to reengage with full force.

That take a lot of energy, and a lot of strength. In the moments, when I wonder where will that strength come from, I think of this sign: "Do not be afraid".

Saturday, January 28, 2017

Women's March on Washington: Across time and dimensions in space



It's been one week since the Women's March on Washington (WMW). I was curious to see what would still grip me most strongly now after one week. Filtering through so many topics, so many impressions, so much emotion, what remains with me most clearly is: people's voices.

People are speaking up, speaking out, and, for the first time that I can remember, speaking with unanimous confidence that speaking can and does indeed bring about change. The most obvious case is the power of every citizen to call their government representatives in order to communicate their opinion. But speaking also has the power to open people's minds to fundamentally different perspectives: speaking with each other can lead to the understanding that the world is a single, immense interconnected system. There is and can be no difference between standing up for ourselves and standing up for other people. This understanding builds confidence, lends authority to our individual voices, and allows us to comprehend our own potential at our cores: what we can make happen by all pulling in a common direction is more that we have previously dared to hope or imagine.

This blog post contains observations about the WMW seen through the lens of information, which is my field of research. Specifically, I focus on information as it is captured and shared in multimedia: audio, video, and images.

In Ch. 2 of their Multimedia Computing book, Friedland and Jain discuss communication-related inventions over the course of civilization. Their discussion makes the point is that the era we live in is an era of media recordings, digital media, and the Internet. These inventions provide us with three invariances, which offer historically unique opportunities for communication: invariance of time, invariance of space, and invariance of addressee.

"Invariance" in this case refers to the stability of communication—if a message is invariant, it is not subject to loss or decay. Specifically, these three invariances mean that a message communicated in the past is also available in the future, that a message presented at one place, can be also presented at another, and that a message communicated to one person can be communicated to anyone.

How do these invariances make marching in today's times different than ever before? When our grandparents' generation made signs and took to the street, their message would only reach the people who were present on that street and in that moment. Anything beyond that was dependent on newspaper and radio coverage.

In 2017, the WMW took place in full realization that the march was not limited to that moment, to that space, or to the specific people who were physically in Washington DC. Most of the intended addressees of the march were somewhere completely different. In the case of all three invariances, I observed behavior that reflected the consciousness of people of these information invariances and the need to use them.

Invariance of time: Signs as photo opportunities
Cameras, both conventional and on mobile phones, were everywhere at the WMW. I watched amazed as people took pictures of each holding their signs. Slowly, at the march, it dawned on me that my idea of not bringing a camera to a demonstration outdated. It became apparent to me that people made their signs with the intention of having other people take pictures of them. Some people who had particularly interesting or novel signs were standing at the side of the street. Apparently their purpose standing there was so that people would come up and chat and then pose with the sign for pictures.

A consciousness pervaded the march: a sign is not merely a physical object, but rather a message broadcasted outward without a predetermined limit. Your face on a picture next to a sign anchors it to you as a person in a way that comments on the Web fail to be anchored. The march came together around unity principles. Motivated by these principles, a "selfie" taken with a march sign becomes an "unselfie": an act of selflessness in support of rights and of people without the opportunity to stand up for their own rights.

Invariance of space: Everywhere at once
There was a feeling of connection at the WMW to people who would have liked to have been there, but who weren't able to make the trip to Washington DC. I saw at least one marcher who had listed the names of the people who supported her on her sign. While marching, I felt strongly connected to the people in our circle of family and friends who had stayed at home and watched the media coverage and/or attended other marches.

During the WMW, the mobile phone network clearly suffered overload. Through much of the duration of the march, it was not possible to access the Web via the mobile phone network, to call, or even to send or receive a text. For this reason, we realized that it was that the people who were not at the march were actually understanding what was going on more than the marchers could, although we were actually at the march. The feeling that being the the middle of the march was not necessarily the best way of getting an overall understanding of what was going on enhanced the impression that the march was not happening in a particular physical space, but in fact everywhere at once.

It was only later that evening that I came to fully understand that there had been marches of magnitude in so many places around the the US and the world. Today, it's a week later and I spent some time browsing march pictures on Flickr. March pictures are all simply march pictures, and whether they were taken at the WMW or at a sister march has since faded into the background.

Invariance of addressee: No one missed out on anything
My experience of the march was people, people, and more people. I know Washington DC well, but at times, I felt completely out of sight of anything familiar. During the time that the speakers were speaking we saw no stage: we just had faith that some where in the core of the masses the speakers were speaking on schedule. At one point, someone close to me in crowd said, "Madonna's here!" People seemed excited to hear that, but everyone had realized by then, that it was counterproductive to try to get to the stage.

There was a consciousness that no one was missing out on anything. Not being able to see any of the speakers was not a disappointment, we could all just shrug "Oh, well, I'll catch the speeches on YouTube later". (I spend some time doing that today.) Thinking about what it was like in the middle of the march: I've never experienced a moment, with such a clear sense of shared awareness that that moment would be lived and relived afterwards. For those with similar scifi habits, I'll say, it's the closest I've ever come to experience the feeling of travel across time and relative dimensions in space TARDIS.

Past, present, and future
The weeks leading down to the WMW were already filled with an appreciation of what past marchers had to teach us, and my thoughts frequently turned to the 1963 March on Washington for Jobs and Freedom. Awareness of the contributions of those in whose footsteps we follow is perhaps the most dramatic impact of information communicated across time (54 years ago), space (around the world), and audience (I was not even born then).

I never thought too much about 1963 as a child, but I also never thought too much about fire drills. When the alarm goes off, you calmly and peacefully leave the school. When things become unbearable, you calmly and peacefully go to march in Washington. These are the procedures and the practices that keep us—all of us—safe and keep our efforts to build a just society moving forward. Images, audio recordings, and videos hold the practice before our mind's eye: yes, it does happen, it has happened, and since it needs to happen, it will happen again.

Thursday, January 26, 2017

Recommender system failure as a business model: Repellent ad? Pay for premium!



While writing my last post, I spent a lot of time worrying about whether we really understand the forces at play with so much of our information world driven by business models based on clicks. My underlying assumption was that these forces all perpetuate the dependence of the production and flow of information in today's world on advertising. Today, I was reminded of the importance of thinking out of the box, and never assuming anything: there might be exceptions.

Here's what's happening: YouTube incentivized me to subscribe to YouTube Red by showing me an ad that raised the hair on the back of my neck, and then giving me a pop up window asking "Want to remove ads?" (screenshot above).

Specifically what happened: the pre-roll ad for my video was from Urban Carry Holsters:


G2 Overview from David Foster on Vimeo.

and while my video played I had Urban Carry Holsters videos suggested at the upper right hand of my page:

After watching this for a while in horrified fascination, YouTube opened a pop-up:


The "try it" button might as well have been labeled: "Get me out of here!"

Pretty brilliant, really. What I am assuming is happening  (i.e., "may be could be happening") is that the recommender system algorithm is optimized to increase not only the number of ad clicks, but also the number of YouTube Red subscriptions.

Of course, I am a proponent of recommender systems that are not designed to fulfill a single target [1]. The target could be ill-designed, and the world is also just not that simple.

However, I am of two minds about what YouTube just did to me as a user. First, when we talk about gun violence in the US, we talk about deaths and causalities. The discussion of the psychological wear and tear is often in the shadows. If my heartbeat rises with an ad like this one, then I can't even imagine what parents must go through, who send their kids out the door in the morning to school with the constant worry of stray bullets and guns in irresponsible hands. Ads like these just contribute to the second-order harm that the fact that we have no real gun solution inflicts on society. YouTube's recommender should know enough about me to protect me from the psychological wear and tear (which results in wasted time).

Second, maybe YouTube should not be protecting me, but exposing me to more. (Yes, I am of two minds, and the second is completely opposite.) If recommender systems recommend advertisements that are personalized to be repellent for users, it could be a force that drives subscriptions at a large scale. If enough other people react like me, we will soon be on the road to being able to fund the production and distribution of information based on quality and trust, funded by subscriptions, rather than on clicks.

There is a chance that this ad is not a complete recommender system misfire. The Urban Carry Holster ad was not actually an utter mismatch for my tastes. They show that the holster was designed on the basis of a "user study", and I have certainly purchased a number of high quality real leather handbags in my day. It's the "detail" of putting the gun inside of it that freaked me out.

So maybe it is a recommender system failure, or maybe it is the most important thing that recommenders have done for our online information ecosystem in years. Whichever of these points of view ends of winning, it is something worthwhile thinking about.

My only concern is the manipulation aspect: in order not to destroy trust with YouTube, I would appreciate knowing that the ads are optimized to increase YouTube Red subscriptions, and I am indeed being nudged.

[1] A Said, D Tikk, K Stumpf, Y Shi, M Larson, P Cremonesi. 2012. Recommender Systems Evaluation: A 3D Benchmark. ACM RecSys 2012 Workshop on Recommendation Utility Evaluation: Beyond RMSE, Dublin, Ireland

Sunday, January 8, 2017

Down the Rabbit Hole: Greetings from a state of extreme information overload

This morning, I innocently checked the news. Then I disappeared down the rabbit hole. One click following another, driven by the idea that around the next bend I would arrive at some kind of a lasting understanding that would outlive today.

When I realized I was in full information reading free fall, I started writing this blog post, just to record what was happening.

To reconstruct the beginning of the experience, I asked myself what was the lead story on The Guardian when I opened it this morning.

Do I really remember what happened two hours ago? First, I thought no. Then, I remembered it was something about the shooting in Fort Lauderdale. But what? The shooter was unhappy in some way. Let me go back to check, but whoops in the meantime, there is a fully different lead story...I can't go back to where I was...maybe Fort Lauderdale was not so important after all.

Actually, no don't want to be reading about Fort Lauderdale. I land in Florida airports rather frequently and I don't need to be creating anxiety. Shouldn't be reading that one.

Spent some time trying to get back to see the same "first page" that I saw this morning...clock ticking. It appeared not to be possible.

What kind of insight will I arrive at that will outlive today?

This morning became this afternoon as I dove into certain column with the headline: "Moral panic over fake news hides the real enemy – the digital giants"

https://www.theguardian.com/commentisfree/2017/jan/08/blaming-fake-news-not-the-answer-democracy-crisis

Hmmm. What exactly is "moral panic"?

I read this:

https://www.psychologytoday.com/blog/wicked-deeds/201507/moral-panic-who-benefits-public-fear

Interesting. We learn:

"Moral panic has been defined as a situation in which public fears and state interventions greatly exceed the objective threat posed to society by a particular individual or group who is/are claimed to be responsible for creating the threat in the first place."

However, that gets us nowhere on what "Moral panic over fake news hides the real enemy – the digital giants" is going to actually tell us. If there is a fear, it is related to the fact that we have no way of estimating an objective threat, and by this definition can't be moral panic.

OK. Title doesn't make sense. Let's click anyway. Maybe this article will allow me to move forward on one of my more dominant streams of thoughts these days: The discourse on news and news reading behavior seems to assume that people have an unlimited amount of time and attention resources to consume news in a given day. How do we achieve a healthy and balanced news diet, if we don't have countless hours to spend?

This stream of thought has led me to ask the question if the time that we are spent worrying about "fake news" should be spent thinking about something else. And the related question: "What is that something?" and "Is the problem with fake news actually not that it is fake but that it is simply consuming time that we should be spending doing other things?"

So I click. Falling, falling. The piece is interesting, but not what I expected.

Yet I am reading ideas that I don't recall encountering before in such a form. I keep reading. Second to last paragraph is:

"The only solution to the problem of fake news that neither misdiagnoses the problem nor overpowers the elites is to completely rethink the fundamentals of digital capitalism. We need to make online advertising – and its destructive click-and-share drive – less central to how we live, work and communicate. At the same time, we need to delegate more decision-making power to citizens – rather than the easily corruptible experts and venal corporations."

But how much does the author really know about the forces at play within the larger context that gives rise to online advertising? If there is going to be a "rethinking" there need to be "rethinkers" who are positioned to make changes. This piece seems to be implying that those "rethinkers" exist: but can they exert the required influence?

OK. I could fall forever. I am just going to dig a bit more deeply into this one article, and then I am going to stop and do something else.

Let's start with remembering exactly does "venal" mean again? Looked that up. "Open to bribery". Right. OK.

To understand who the author might consider to be the "rethinkers", let's have a look at where the author is coming from, specifically, what he might know about neuroscience and psychology, i.e., information addiction and confirmation bias, information literacy, and the science of complex systems. I started out by looking at the profile page of the author, here:

https://www.theguardian.com/profile/evgeny-morozov

which links to his blog here:

http://neteffect.foreignpolicy.com

Which doesn't give me a blog, but rather a portal:


what is going on?

I decide to read all of the comments to see if anyone else had this problem.

Whew. Lot's of opinions there. Pretty interesting discussion. One comment states "We have norm of unexamined adoption". That's an interesting observation: How did those norms get formed in the first place? If we can figure that out, then we can maybe take some action there.

At least two people commenting are pointing to the need for helping people develop critical thinking thinking skills and the ability to verify information. That's another of my streams of thought lately: how to promote the practice of evaluating information sources, for example, with the CAARP test.

No one seems to be bothered by the broken link in the author profile of the author of the piece. Usually a broken link would point to a poorly maintained, and potentially less authoritative source. But this is The Guardian! Maybe I am seeing things?

Then I spend some time on The Guardian website trying to figure out where to report a broken link. Lots of opportunities for suggesting corrections to content, and for securely passing The Guardian information. Good to see. However, none for just saying, "Hey, the link is bad".

OK. Time to take action. I posted the comment:

"Does anyone else find that the link to Morozov's Net Effect blog at the top of his Guardian profile page (https://www.theguardian.com/profile/evgeny-morozov) doesn't seem to really lead to the blog? It seems like the The Guardian made a mistake, and that the link should be directing us here: https://foreignpolicy.com/category/net-effect Uncertainty about this link is hindering me in digging into the wider context of this piece."

Time passes.

Worrying that that comment will be interpreted as being negative about the piece. I'm not negative, just trying to get to the bottom of what the lasting message is for me.

Time passes.

I'm spending time on trying to understand why research on dopamine and information seeking seems to have fallen silent in the mainstream press after 2012, and on wondering why there is not a good website to explain complex systems. We need to rely on Wikipedia for so many of the related concepts like "preferential attachment" and "emergence".

Why in the world does my Morozov piece feature one picture of Putin and the top and one picture of Trump in the middle? It is not about either of them. I don't think Morozov chose those.

I am still falling...with also a feeling of having been sucked in.

Time passes.

This is about one piece that I read in the newspaper! I'm trying to form an opinion about one single opinion piece. What if I had tried to read the other ones as well? What if I were doing any serious fact checking?

Greetings from a state of extreme information overload.

Time passes.

Is the conclusion that the limits our time will ultimately always win? That we will drown in a state of information overload because it requires an afternoon to evaluate a single opinion piece?

I am not so sure. In this case, I am planning to take action on my conclusions regarding the article and the things that the people are saying in the comments. As an information retrieval researcher, a crisis in information quality is a crisis at the core of our research field. As an instructor of a freshman information science course I need to be able to describe best practices in information and consumption behavior.

There is a lot riding on this one article for me.

In that respect, it is not wasted time.

A arrive at the bottom of with a loud bump.

So the conclusion is, yes, our time is limited. We can't spend an entire afternoon examining everything that we read. The most important information is the information that we take action on. We need to seek out that information, and evaluate the heck out of it.

If we are not planning to take action, read, but leave the information in suspended animation. For example, the article on Fort Lauderdale. Or: there is now an article about the Mob on the front page of the New York Times. I choose not to subject these to scrutiny, but neither will I take any action (including sharing those articles) on the basis of what I read.

Looking at the length of this post, another obvious conclusion is that people should set aside more time for finding and consuming information. The information available online initially looks "free", but really we need to also count the price of our time. Information without verification is useless.

Setting aside time requires asking the question, "What did we lose because we didn't choose to do something else instead?"In short, how can we more tightly link reading the news to tradeoffs and to tangible value?

Now what about this little bottle?

Alice drink me