N-grams: 2014

Thursday, December 18, 2014

Anonymous Virtual Movie Ticket for "The Interview": Sony should fight fire with a torrent of fire

On the current crisis: Sony's next move should be to set up a system by which people can buy "Virtual Movie Theater Tickets" anonymously for "The Interview".

The payments for these tickets would be made into a "Theater Ticket" fund. When there is enough money in that fund so that Sony can offset their loss on "The Interview", then they can just release the movie to the world via anonymous torrent.

This way, people can watch "The Interview", and get back to their lives, not worrying that free speech has been compromised by terrorists. The critics can pass judgement on the film's tastefulness, and the world would stay the way it was. (We liked it when video content with the power to threaten lives in democratic countries remained strictly confined to the plots of horror films---remember The Ring?)

The idea of a "Virtual Movie Ticket" is an extension of the call to Sony to "Fight fire with fire", which they are reported to be downplaying.

Why pay for an Anonymous Virtual Movie Theater Ticket? The impasse to setting the movie free on the world via filesharing is the precedent it would set for the movie industry. Movies and TV content require resources, sometimes substantial resources, to produce---this fact remains the bottom line. Terrorist shouldn't restrict free speech, but, not forgetting the more mundane, they shouldn't be able to influence Sony stock prices with a flick of a finger either.

Why does a Virtual Movie Ticket need to be anonymous? The reason is: The last thing that should fall into the hands of the people who hacked Sony is a list of which households which bought "Virtual Movie Theater Tickets".

The same line of reasoning applies to considerations of whether the movie should be released so that people can watch it via VOD streaming.

If the Sony servers can be hacked, then the VOD service can be hacked, as mentioned here. Again, who is watching what is not information that should be in the hands of whoever issued the terrorist threats.

Delft University of Technology has just issued a new release of Tribler, a BitTorrent client that is not dependent on central servers. The Tribler team has been working long and hard on realizing the technology necessary for anonymity:

https://torrentfreak.com/bittorrent-anonymous-and-impossible-to-shut-down-141218/

A common reaction to this research, is that the technology is only needed by "pirates" or in countries with oppressive governments, those evading either the law, or the laying low under lawlessness.

However, with "The Interview" crisis the media consumption behavior of free citizens has been singled out by a terrorist threat. This case drives home how seemingly innocuous information can suddenly become dangerous to individuals.

"The Interview" case dramatically highlights that it is not enough to lead a law abiding life, but that there is a clear and present need for tools that allow all citizens to take responsibility for protecting the privacy of their own information behavior.

Whatever happens next, people who had never considered filesharing before will be forced to think seriously about whether they should be keeping their movie watching behavior anonymous for the purpose of protecting themselves.

By giving people an easy, anonymous chance to pay for a "Virtual Movie Theater Ticket" before they anonymously access "The Interview" via torrent, the current crisis could have the unexpected positive outcome. Rather than being a fiasco, it would set a precedent for a major shift in the technical and economic model of movie production, beneficial to both the studios and the consumer.

The "Virtual Movie Theater Ticket" is conceptually a small step from the movie ticket sold a the box office, but in the current situation, it would be a large and liberating game changer.

Friday, December 12, 2014

"Smart Photography" is not So Smart

This post enumerates the reasons for which "Smart Photography" technology described in this article:
Stokman, Harro. The Future of Smart Photography. Computing Now. IEEE Computer Society. pp. 66-70, July-September 2014
is an enormously bad idea. The technology attempts to categorize images at the moment at which they were taken, and prevent certain types of images from being taken in the first place. In this post, I point out the reasons for which no analogous technology exists for text production. Then I go on to argue that "Smart Photography" constrains the ability of people to record important moments, and, critically, could hinder the ability of a witness to collect evidence during a crime. I close with an example of an image, with which I exercise my freedom of expression.

I am writing a blog post. Let's think for a moment about what is not happening. Specifically, let's reflect on the fact that www.blogger.com is not immediately attempting to put this post into a particular topic category as I write.

One reason why it is not attempting this is that, ultimately, automatic prediction of the topic of my writing is not particularly useful to me. Automatic topic detection could misinterpret my topic, or it could completely miss the fact that I was writing on a new topic that had never been written about before. My post would get misfiled, and potentially lost.

Such a text classification technology would also need to assume that topic was indeed the appropriate type of category into which I wished to sort this blogpost. There would be no room to invent new types of categories, related to e.g., style, place, sentiment or mood.

From this example, we observe: Blogging is a couple decades old, arguably older. After all of this time, www.blogger.com keeps the responsibility of categorizing blogposts firmly with the writer. I will need to click the "Labels" icon and add the tags myself.

Here is another example: As I go through a series of Mac computers, they have gotten more sophisticated over the years. However, there is no functional "autosuggest" feature that can predict in which folder I will want to store a document or presentation while I am creating it. Because of the nature of human creativity, it just doesn't make sense to do this if the computer is going to be useful for tasks that have not yet been imagined.

Finally: YouTube auto suggests categories (presumably using my metadata), with the effect that all my videos are "Science & Technology". That helps me to find, well, exactly nothing in my uploads. Everything is labeled the same. There, again, it's clear that creation must also involve classification effort, if the end effect is to be organization. We don't create towards a pre-defined set of concepts or topics. Instead, when we create content, we also create concepts.

These three examples illustrate that the idea of classification of text at the moment of creation, has not "caught on". And, it's not because we do not yet know how to train text classifiers. Text classification technology has long been considered far ahead of computer vision. We need to acknowledge that other forces are at play.

Yet, in the face of a lack of general applications that classify text at the moment of creation, computer vision researchers are now attempting to build "Smart Photography" applications that would classify images at the moment they are taken by cameras. This contradiction should make us really sit up and think hard about the implications of "Smart Photography".

"Smart Photography" is first fascinating, and then horrifying.

It's fascinating because it's fun. If my camera decided at the moment I clicked the shutter that I was taking a picture of food, I probably would take more pictures of food. I would do it because it would be cool to see if the camera "understands" food. (Yes, I'm a multimedia geek). Also: because food as a separate category is actually built into my camera, I would feel less embarrassed about taking out my device and snapping a picture before eating.

However, this kind of behavior effectively amounts to the camera teaching me what I should be taking photos of. It will subtly channel human photographic impulses down the broad and easy road, and allow less traveled paths of expression to slowly grow over.

That's sad, but not yet horrifying. Horrifying is the following:

The "Smart Photography" camera is going to prevent its user form taking certain photos entirely.

Reading the "Smart Photography" article, the computer vision experts obviously have their hearts in the right places when they envision a camera whose shutter freezes when part of a hand appears in the frame.

However: Imagine your baby's first steps. You miss the shot because you just couldn't get your finger out of the frame in time. You would much rather have a "baby plus finger" photo and be able to treasure the moment, than have your camera freeze up on you because it "saw" a finger in front of the lens and locked the shutter.

It goes on: You miss the shot of your kid scoring that amazing soccer goal, of that rare bird that you saw on your walk in the woods. You miss the shot of the damage that was done to your car in an accident because you just couldn't hold the camera perfectly correctly. A bad shot would have been better than none.

And it gets worse. The camera aspires to block adult content, making it impossible to take a picture of any scene that it classifies as pornographic. It sounds like a miracle for law enforcement the first time you hear it. But the price is too high: Basically, if the camera blocks adult content, it means that if I witness a rape, I have no way of taking a picture of it. The possibility of identifying the perpetrator is blocked by the camera itself.

Effectively, the innovation of "Smart Photography" is making possible a camera that does not work.

Returning to the comparison to the text case. What if www.blogger.com was preventing me from writing this column, as soon as it sensed that its topic included rape? Our technology does not prevent the generation of text, and we need to remain consistent with the values that tell us that lead us to the conclusion that it should not. It is a bad idea to introduce technologies that prevent the generation of images.

One important reason why our technology does not censor text on creation, is it is not people who design technology that get to make the decision about what I can and cannot express. Rather it is the legal system, which is in turn based on the values of the community at large. This system, imperfect and slow moving as it might be, represents individual citizens equally, and can be influenced by them, in the way that a technology cannot be influenced equally by everyone.

The "Smart Photography" article argues that photocopiers prevent people from copying money, i.e., paper bills, and that this technology represents a next step. The reason why money works at all is that there is a system and a society working to make sure that its purpose is unambiguously interpreted. Money is a conventionalized sign at the basis of the society, it exists at all exactly because it is not open for interpretation. Plainly stated, the argument that technology blocking photographic capture of adult content is a natural extension of blocking photocopies of money, relies on an unsound analogy, and must be discounted for this reason.

What's the alternative to "Smart Photography"?

The solution is not making photos smarter, but rather it's changing people. It's relentlessly pursuing our efforts to support each other in our communities, and to help each other make better decisions. It's about the unending quest, that begins again with each new generation: to make people smarter.

We need to assure adequate funding to the people who dedicate their careers to fighting crime. Finding perpetrators of sexual abuse/sex crimes is simply a hard task that requires a huge investment: sick minds are sick, and they will not let a new camera technology stand in the way of their evil business. With this "Smart Photography" camera, sex offenders will be incentivized to start taking pictures that are not so easy to automatically identify, and they may be able to wipe out their own footprints. There are no easy technical shortcuts that will eliminate the need for old fashion crime fighting, yes, also of the gumshoe variety.

We need to educate people. Bear selfies are stupid. Getting people to stop taking bear selfies is not a matter of creating a camera that recognizes a bear selfie situation, and blocks the shutter when someone tries to take a bear selfie. The bear selfie is a symptom of an underlying lack of reflection. It is the underlying problem, and not its superficial manifestation that needs to be addressed. The answer is about taking the time to really talk to our children, and to each other, about what is appropriate and what is not appropriate in a given situation.

Below is the most repugnant photo that I have ever posted online, but today for the first time in my life, I did not take for granted that my camera includes a functionality that allows me to take it.

Thursday, October 23, 2014

MediaEval 2014 Placing Task Technical Retreat

At the end of MediaEval Workshop, the 2014 Placing Task had a technical retreat where the details of the year's crop of algorithms was assessed, and plans for the future were discussed.

I missed the first part of the meeting, because I was still doing some organizational stuff, and also saying goodbye to people (I find that so difficult to do, and I certainly didn't manage to say goodbye to everyone). However, I did take some notes on the parts that I attended and I am putting them here for posterity.

Of course expect the usual attentional bias in what I chose to write down---possibly also in the categories that I put the notes into as well.

Moving beyond geo-location estimation

Can we formulate a data analysis task that moves beyond geo-prediction?

Can we drive the benchmark to get the task participants to uncover the weaknesses in current placing systems. What mistakes are you making, are why are you making them?

Geo-relevance

For which images is geo-location relevant?
Which is the location for which it is relevant?
What is the tolerance for error? (depends on humans, applications)

Placability

In the past two years placability has been offered as part of the task, but has been disappointingly unpopular. This seems to be a matter of people not having time. We shouldn’t take the evidence as meaning that people don’t want to do it.

Alternate form of placability:

“Select a set of x images (e.g., 100 images) from the test set that you are sure that you have placed correctly and visualize them in a map"

How to support the participants

Can we release a baseline system?

Estimates for the error?

How to move beyond co-ordinate estimation

Can we make the Placing Task more clearly application oriented?
Are there use scenarios beyond Flickr?
Is anyone interested in the task of Geo-Cloaking?
Can the task pit two teams against each other, one cloaking and one placing?

Evaluation metric

We think that geodesic distance is convenient, but has limits, since it doesn’t reflect the usefulness of predictions for humans within use scenarios.
Maybe move to administrative districts
Other metrics motivated by human image interpretation?

Ground truth

We can measure placing performance only within the error of the ground truth (cf. [2]). What can we do to work around this limitation?

Correspondence between geo-tags and exif metadata is indicative of whether the tag is correct. See also cool new work on timestamps [4].
Are their other easily measurable characteristics of images online that can be used to identify images/videos with reliable geo-tags at a large scale?
Collect more human labeled data. Do we really need to have a 500,000 item size data set?

How People Judge Place

Users (i.e., humans judging images) have different ways of knowing where a picture was taken.

It depends on the relationship between the human judging, the image, the act of image creation.

The most basic contrast is between the case in which the human judge is the photographer, and the case in which the human judge is not the photographer and also shares no life experiences with the photographer.

Previously I discussed these different relationships in a post entitled “Visual Relatedness is in the Eye of the Beholder” and also in [3].

Why is this important? Some mistakes that are made by automatic geo-location prediction algorithms are disturbing to users, some are not. Whether or not a mistake is disturbing to a particular human judge is related to the way in which the human judge knows where the picture was taken. In other words, I may “forgive” an automatic geo-location estimation algorithm for interchanging the location of two rock faces of the same mountain, unless one of them happens to be the rock face that I myself managed to scale. How people judge place, is closely related to the types of evaluation metrics we need to choose to make the Placing Task as useful as possible.

In the Man vs. Machine paper [1] sets up a protocol that gathers human judgements in a way that controls the way in which people “know” or are allowed to come to know the location of images. More work should be explicitly aware of these factors.

Embrace the messiness

The overall conclusion: anything that we can do to move the task away from "number chasing” towards insight is helpful. This means finding concrete ways to embrace the fact that the task is inherently messy.

Thank you!

Thank you to the organizers of Placing 2014 for their efforts this year. We look forward to a great task again next year.

References

[1] Jaeyoung Choi, Howard Lei, Venkatesan Ekambaram, Pascal Kelm, Luke Gottlieb, Thomas Sikora, Kannan Ramchandran, and Gerald Friedland. 2013. Human vs machine: establishing a human baseline for multimodal location estimation. In Proceedings of the 21st ACM international conference on Multimedia (MM '13). ACM, New York, NY, USA, 867-876.

http://www.icsi.berkeley.edu/~jaeyoung/papers/mm042-choi.pdf

[2] Claudia Hauff. 2013. A study on the accuracy of Flickr's geotag data. In Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval (SIGIR '13). ACM, New York, NY, USA, 1037-1040.

[3] M. Larson, P. Kelm, A. Rae, C. Hauff, B. Thomee, M. Trevisiol, J. Choi, O. van Laere, S. Schockaert, G. J. F. Jones, P. Serdyukov, V. Murdock, and G. Friedland. The benchmark as a research catalyst: Charting the progress of geo-prediction for social multimedia. In J. Choi and G. Friedland, editors, Multimodal Location Estimation of Videos and Images. Springer, 2015.

[4] Thomee, B., Moreno, J.G.,, Shamma, D.A. Who’s Time Is it Anyway? Investigating the Accuracy of Camera Timestamps. ACM MM 2014, to appear. http://www.liacs.nl/~bthomee/assets/14time_p.pdf

Sunday, August 17, 2014

Preparing a camera ready version: Checklist for the final phase of a paper

Authors should watch their words.

The conference submission has been accepted. Your paper will appear at the conference! The co-authors have congratulated each other, and you are looking forward to the event. Your work, however, is not yet done. You want a highly polished product so that people enjoy reading your paper, understand it, remember it, and, yes, use it, hopefully, citing it in their own work. Achieving the goal of making your already-great-work perfect requires investing time and effort into preparing your camera ready.

The art of the camera ready paper is more complex than it may seem. It requires attention to both content and form. You are at the same time adapting the paper to take the comments of the reviewers into account, and also beating the paper into perfect compliance with formatting requirements and editing conventions. During this process you must step back from your work, and look at it as other people would see it. While you are switching your brain between these different perspectives, it is helpful to have a checklist to keep you from forgetting anything that you would kick yourself for afterwards. In this post, I list ten points that I like to check through just before uploading the final version of the camera ready.

Did you give your best effort to accommodate the reviewers' comments? Even if the paper will not be re-checked by a reviewer or an editor, it is important to get as much as possible out of what the reviewers have written. Their perspective gives you insight in what your readers need in order to be able to understand your paper, and to be convinced that it makes a compelling case.
Did you check for issues that the reviewers missed? Reviewers are not perfect. At the end of the day, if your paper is not correct, consistent, and understandable, the reader will blame you, the author, and not the reviewers. Read the paper with "readers' eyes", and look for gaps, where a piece of information is necessary to understand the paper, but was omitted in the originally submitted version. Here, it particularly helps to check the captions of the tables and the figures. Readers often skim these in order to get an overall understanding of the approach and the results. The captions should be consistent with the text, and as complete as possible.
Did you check the requirements of the conference proceedings? You will be required to use a specific template, and usually also to add a specific footer at the end of the first column. The conference creates a consistent, highly professional proceedings by imposing these restrictions: do not exercise your own creativity here.
Did you check your metadata? The conference will often require you to copy-paste your title, authors, abstract and references into the camera ready submission system. These are used for preparing the program, and also for presenting your paper online. If you change the abstract, for example, you must also update the abstract version in the system. It's easy to forget.
Did you de-blind thoroughly? You probably anonymized your paper to submit it. But remember, you removed not only the authors names, but you probably also blinded your own citations in the reference list, and also any URLs leading to supporting resources. You need to add these back to the paper. Do not forget to now also add the acknowledgement of your funder or funders, and anyone else that contributed to the paper.
Did you check for consistency in your terminology? Writing a paper is like writing code. When you declare a variable, then you must use it consistently throughout the paper. For example, if you decide to use a term as a proper noun and capitalize it, e.g., "Support Vector Machine", you need to keep it capitalized throughout the paper. You can use your favorite web search engine to check which form is more widely used, if you are uncertain. You readers are, obviously, not going to return a compile error if you are not consistent. But they will get a funny feeling that something is not right with your paper. A common problem is with hyphenated forms. For example, you should on "crowdsourcing" or "crowd-sourcing" at the onset of the paper and not switch back and forth between the two forms. People often get confused about consistency when it comes to hyphens. If you just look at word strings you will see two forms in papers. For example, in our papers on deep links, you will see both the string "deep link" and the string "deep-link". Look a little closer---consistency is not being violated. The form "deep-link" is hyphenated because it is being used as noun+noun modifier, as in "deep-link comments". If there is no following noun, then the paper uses "deep link".
Did you check your spelling and grammar? It seems obvious, but in the rush it is possible to forget. It is helpful if you are writing .tex to go out of your way to put your text through a grammar checker to look for mismatches in subject/verb agreement, misplaced adverbs, and run-on sentences. A kludge if your do not have a grammar checker is to copy paste into another editor (e.g., Word) and ignore the errors created by the .tex mark-up. You should understand for each sentence that the grammar checker flags, why it is flagged. As the human writer, you have the authority to decide to leave a sentence that the grammar checker does not like the way it is. For example, grammar checkers often have a allergy for passive sentences, i.e., they hate sentences whose grammatical subject is something other than the logical subject of the action. Sometimes, however, such sentences are necessary to keep two consecutive sentences clearly focused on a specific topic. You should be aware that there are many issues that the spell checker is probably not able to detect. For example: complement vs. compliment, or discreet vs. discrete. Another one is work vs. works. If you are a computer scientist (as opposed to a painter or other artist), when referring to the products of your efforts you need to use "work" as an uncountable noun. Just like you would automatically write "research" rather than "researches". You should never write "works", unless you are truly referring to works of art for some reason.
Did you check your layout? For many conference proceedings, preparing the camera ready means that you are actually responsible for the typesetting of your paper. You should have some general knowledge of typesetting principles. An important one is to eliminate typesetting "widows and orphans", lines of text that are isolated at the bottom or the top of a page. It looks particularly bad when a section title is the last line on the page, and the body of the section starts on the next page. Your poor reader will be flipping back and forth to figure out what is going on. Also, as in the above illustration "watch your words" so that they do not wander into the columns. This is a problem with .tex. To fix your layout, you are going to need to rewrite your text just a bit (add or take out a few words). Most conferences also accept if you gently adjust the spacing to avoid typesetting problems. Another guideline that is important both from the content perspective and from the layout perspective is how you end your sections. A section (or a sub-section) should never end with a formula, or with a bulleted or enumerated list. Rather it should conclude with at least one line of conventional text that ties the information in the formula or the list to the topic of the paragraph in which it is contained.
Did you check your punctuation? I like to quip that many authors who are not physically capable of typing a line of code without ending it with a required semi-colon, use papers to let their punctuation fantasies fly. The reality of the situation is that punctuation is not entirely discretionary, and authors should choose an existing set of conventions and apply them consistently. For example, in the dominant style, "e.g." and "i.e." are always followed by a comma. If you choose that style, it should apply to every instance of "e.g." and "i.e." in your paper. You should understand the difference between a hyphen, an en-dash, and an em-dash, and use them accordingly. Finally, if you are using automatic hyphenation, you need to watch to make sure that words that are not in the vocabulary of your hyphenater are treated properly. Words get hyphenated at syllable boundaries. Note that if your software is set to hyphenate another language, German for example, it will not handle English correctly. You should avoid leaving a syllable at the end of one line that will lead the reader to expect a completely different word. Such cases are generally referred to as "bad breaks". One that I encounter a lot is "Medi-aEval". Correctly, this must be broken over two lines as "Media-Eval".
Did you check your references? Recently, I had a very interesting discussion with a student who had just finished writing an excellent master thesis---expect, the reference section was a mess. The same conference was cited in different references in different ways: with acronym, without acronym, with page numbers, without page numbers, with city, without city. Proper nouns in the titles of the paper had all been case flattened, i.e., "flickr" rather than "Flickr". I asked him why he had not fixed his references. He said "But I got the .tex from Google Scholar, it has to be right!" Looks like I discovered an area in which I have failed as a mentor. Or expressed differently, an area in which I can further try to push back against our human tendency to close our eyes and push the autopilot button, when someone offers us one. The perfectly curated reference section is deserving of a blog post of its own.

Tuesday, April 29, 2014

I Don't Specifically Wanna be AnarCHI, but it sure does strike a chord...

If I am following ACM CHI 2014 on Twitter, and read a tweet with a picture of a document entitled "Never mind the Bullocks: I Wanna Be AnarCHI: A Manifesto for Punk CHI", I feel compelled to find it, read it and remark.

Except you can't really "read" this paper in the standard sense because it consists of a title, abstract, guitar chords, and lyrics.

It's not a paper, but rather a recipe that gets you to a performance. (You, yes, that's you, who consider yourself to be the "reader".) You have to interact with the thing to make it whole. To interact, you need to follow "instructions" for playing music, formulated using the standard conventions for how guitar music is written.

Which invites the following line of thought: If this paper can be an interactive set of instructions that can be carried out by people familiar with certain standard conventions, then, how far do we get if we view other papers as similar animals?

Unpacking that question: On the one hand, you can say about any paper, that you don't really get a complete entity until you interact with it: read it, interpret it, cite it, extend it. On the other hand, you can say about any paper, if you don't master the system of conventions (I don't know how to play the guitar), you are stuck and do not move beyond square one.

So it's a paper like any other paper, after all? Um, well. I continue with some additional remarks.

The lyrics in the paper refer to the "subtle kind of pressures that go unseen". If this were a film, the mention of the "unseen" would trigger us to expect that other forms of interpretation might lead to more insight. We could call it a foreshadowing or a hint that the message of this paper cannot be directly perceived, but must be witnessed through participation.

If the paper includes the participation, what does this participation tell us? Specifically, what does it tell us about ourselves, the "readers" of "papers"? My remarks regard the first Commentary on the paper.

This Commentary includes the statement, "It is not clear what the authors are trying to accomplish." My remark: This suggests the existence of an assumption among readers (i.e., readers like ourselves) that a paper should represent an attempt at a well-formulated achievement.

The Commentary also includes the statement, "...we are forced to question whether this paper should really be seen as anarchist". My remark: This suggests the existence of an assumption that a paper should not open up possibilities to discover contradictions between its literally expressed message, and its larger implications.

Then, the Commentary includes the statement, "The authors do draw important attention to the government and corporate funding of HCI research...However, we wonder how much of the authors' own research is funded by such." My remark: This suggests the existence of an assumption that the authors of a paper must not advocate actions differing from, or going beyond, those in which they are currently engaged.

Finally, it includes the question "Would such mainstream acceptance paradoxically undermine the movement's very purpose?" The "movement" is here the purported AnarCHI "movement". My remark: This suggests the existence of an assumption that ideas should not come into being with the anticipation that they will ultimately destroy themselves, but rather, should come into being in order to establish permanence.

Yes, looking at this list, it does look like a set of conventions. And because the authors of this "paper" do not respect these conventions they do not get beyond square one with this Commentary.

However let's ask this: Don't we value exactly the opposite of these assumptions? Don't many of us believe that there needs to be flexibility in research for exploration (not all goals should be well-formulated), for writing papers in which the readers might discover contradictions (see things that the original authors don't), for papers that inspire future scientists that they can be better than us, and for papers that present ideas aimed at moving forward the field as a whole, rather than establishing their contribution as part of an immutable canon.

In the form of these assumptions, the subtle pressures that go unseen have stepped out of the darkness and into the spotlight. They are not inherently bad pressures. Conventions allow us to communicate, just like they allow a guitarist to interpret written music to reproduce the intentions of the compose.

However, they are there. And naturally, they are brightly illuminated when someone invokes punk along with a cool CHI play on words like AnarCHI.

Heck yeah, they are there. And like the gravity that keeps us glued to the surface of the planet, we need, as human beings, heroic acts of will, insight, and technological development in order to, physically, be able to fly.

The Commentary refers to this paper as "exceedingly clever". But maybe it's not clever at all. It could just be considered a standard method to transcend the unspoken assumptions of a mature community, and to realign them with the underlying values privately cherished by that same community, i.e., the values that we would like to believe that we hold and can act on.

The Commentary implies that the paper subverts the mainstream. But it doesn't subvert the mainstream. It reveals the tension that exists inside everyone of us. We struggle to keep our intuitions, investigation, and ambitions free of its destructive load of expecting science to progress in neat, self-consistent packages, self-contained packages with long lifespans in the larger community. It helps us because it encourages us with the reminder that we are not alone.

The Commentary admits the possibility "...we just don't get it". With respect to the duck video, I would be with you.

I've never attended CHI. Maybe I also need to learn how to play the guitar.

Friday, April 18, 2014

Multimedia Of the People, By the People needs multimedia technology For the People

Flick: krazydad

Yesterday, our Multimedia Computing Group had a strategy day at the Delft Arts Center. These are the days that we take time—that strictly time speaking we cannot really spare from deadlines and projects—and step back and look at the larger picture of our group and of multimedia research. The location of the Arts Center is green and the Dutch April obliged us with sun, ideal to take time to contemplate and discuss the bigger questions.

My larger picture is this: I am interested in the digital reflexes of human thought, creativity, and communication. These reflexes lead to the generation of multimedia and interaction data. The natural group with interest in this data is the people that it represents, i.e., the people whose efforts and activities caused it to come into being. I create algorithms and technology that support people in getting the most out of “their” data.

Of course their is a lot of multimedia data out there, also including satellite images, medical images, and surveillance video. These are, of course, also forms of multimedia. However, typically, the people who generate these data (or the people who create the systems that generate these data), are not themselves represented in the data.

In my view, multimedia “Of the People, By the People” (i.e., data arising from the creativity and activity of a large number of general-population users) should be distinguished from special purpose multimedia. This means that we need a concept of multimedia systems "For the People” and that these should be the focus of special development effort.

There are conceptual and algorithmic reasons why multimedia systems "For the People” should be developed as a separate class. One important point is that multimedia of the people and by the people is also generated for multiple purposes, giving rise to complexity not encountered in other systems. For example, a satellite image wouldn’t be expected to be able to fulfill to radically different goals, such as “education” and “entertainment”, whereas such multi-facetedness is quite common for a video on YouTube. We tackled such complexity in a recent publication, “Using Crowdsourcing to Capture Complexity in Human Interpretations of Multimedia Content”, but I will not discuss it further here.

Here, instead, I would like to focus on the ethical aspects of why multimedia “Of the People, By the People” should be the subject of dedicated research devoted to creating multimedia systems for users. My motivation was a conversation during yesterday's lunch that came scarily close to arriving at the question, “Do we really have time for ethics in our research anyway?” Given that it required a significant amount of sacrifice to make time for the strategy day, I was really not ready to come to the conclusion that, whoops, no, after all, no time for ethics. I tried to make the point that ethics does not necessarily require time, but rather it involves making explicit choices that bear our research forward in one direction as opposed to another. The difference between the directions may be subtle, but the implications are huge.

Differentiating multimedia “Of the People, By the People” from other types of multimedia is a way at looking at what we do that opens to the door to consideration of the ethical aspects of our research. In particular, if we target multimedia systems "For the People", our research makes a contribution to supporting the dialogue on an important issue of the digital age: the right of individuals and communities to benefit from the data produced as a result of their own creativity and activity. I am not asserting that creating multimedia technologies "For the People" is in itself an act that one can consider “ethical” Rather, that choosing to do so, and making this choice explicit, we support people working in other disciplines in defining, and finding answers, to the tough ethical questions.

Social multimedia sharing on the internet has led to the creation of incredible resources. YouTube and Flickr contain an enormous amount of material that teaches, records aspects of human life, or represents the product of creative talent. In order that multimedia resources are able to reach their full potential of serving the good of individuals and society as a whole, it is important that technical tools are developed that will allow these resources to be searched and browsed.

A somewhat trivial way of understanding (or "feeling") the way in which the people who created these resources also have a right to them is this, let's try a simple exercise. Imagine Google “turned off” YouTube. Would you feel that something had been taken away from you? Would schools, communities, social relationships between people be damaged? The answer is clearly “yes”. I am not going to far as to make the argument that YouTube videos should be considered public goods. There are complicated intellectual property issues at play there. The point here is simple: “turning off” YouTube would have larger implications than of “just” a company making a business decision for business reasons.

Let’s look at a less trivial example: Imagine that Google does not turn off YouTube, but rather starts to spend less effort on maintaining it (i.e., that there are streaming problems, or that we can’t find what we are looking for.) That’s a slippery slope going down from what YouTube is today, to the point of “turning it off”. Google's position on that slope again has consequences for ability of people to access content that they themselves have created. In other words, that position goes beyond mere business decisions.

Finally, let's turn to where we are today: Google makes certain decisions about the types of multimedia search technologies to develop and not to develop for YouTube. These decisions have consequences. One decision in which these consequences are perhaps most clearly evident is that YouTube does not provide time-code level search functionality to videos. This means that, for example, a student studying data fitting, can’t directly search for short segments of lecture videos explaining least squares, but instead, must do a significant number of video-level searches.

Of course it would be a strange, if not bad, business decision to make the search function of YouTube capable of precisely addressing specific questions. Users that come, get their question answered with one query, and leave again, are not going to hang around and click the ads that make such a critical contribution to revenue.

The point is that decisions whether or not to develop certain technologies (and I can assure you that time-code level search is non-trivial) have implications for the ability of the community who created and contributed the content also to be able to use it.

In order to drive this point home, let me give one more example, based on Wikipedia. There is wide-spread agreement that Wikipedia is peer-produced resource that is important for society world wide. In the case of Wikipedia, the importance of access technology in guaranteeing people’s ability to make use of the resource is clear. Imagine that all of Wikipedia were dumped into an enormous text file without structure and without a search function. Would it be useful to the larger community? Scarcely. The “right” to use Wikipedia is guaranteed not only by the Creative Commons license under which its content is created, but also by the structure of the Wikipedia site and the search functionality that it offers.

As systems get more complicated, and involve processing of pixels and sound samples, we cannot guarantee that appropriate search and access technologies will be developed without significant effort. The “ethical” decision to make as a multimedia researcher in academia is to focus efforts on those research topics that lead to multimedia systems “For the People” that industry is not necessarily incentivized to address.

Note that I am not asserting that companies like Google, are not working to making access to user-contributed collections of multimedia better. Quite to the contrary. Flickr, for example, just announced integration of object recognition http://techcrunch.com/2014/04/17/yahoo-acquisitions-power-flickrs-new-object-recognition-search-editing-and-video-capture/).

However, in order to remain profitable, companies must pay careful attention to their bottom line. For this reason, it is important that actors outside of industry (such as universities) remain actively engaged in developing technologies that help people get the most of multimedia data that they themselves have created. Such actors experience no conflict of interest, and care thus free to stand squarely on the side of the people who created the data when conceiving and developing new multimedia algorithms.

How much time do we need to consider ethics in our research? Not that much. We greatly benefit from a few odd moments of discussion and reflection. Yesterday, I welcomed the chance to view the big picture, and to step back and ask if I, as a multimedia researcher, am truly serving the interests of the people whose effort and action gave rise to the data that I study in the first place.

N-grams