Thursday, October 23, 2014

MediaEval 2014 Placing Task Technical Retreat

At the end of MediaEval Workshop, the 2014 Placing Task had a technical retreat where the details of the year's crop of algorithms was assessed, and plans for the future were discussed.

I missed the first part of the meeting, because I was still doing some organizational stuff, and also saying goodbye to people (I find that so difficult to do, and I certainly didn't manage to say goodbye to everyone). However, I did take some notes on the parts that I attended and I am putting them here for posterity. 

Of course expect the usual attentional bias in what I chose to write down---possibly also in the categories that I put the notes into as well.

Moving beyond geo-location estimation
Can we formulate a data analysis task that moves beyond geo-prediction?

Can we drive the benchmark to get the task participants to uncover the weaknesses in current placing systems. What mistakes are you making, are why are you making them?

Geo-relevance
  • For which images is geo-location relevant?
  • Which is the location for which it is relevant?
  • What is the tolerance for error? (depends on humans, applications)
Placability
In the past two years placability has been offered as part of the task, but has been disappointingly unpopular. This seems to be a matter of people not having time. We shouldn’t take the evidence as meaning that people don’t want to do it.

Alternate form of placability:
“Select a set of x images (e.g., 100 images) from the test set that you are sure that you have placed correctly and visualize them in a map"

How to support the participants
Can we release a baseline system?
Estimates for the error?

How to move beyond co-ordinate estimation
  • Can we make the Placing Task more clearly application oriented?
  • Are there use scenarios beyond Flickr?
  • Is anyone interested in the task of Geo-Cloaking? 
  • Can the task pit two teams  against each other, one cloaking and one placing?
Evaluation metric

  • We think that geodesic distance is convenient, but has limits, since it doesn’t reflect the usefulness of predictions for humans within use scenarios.
  • Maybe move to administrative districts
  • Other metrics motivated by human image interpretation?

Ground truth
We can measure placing performance only within the error of the ground truth (cf. [2]). What can we do to work around this limitation?
  • Correspondence between geo-tags and exif metadata is indicative of whether the tag is correct. See also cool new work on timestamps [4].
  • Are their other easily measurable characteristics of images online that can be used to identify images/videos with reliable geo-tags at a large scale?
  • Collect more human labeled data. Do we really need to have a 500,000 item size data set?
How People Judge Place
Users (i.e., humans judging images) have different ways of knowing where a picture was taken. 

It depends on the relationship between the human judging, the image, the act of image creation.

The most basic contrast is between the case in which the human judge is the photographer, and the case in which the human judge is not the photographer and also shares no life experiences with the photographer.

Previously I discussed these different relationships in a post entitled “Visual Relatedness is in the Eye of the Beholder” and also in [3].

Why is this important? Some mistakes that are made by automatic geo-location prediction algorithms are disturbing to users, some are not. Whether or not a mistake is disturbing to a particular human judge is related to the way in which the human judge knows where the picture was taken. In other words, I may “forgive” an automatic geo-location estimation algorithm for interchanging the location of two rock faces of the same mountain, unless one of them happens to be the rock face that I myself managed to scale. How people judge place, is closely related to the types of evaluation metrics we need to choose to make the Placing Task as useful as possible.

In the Man vs. Machine paper [1] sets up a protocol that gathers human judgements in a way that controls the way in which people “know” or are allowed to come to know the location of images. More work should be explicitly aware of these factors.

Embrace the messiness
The overall conclusion: anything that we can do to move the task away from "number chasing” towards insight is helpful. This means finding concrete ways to embrace the fact that the task is inherently messy.

Thank you!
Thank you to the organizers of Placing 2014 for their efforts this year. We look forward to a great task again next year.

References
[1] Jaeyoung Choi, Howard Lei, Venkatesan Ekambaram, Pascal Kelm, Luke Gottlieb, Thomas Sikora, Kannan Ramchandran, and Gerald Friedland. 2013. Human vs machine: establishing a human baseline for multimodal location estimation. In Proceedings of the 21st ACM international conference on Multimedia (MM '13). ACM, New York, NY, USA, 867-876. 
[2] Claudia Hauff. 2013. A study on the accuracy of Flickr's geotag data. In Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval (SIGIR '13). ACM, New York, NY, USA, 1037-1040.
[3] M. Larson, P. Kelm, A. Rae, C. Hauff, B. Thomee, M. Trevisiol, J. Choi, O. van Laere, S. Schockaert, G. J. F. Jones, P. Serdyukov, V. Murdock, and G. Friedland. The benchmark as a research catalyst: Charting the progress of geo-prediction for social multimedia. In J. Choi and G. Friedland, editors, Multimodal Location Estimation of Videos and Images. Springer, 2015.
[4] Thomee, B., Moreno, J.G.,, Shamma, D.A. Who’s Time Is it Anyway? Investigating the Accuracy of Camera Timestamps. ACM MM 2014, to appear. http://www.liacs.nl/~bthomee/assets/14time_p.pdf