Showing posts with label information need. Show all posts
Showing posts with label information need. Show all posts

Saturday, March 10, 2012

Querying the Collective: Why search engines should (responsibly!) support analytics

My thoughts today turned towards the social responsibility of search engines. As search improves, it seems that the question is not moving towards being resolved, but rather is becoming more important. Simply put, as the helpfulness that search engines have in our lives increases, so does their power over us. In this post, I'd like to attempt to unpack that thought a bit and build a case for the importance of social responsibility of search engines.

Let's consider the question of the identity of the entity with which knowledge resides. In other words, let's have a look at, "who knows what" in some basic search scenarios. User information needs, I claim, differ along the who-knows-what dimension:
  • Known-item search: "I know what I want and my information need will be fulfilled when I can get my hands on this item." (Knowledge effectively already with me.)
  • Ad hoc search: "I don't know exactly what I want. But I know that there are other people who know. My information need will be fulfilled when I can get my hands on information sources created by people who do know." (Knowledge with other people, soon to be with me.)

As long as we stick to these two variants, the world is relatively well behaved. Social responsibility arguably remains with the searcher and with the individuals making the information sources available.

However, with the next step in this typology, things definitely become different.

  • Analytics search: "I don't know exactly what I want. In fact, I know that there is no individual person who knows. My information need will be fulfilled if I can get my hands on information that is created using analytics over a larger number of information sources." (Knowledge with no one, not yet.)

We do analytics search all the time, even without realizing it. For me, it's often for small things while writing research papers, for example, for finding the more common usage "crowd-sourcing" vs. "crowdsourcing" or to see if "internet" has finally overtaken "Internet". I actually just this moment carried out a search for "analytics", which is being red-underlined as a spelling mistake as I write. In response to my query, Google tells me "About 117,000,000 results (0.24 seconds)". I decide that there are a lot of other people using this word -- many in the same way that I am using it now, so I ignore the spellchecker and move on.

The point is that this 117,000,000 is information that was derived on the spot by analyzing and aggregating a huge number of data sources. As a result, the responsibility for this information has shifted and now lies elsewhere. It is not so clear that it lies only with me, the person asking the question, or with the information sources that are being aggregated. Rather, the responsible for creating this information lies with the algorithm that made the calculation. If we think that a non-human entity such as an algorithm cannot be responsible, then the conclusion must be that the responsibility lies with the people who created and control the algorithm that made the calculation, i.e., the minds and masters behind the search engine.

Of course, many times that responsibility is not a particularly heavy weight and the answers to analytics search queries can often be wildly off and still not be harmful. Give or take a million, I still will see that there are a lot of people using the word "analytics" and that answers my question.

However, I would argue that there are enough cases in which the results of analytics search queries have a large enough impact that they should force us to think carefully about the social responsibility that is borne directly by our search algorithms, their creators and the providers of our search services.

Recently, I spent some time in a house by a lake in the forest. The local news reported an incident in which a hunter reported a man in the forest, carrying a gun, but not wearing the blaze-orange of a hunter. The man had fired at him, and the hunter returned fire. Basically, I made my decision about when to go out of the house after hearing that news report by using a search engine to monitor real-time media (Twitter and local news). My queries were analytics queries because they relied on the entire collection of available information being scanned. My conclusion about the situation relied on a relatively subtle difference between no one mentioning it and it being mentioned by a handful of people (the forest being rather sparsely populated). I continued periodic query sessions, and the story died out relatively quickly. I walked out of the house with confidence that the incident was a fluke and not a rampage and that no one was going to take a pot shot at me.

One could argue that I was irresponsible for potentially putting my safety in the hands of a search engine. But one could also argue that I was irresponsible for being there in the hunting season. There are also those that would claim that the place is a bit weird anyhow, and should be completely avoided. The problem is, that place is where I'm from. I'm probably not about to stop going back and also I'm probably not about to stop looking for information by carrying out analytics search with a search engine.

It seems inevitable: People use analytics search to form opinions and make assessments that influence their behavior and lead them to make important decisions.

We have little choice but to admit that we would like search engines to offer us as users the possibility to satisfy our information needs using analytic search. It gives us a lens to view the world around us. It takes us a step in the direction pointed to recently by Doug Oard, who was quoted on Twitter as wanting an information retrieval system that is an exoskeleton for the mind. Personally, (and to the bemusement of my colleagues) I tend to talk about search, especially in the context of social networks, as providing us with a prosthesis. In the end, all the metaphors boil down to analytics search being just plain important to us and to what we want to do in our lives.

We are left with the conclusion: Search engines should support analytics search, but they should take careful regard of social responsibility.

Why am I thinking of analytics search today? Probably because I've come up against another problem where it is useful. This problem involves no guns, so it's not particularly life threatening -- at the most it threatens the productivity of our lab.

At the beginning of the year, there was a high-level decision to restrict access to our building on the weekends. The cited reasons were that ICT technology no longer requires 24-hour building access and that weekend closure would save energy and security costs. The net result has been that on the weekends our lab is completely empty, when there used to be at least one or two PhD students working there, when I'd go in.

I became curious about exactly how much electricity is being saved and realized that we can actually make a rough estimate using social media to calculate the number of weekends in which the building is actually powered down. For example, today (a Saturday) it wasn't. Today, I took a picture (above) in the cafeteria which reveals the fact that it wasn't. There were also other people in this group in the picture that were themselves taking pictures. If a search engine will allow me to find other pictures (e.g., on Flickr) of weekend events in the building, it will be possible to make an estimate the total number of days of electrical consumption actually saved by keeping the PhD students out of the lab.

None of the individual picture takers know this information, but if the information can be aggregated with the support of a search, then we can know -- calling the information into being, as it were, using a couple of queries linked to dates and locations. I'll leave it to another day to examine the question of if I actually have a right to know how much energy is being saved by the building closure policy. Here, I draw a different conclusion: if I am going to rely on a search engine to formulate an impression of what happens in the building on the weekends, then I would like that search engine to have assumed the responsibility of giving the best answer it possibly can.

Friday, January 8, 2010

Get out and pull

Thu, Jan 7, 2010 at 4:45 AM I received an e-mail from the Nederlandse Spoorwegen, the Dutch train service. After greeting me by name, the mail went on to read, Door het winterweer zijn er meer treinen defect. Dat heeft gevolgen voor de dienstregeling. (Eng. 'Because of the winter weather more trains are out of service. This has consequences for the train schedules.') Of course, I appreciate the personalized warning and it would have been critical if I had needed to travel by train yesterday.

For me, this incident was an example of how information technology does help us as a society to be more efficient, cost-effective and perhaps even kinder to the environment. But now that the Nederlandse Spoorwegen has the possibility to broadcast a wide, personal warning and forestall hoards of cold and angry passengers on the platforms, doesn't that make it economic for them to operate on an even thinner margin--i.e., have even less resources on hand that they can swing into action in the case of weather emergencies? It's a slippery slope downwards to being in an even worse position to handle emergency situations. Are things more efficient, or have we just found another balance to inefficiency?

A little personalized knowledge is a dangerous thing. The 4:45 AM mail is, of course, going to be forwarded to bosses across the country -- I'm not coming in, or I'm going to be late today. Half the trains might still be running -- but is there any guaranteed that only half the train riding workforce has now flips into snow-day mode? The Nederlandse Spoorwegen is saving itself from weather problems with its information spreading, but it could actually be amplifying the weather problems for other sectors. What to do? I am certainly not going to advocate the fully personalized approach: The Nederlandse Spoorwegen knows every individuals's contribution to the overall economy and then only warns the less essential members away from trying to take the train on days that less trains are running. Perhaps I could live with the following 4:45 AM mail: 'Good Morning! We think that you want to go to Amsterdam this morning, if you leave the house in one hour there will be a train to Amsterdam picking you up on Platform 1 when you arrive at the station.' I suppose it would be most effective if the message was sent directly to my alarm clock.

But maybe we're not really moving towards putting our schedules entirely in the hands of the Nederlandse Spoorwegen. The final sentence of the 4:45 AM mail is perhaps the most important one: Kijk voor meer informatie op www.ns.nl. (Eng. 'Visit www.ns.nl for more information.) The mail is not warning me off -- what it is is a gentle information push that is inviting me to go out and search for my own information and to make my own decision. There's something you might need to try to find out today, it says. It pushes me to go pull. It tells me that I might just have an information need.

The responsibility of search is to be able to respond in a flexible manner to searchers that have been moved to search by a prompted information need. But maybe my search system should be on the look out for me and be responsible for sending the 4:45 AM mail as well. The Nederlandse Spoorwegen could then attend to its business of getting all of the trains running again on schedule. They wouldn't have to even mention it, and we'd all be willing to get out and pull.

Sunday, November 22, 2009

Lemonface

Since I was going to be mentioning Flickr Video in my presentation about VideoCLEF at last week's TRECVid 2009 workshop, I decided I should try and make one. This video was shot out the window in the morning. And yes, I dutifully made sure it was properly geo-tagged as Gaithersburg. Sure indeed, capturing the ripple makes the flag image come alive..."It's like a photo, but it moves!"

Is Flickr video a long photo?

Maybe it's even a bit too alive. The ripples would be more dramatic had the camera been stabilized...you can see I'm not holding it still. Where does it end? Interactive moving pictures, a la Harry Potter's newspaper, I suppose.

Immediately after downloading this video off of my camera, I lost it. My first reaction: Had I downloaded all of the pictures on the camera? (Yes!) Only then did I start fretting about how much money I paid for the camera, a Cannon PowerShot SD1100, and how much it would cost to replace it.

I felt it viscerally: it's not just a bunch of bits! Content matters to us in a very personal way. Since I didn't empty the camera memory card, are my photos are floating around out there, in whatever space that the photos on lost cameras go. What if this space turns out to be the Internet? Would I be embarrassed?

In my opinion the most embarrassing picture on the camera is a lemon-face from Jay and Silent Bob. And yes, there is also a corresponding lion-face out there...somewhere. I have this most pressing information need: "Where is my camera?"