N-grams

Thursday, March 18, 2010

Time to play

What's the difference between a single image and a video? Well, video is a timed medium, otherwise called a time-continuous or time-synchronized medium. It's impossible to take in at a glance, and in order to appreciate it, or derive any benefit from it at all, you have to devote time to watching it.

Watching would be more efficient if we didn't need to watch from end to end, but rather had some sort of a road map for the video: an intelligent players that would give us signposts, provide us with an indication of where the video could be most interesting.

I was invited to give a position statement at a session on search at ICTDelta 2010, a larger IT event in the Netherlands. I took the position that we need to have intelligent multimedia players that give us clues as to where the video is the most interesting as my position. The position statement was in Dutch, but I made an English version in the form of a YouTube video.

Sunday, February 14, 2010

Searching Stale Buzzed

The timeline of my relationship with Buzz was rather a short one:

Tuesday, Feb 9, 2010 Google introduces Google Buzz.
Wednesday, Feb 10, 2010 I was in the room the next day with two PetaMedia researchers formulating their opinion, "You're in my network, but I am apparently not in yours--that makes no sense."
Sunday, Feb 14, 2010 I find the time to look into it, reading coverage in the NYT and checking out what was happening at Wired.com, and also looking at subsequent posts on the Google Gmail blog Thursday and yesterday. I turn it off and check to make sure that I don't have a public profile.

Why did I opt out of Buzz? Well, (perhaps rather unsurprisingly) it's an issue that concerns search. Consulting the Buzz site provides me with the information that Buzz is, Meer dan alleen statusberichten: Updates, foto's, video's en meer delen. (Eng. "Not only status messages: Share updates, photos, videos and more"). Sharing stuff with my friends and family is nice, but what if I want to go back and find this stuff later? How will it be indexed? Will I be able to find it, but date, location, keyword, tag? Can they find it? Is there a mechanism for "forgetting"? (If I don't want the option of going back to re-live every Valentine's Day of my past, for example.) What will the cross-lingual search support be like? (The site has made a lucky guess that I speak Dutch, but my sharing will likely be in a mixture of languages...Will it cross languages to reach my followers?)

There's no readily available information on any of these issues. It's all about Buzz, with no vision of what's going to happen with the large amounts of previously Buzzed that will enevitably accumulate. I don't want to be tempted into Buzzing into what, without effective search and browinsing technology, basically is huge void. I'd better keep on simply printing out pictures and sending them to my family and friends with the conventional mail. Sure, I can Buzz it today, but a couple years from now my stuff will be simply gone. If it is indeed being archived somewhere, it will be gone in the sense of its useless without a intelligently conceived mechanism to search and browse it.

To accompany this post, I've chosen a photo from a Flikr that is dated 1909. If the woman in this photo had Buzzed this picture one hundred years ago, would it still be in the possession of her great-grandchild? Given the text on Flickr the great-grandchild is obviously fascinated by it and thrilled to have it. From the beginning, Buzz should have a vision that recognizes the value of stale Buzzed.

If ever a Buzz there wuzz

Google's precipitous plunge into the socially networking arena is basically beneficial for us Gmail users stuck in our old-fashioned email world. Powerful new forms of communication, work, socialization and entertainment are emerging as a result of platforms that are developed for the explicit purpose of connecting an individual to a network of other individuals. To learn to make the most of these new forms, we need to be able to pass seamlessly from our old models, based on metaphors like the letter (one-to-one) and the newspaper (one-to-everyone) to new models. These include, from to the new one-to-several and everyone-to-everyone models supported by Google Buzz.

Seamlessness is one thing, pre-weaving a friends network from a list of e-mail contacts is another. I find it hard to believe that Google launched Buzz in its initial form without anticipating the deluge of angry protests. In my opinion, relatively small scale user test would have allowed them to anticipate the overwhelmingly negative reception. Google's reaction, the adaptation of Buzz towards making concealing rather than revealing the default, has been remarkably fast -- are they really making it up as they go along?

It would not surprise me if Google realized in advance that they were stepping over an important line with the launch of Buzz. Whether it was meant in this way or not, it is worthwhile reflecting on the message the overstep relays. Google could have found no more effective way to remind the world that, although they invest in protection of our privacy and freedom of expression (cf. Gmail's quote to NYT), they cannot make a guarantee to Gmail users that Gmail is an entirely secure channel of communication. Effectively, in the face of threat from hackers traced to China, Google has reacted by engaging it its own sort of hacking. Because Buzz steps so far outside of what people general expect e-mail to do, Google has basically carried out a hacking attack on itself. Essentially, by momentarily making us feel exposed, by showing us the potential damage that could be wrought by a large-scale Gmail hacking attck, Google has given us a clear and memorable reminder: Gmail cannot, and never will be, able to guarantee 100% protection of our privacy. We must always keep in mind the potential dangers of large scale compromise of the system.

Thursday, February 4, 2010

Diep Blue

I am sitting in the train from Brussels to Rotterdam. I have work to do, but I am also too tired to do it well. I need to entertain myself instead. What to do? Hmm. Hey, write a blog entry. But I have the strangest problem. I don't have internet connectivity. I can't write. But I could write a post and upload it later.

I still have the same feeling: I just can't write.

Can't write? Well, first, I have my blog drafts stored on blogger.com. I currently have one half finished and I'd like to finish it off. But I can't get to my draft and if I could, I couldn't finish it, since I'd like to put in some links and check some data sources and facts...that requires connectivity. Slowly it strikes me: I carry an inner conviction that I can't write without access to a search engine. Is it true? I can't create without search? I'm moved to prove that this is not true.

Pause of several minutes.

We pass over the Hollands Diep by Dordrecht. Is that really what it's called? I've just always thought that this water here was what people were talking about when they say Hollands Diep.

Well, no way to look that up now. Instead I gaze out on it, the first few seconds wishing that I was making a video of the view from the bridge. It's the fading light. The sky and the water are nearly the same color and the struts of the bridge (hmm, shall I call them struts or girders--can't check that either...) pass by in regular intervals, with a comforting rhythm. I realize the bridge will be over, so I give my thought flow a little nudge towards simply enjoying the view and not wishing I were capturing it.

The conductor comes by and checks my ticket...ok that was another couple minutes.

Now were stopped in Dordrecht...here comes the coffee guy...in Dordrecht still...Hey, at least I can add a nice Dordrecht image to this entry.

The first several don't come out well. Then the station is gone, but there are trees. Rows of trees. That's iconic for the Netherlands, at least in my mind. Click, click, I capture a couple images. Trees and telephone lines. Whereby the telephone lines are strangely crisp against the blurry trees. Color looks interesting grey/blue. That's nice...I can fiddle with it more later.

I've listened to quite a few talks lately on classifying multimedia according to affect. These talks are mostly discussion the emotional impact that an image or a video has on the viewer as opposed to discussion of the affect being experienced by humans depicted in the image or video.

I look at my trees and telephone line shot again and wonder what the emotional impact on the viewer would be. I guess the color would be sadness: the Dutch word verdriet comes to my mind. Of course, I can't go and look up what the affect people say about this color. I think it's a cool color. I also remember one talk in which there was a slide that wanted to convince me that people consider this sort of color masculine.

I remember that slide exactly because it caused me a bit of mental distress. If I had a group of subjects doing user test and I asked them to arrange pictures on a masculine to feminine scale -- what exactly would they be doing? Accessing their emotional experience? Or, and I consider this more plausible, "cheating" by tapping into past baby congratulations card selection experience? Look around the train, where do we see blue? The button that closes the doors. Blue. The button that opens the doors. Yellow. The labels get removed or the malicious switch them, but the regular train riders know. It's convention and not emotion.

I think that when I will look at this picture the future, it will remind me of tiredness. Of how exhausted I was when I took it. Of my failure to capture the moment in the way I thought it should be captured. And how befuddled my brain was both by the tiredness and also by the attempt to write -- to develop a coherent point -- without access to search.

But here we are now. Rotterdam.

Friday, January 15, 2010

No, Comment

Six months ago the Boston Globe ran an op-ed by someone named Douglas Bailey entitled 'Got a comment? Keep it to yourself.' The piece takes the position that comment posting functionality should be removed from online newspapers in order to restore 'journalism's dignity', lost, according to Bailey, when newspapers started making their content available online for free, thus devaluing it. Not surprisingly, this article has collected 193 comments, 191 of them within the first five days it was on line. Nearly every sentence Bailey wrote, it seems, is graced with a reader reaction. One commenter advises him to 'Keep his op-ed to himself.' A November late addition declares 'Reading the comments section is the BEST part of the article,' which sums up the view of many.

I made a stab at reading all 193 comments -- starting with the ones that received the highest reader ratings. I didn't make it through everything. Somebody (or some spam filter) must have, however, since there are notes where certain comments have been censored 'We removed archie-skip's comment.' The filterer was not, however, Bailey himself, who begins his final paragraph with, 'By the way, don’t bother posting any comments directed to me when this article appears on the Web. I won’t see them.'

What do I have in common with Douglas Bailey? Well, I won't see your comments. In fact, you can't make comments here, I have the comment functionality turned off. At times -- especially around deadlines -- I don't visit my blog for weeks at a time. For this reason, I can't spam filter and I also can't react fast enough to start a meaningful dialogue.

A lot of commenters overlooked Bailey's final sentence which read, 'If you really have something interesting to say, I’ll find you.' By turning off the comments on this blog, I am in a sense, saying the same thing.

If we believe that the Internet should be a place where opinions are expressed and exchanged, where we go to meet in circles of friends, fellow hobbyists, professional colleagues, compatriots, fellow humans, where we learn from each other, hash out the issues, forge consensus, if we want that sort of dialogue on the Internet, then Bailey's casual, 'I'll find you' represents a real challenge. Effectively, he is pushing the whole burden of supporting the dialogue onto, yes, well, right, search.

One might argue, that pieces of information get linked up in ways other than search. But type-into-search-box is the basic search gesture and our browsing, retrieval, exploring, generally amusing ourselves on the Internet behavior relies on this gesture and on the variations we bring to it. Linking things in other ways is non-trivial. Bailey's article ran ran ten days after it was published in the IHT (http://global.nytimes.com/) under the title 'Do not comment on this article' This is where I originally read it. The New York Times published two reader reactions on their site...but it is quite tricky to get from these reactions back to the original piece. You need search. I executed a couple of rounds of type-into-search-box and found it at the Boston Globe...without even thinking about it.

I guess, I am not too concerned about reliance on search to perpetuate the dialogue in this blog. The "you" reading this blog is mostly my future self, and she can comment without needing comment functionality.

But more generally, we should reflect more often on the responsibility of writers and commenters to not only express their opinions, but express them in a way that they can be found and can be associated with the larger dialogue to which they contribute. Organizing discussions around specific articles, however, might not be the answer. Surely, the issue is important enough to sustain serious debate for more than the five days that Douglas Bailey drew serious volumes of reactions...it deserves a life of its own independent from the specific article. Afterall, someone might still have something to say about it six months later.

Friday, January 8, 2010

Get out and pull

Thu, Jan 7, 2010 at 4:45 AM I received an e-mail from the Nederlandse Spoorwegen, the Dutch train service. After greeting me by name, the mail went on to read, Door het winterweer zijn er meer treinen defect. Dat heeft gevolgen voor de dienstregeling. (Eng. 'Because of the winter weather more trains are out of service. This has consequences for the train schedules.') Of course, I appreciate the personalized warning and it would have been critical if I had needed to travel by train yesterday.

For me, this incident was an example of how information technology does help us as a society to be more efficient, cost-effective and perhaps even kinder to the environment. But now that the Nederlandse Spoorwegen has the possibility to broadcast a wide, personal warning and forestall hoards of cold and angry passengers on the platforms, doesn't that make it economic for them to operate on an even thinner margin--i.e., have even less resources on hand that they can swing into action in the case of weather emergencies? It's a slippery slope downwards to being in an even worse position to handle emergency situations. Are things more efficient, or have we just found another balance to inefficiency?

A little personalized knowledge is a dangerous thing. The 4:45 AM mail is, of course, going to be forwarded to bosses across the country -- I'm not coming in, or I'm going to be late today. Half the trains might still be running -- but is there any guaranteed that only half the train riding workforce has now flips into snow-day mode? The Nederlandse Spoorwegen is saving itself from weather problems with its information spreading, but it could actually be amplifying the weather problems for other sectors. What to do? I am certainly not going to advocate the fully personalized approach: The Nederlandse Spoorwegen knows every individuals's contribution to the overall economy and then only warns the less essential members away from trying to take the train on days that less trains are running. Perhaps I could live with the following 4:45 AM mail: 'Good Morning! We think that you want to go to Amsterdam this morning, if you leave the house in one hour there will be a train to Amsterdam picking you up on Platform 1 when you arrive at the station.' I suppose it would be most effective if the message was sent directly to my alarm clock.

But maybe we're not really moving towards putting our schedules entirely in the hands of the Nederlandse Spoorwegen. The final sentence of the 4:45 AM mail is perhaps the most important one: Kijk voor meer informatie op www.ns.nl. (Eng. 'Visit www.ns.nl for more information.) The mail is not warning me off -- what it is is a gentle information push that is inviting me to go out and search for my own information and to make my own decision. There's something you might need to try to find out today, it says. It pushes me to go pull. It tells me that I might just have an information need.

The responsibility of search is to be able to respond in a flexible manner to searchers that have been moved to search by a prompted information need. But maybe my search system should be on the look out for me and be responsible for sending the 4:45 AM mail as well. The Nederlandse Spoorwegen could then attend to its business of getting all of the trains running again on schedule. They wouldn't have to even mention it, and we'd all be willing to get out and pull.

Sunday, December 13, 2009

The nature of social queries

In my last post, I described the location of my camera as my most pressing information need. Soon thereafter, the need was satisfied via my social network, within which the lost camera acted as an implicit query. This process was, however, had none of the revolutionary flavor otherwise associated with social search. I got a call from my mother saying, "You uncle told me that you left your camera sitting on their kitchen counter."

The camera experience set me to thinking about social queries and social search. The missing camera information need falls into the category of known-item search. (Although, I did entertain the idea along the way that I should actually be shopping for a new and better camera.) Also, it is interesting to note, that the query could be answered within my own social network. It's a rather obvious point. After TRECVid I went to my uncle's house and not some other unmotivated place in the area.

Before the call from my mother that I was also pursuing a sort of a social search solution -- a completely conventional proceedure. I called the rental car and the hotel. I was trying to shake down the ad hoc social network including the people who entered the car and the hotel room just after me to figure out what happened to that camera. What I suspect is that a lot of the information needs we have as individuals correspond to known-item social queries and are answerable via a network containing relatively few rather mundanely predictable nodes.

I'm apparently not the only one thinking about known-item search in networks. Recently, the DARPA Network Challenge concluded. It involved the information need, not of an individual, but of an organization. To win you had to be the first to report the locations of ten red weather balloons across the continental US. DARPA moored the balloons and made them visible from nearby roads. The challenge was won by a team at MIT.

Looks like it was fun. The point I'd like to make here, though, concerns the nature of the query. It was a known-item (items, to be exact) query. But because someone already knew (created, in fact) the answer, the search space was radically limited. The US$ 40,000 prize money meant that this could not be a query "typical" of the average node in the network. I'm not sure what we can say that we learned as a result of the experience. On the other hand, we can be grateful that DARPA is smart not to ask something potentially destructive, like, "...." (Sorry, couldn't bring myself to even write an example here -- but a couple readily come to mind.)

Although the DARPA Network Challenge might have been fun, I am afraid it falls short of being good, healthy fun. It's a social search problem both initiated and solved by entities with, it's safe to say, fairly high centrality status within the graph of the social network used to solve the problem. As potential nodes future networks solving similar problems, such experiences effectively serve the purpose establishing precedent and teaching us about problem solving procedure. What we've learned from DARPA is: Sit around and wait until some entity poses money-backed question and then contribute to the MIT-sponsored site and get your piece of the payoff. Good, healthy fun would have taught us that we have to think very, very carefully about who we tell what. It is important not to proceed a step further until we have a mechanism in place by which we can makes sure that everyone understands the difference between sending in a geo-coordinate- and time-stamped photo of a balloon and one of the neighbors putting out the trash.

In particular, it's important to understand the implications of who we tell what about whom. In grade school we come to terms with the delicate balance between supporting fairness within the school world and not betraying our fellows by being a "nark". On the scale of today's social networks, the connection between our actions of telling and the consequences for our fellow human beings is no where near to being adequately transparent to allow for direct learning by individuals. It seems like a harmless piece of information, but when are we morally obliged to contribute it because it would help and when should we stay our urge to be part of the MIT lottery-like fun because of the potential harm?

Perhaps the issue is not yet relevant. People search, by which I mean search for location information about real living people, is still difficult. The sadness over the disappearance of Jim Gray is for some among us in a curious way inseparable from the disappointment at the failure of technology-enhanced large-scale search. You can launch a large scale distributed search to find someone and still not succeed. Even shaking down the vast social networks in the US doesn't seem so easy. Earlier this year, Wired posed a challenge called Vanish. To win you had to find a Wired writer named Evan Ratliff, within a month of him assuming a new undercover identity. He was discovered, but only undertaking a challenge put to him by Wired that forced him to radically narrow the search space for his pursuers -- i.e., enter a know location within a known time frame.