Stokman, Harro. The Future of Smart Photography. Computing Now. IEEE Computer Society. pp. 66-70, July-September 2014
is an enormously bad idea. The technology attempts to categorize images at the moment at which they were taken, and prevent certain types of images from being taken in the first place. In this post, I point out the reasons for which no analogous technology exists for text production. Then I go on to argue that "Smart Photography" constrains the ability of people to record important moments, and, critically, could hinder the ability of a witness to collect evidence during a crime. I close with an example of an image, with which I exercise my freedom of expression.
I am writing a blog post. Let's think for a moment about what is not happening. Specifically, let's reflect on the fact that www.blogger.com is not immediately attempting to put this post into a particular topic category as I write.
One reason why it is not attempting this is that, ultimately, automatic prediction of the topic of my writing is not particularly useful to me. Automatic topic detection could misinterpret my topic, or it could completely miss the fact that I was writing on a new topic that had never been written about before. My post would get misfiled, and potentially lost.
Such a text classification technology would also need to assume that topic was indeed the appropriate type of category into which I wished to sort this blogpost. There would be no room to invent new types of categories, related to e.g., style, place, sentiment or mood.
From this example, we observe: Blogging is a couple decades old, arguably older. After all of this time, www.blogger.com keeps the responsibility of categorizing blogposts firmly with the writer. I will need to click the "Labels" icon and add the tags myself.
Here is another example: As I go through a series of Mac computers, they have gotten more sophisticated over the years. However, there is no functional "autosuggest" feature that can predict in which folder I will want to store a document or presentation while I am creating it. Because of the nature of human creativity, it just doesn't make sense to do this if the computer is going to be useful for tasks that have not yet been imagined.
Finally: YouTube auto suggests categories (presumably using my metadata), with the effect that all my videos are "Science & Technology". That helps me to find, well, exactly nothing in my uploads. Everything is labeled the same. There, again, it's clear that creation must also involve classification effort, if the end effect is to be organization. We don't create towards a pre-defined set of concepts or topics. Instead, when we create content, we also create concepts.
These three examples illustrate that the idea of classification of text at the moment of creation, has not "caught on". And, it's not because we do not yet know how to train text classifiers. Text classification technology has long been considered far ahead of computer vision. We need to acknowledge that other forces are at play.
Yet, in the face of a lack of general applications that classify text at the moment of creation, computer vision researchers are now attempting to build "Smart Photography" applications that would classify images at the moment they are taken by cameras. This contradiction should make us really sit up and think hard about the implications of "Smart Photography".
"Smart Photography" is first fascinating, and then horrifying.
It's fascinating because it's fun. If my camera decided at the moment I clicked the shutter that I was taking a picture of food, I probably would take more pictures of food. I would do it because it would be cool to see if the camera "understands" food. (Yes, I'm a multimedia geek). Also: because food as a separate category is actually built into my camera, I would feel less embarrassed about taking out my device and snapping a picture before eating.
However, this kind of behavior effectively amounts to the camera teaching me what I should be taking photos of. It will subtly channel human photographic impulses down the broad and easy road, and allow less traveled paths of expression to slowly grow over.
That's sad, but not yet horrifying. Horrifying is the following:
The "Smart Photography" camera is going to prevent its user form taking certain photos entirely.
Reading the "Smart Photography" article, the computer vision experts obviously have their hearts in the right places when they envision a camera whose shutter freezes when part of a hand appears in the frame.
However: Imagine your baby's first steps. You miss the shot because you just couldn't get your finger out of the frame in time. You would much rather have a "baby plus finger" photo and be able to treasure the moment, than have your camera freeze up on you because it "saw" a finger in front of the lens and locked the shutter.
It goes on: You miss the shot of your kid scoring that amazing soccer goal, of that rare bird that you saw on your walk in the woods. You miss the shot of the damage that was done to your car in an accident because you just couldn't hold the camera perfectly correctly. A bad shot would have been better than none.
And it gets worse. The camera aspires to block adult content, making it impossible to take a picture of any scene that it classifies as pornographic. It sounds like a miracle for law enforcement the first time you hear it. But the price is too high: Basically, if the camera blocks adult content, it means that if I witness a rape, I have no way of taking a picture of it. The possibility of identifying the perpetrator is blocked by the camera itself.
Effectively, the innovation of "Smart Photography" is making possible a camera that does not work.
Returning to the comparison to the text case. What if www.blogger.com was preventing me from writing this column, as soon as it sensed that its topic included rape? Our technology does not prevent the generation of text, and we need to remain consistent with the values that tell us that lead us to the conclusion that it should not. It is a bad idea to introduce technologies that prevent the generation of images.
One important reason why our technology does not censor text on creation, is it is not people who design technology that get to make the decision about what I can and cannot express. Rather it is the legal system, which is in turn based on the values of the community at large. This system, imperfect and slow moving as it might be, represents individual citizens equally, and can be influenced by them, in the way that a technology cannot be influenced equally by everyone.
The "Smart Photography" article argues that photocopiers prevent people from copying money, i.e., paper bills, and that this technology represents a next step. The reason why money works at all is that there is a system and a society working to make sure that its purpose is unambiguously interpreted. Money is a conventionalized sign at the basis of the society, it exists at all exactly because it is not open for interpretation. Plainly stated, the argument that technology blocking photographic capture of adult content is a natural extension of blocking photocopies of money, relies on an unsound analogy, and must be discounted for this reason.
What's the alternative to "Smart Photography"?
The solution is not making photos smarter, but rather it's changing people. It's relentlessly pursuing our efforts to support each other in our communities, and to help each other make better decisions. It's about the unending quest, that begins again with each new generation: to make people smarter.
We need to assure adequate funding to the people who dedicate their careers to fighting crime. Finding perpetrators of sexual abuse/sex crimes is simply a hard task that requires a huge investment: sick minds are sick, and they will not let a new camera technology stand in the way of their evil business. With this "Smart Photography" camera, sex offenders will be incentivized to start taking pictures that are not so easy to automatically identify, and they may be able to wipe out their own footprints. There are no easy technical shortcuts that will eliminate the need for old fashion crime fighting, yes, also of the gumshoe variety.
We need to educate people. Bear selfies are stupid. Getting people to stop taking bear selfies is not a matter of creating a camera that recognizes a bear selfie situation, and blocks the shutter when someone tries to take a bear selfie. The bear selfie is a symptom of an underlying lack of reflection. It is the underlying problem, and not its superficial manifestation that needs to be addressed. The answer is about taking the time to really talk to our children, and to each other, about what is appropriate and what is not appropriate in a given situation.
Below is the most repugnant photo that I have ever posted online, but today for the first time in my life, I did not take for granted that my camera includes a functionality that allows me to take it.