Wednesday, November 30, 2011

Using Narrative to Support Image Search


A strong system metaphor helps to align the needs and expectations with which a user approaches a multimedia search engine and the functionality and types of results that that search engine provides. My conviction on this point is so firm that I found myself dressed up as Alice from Alice's Adventures in Wonderland and competing as a finalist at the ACM Multimedia 2011 Grand Challenge in the Yahoo! Image Challenge.

Essentially, the story in the book runs that Alice enters, after a long fall, through a door into another world. Here, she encounters the fantastic and the unexpected, but her views are basically determined by two perspectives: one that she has when she grows to be very big and one that she has when she shrinks to be very small. The book plays with language and with logic and for this reason has a strong intellectual appeal to adults as well as holding the fascination of children.

We built a system based on this narrative, which offers users (in response to an arbitrary Flickr query) sets of unexpected yet fascinating images, created either from a "big" perspective or from a "small" perspective. The "Alice" metaphor tells the user to: (1) Expect the "big" and "small" perspectives (2) Expect a system that can be understood at two levels: as both engaging childlike curiosity and also meriting serious intellectual attention due to the way in which it uses language and statistics (3) Expect a system that will need a little bit of patience since the images appear a bit slowly (we're processing a flood of live Flickr results in the background), like the fading in of the Cheshire Cat.

The Grand Challenge requires participants to present their idea in exactly three minutes in a presentation that addresses the following points:
  • What is your solution? Which challenge does it address? How does it address the challenge?
  • Does your solution work? Is there evidence that it works?
  • Can you demo the system?
  • Is the solution generalizable to other problems? What are the limits of your approach?
  • Can other people reproduce your results? How?
  • Did the audience and the jury understand and ENJOY your presentation?
We used the three minutes to cover these points in a dialogue between Alice and Christoph Kofler (CK), first author on the Grand Challenge paper:

Kofler, C., Larson, M., Hanjalic, A. Alice's Worlds of Wonder: Exploiting Tags to Understand Images in terms of Size and Scale. ACM Multimedia 2011, Grand Challenge paper.

During the dialogue we demonstrated the system running live (We knew it was a risk to run a live demo, but luck was with us and the wireless network held up).

Alice's Worlds of Wonder: Three Minute Dialogue

(showing a rather standard opening slide)
CK: Alice, look at them out there, their image search experience is dry and boring.

Alice: We should show them our answer to the Yahoo! Image Challenge on Novel Image Understanding.

(showing system interface)
CK: The Wonderlands system runs on top of Flickr and sorts search results for the user at search time.

(dialogue during live demo)
Alice: Let’s show them how it works. Do we trust the wireless network?
CK: Yes. We need a Flickr query.
Alice: Let’s do “car”
CK: The Wonderlands system presents the user with the choice to enter “Alice’s Small World” or “Alice’s Big World”
Alice: Let’s choose Small World.

Alice (to audience): If you know me in "Alice in Wonderland", you know that in the story I shrink to become very small. This is the metaphor underlying the Small World of the Wonderlands system. It shrinks you, too, as a Flickr user, by putting you eye-to-eye with small objects pictured in small environments with limited range. You get the impression you have the perspective of a small being viewing the world from down low.

Still Alice: (to CK) Let’s choose Big World now. In the book, I also grow to be very big. The Big World makes you grow again. Objects are large and the perspective is broad.

You can imagine cases in which you were looking for person-sized cars --- here, the Big World would help you focus your search on the images that you really want.

CK: Should we explain how it works?

Alice: Yes.

CK: (Displays "Implicit Physics of Language" slide) We exploit a combination of user tags and the implicit physics of language.

Alice: Exactly.

Alice: Basically, your search engine knows something about the physics of the real world because it indexes large amounts of human language.

Certain queries give you the real-world size of objects: “the flower in her hand” returns a large number of results, so you can infer that a flower is small.

CK: Oh yes! And “the factory in her hand” returns no results so you know a factory is large.

Alice: Basically, the search engine is telling us that a girl holding a flower in her hand is a common situation, but that her holding a factory is not. We get this effect because physics dictates that something commonly held in a human hand must be small.

CK: (Displays with the entry window with the two doors) The sorting algorithm is straightforward. Alice’s Small World contains images whose tags tend to designate smaller objects and Alice’s Big World contains images whose tags tend to designate larger objects.

Alice: Exactly.

CK: So Alice, the system takes a fanciful and engaging perspective. But in order to carry out quantitative evaluation we can look at it in terms of scale. We achieve a weighted precision nearly three times random chance.
(Flash up under the two doors "Evaluation on 1,633 Flickr images from MIRFLICKR data set. 0.773 weighted precision")

Alice: So the scale numbers point to the conclusion that we are creating a genuine two-worlds experience for users.

CK: Right. But, Alice, do we need to stop at two worlds: big and small? Are there other worlds out there?

Alice: Well, Christoph, effectively the only limit is the speed at which we can query Flickr and Yahoo!. You know that the implicit physics of language works because of general physical principles. So, in theory, there are as many different worlds as there are interesting physical properties.

CK: But being Alice, you like the small and the big worlds, right?

Alice: Yes, I do. Shall we try another query?

CK: (Display final slide) Or we can just tell them where to download the system. You know, the code's online.

Alice: Yes, let them try it out! No more dry and boring image search for this group...(TIME UP!!)