NeuralNet Storyteller: Letting AI write stories based on Photographs

I have been into photography for quite sometime now, it was inevitable that my little brain would want to try something out with the photographs. So here is another little fun project I did to explore the combination of art and Machine Learning. I had developed a habit of publishing my photographs only if I could relate it to a story that strongly resonated with the photograph. This introspection led me into investigating if I could let AI write stories by looking at the photographs. And that is when I bumped on to the ULTIMATE NeuralNet Storyteller developed by Ryan Kiros et al from University of Toronto.

What does it do?

Given a photograph, it writes a story in the artistic style of a romantic novel without human intervention. Think of it as writing a passage for a romantic novel given a visually descriptive image.Before heading on to the technical summary, let’s look at the results of NeuralNet Storyteller on some of the photographs:

OUTPUT: My descent to the moon , blinking in the evening sky . It was as if I had spent the last thousand years trying to find a way out of it , but I did n’t know what else to do . Instead , I felt a sinking feeling in the pit of my stomach . The reality of the situation was so much more than that Elizabeth had caught up with him , and she simply held his arm out for her . The sun ‘s rays rose like stars on the horizon , enveloping us . I wanted to stay alive .

 

OUTPUT: Images wing of the plane was spectacular , and I wondered what it was like to fly out of the sky . For the first time in millennia , I could n't find the truth in the news . In fact , it had been more than ever since my plane landed on Earth . It was as if he breathed in and out of the air , looking up at the sky above us . The plane had begun to take its toll , but that 's the only way possible . I wanted to rescue her and hold my breath .

OUTPUT:
Images wing of the plane was spectacular , and I wondered what it was like to fly out of the sky . For the first time in millennia , I could n’t find the truth in the news . In fact , it had been more than ever since my plane landed on Earth . It was as if he breathed in and out of the air , looking up at the sky above us . The plane had begun to take its toll , but that ‘s the only way possible . I wanted to rescue her and hold my breath .

 

OUTPUT: I bird barely touched the water as it was in front of me . By the time I reached the source of the fog , I could hear the rise and fall of his chest , leaving her gasping for air . I wanted to talk to her , but I had no intention of letting her go . In fact , it was the most beautiful thing I 'd ever seen . The sun rose above the horizon as a bird swam in and out , leaving me gasping for air . It seems like the right person to ask who he is , that I had fallen asleep in the water .

OUTPUT:
I bird barely touched the water as it was in front of me . By the time I reached the source of the fog , I could hear the rise and fall of his chest , leaving her gasping for air . I wanted to talk to her , but I had no intention of letting her go . In fact , it was the most beautiful thing I ‘d ever seen . The sun rose above the horizon as a bird swam in and out , leaving me gasping for air . It seems like the right person to ask who he is , that I had fallen asleep in the water .

 

 

And How?

The NeuralNet Storyteller takes an image, recognizes the objects in the image based on which it produces a caption and then transforms the caption into a short romantic story using what is called Style Shifting.

The only part that is trained in a supervised manner was for generating captions using Microsoft COCO data. RNN is trained on Romance novels to first build a decoder to convert passages from the novel to skip-thought vector representation. These skip thought vectors are then conditioned to generate the passages that were used to generate the skip-thought vectors.To obtain the artistic style of Romance Novel, this dataset (romance novels from BookCorpus) has been used. In order to embed new images and retrieve captions, a visual-semantic embedding is trained between COCO images and captions. The captions and images are mapped into a common vector space.
The Skip-Thought Vectors are obtained from an unsupervised approach to train a generic, distributed sentence encoder.Sentences that share semantic and syntactic properties are mapped to similar vector representations. These Vectors make it possible to construct a Style-Shifting function in a simple way that bridges the gap between retrieved image captions and passages in novels.

  1. RNN is trained on Romance novels for encoding and decoding passages from romance novels to skip-thought vectors
  2. Simultaneously, a visual semantic embedding is trained to obtain captions for given photographs
  3. Input: A photograph is given as input
  4. Obtain Caption for Photograph: The Visual Semantic embedding predicts the most suitable caption for the image.
  5. Style Shifting: The caption is then translated into a romantic story type of passage by keeping the ‘thought’ of the caption and replacing the caption (descriptive) style with romantic story style.
  6. Output: TA DA! A Romantic Story Syle Caption for the Photograph.

I hope you are as impressed with the results as I am. If you have ideas for fun AI projects or if you are looking to collaborate, give me a shout-out here or simply comment below. Also, checkout my photography work on facebook and instagram.

Reference:

Ryan Kiros, Yukun Zhu, Ruslan Salakhutdinov, Richard S. Zemel, Antonio Torralba, Raquel Urtasun, and Sanja Fidler. “Skip-Thought Vectors.” arXiv preprint arXiv:1506.06726 (2015).
Code: https://github.com/ryankiros/neural-storyteller