I have been into photography for quite sometime now, it was inevitable that my little brain would want to try something out with the photographs. So here is another little fun project I did to explore the combination of art and Machine Learning. I had developed a habit of publishing my photographs only if I could relate it to a story that strongly resonated with the photograph. This introspection led me into investigating if I could let AI write stories by looking at the photographs. And that is when I bumped on to the ULTIMATE NeuralNet Storyteller developed by Ryan Kiros et al from University of Toronto.
What does it do?
Given a photograph, it writes a story in the artistic style of a romantic novel without human intervention. Think of it as writing a passage for a romantic novel given a visually descriptive image.Before heading on to the technical summary, let’s look at the results of NeuralNet Storyteller on some of the photographs:
The NeuralNet Storyteller takes an image, recognizes the objects in the image based on which it produces a caption and then transforms the caption into a short romantic story using what is called Style Shifting.
The only part that is trained in a supervised manner was for generating captions using Microsoft COCO data. RNN is trained on Romance novels to first build a decoder to convert passages from the novel to skip-thought vector representation. These skip thought vectors are then conditioned to generate the passages that were used to generate the skip-thought vectors.To obtain the artistic style of Romance Novel, this dataset (romance novels from BookCorpus) has been used. In order to embed new images and retrieve captions, a visual-semantic embedding is trained between COCO images and captions. The captions and images are mapped into a common vector space.
The Skip-Thought Vectors are obtained from an unsupervised approach to train a generic, distributed sentence encoder.Sentences that share semantic and syntactic properties are mapped to similar vector representations. These Vectors make it possible to construct a Style-Shifting function in a simple way that bridges the gap between retrieved image captions and passages in novels.
- RNN is trained on Romance novels for encoding and decoding passages from romance novels to skip-thought vectors
- Simultaneously, a visual semantic embedding is trained to obtain captions for given photographs
- Input: A photograph is given as input
- Obtain Caption for Photograph: The Visual Semantic embedding predicts the most suitable caption for the image.
- Style Shifting: The caption is then translated into a romantic story type of passage by keeping the ‘thought’ of the caption and replacing the caption (descriptive) style with romantic story style.
- Output: TA DA! A Romantic Story Syle Caption for the Photograph.
I hope you are as impressed with the results as I am. If you have ideas for fun AI projects or if you are looking to collaborate, give me a shout-out here or simply comment below. Also, checkout my photography work on facebook and instagram.
Ryan Kiros, Yukun Zhu, Ruslan Salakhutdinov, Richard S. Zemel, Antonio Torralba, Raquel Urtasun, and Sanja Fidler. “Skip-Thought Vectors.” arXiv preprint arXiv:1506.06726 (2015).