15.06.2019 - Anılcan Atik

DeepDream Project

DeepDream is an experiment that visualizes the patterns learned by a neural network. Similar to when a child watches clouds and tries to interpret random shapes, DeepDream over-interprets and enhances the patterns it sees in an image.

This individual project for Advanced Machine Learning is based on resources:

This resource contains a minimal implementation of DeepDream, as described in this blog post by Alexander Mordvintsev and mostly these codes are implemented in this project.

It does so by forwarding an image through the network, then calculating the gradient of the image with respect to the activations of a particular layer. The image is then modified to increase these activations, enhancing the patterns seen by the network, and resulting in a dream-like image. This process was dubbed "Inceptionism" (a reference to InceptionNet, and the movie Inception).

I will try to implement this technique to my image.


In [43]:
import tensorflow as tf
import numpy as np

import matplotlib as mpl

import IPython.display as display
import PIL.Image

from tensorflow.keras.preprocessing import image

Choosing an image to dream-ify

For this tutorial, I'll be using an image of a minimal red planet with rings.

In [2]:
url = 'https://i.imgur.com/xMSN3oj.jpg'
In [51]:
# Download an image and read it into a NumPy array.
def download(url, max_dim=None):
  name = url.split('/')[-1]
  image_path = tf.keras.utils.get_file(name, origin=url)
  img = PIL.Image.open(image_path)
  if max_dim:
    img.thumbnail((max_dim, max_dim))
  return np.array(img)

# Normalize an image
def deprocess(img):
  img = 255*(img + 1.0)/2.0
  return tf.cast(img, tf.uint8)

# Display an image
def show(img):

# Downsizing the image makes it easier to work with.
original_img = download(url, max_dim=500)

Futuristic and minimalistic elements in this image attracts my attention, and I thought implementing ML algorithms' perspective on it would be a good idea.

In this part we download and prepare a pre-trained image classification model.

We will be using InceptionV3.
We will get help from Model feature visualization appendix which gives visual clues about layers of a GoogleNet model, trained on ImageNet images.

In [16]:
base_model = tf.keras.applications.InceptionV3(include_top=False, weights='imagenet')

The idea in DeepDream is to choose a layer (or layers) and maximize the "loss" in a way that the image increasingly "excites" the layers.

By maximizing the "loss" function instead of the conventional minimization of the loss function, we get an image that is increasingly deviated from the original.
Each layer in the DNN model is specialized in a spesific area, choosing different layers would deviate the original photo in a way the layer is specialized.
Lower layers tend to produce strokes or simple patterns, while deeper layers give sophisticated features in images, or even whole objects regarding the InceptionV3 model.

The InceptionV3 architecture is quite large (for a graph of the model architecture see TensorFlow's research repo). For DeepDream, the layers of interest are those where the convolutions are concatenated. There are 11 of these layers in InceptionV3, named 'mixed0' though 'mixed10'. Using different layers will result in different dream-like images. Deeper layers respond to higher-level features (such as eyes and faces), while earlier layers respond to simpler features (such as edges, shapes, and textures).

In [17]:
# Maximize the activations of these layers
names = ['mixed8','mixed9']
layers = [base_model.get_layer(name).output for name in names]
# Create the feature extraction model
dream_model = tf.keras.Model(inputs=base_model.input, outputs=layers)
[<tf.Tensor 'mixed8_1/Identity:0' shape=(None, None, None, 1280) dtype=float32>, <tf.Tensor 'mixed9_2/Identity:0' shape=(None, None, None, 2048) dtype=float32>]

Calculating loss

The loss is the sum of the activations in the chosen layers.
We normally try to minimize loss function spesifically in the categorizatiion models to minimize error and improve the success of the model.
The loss is normalized at each layer so the contribution from larger layers does not outweigh smaller layers.
Normally, loss is a quantity you wish to minimize via gradient descent. In DeepDream, you will maximize this loss via gradient ascent.

In [18]:
def calc_loss(img, model):
  # Pass forward the image through the model to retrieve the activations.
  # Converts the image into a batch of size 1.
  img_batch = tf.expand_dims(img, axis=0)
  layer_activations = model(img_batch)
  if len(layer_activations) == 1:
    layer_activations = [layer_activations]

  losses = []
  for act in layer_activations:
    loss = tf.math.reduce_mean(act)

  return  tf.reduce_sum(losses)

Gradient ascent

Once we calculated the loss for the chosen layers, awe need to calculate the gradients with respect to the image, and add them to the original image.

Introducing gradients to the image enhances the patterns seen by the network. At each step, you will have created an image that increasingly excites the activations of certain layers in the network.

The method that does this, below, is wrapped in a tf.function for performance. It uses an input_signature to ensure that the function is not retraced for different image sizes or steps/step_size values. See the Concrete functions guide for details.

In [19]:
class DeepDream(tf.Module):
  def __init__(self, model):
    self.model = model

        tf.TensorSpec(shape=[None,None,3], dtype=tf.float32),
        tf.TensorSpec(shape=[], dtype=tf.int32),
        tf.TensorSpec(shape=[], dtype=tf.float32),)
  def __call__(self, img, steps, step_size):
      loss = tf.constant(0.0)
      for n in tf.range(steps):
        with tf.GradientTape() as tape:
          # This needs gradients relative to `img`
          # `GradientTape` only watches `tf.Variable`s by default
          loss = calc_loss(img, self.model)

        # Calculate the gradient of the loss with respect to the pixels of the input image.
        gradients = tape.gradient(loss, img)

        # Normalize the gradients.
        gradients /= tf.math.reduce_std(gradients) + 1e-8 
        # In gradient ascent, the "loss" is maximized so that the input image increasingly "excites" the layers.
        # You can update the image by directly adding the gradients (because they're the same shape!)
        img = img + gradients*step_size
        img = tf.clip_by_value(img, -1, 1)

      return loss, img
In [8]:
deepdream = DeepDream(dream_model)

Main Loop

In [20]:
def run_deep_dream_simple(img, steps=100, step_size=0.01):
  # Convert from uint8 to the range expected by the model.
  img = tf.keras.applications.inception_v3.preprocess_input(img)
  img = tf.convert_to_tensor(img)
  step_size = tf.convert_to_tensor(step_size)
  steps_remaining = steps
  step = 0
  while steps_remaining:
    if steps_remaining>100:
      run_steps = tf.constant(100)
      run_steps = tf.constant(steps_remaining)
    steps_remaining -= run_steps
    step += run_steps

    loss, img = deepdream(img, run_steps, tf.constant(step_size))
    print ("Step {}, loss {}".format(step, loss))

  result = deprocess(img)

  return result



I am going to implement Layer 8 (mixed8) and Layer 9 (mixed9) in my model.

In [63]:
dream_img = run_deep_dream_simple(img=original_img, 
                                  steps=100, step_size=0.01)

Scaling up with tiles

One thing to consider is that as the image increases in size, so will the time and memory necessary to perform the gradient calculation. The above octave implementation will not work on very large images, or many octaves.

To avoid this issue you can split the image into tiles and compute the gradient for each tile.

Applying random shifts to the image before each tiled computation prevents tile seams from appearing.

Start by implementing the random shift:

In [10]:
def random_roll(img, maxroll):
  # Randomly shift the image to avoid tiled boundaries.
  shift = tf.random.uniform(shape=[2], minval=-maxroll, maxval=maxroll, dtype=tf.int32)
  shift_down, shift_right = shift[0],shift[1] 
  img_rolled = tf.roll(tf.roll(img, shift_right, axis=1), shift_down, axis=0)
  return shift_down, shift_right, img_rolled
In [11]:
shift_down, shift_right, img_rolled = random_roll(np.array(original_img), 512)