Wednesday 19 July 2017

Deep learning 08--Neural style by Keras


    Today I want to write down how to implement the neural style of the paper A Neural Algorithm of Artistic Style by Keras learn from fast.ai course. You can find the codes located at github.

    Before I begin to explain how to do it, I want to mentioned that generate artistic style by deep neural network is different with image classification, we need to learn new concepts and add them into our tool boxes, if you find it hard to understand at the first time you saw it, do not fear, I have the same feeling too. You can ask me the questions or go to fast ai forum.

    The paper present an algorithm to generate artistic style image by combine two image together using convolution neural network. Here are examples combine source images(bird, dog, building) with style images like starry , alice and tes_teach. From left to right is style image, source image, image combined by convolution neural network.









    Let us begin our journey of the implementation of the algorithm(I assume you know how to install Keras, tensorflow, numpy, cuda and other tools, I recommend using ubuntu16.04.x as your os, this could save you tons of headache when setup your deep learning toolbox).

Step 1 : Import file and modules

 
from PIL import Image

import os

import keras.backend as K
import vgg16_avg

from keras.models import Model
from keras.layers import *
from keras import metrics

from scipy.optimize import fmin_l_bfgs_b
from scipy.misc import imsave



Step 2 : Preprocess our input image

 
#the value of rn_mean is come from image net data set
rn_mean = np.array([123.68, 116.779, 103.939], dtype=np.float32)

#create image close to zero mean and convert rgb channel to bgr channel 
#since the vgg model need bgr channel. ::-1 invert the order of axis 0
preproc = lambda x: (x - rn_mean)[:,:,:,::-1]
#We need to undo the preprocessing before we save it to our hard disk
deproc = lambda x: x[:,:,:,::-1] + rn_mean

Step 3 : Read the source image and style image

    Source image is the image you want to apply style on it. Style image is the style you want to apply on the source image.


dpath= os.getcwd() + "/"

#I make the size of content image, style image, generated img
#have the same shape, but this is not mandatory
#since we do not use any full connection layer
def read_img(im_name, shp):
    style_img = Image.open(im_name)
    if len(shp) > 0:
        style_img = style_img.resize((shp[2], shp[1]))
    style_arr = np.array(style_img)    
    #The image read by PIL is three dimensions, but the model
    #need a four dimensions tensor(first dim is batch size)
    style_arr = np.expand_dims(style_arr, 0)
    
    return preproc(style_arr)

content_img_name = "dog"
content_img_arr = read_img(dpath + "img/{}.png".format(content_img_name), [])
content_shp = content_img_arr.shape
style_img_arr = read_img(dpath + "img/starry.png", content_shp)

Step 4 : Load vgg16_avg

    Unlike doing image classification with pure sequential api of Keras, to build a neural style network, we need to use backend api of Keras.


content_base = K.variable(content_img_arr)
style_base = K.variable(style_img_arr)
gen_img = K.placeholder(content_shp)
batch = K.concatenate([content_base, style_base, gen_img], 0)

#Feed the batch into the vgg model, every time we call the model/layer to
#generate output, it will generate output of content_base, style_base,
#gen_img. Unlike content_base and style_base, gen_img is a placeholder,
#that means we will need to provide data to this placeholder later on
model = vgg16_avg.VGG16_Avg(input_tensor = batch, include_top=False)

#build a dict of model layers
outputs = {l.name:l.output for l in model.layers}
#I prefer these 1~3 layers hierarchy as my style_layers, 
#you can try it out with different range
style_layers = [outputs['block{}_conv1'.format(i)] for i in range(1,4)]
content_layer = outputs['block4_conv2']

    If you find K.variable, K.placeholder very confuse, please check the document of TensorFlow and Keras backend api.

Step 5 : Create function to find loss and gradient


#gram matrix is a matrix collect the correlation of all of the vectors
#in a set. Check wiki(https://en.wikipedia.org/wiki/Gramian_matrix) 
#for more details
def gram_matrix(x):
    #change height,width,depth to depth, height, width, it could be 2,1,0 too
    #maybe 2,0,1 is more efficient due to underlying memory layout
    features = K.permute_dimensions(x, (2,0,1))
    #batch flatten make features become 2D array
    features = K.batch_flatten(features)
    return K.dot(features, K.transpose(features)) / x.get_shape().num_elements()    

def style_loss(x, targ):
    return metrics.mse(gram_matrix(x), gram_matrix(targ))
    
content_loss = lambda base, gen: metrics.mse(gen, base)    

#l[1] is the output(activation) of style_base, l[2] is the
#output of gen_img loss of style image and gen_img. As the
#paper suggest, we add the loss of all convolution layers
loss = sum([style_loss(l[1], l[2]) for l in style_layers]) 

#content_layer[0] is the output of content_base,
#content_layer[2] is the output of gen_img
#loss of content image and gen_img
loss += content_loss(content_layer[0], content_layer[2]) / 10. 

#The loss need two variables but we only pass in one,
#because we only got one placeholder in the graph,
#the other variable already determine by K.variable
grad = K.gradients(loss, gen_img)
#We cannot call loss and grad directly, we need
#to create a function(convert it to symbolic definition)
#before we can feed it into the solver
fn = K.function([gen_img], [loss] + grad)

    You can adjust the weight of style loss and content loss by yourself until you think the image looks good enough. The function at the end only tells you that the concatenated list of loss and grads is the output that you want to - eventually - minimize. So, when you feed it to the solver bfgs, it will try to minimize the loss and will stop when the gradients are also zero (a minimum, hopefully not just a local one).

Step 6 : Create a helper class to separate loss and gradient


#fn will return loss and grad, but fmin_l_bfgs need to seperate them
#that is why we need a class to separate loss and gradient and store them
class Evaluator:
    def __init__(self, fn_, shp_):
        self.fn = fn_
        self.shp = shp_
        
    def loss(self, x):
        loss_, grads_ = self.fn([x.reshape(self.shp)])
        self.grads = grads_.flatten().astype(np.float64)
        
        return loss_.astype(np.float64)
    
    def grad(self, x):
        return np.copy(self.grads)
    
evaluator = Evaluator(fn, content_shp)

Step 7 : Generate a random noise image(white noise image mentioned by the paper)


#This is the real value of the placeholder--gen_img
rand_img = lambda shape: np.random.uniform(-2.5, 2.5, shape)/100


Step 8 : Minimize the loss of rand_img with the source image and style image


def solve_img(evalu, niter, x):
    for i in range(0, niter):
        x, min_val, info = fmin_l_bfgs_b(evalu.loss, x.flatten(), 
                                         fprime=evalu.grad, maxfun = 20)
        #value of PIL lie within -127 and 127
        x = np.clip(x, -127, 127)
        print(i, ',Current loss value:', min_val)
        x = x.reshape(content_shp)
        simg = deproc(x.copy())
        img_name = '{}_{}_neural_style_img_{}.png'.
                    format(dpath + "gen_img/", content_img_name, i)
        imsave(img_name, simg[0])
    return x

solve_img(evaluator, 10, rand_img(content_shp)/10.)

    You may ask, why using fmin_l_bfgs_b but not stochastic gradient descent? The answer is we can, but we have a better choice. Unlike image classification, we do not have a lot of batch to run, right now we only need to figure out the loss and gradient between three inputs, they are source image, style image and the random image, using fmin_l_bfgs_b is more than enough.