Tuesday 10 October 2017

Deep learning 11-Modern way to estimate homography matrix(by light weight cnn)

  Today I want to introduce a modern way to estimate relative homography between a pair of images. It is a solution introduced by the paper titled Deep Image Homography Estimation.


Q : What is a homography matrix?

A : Homography matrix is a 3x3 transformation matrix that maps the points in one image to the corresponding points in another image.

Q : What are the use of homography matrix?

A : There are many applications depend on the homography matrix, a few of them are image stitching, camera calibration, augmented reality.

Q : How to calculate a homography matrix between two images?

A : Traditional solution are based on two steps, corner estimation and robust homography estimation. In corner detection step, You need at least  4 points correspondences between the two images, usually we would find out these points by matching features like AKAZE, SIFT, SURF.  Generally, the features found by those algorithms are over complete, we would prune out the outliers(ex : by RANSAC) after corner estimation. If you are interesting about the whole process describe by c++, take a look at this project.

Q : Traditional solution require heavy computation, do we have another way to obtain homography matrix between two images?

A : This is the question the paper want to answer, instead of design the features by hand, this paper design an algorithm to learn the homography between two images. The biggest selling point of the paper is they turn the homography estimation problem into a machine learning problem.


  This paper use VGG style CNN to measure the homography matrix between two images, they call it HomographyNet. This model is trained in an end to end fashion, quite simple and neat.

  HomographyNet come with two versions, classification and regression. Regression network produces eight real value numbers and use L2 loss as the final layer. Classification network use softmax as the final layer and quantize every real values into 21bins. First version has better accuracy, while average accuracy of second version is much worse than first version, it can produce confidences. 


4-Point Homography Parameterization

  Instead of using a 3x3 homography matrix as the label(ground truth), this paper use 4-point parameterization as label.

Q : What is 4-point parameterization?

A : 4-point parameterization store the different of 4 corresponding points between two images, Fig03 and Fig04 explain it well.


Q : Why do they use 4-point parameterization but not 3x3 matrix?

A : Because the 3x3 homography is very difficult to train, the problem is the 3x3 matrix mixing 
rotation and translation together, the paper explain why. 

The submatrix [H11, H12; H21, H22] represents the rotational terms in the homography., while the vector [H13, H23] is the translational offset. Balancing the rotational and translational terms as part of an optimization problem is difficult.


  Data Generation

Q : Training deep convolution neural networks from scratch requires a large amount of data, where could we obtain the data?

A : The paper invent a smart solution to generate nearly unlimited number of labeled training examples. Fig05 summarize the whole process



  Results of my implementation is outperform the paper, average loss of mine is 2.58, while the paper is 9.2. Largest loss of my model is 19.53. Performance of my model are better than the paper more than 3.5 times(9.2/2.58 = 3.57).  What makes the performance improve so much?A few of reasons I could think of are

1. I change the network architectures from vgg like to squeezeNet1.1 like.
2. I do not apply any data augmentation, maybe blurring or occlusion cause the model harder to train.
3. The paper use data augmentation to generate 500000 data for training, but I use 500032 images from imagenet as my training set. I guess this potentially increase variety of the data, the end result is network become easier to train and more robust(but they may not work well for blur or occlusion).

  Following are some of the results, the region estimated by the model(red rectangle) is very close to the real regions(blue rectangle).


Final thoughts

  The results looks great, but this paper do not answer two important questions.

1. The paper only test on synthesis images, do they work on real world images?
2. How should I use the trained model to predict a homography matrix?

  I would like to know the answer, if anyone find out, please leave me a message.

Codes and model

  As usual, I place my codes at github, model at mega.

  If you liked this article, please help others find it by clicking the little g+ icon below. Thanks a lot!

Sunday 3 September 2017

Wrong way to use QThread

  There are two ways to use QThread, first solution is inherit QThread and override run function, the other solution is create a controller. Today I would like to talk about second solution and show you how to misused QThread(general gotcha of QThread).

  The most common error I see is calling the function of the worker directly, please do not do that, because in this way, your worker will not work on another thread but the thread you are calling it.

  Allow me prove this to you by a small example. I do not separate implementation and declaration in this post because this make the post easier to read.

case 1 : Call by function

1 : let us create a very simple, naive worker, this worker must be an QObject, because we need to move the worker into QThread.

class naive_worker : public QObject
    explicit naive_worker(QObject *obj = nullptr);

    void print_working_thread()

2 : create a dead simple gui by QtDesigner. Button "Call by normal function" will call the function "print_working_thread", directly, button "Call by signal and slot" will call the "print_working_thread" by signal and slot, "Print current thread address" will print the address of main thread(gui thread).

3 : Create a controller

class naive_controller : public QObject
    explicit naive_controller(QObject *parent = nullptr):
    worker_(new naive_worker)
        //move your worker to thread, so Qt know how to handle it
        //your worker should not have a parent before calling

        connect(&thread_, &QThread::finished, worker_, &QObject::deleteLater);

        //this connection is very important, in order to make worker work on the thread
        //we move to, we have to call it by the mechanism of signal and slot
        connect(this, &naive_controller::print_working_thread_by_signal_and_slot,
                worker_, &naive_worker::print_working_thread);


    void print_working_thread_by_normal_call()

    void print_working_thread_by_signal_and_slot();

    QThread thread_;
    naive_worker *worker_;

4 : Call it by two different functions and compare their address.

class MainWindow : public QMainWindow
    explicit MainWindow(QWidget *parent = nullptr);

private slots:
    void on_pushButtonPrintCurThread_clicked()
        //this function will be called when 
        //"Print current thread address" is clicked
    void on_pushButtonCallNormalFunc_clicked()
        //this function will be called when
        //"Call by normal function" is clicked

    void on_pushButtonCallSignalAndSlot_clicked()
       //this function will be called when
       //"Call by signal and slot" is clicked

    naive_controller *controller_;
    naive_worker *worker_;
    Ui::MainWindow *ui;

5. Run the app and click the button with following order. "print_working_thread"->"Call by normal function"->"Call by signal and slot" and see what happen. Following are my results

QThread(0x1bd25796020) //call "print_working_thread"
QThread(0x1bd25796020) //call "Call by normal function"
QThread(0x1bd2578bf70) //call "Call by signal and slot"

    Apparently, to make our worker run in the QThread we moved to, we have to call it through signal and slot machanism, else it will execute in the same thread. You may ask, this is too complicated, do we have an easier way to spawn a thread? Yes we do, you can try QtConcurrent::run and std::async, they are easier to use compare with QThread(it is a regret that c++17 fail to include future.then) , I use QThread when I need more power, like thread communication, queue operation.

Source codes

    Located at github.

Monday 28 August 2017

Deep learning 10-Let us create a semantic segmentation model(LinkNet) by PyTorch

  Deep learning, in recent years this technique take over many difficult tasks of computer vision, semantic segmentation is one of them. The first segmentation net I implement is LinkNet, it is a fast and accurate segmentation network. 


Q : What is LinkNet?

A :  LinkNet is a convolution neural network designed for semantic segmentation. This network is 10 times faster than SegNet and more accurate.

Q : What is semantic segmentation? Any difference with segmentation?

A :  Of course they are difference. Segmentation partition image into several "similar" parts, but you do not know what are those parts presents. On the other hand, semantic segmentation partition the image into different pre-determined labels. Those labels are present as color as the end results. For example, checkout the following images(from camvid).

Q : Semantic segmentation sounds like object detection, are they the same thing?

A : No, they are not, although you may achieve the same goal by both of them.
From the aspect of tech, they use different approach. From the view of end results, semantic segmentation tell you what are those pixels are, but they do not tell you how many instance in your images, object detection show you how many instance in your images by minimal bounding box, but it do not give you delienation of objects. For example, checkout below images(from yolo).

Network architectures

  LinkNet paper describe their network architecture with excellent graphs and simple descriptions, following are the figures copy shameless from the paper.

  LinkNet adopt encoder-decoder architecture, according to the paper, LinkNet performance or come from adding the output of encoder to the decoder, this help the decoder easier to recover the information. If you want to know the details, please study section 3 of the paper, it is nice writing, very easy to understand.

Q : The paper is easy to read, but they do not explain what is full convolution, could you tell me what that means?

A :  Full convolution indicates that the neural network is composed of convolution layers and activation only, without any full connection or pooling layers. 

Q : How do they perform down-sampling without pooling layers?

A : Make the stride of convolution as 2 x 2 and do zero padding, if you cannot figure it out why this work, I suggest you create an excel file, write down some data and do some experiment.

Q : Which optimizer work best?

A : According to the paper, rmsprop is the winner, my experiments told me the same thing too, in case you are interesting, below are the graph of training loss. From left to right is rmsprop, adam, sgd. Hyper parameters are

Initial learning rate : adam and rmsprop are 5e-4, sgd is 1e-3
Augmentation : random crop(480,320) and horizontal flip
Normalize : subtract mean(based on imagenet mean value) and divided by 255
Batch size : 16
Epoch : 800
Training examples : 368

  The results of adam and rmsprop are very close. Loss of sgd steadily decrease, but it converge very slow even with higher learning rate, maybe higher learning rate would work better for SGD.

Data pre-processing

  Almost every computer vision task need you to pre-process your data, segmentation is not an exception, following are my steps.

1 : Convert the color do not exist in the category into void(0, 0, 0)
2 : Convert the color into integer
3 : Zero mean(mean value come from imagenet)

Experiment on camvid

  Enough of Q&A, let us have some benchmark and pictures­čśŐ.

  Model 1,2,3 all train with same parameters, pre-processing but with different input size when training, they are (128,128), (256,256), (512, 512). When testing, the size of the images are (960,720).

  Following are some examples, from left to right is original image, ground truth and predicted image.

  Results looks quite good and  IoU is much better than the paper, possible reasons are

1 : I augment the data by random crop and horizontal flip, the paper may use another methods or do not perform augmentation at all(?).

2 : My pre-processing are different with the paper

3 : I did not omit void when training

4 : My measurement on IoU is wrong

5 : My model is more complicated than the paper(wrong implementation)

6 : It is overfit

7 : Random shuffle training and testing data create data leakage because many images of camvid
are very similar to each other

Trained models and codes

1 : As usual, located at github.
2 :  Model trained with 368 images, 12 labels(include void), random crop (128x128),800 epoch
3 :  Model trained with 368 images, 12 labels(include void), random crop (480x320),800 epoch
4 :  Model trained with 368 images, 12 labels(include void), random crop (512x512),800 epoch


Q : Is it possible to create portable model by PyTorch?

A : It is possible, but not easy. you could check out ONNX and caffe2 if you want to try it. Someone manage to convert pytorch model to caffe model and loaded by opencv dnn. Right now opencv dnn do not support PyTorch but PyTorch. Thanks god opencv dnn can import model trained by torch  at ease(right now opencv dnn do not support nngraph).

Q : What are IoU and iIoU in the paper refer to?

A : This page give good definition, although I still can't figure out how to calculate iIoU.

  If you liked this article, please help others find it by clicking the little g+ icon below. Thanks a lot!

Monday 7 August 2017

Deep learning 09-Performance of perceptual losses for super resolution

    Have you ever scratch your head when upscaling low resolution images? I do, because we all know the quality of the images after upscaling degrade. Thanks to the rise of machine learning in recent years, we are able to upscale single image with better results compare with traditional solutions(ex : bilinear, bicubic. You do not need to know what they are except they are apply widely in many products), we call this technique super resolution.

    This sound great, but how could we do it?I did not know it either until I study the tutorials of part2 of the marvelous Practical Deep learning for Coders, this course is fantastic to get your feet wet on deep learning.

    I will try my best to explain everything with minimal prerequisite knowledge on machine learning and computer vision, however, some knowledge of convolution neural network(cnn) is needed. The course of part1 is excellent if you want to learn cnn in depth. If you are in a hurry, pyimagesearch and medium has a short tutorial about cnn.

What is super resolution and how does it work

Q : What is super resolution

A :  Super resolution is a class of technique to enhance the resolution of images or videos.

Q : There are many softwares could help us upscale images, why do we need super resolution?

A : Traditional solutions of upscaling image apply interpolation algorithm on one image only(ex: bilinear or bicubic). In the contrast, super resolution exploit info from another source, either from contiguous frames, from the model trained by machine learning or different scale from one image.

Q : How does super resolution work

A : Super resolution I want to introduce today is based on Perceptual losses for Real-Time style Transfer and Super-Resolution.(please consult wiki if you want to study another type of super resolution).  The most interesting part of this solution is it treat super resolution as an image transformation problem(it is a process where an input image is transformed into an output image). This mean we may use the same technique to solve colorization, denoising, depth estimation, semantic segmentation and another tasks(It is not a problem if you do not know what they are).

Q : How do we transformed low resolution image to high resolution image?

A : A picture worth a thousand words.

    This network is composed by two components, image transformation network and a loss network. Image transformation network transform low resolution image into high resolution image, while loss network measuring the difference between predicted high resolution image and the true high resolution image

Q : What is the loss network anyway?Why do we use it to measure the loss?

A : Loss network is an image classification network train on imagenet (ex : vgg16, resnet, densenet). We use it to measure the loss because we want our network to better measure perceptual and semantic difference between images. The paper call the loss measure by this loss network perceptual loss.

Q : What makes the loss network able to generate better loss?

A : The loss network can generate better loss because the convolutional neural network trained for image classification have already learned to encode the perceptual and semantic information we want.

Q : The color of the image is different after upscale, how could I fixed it?

A : You could apply histogram matching as the paper mentioned, this should be able to deal with most of the cases.

Q : Any draw back of this algorithm?

A : Of course, nothing is perfect.

1 : Not all of the image work, they may look very ugly after upscale.
2 : The image maybe ice cream to your eyes, but it is not reconstructing the photo exactly but create details based on its training from example images.It is impossible to reconstruct the image with perfect results, because we have no way to retrieve the information did not exist from the beginning.
3 : Color of part of the images change after upscale, even histogram matching cannot fix it.

Q : What is histogram matching?

A : It is a way to make the color distribution of image A looks like image B.


    All of the experiments use same network architecture and train on 80000 images from imagenet, 2 epoch. From left to right are original image, image upscale 4x by bicubic, image upscale by super resolution by 4x.

    The results are not perfect, but this is not the end, super resolution is a hot research topic, every paper is a stepping stone for next algorithm, we will see more and more better, advance techniques pop out in the future.

Sharing trained model and codes

1 : Notebook to transform the imagenet data to training data
2 : Notebook to train and use the super resolution model
3 : Network model with transformation network and loss network, trained on 80000 images

    If you liked this article, please help others find it by clicking the little g+ icon below. Thanks a lot!

Wednesday 19 July 2017

Deep learning 08--Neural style by Keras

    Today I want to write down how to implement the neural style of the paper A Neural Algorithm of Artistic Style by Keras learn from fast.ai course. You can find the codes located at github.

    Before I begin to explain how to do it, I want to mentioned that generate artistic style by deep neural network is different with image classification, we need to learn new concepts and add them into our tool boxes, if you find it hard to understand at the first time you saw it, do not fear, I have the same feeling too. You can ask me the questions or go to fast ai forum.

    The paper present an algorithm to generate artistic style image by combine two image together using convolution neural network. Here are examples combine source images(bird, dog, building) with style images like starry , alice and tes_teach. From left to right is style image, source image, image combined by convolution neural network.

    Let us begin our journey of the implementation of the algorithm(I assume you know how to install Keras, tensorflow, numpy, cuda and other tools, I recommend using ubuntu16.04.x as your os, this could save you tons of headache when setup your deep learning toolbox).

Step 1 : Import file and modules

from PIL import Image

import os

import keras.backend as K
import vgg16_avg

from keras.models import Model
from keras.layers import *
from keras import metrics

from scipy.optimize import fmin_l_bfgs_b
from scipy.misc import imsave

Step 2 : Preprocess our input image

#the value of rn_mean is come from image net data set
rn_mean = np.array([123.68, 116.779, 103.939], dtype=np.float32)

#create image close to zero mean and convert rgb channel to bgr channel 
#since the vgg model need bgr channel. ::-1 invert the order of axis 0
preproc = lambda x: (x - rn_mean)[:,:,:,::-1]
#We need to undo the preprocessing before we save it to our hard disk
deproc = lambda x: x[:,:,:,::-1] + rn_mean

Step 3 : Read the source image and style image

    Source image is the image you want to apply style on it. Style image is the style you want to apply on the source image.

dpath= os.getcwd() + "/"

#I make the size of content image, style image, generated img
#have the same shape, but this is not mandatory
#since we do not use any full connection layer
def read_img(im_name, shp):
    style_img = Image.open(im_name)
    if len(shp) > 0:
        style_img = style_img.resize((shp[2], shp[1]))
    style_arr = np.array(style_img)    
    #The image read by PIL is three dimensions, but the model
    #need a four dimensions tensor(first dim is batch size)
    style_arr = np.expand_dims(style_arr, 0)
    return preproc(style_arr)

content_img_name = "dog"
content_img_arr = read_img(dpath + "img/{}.png".format(content_img_name), [])
content_shp = content_img_arr.shape
style_img_arr = read_img(dpath + "img/starry.png", content_shp)

Step 4 : Load vgg16_avg

    Unlike doing image classification with pure sequential api of Keras, to build a neural style network, we need to use backend api of Keras.

content_base = K.variable(content_img_arr)
style_base = K.variable(style_img_arr)
gen_img = K.placeholder(content_shp)
batch = K.concatenate([content_base, style_base, gen_img], 0)

#Feed the batch into the vgg model, every time we call the model/layer to
#generate output, it will generate output of content_base, style_base,
#gen_img. Unlike content_base and style_base, gen_img is a placeholder,
#that means we will need to provide data to this placeholder later on
model = vgg16_avg.VGG16_Avg(input_tensor = batch, include_top=False)

#build a dict of model layers
outputs = {l.name:l.output for l in model.layers}
#I prefer these 1~3 layers hierarchy as my style_layers, 
#you can try it out with different range
style_layers = [outputs['block{}_conv1'.format(i)] for i in range(1,4)]
content_layer = outputs['block4_conv2']

    If you find K.variable, K.placeholder very confuse, please check the document of TensorFlow and Keras backend api.

Step 5 : Create function to find loss and gradient

#gram matrix is a matrix collect the correlation of all of the vectors
#in a set. Check wiki(https://en.wikipedia.org/wiki/Gramian_matrix) 
#for more details
def gram_matrix(x):
    #change height,width,depth to depth, height, width, it could be 2,1,0 too
    #maybe 2,0,1 is more efficient due to underlying memory layout
    features = K.permute_dimensions(x, (2,0,1))
    #batch flatten make features become 2D array
    features = K.batch_flatten(features)
    return K.dot(features, K.transpose(features)) / x.get_shape().num_elements()    

def style_loss(x, targ):
    return metrics.mse(gram_matrix(x), gram_matrix(targ))
content_loss = lambda base, gen: metrics.mse(gen, base)    

#l[1] is the output(activation) of style_base, l[2] is the
#output of gen_img loss of style image and gen_img. As the
#paper suggest, we add the loss of all convolution layers
loss = sum([style_loss(l[1], l[2]) for l in style_layers]) 

#content_layer[0] is the output of content_base,
#content_layer[2] is the output of gen_img
#loss of content image and gen_img
loss += content_loss(content_layer[0], content_layer[2]) / 10. 

#The loss need two variables but we only pass in one,
#because we only got one placeholder in the graph,
#the other variable already determine by K.variable
grad = K.gradients(loss, gen_img)
#We cannot call loss and grad directly, we need
#to create a function(convert it to symbolic definition)
#before we can feed it into the solver
fn = K.function([gen_img], [loss] + grad)

    You can adjust the weight of style loss and content loss by yourself until you think the image looks good enough. The function at the end only tells you that the concatenated list of loss and grads is the output that you want to - eventually - minimize. So, when you feed it to the solver bfgs, it will try to minimize the loss and will stop when the gradients are also zero (a minimum, hopefully not just a local one).

Step 6 : Create a helper class to separate loss and gradient

#fn will return loss and grad, but fmin_l_bfgs need to seperate them
#that is why we need a class to separate loss and gradient and store them
class Evaluator:
    def __init__(self, fn_, shp_):
        self.fn = fn_
        self.shp = shp_
    def loss(self, x):
        loss_, grads_ = self.fn([x.reshape(self.shp)])
        self.grads = grads_.flatten().astype(np.float64)
        return loss_.astype(np.float64)
    def grad(self, x):
        return np.copy(self.grads)
evaluator = Evaluator(fn, content_shp)

Step 7 : Generate a random noise image(white noise image mentioned by the paper)

#This is the real value of the placeholder--gen_img
rand_img = lambda shape: np.random.uniform(-2.5, 2.5, shape)/100

Step 8 : Minimize the loss of rand_img with the source image and style image

def solve_img(evalu, niter, x):
    for i in range(0, niter):
        x, min_val, info = fmin_l_bfgs_b(evalu.loss, x.flatten(), 
                                         fprime=evalu.grad, maxfun = 20)
        #value of PIL lie within -127 and 127
        x = np.clip(x, -127, 127)
        print(i, ',Current loss value:', min_val)
        x = x.reshape(content_shp)
        simg = deproc(x.copy())
        img_name = '{}_{}_neural_style_img_{}.png'.
                    format(dpath + "gen_img/", content_img_name, i)
        imsave(img_name, simg[0])
    return x

solve_img(evaluator, 10, rand_img(content_shp)/10.)

    You may ask, why using fmin_l_bfgs_b but not stochastic gradient descent? The answer is we can, but we have a better choice. Unlike image classification, we do not have a lot of batch to run, right now we only need to figure out the loss and gradient between three inputs, they are source image, style image and the random image, using fmin_l_bfgs_b is more than enough.


Tuesday 30 May 2017

Create a better images downloader(Google, Bing and Yahoo) by Qt5

  I mentioned how to create a simple Bing image downloader in Download Bing images by Qt5, in this post I will explain how do I tackle the challenges I have encountered when I try to build a better image downloader app by Qt5, the skills I used are apply on QImageScraper version_1.0, if you want to know the details, please dive into the codes, they are too complicated to write down in this blog.

1 : Show all of the images searched by Bing

  To show all of the images searched by Bing, we need to make sure the page is scrolled to the bottom, unfortunately there is no way to check it with 100% accuracy if we are not scrolling the page manually, because the height of the scroll bar keep changing when you scrolling it, this make the program almost impossible to determine when should it stop to scroll the page.


  I give several solutions a try but none of them are optimal, I have no choice but seek a compromise. Rather than scrolling the page full auto, I adopt semi auto solution as Pic.1 shown.


2 : Not all of the images are downloable

  There are several reasons may cause this issue.

  1. The search engine(Bing, Yahoo, Google etc) fail to find direct link of the image.
  2. The server "think" you is not a real human(robot?)
  3. Network error
  4. There are no error happen, but the reply of the server take too long

  Although I cannot find a perfect solution for problem 2, but there are some tricks to alleviate it, let the flow chart(Pic.2) clear the miasma.



  Simply put, if error happen, I will try to download thumbnail, if even the thumbnail cannot download, I will try next image. After all, this solution is not too bad, let us see the results of download 893 smoke images search by Google.


  All of the images could be downloaded, 817 of them are big images, 76 of them are small images, not a perfect results but not bad either. Something I did not mentioned in Pic.2 and Pic.3 are 

  1. I always switch user agents
  2. I start next download with random period(0.5second~1.5second)
  Purpose of these "weird" operations is try to emulate behaviors of humans, this could lower down the risk of being treated as "robot" by the servers. I cannot find free, trustable proxies yet, else I would like to randomly connect to different proxies time to time too, please tell me where to find those proxies if you know, thanks.

3 : Type of the images are mismatch or did not specify in file extension

  Not all of the images have correct type(jpg, png, gif etc), I am very lucky that Qt5 provide us QImageReader, this class can determine the type of the image from contents rather than extension. With it we can change the suffix of the file into the real format, remove the files which are not images.

4 : QFile fail to rename/remove file

  QFile::rename and QFile::remove got some troubles on windows(works well on mac), this bug me a while, it cost me one day to find out QImageReader blocking the file

5 : Invalid file name 

  Not all of the file name are valid, it is extremely hard to find out a perfect way to determine the file name is valid or not, I only do some minimal process for this issue--Remove illegal characters and trimmed white spaces.

6 : Deploy app on major platforms

  One of the strong selling points Qt is the ability of cross-platform, to tell you the truth I can build the app and run it on windows, mac and linux without changing single line of codes, it work out of the box. Problem is, deploy the app on linux is not fun at all, it is a very complicated task, I will try deploy this image downloader after linuxqtdeploy become mature.


  In this blog post, I reviewed some problems I met when using Qt5 to develop an image downloader, this is by no means exhaustive but scrape the surface, if you want to know the details, every nitty-gritty, better dive into source codes.

Download Bing images by Qt5
Source codes of QImageScraper

Sunday 14 May 2017

Download Bing images by Qt5

  Have you ever need more data for your image classifier?I do, but download images search by Google, Bing nor Flickr one by one are very time consuming, why not we write a small, simple images scraper to help us? Sounds like a good idea, as usual, before I start the task, I list out the requirements of this small app.

a : Cross platform, able to work under ubuntu and windows with one code base(no plan for mobiles since this is a tool design for machine learning)
b : Support regular expression, because I need to parse the html
c : Support high level api of networking
d : Have decent webEngine, it is very hard(impossible?) to scrape the images from those search engine without it
e : Support unicode
f : Easy to create ui, because I want instant feedback of the website, this could speed up development times
g : Ease to build, solving dependency problem of different 3rd libraries are not fun at all

  After search through my toolbox I find out Qt5 is almost ideal for my task. In this post I will use Bing as an example(Google and Yahoo images share the same tricks, processes of scraping these big 3 image search engine are very similar). If you ever try to study the source codes of the search results of Bing, you will find out they are very complicated, difficult to read(Maybe MS spend lots of time to prevent users scrape images). Are you afraid?Rest assured, the steps of scraping image from Bing is a little bit complicated but not impossible as long as you have nice tools to aid you :).

Step 1 : You need a decent, modern browser like firefox or chrome

  Why do we need a decent browser?Because they have a powerful feature--Inspect Element, this function can help you find out the contents(links, buttons etc) of the website. 


Step 2 : Click Inspect Element on interesting content

  Move your mouse to the contents you want to observe and click Inspect Element.


After that the browser should show you the codes of the interesting content.


The codes point by the browser may not something you want, if this is the case, look around the codes point by the browser as Pic3 show.

Step 3 : Create a simple prototype by Qt5

  We already know how to inspect the source codes of the web page, let us create a simple ui to help us. This ui do not need to be professional or beautiful, after it is just a prototype. The functions we need to create a Bing image scraper are

a : Scroll pages
b : Click see more images
c : Parse and get the links of images
d : Download images

  With the help of Qt Designer, I am able to "draw" the ui(Pic4) within 5 minutes(ignore parse_icon_link and next_page buttons for this tutorial).


Since Qt Designer do not QWebEngineView yet, I add it manually by codes

    ui->gridLayout->addWidget(web_view_, 4, 0, 1, 2);

  Pic5 is what it looks like when running.


Step 4 : Implement scroll function by js

    //get scroll position of the scroll bar and make it deeper
    auto const ypos = web_page_->scrollPosition().ry() + 10000;
    //scroll to deeper y position
    web_page_->runJavaScript(QString("window.scrollTo(0, %1)").arg(ypos));

Step 5 : Implement parse image link function 

  Before we can get the full link of the images, we need to scrape the links of the page.

    web_page_->gttoHtml([this](QString const &contents)
        QRegularExpression reg("(search\\?view=detailV2[^\"]*)");
        auto iter = reg.globalMatch(contents);
            QRegularExpressionMatch match = iter.next();            
            if(match.captured(1).right(20) != "ipm=vs#enterinsights"){
                QString url = QUrl("https://www.bing.com/images/" + 
                url.replace("&amp", "&");

Step 6 : Simulate "See more image"

  This part is a little bit tricky, I tried to find the words "See more images" but find nothing, the reason is the source codes return by View Page Source(Pic 6) do not update.

Pic 6

  Solution is easy, use Inspect Element to replace View Page Source(sometimes it is easier to find the contents you want by View Page Source, both of them are valuable for web scraping).


Step 7 : Download images

  Overall our ultimate goal is download the image we want, let us finish last part of this prototype.

  First, we need to get the html text of the image page(the page with the link of image source).


  Second, download the image
    void experiment_bing::web_page_load_finished(bool ok)
          qDebug()<<"cannot load webpage";

        web_page_->toHtml([this](QString const &contents)
            QRegularExpression reg("src2=\"([^\"]*)");
            auto match = reg.match(contents);
               QNetworkRequest request(match.captured(1));
               QString const header = "msnbot-media/1.1 (+http://search."
               //without this header, some image cannot download
               request.setHeader(QNetworkRequest::UserAgentHeader, header);                      
               downloader_->append(request, ui->lineEditSaveAt->text());
               qDebug()<<"cannot capture img link";
               //this image should not download again

  Third, download next image

void experiment_bing::download_finished(size_t unique_id, QByteArray)


  These are the key points of scraping images of Bing search by QtWebEngine, The downloader I use in this post are come from qt_enhance, whole prototype are placed at mega. If you want to know more, visit following link

Create a better images downloader(Google, Bing and Yahoo) by Qt5


  Because Qt-bug 66099, this prototype do not work under windows 10, unfortunately this bug is rated as P2, that means we may need to wait a while before it can be fixed by Qt community.


  Qt5.9 Beta4 fixed Qt-bug 60669, it works on my laptop(win 10 64bits) and desktop(Ubuntu 16.04.1).

  There exist a better solution to scrape image link of Bing, I will mentioned it on the next post.