Saturday, 4 August 2018

Qt and computer vision 2 : Build a simple computer vision application with Qt5 and opencv3

    In this post, I will show you how to build a dead simple computer vision application with Qt Creator and opencv3 step by step.

Install opencv3.4.1(or newer version) on windows


0. Go to source forge, download prebuild binary of opencv3.4.2. or you could build it by yourself

1. Double click on the opencv-3.4.2-vc14_vc15.exe and extract it to your favorite folder(pic_00)

Pic00

2. Open the folder you extract(assume you extract it to /your_path/opencv_3_4_2). You will see a folder call "opencv" .


Pic01
3. Open your QtCreator you installed.


Create a new project by Qt Creator



4. Create a new project

Pic02
5. You will see a lot of options, for simplicity, let us choose "Application->Non-Qt project->Plain c++ application". This tell the QtCreator, we want to create a c++ program without using any Qt components.


Pic03


6. Enter the path of the folder and name of the project.

Pic04
7. Click the Next button and use qmake as your build system by now(you can prefer cmake too, but I always prefer qmake when I am working with Qt).

8. You will see a page ask you to select your kits, kits is a tool QtCreator use to group different settings like device, compiler, Qt version etc.

Pic05
9. Click on next, QtCreator may ask you want to add to version control or not, for simplicity, select None. Click on finish.

10. If you see a screen like this, that means you are success.

Pic06


11. Write codes to read an image by opencv



#include <iostream>

#include <opencv2/core.hpp>
#include <opencv2/highgui.hpp>

//propose of namespace are
//1. Decrease the chance of name collison
//2. Help you organizes your codes into logical groups
//Without declaring using namespace std, everytime when you are using
//the classes, functions in the namespace, you have to call with the
//prefix "std::".
using namespace cv;
using namespace std;

/**
 * main function is the global, designated start function of c++.
 * @param argc Number of the parameters of command line
 * @param argv Content of the parameters of command line.
 * @return any integer within the range of int, meaning of the return value is
 * defined by the users
 */
int main(int argc, char *argv[])
{
    if(argc != 2){
        cout<<"Run this example by invoking it like this: "<<endl;
        cout<<"./step_02.exe lena.jpg"<<endl;
        cout<<endl;
        return -1;
    }

    //If you execute by Ctrl+R, argv[0] == "step_02.exe", argv[1] == lena.jpg
    cout<<argv[0]<<","<<argv[1]<<endl;

    //Open the image
    auto const img = imread(argv[1]);
    if(!img.empty()){
        imshow("img", img); //Show the image on screen
        waitKey(); //Do not exist the program until users press a key
    }else{
        cout<<"cannot open image:"<<argv[1]<<endl;

        return -1;
    }

    return 0; //usually we return 0 if everything are normal
}


How to compile and link the opencv lib with the help of Qt Creator and qmake


  Before you can execute the app, you will need to compile and link to the libraries of opencv. Let me show you how to do it. If you missed steps a and b, you will see a lot of error messages like Pic07 or Pic09 show.

12. Tell the compiler, where are the header files, this could be done by adding following command in the step_02.pro.

INCLUDEPATH += your_install_path_of_opencv/opencv/opencv_3_4_2/opencv/build/include

  The compiler will tell you it can't locate the header files if you do not add this line(see Pic07).

Pic07

  If your INCLUDEPATH is correct, QtCreator should be able to find the headers and use the auto complete to help you type less words(Pic08).

Pic08


13. Tell linker which libraries of the opencv it should link to by following command.

LIBS += your_install_path_of_opencv/opencv/opencv_3_4_2/opencv/build/x64/vc14/lib/opencv_world342.lib

Without this step, you will see the errors of "unresolved external symbols"(Pic08).

Pic09
14. Change from debug to release.

Pic10

  Click the icon surrounded by the red region and change it from debug to release. Why do we do that? Because


  • Release mode is much more faster than debug mode in many cases
  • The library we link to is build as release library, do not mixed debug and release libraries in your project unless you are asking for trouble
  I will introduce more details of compile, link, release, debug in the future, for now, just click Ctrl+B to compile and link the app.

Execute the app

  After we compile and link the app, we already have the exe in the folder(in the folder show at Pic11).

Pic11

  We are almost done now, just few more steps the app could up and run.

13. Copy the dll opencv_world342.dll and opencv_ffmpeg342_64.dll(they place in /your_path/opencv/opencv_3_4_2/opencv/build/bin) into a new folder(we called it global_dll).

14. Add the path of this folder into system path. Without step 13 and 14, the exe wouldn't be able to find the dll when we execute the app, and you may see following error when you execute the app from command line(Pic12). I recommend you use the tool--Rapid environment editor(Pic13) to edit your path on windows.

Pic12


Pic13
15. Add command line argument in the QtCreator, without it, the app do not know where is the image when you click Ctrl+R to execute the program.

Pic14.jpg
16. If success, you should see the app open an image specify from the command line arguments list(Pic15).

Pic15

  They are easy, but they could be annoying at first. I hope this post could leverage your frustration. You can find the source codes located at github.








Sunday, 22 April 2018

Qt and computer vision 1 : Setup environment of Qt5 on windows step by step

    Long time haven't updated my blog, today rather than write a newer, advanced deep learning topics like "Modern way to estimate homography matrix(by lightweight cnn)" or "Let us create a semantic segmentation model by PyTorch", I prefer to start a series of topics for new comers who struggling to build a computer vision app by c++. I hope my posts could help more people find out  use c++ to develop application could as easy as another "much easier to use languages"(ex : python).

    Rather than introduce you most of the features of Qt and opencv like other books did, these topics will introduce the subsets of Qt which could help us develop decent computer vision application step by step.

c++ is as easy to use as python, really?

    Many programmers may found this nonsense, but my own experience told me it is not, because I never found languages like python, java or c# are "much easier to use compare with c++". What make our perspective become so different? I think the answers are

1. Know how to use c++ effectively.
2. There exist great libraries for the tasks(ex : Qt, boost, opencv, dlib, spdlog etc).

    As long as these two conditions are satisfy, I believe many programmers will have the same conclusion as mine. I will try my best to help you learn how to develop easy to maintain application by c++ in these series, show you how to solve those "small yet annoying issues" which may scare away many new comers.


Install visual c++ 2015

    Some of you may ask "why 2015 but not 2017"? Because when I am writing this post, cuda do not have decent support on visual c++ 2017 or mingw yet, cuda is very important to computer vision app, especially when deep learning take over many computer vision tasks today.

1. Go to this page, click on the download button of visual studio 2015.


2. Download visual studio 2015 community(you may need to open an account before you can enter this page)

3. Double click on the exe "en_visual_studio_community_2015_with_update_3_x86_x64_web_installer_xxxx" and wait a few minutes.

4. Install visual c++ tool box as following shown, make sure you select all of them.




    Install Qt5 on windows

1. Go to the download page of Qt
2. Select open source version of Qt



3. Click download button, wait until qt-unified-windows downloaded.



4. Double click on the installer, click next->skip->next

5. Select the path you want to install Qt

6. Select the version of Qt you want to install, every version of Qt(Qt5.x) have a lot of binary files to download, only select the one you need. We prefer to install Qt5.9.5 at here. Why Qt5.9.5? Because Qt5.9 is a long term support version of Qt, in theory long term support should be more stable.



7. Click next and install.

Test Qt5 installed or not


1. Open QtCreator and run an example. Go to your install path(ex: C:/Qt/3rdLibs/Qt), navigate to your_install_path/Tools/QtCreator/bin and double click on the qtcreator.exe.


2. Select Welcome->Example->Qt Quick Controls 2 - Gallery


3. Click on the example, it may pop out a message box to ask you some questions, you can click on yes or no.

4. Every example you open would pop out help page like this, keep it or not is your choices, sometimes they are helpful.



5. First, select the version of Qt you want to use(surrounded by red bounding box). Second, keep the shadow build option on(surrounded by green bounding box), why keep it on? Because shadow build could help you separate your source codes and build binary. Third select you want to build your binary as debug or release version(surrounded by blue bounding box). Usually we cannot mix debug/release libraries together, I will open another topic to discuss the benefits of debug/release, explain what is MT/MD, which one you should choose etc.





6. Click on the run button or Ctrl + R, then you should see the example running on your computer.

 

 

Tuesday, 10 October 2017

Deep learning 11-Modern way to estimate homography matrix(by light weight cnn)

  Today I want to introduce a modern way to estimate relative homography between a pair of images. It is a solution introduced by the paper titled Deep Image Homography Estimation.

Introduction

Q : What is a homography matrix?

A : Homography matrix is a 3x3 transformation matrix that maps the points in one image to the corresponding points in another image.

Q : What are the use of homography matrix?

A : There are many applications depend on the homography matrix, a few of them are image stitching, camera calibration, augmented reality.

Q : How to calculate a homography matrix between two images?

A : Traditional solution are based on two steps, corner estimation and robust homography estimation. In corner detection step, You need at least  4 points correspondences between the two images, usually we would find out these points by matching features like AKAZE, SIFT, SURF.  Generally, the features found by those algorithms are over complete, we would prune out the outliers(ex : by RANSAC) after corner estimation. If you are interesting about the whole process describe by c++, take a look at this project.

Q : Traditional solution require heavy computation, do we have another way to obtain homography matrix between two images?

A : This is the question the paper want to answer, instead of design the features by hand, this paper design an algorithm to learn the homography between two images. The biggest selling point of the paper is they turn the homography estimation problem into a machine learning problem.


HomographyNet

  This paper use VGG style CNN to measure the homography matrix between two images, they call it HomographyNet. This model is trained in an end to end fashion, quite simple and neat.


Fig00
  
  HomographyNet come with two versions, classification and regression. Regression network produces eight real value numbers and use L2 loss as the final layer. Classification network use softmax as the final layer and quantize every real values into 21bins. First version has better accuracy, while average accuracy of second version is much worse than first version, it can produce confidences. 


Fig01
 
Fig02

4-Point Homography Parameterization

  Instead of using a 3x3 homography matrix as the label(ground truth), this paper use 4-point parameterization as label.

Q : What is 4-point parameterization?

A : 4-point parameterization store the different of 4 corresponding points between two images, Fig03 and Fig04 explain it well.

Fig03

Fig04
Q : Why do they use 4-point parameterization but not 3x3 matrix?

A : Because the 3x3 homography is very difficult to train, the problem is the 3x3 matrix mixing 
rotation and translation together, the paper explain why. 

The submatrix [H11, H12; H21, H22] represents the rotational terms in the homography., while the vector [H13, H23] is the translational offset. Balancing the rotational and translational terms as part of an optimization problem is difficult.

Fig05

  Data Generation


Q : Training deep convolution neural networks from scratch requires a large amount of data, where could we obtain the data?

A : The paper invent a smart solution to generate nearly unlimited number of labeled training examples. Fig05 summarize the whole process

Fig06


Results

  Results of my implementation is outperform the paper, average loss of mine is 2.58, while the paper is 9.2. Largest loss of my model is 19.53. Performance of my model are better than the paper more than 3.5 times(9.2/2.58 = 3.57).  What makes the performance improve so much?A few of reasons I could think of are

1. I change the network architectures from vgg like to squeezeNet1.1 like.
2. I do not apply any data augmentation, maybe blurring or occlusion cause the model harder to train.
3. The paper use data augmentation to generate 500000 data for training, but I use 500032 images from imagenet as my training set. I guess this potentially increase variety of the data, the end result is network become easier to train and more robust(but they may not work well for blur or occlusion).

  Following are some of the results, the region estimated by the model(red rectangle) is very close to the real regions(blue rectangle).


Fig07


Final thoughts

  The results looks great, but this paper do not answer two important questions.

1. The paper only test on synthesis images, do they work on real world images?
2. How should I use the trained model to predict a homography matrix?

  I would like to know the answer, if anyone find out, please leave me a message.

Codes and model

  As usual, I place my codes at github, model at mega.

  If you liked this article, please help others find it by clicking the little g+ icon below. Thanks a lot!

Sunday, 3 September 2017

Wrong way to use QThread

  There are two ways to use QThread, first solution is inherit QThread and override run function, the other solution is create a controller. Today I would like to talk about second solution and show you how to misused QThread(general gotcha of QThread).

  The most common error I see is calling the function of the worker directly, please do not do that, because in this way, your worker will not work on another thread but the thread you are calling it.

  Allow me prove this to you by a small example. I do not separate implementation and declaration in this post because this make the post easier to read.

case 1 : Call by function


1 : let us create a very simple, naive worker, this worker must be an QObject, because we need to move the worker into QThread.

class naive_worker : public QObject
{
    Q_OBJECT
public:
    explicit naive_worker(QObject *obj = nullptr);

    void print_working_thread()
    {
        qDebug()<<QThread::currentThread();
    }
};

2 : create a dead simple gui by QtDesigner. Button "Call by normal function" will call the function "print_working_thread", directly, button "Call by signal and slot" will call the "print_working_thread" by signal and slot, "Print current thread address" will print the address of main thread(gui thread).


3 : Create a controller


class naive_controller : public QObject
{
    Q_OBJECT
public:
    explicit naive_controller(QObject *parent = nullptr):
    QObject(parent),
    worker_(new naive_worker)
    {
        //move your worker to thread, so Qt know how to handle it
        //your worker should not have a parent before calling
        //moveToThread
        worker_->moveToThread(&thread_);

        connect(&thread_, &QThread::finished, worker_, &QObject::deleteLater);

        //this connection is very important, in order to make worker work on the thread
        //we move to, we have to call it by the mechanism of signal and slot
        connect(this, &naive_controller::print_working_thread_by_signal_and_slot,
                worker_, &naive_worker::print_working_thread);
        thread_.start();
    }

    ~naive_controller()
    {
        thread_.wait();
        thread_.quit();
    }

    void print_working_thread_by_normal_call()
    {
        worker_->print_working_thread();
    }

signals:
    void print_working_thread_by_signal_and_slot();

private:
    QThread thread_;
    naive_worker *worker_;
};

4 : Call it by two different functions and compare their address.


class MainWindow : public QMainWindow
{
    Q_OBJECT
public:
    explicit MainWindow(QWidget *parent = nullptr);
    ~MainWindow();

private slots:
    void on_pushButtonPrintCurThread_clicked()
    {
        //this function will be called when 
        //"Print current thread address" is clicked
        qDebug()<<QThread::currentThread();
    }
    void on_pushButtonCallNormalFunc_clicked()
    {
        //this function will be called when
        //"Call by normal function" is clicked
        controller_->print_working_thread_by_normal_call();
    }

    void on_pushButtonCallSignalAndSlot_clicked()
    {
       //this function will be called when
       //"Call by signal and slot" is clicked
       controller_->print_working_thread_by_signal_and_slot();
    }

private:
    naive_controller *controller_;
    naive_worker *worker_;
    Ui::MainWindow *ui;
};

5. Run the app and click the button with following order. "print_working_thread"->"Call by normal function"->"Call by signal and slot" and see what happen. Following are my results

QThread(0x1bd25796020) //call "print_working_thread"
QThread(0x1bd25796020) //call "Call by normal function"
QThread(0x1bd2578bf70) //call "Call by signal and slot"

    Apparently, to make our worker run in the QThread we moved to, we have to call it through signal and slot machanism, else it will execute in the same thread. You may ask, this is too complicated, do we have an easier way to spawn a thread? Yes we do, you can try QtConcurrent::run and std::async, they are easier to use compare with QThread(it is a regret that c++17 fail to include future.then) , I use QThread when I need more power, like thread communication, queue operation.

Source codes

    Located at github.

Monday, 28 August 2017

Deep learning 10-Let us create a semantic segmentation model(LinkNet) by PyTorch

  Deep learning, in recent years this technique take over many difficult tasks of computer vision, semantic segmentation is one of them. The first segmentation net I implement is LinkNet, it is a fast and accurate segmentation network. 

Introduction


Q : What is LinkNet?

A :  LinkNet is a convolution neural network designed for semantic segmentation. This network is 10 times faster than SegNet and more accurate.

Q : What is semantic segmentation? Any difference with segmentation?

A :  Of course they are difference. Segmentation partition image into several "similar" parts, but you do not know what are those parts presents. On the other hand, semantic segmentation partition the image into different pre-determined labels. Those labels are present as color as the end results. For example, checkout the following images(from camvid).



Q : Semantic segmentation sounds like object detection, are they the same thing?

A : No, they are not, although you may achieve the same goal by both of them.
From the aspect of tech, they use different approach. From the view of end results, semantic segmentation tell you what are those pixels are, but they do not tell you how many instance in your images, object detection show you how many instance in your images by minimal bounding box, but it do not give you delienation of objects. For example, checkout below images(from yolo).




Network architectures


  LinkNet paper describe their network architecture with excellent graphs and simple descriptions, following are the figures copy shameless from the paper.





  LinkNet adopt encoder-decoder architecture, according to the paper, LinkNet performance or come from adding the output of encoder to the decoder, this help the decoder easier to recover the information. If you want to know the details, please study section 3 of the paper, it is nice writing, very easy to understand.

Q : The paper is easy to read, but they do not explain what is full convolution, could you tell me what that means?

A :  Full convolution indicates that the neural network is composed of convolution layers and activation only, without any full connection or pooling layers. 

Q : How do they perform down-sampling without pooling layers?

A : Make the stride of convolution as 2 x 2 and do zero padding, if you cannot figure it out why this work, I suggest you create an excel file, write down some data and do some experiment.

Q : Which optimizer work best?

A : According to the paper, rmsprop is the winner, my experiments told me the same thing too, in case you are interesting, below are the graph of training loss. From left to right is rmsprop, adam, sgd. Hyper parameters are

Initial learning rate : adam and rmsprop are 5e-4, sgd is 1e-3
Augmentation : random crop(480,320) and horizontal flip
Normalize : subtract mean(based on imagenet mean value) and divided by 255
Batch size : 16
Epoch : 800
Training examples : 368



  The results of adam and rmsprop are very close. Loss of sgd steadily decrease, but it converge very slow even with higher learning rate, maybe higher learning rate would work better for SGD.

Data pre-processing


  Almost every computer vision task need you to pre-process your data, segmentation is not an exception, following are my steps.

1 : Convert the color do not exist in the category into void(0, 0, 0)
2 : Convert the color into integer
3 : Zero mean(mean value come from imagenet)

Experiment on camvid


  Enough of Q&A, let us have some benchmark and pictures­čśŐ.

Performance
  Model 1,2,3 all train with same parameters, pre-processing but with different input size when training, they are (128,128), (256,256), (512, 512). When testing, the size of the images are (960,720).

  Following are some examples, from left to right is original image, ground truth and predicted image.













  Results looks quite good and  IoU is much better than the paper, possible reasons are

1 : I augment the data by random crop and horizontal flip, the paper may use another methods or do not perform augmentation at all(?).

2 : My pre-processing are different with the paper

3 : I did not omit void when training

4 : My measurement on IoU is wrong

5 : My model is more complicated than the paper(wrong implementation)

6 : It is overfit

7 : Random shuffle training and testing data create data leakage because many images of camvid
are very similar to each other


Trained models and codes


1 : As usual, located at github.
2 :  Model trained with 368 images, 12 labels(include void), random crop (128x128),800 epoch
3 :  Model trained with 368 images, 12 labels(include void), random crop (480x320),800 epoch
4 :  Model trained with 368 images, 12 labels(include void), random crop (512x512),800 epoch

Miscellaneous


Q : Is it possible to create portable model by PyTorch?

A : It is possible, but not easy. you could check out ONNX and caffe2 if you want to try it. Someone manage to convert pytorch model to caffe model and loaded by opencv dnn. Right now opencv dnn do not support PyTorch but PyTorch. Thanks god opencv dnn can import model trained by torch  at ease(right now opencv dnn do not support nngraph).

Q : What are IoU and iIoU in the paper refer to?

A : This page give good definition, although I still can't figure out how to calculate iIoU.


  If you liked this article, please help others find it by clicking the little g+ icon below. Thanks a lot!