Thursday, 4 October 2018

Person detection(Yolo v3) with the helps of mxnet, able to run on gpu/cpu

    In this post I will show you how to do object detection with the helps of the cpp-package of mxnet. Why do I introduce mxnet? Because following advantages make it a decent library for standalone project development

1. It is open source and royalty free
2. Got decent support for GPU/CPU
3. Scaling efficiently to multiple GPU and machines
4. Support cpp api, which means you do not need to ask the users to install python environment , shipped the source codes in order to run your apps
5. mxnet support many platforms, including windows, linux, mac, aws, android, ios
6. It got a lot of pre-trained models
7. MMDNN support mxnet, which mean we can convert the models trained by different libraries to mxnet(although not all of the models could be converted).

Step 1 : Download model and convert it to the format can load by cpp package

1. Install anaconda(the version come with python3)
2. Install mxnet from the terminal of anaconda
3. Install gluon-cv from the terminal of anaconda
4. Download model and convert it by following scripts

import gluoncv as gcv
from gluoncv.utils import export_block

net = gcv.model_zoo.get_model('yolo3_darknet53_coco', pretrained=True)
export_block('yolo3_darknet53_coco', net)

 Step 2 : Load the models after convert

void load_check_point(std::string const &model_params,
                      std::string const &model_symbol,
                      Symbol *symbol,
                      std::map<std::string, NDArray> *arg_params,
                      std::map<std::string, NDArray> *aux_params,
                      Context const &ctx)
    Symbol new_symbol = Symbol::Load(model_symbol);
    std::map<std::string, NDArray> params = NDArray::LoadToMap(model_params);
    std::map<std::string, NDArray> args;
    std::map<std::string, NDArray> auxs;
    for (auto iter : params) {
        std::string type = iter.first.substr(0, 4);
        std::string name = iter.first.substr(4);
        if (type == "arg:")
            args[name] = iter.second.Copy(ctx);
        else if (type == "aux:")
            auxs[name] = iter.second.Copy(ctx);

    *symbol = new_symbol;
    *arg_params = args;
    *aux_params = auxs;

    You could use the load_check_point function as following

    Symbol net;
    std::map<std::string, NDArray> args, auxs;
    load_check_point(model_params, model_symbols, &net, &args, &auxs, context);

    //The shape of the input data must be the same, if you need different size,
    //you could rebind the Executor or create a pool of Executor.
    //In order to create input layer of the Executor, I make a dummy NDArray.
    //The value of the "data" could be change later
    args["data"] = NDArray(Shape(1, static_cast<unsigned>(input_size.height),
                                 static_cast<unsigned>(input_size.width), 3), context);
    executor_.reset(net.SimpleBind(context, args, std::map<std::string, NDArray>(),
                                   std::map<std::string, OpReqType>(), auxs));

    model_params is the location of the weights(ex : yolo3_darknet53_coco.params), model_symbols(ex : yolo3_darknet53_coco.json) is the location of the symbols saved as json.

Step 3: Convert image format

    Before we feed the image into the executor of mxnet, we need to convert them.

NDArray cvmat_to_ndarray(cv::Mat const &bgr_image, Context const &ctx)
    cv::Mat rgb_image;
    cv::cvtColor(bgr_image, rgb_image, cv::COLOR_BGR2RGB); 
    rgb_image.convertTo(rgb_image, CV_32FC3);
    //This api copy the data of rgb_image into NDArray. As far as I know,
    //opencv guarantee continuous of cv::Mat unless it is sub matrix of cv::Mat
    return NDArray(rgb_image.ptr<float>(),
                   Shape(1, static_cast<unsigned>(rgb_image.rows), static_cast<unsigned>(rgb_image.cols), 3),

Step 4 : Perform object detection on video

void object_detector::forward(const cv::Mat &input)
    //By default, input_size_.height equal to 256 input_size_.width equal to 320.
    //Yolo v3 has a limitation, width and height of the image must be divided by 32.
    if(input.rows != input_size_.height || input.cols != input_size_.width){
        cv::resize(input, resize_img_, input_size_);
        resize_img_ = input;

    auto data = cvmat_to_ndarray(resize_img_, *context_);
    //Copy the data of the image to the "data"
    //Forward is an async api.

Step 5 : Draw bounding boxes on image

void plot_object_detector_bboxes::plot(cv::Mat &inout,
                                       std::vector<mxnet::cpp::NDArray> const &predict_results,
                                       bool normalize)
    using namespace mxnet::cpp;

    //1. predict_results get from the output of Executor(executor_->outputs)
    //2. Must Set Context as cpu because we need process data by cpu later
    auto labels = predict_results[0].Copy(Context(kCPU, 0));
    auto scores = predict_results[1].Copy(Context(kCPU, 0));
    auto bboxes = predict_results[2].Copy(Context(kCPU, 0));
    //1. Should call wait because Forward api of Executor is async
    //2. scores and labels could treat as one dimension array
    //3. BBoxes can treat as 2 dimensions array

    size_t const num = bboxes.GetShape()[1];
    for(size_t i = 0; i < num; ++i) {
        float const score = scores.At(0, 0, i);
        if (score < thresh_) break;

        size_t const cls_id = static_cast<size_t>(labels.At(0, 0, i));
        auto const color = colors_[cls_id];
        //pt1 : top left; pt2 : bottom right
        cv::Point pt1, pt2;
        //get_points perform normalization
        std::tie(pt1, pt2) = normalize_points(bboxes.At(0, i, 0), bboxes.At(0, i, 1),
                                              bboxes.At(0, i, 2), bboxes.At(0, i, 3),
                                              normalize, cv::Size(inout.cols, inout.rows));
        cv::rectangle(inout, pt1, pt2, color, 2);

        std::string txt;
        if (labels_.size() > cls_id) {
            txt += labels_[cls_id];
        std::stringstream ss;
        ss << std::fixed << std::setprecision(3) << score;
        txt += " " + ss.str();
        put_label(inout, txt, pt1, color);

     I only mentioned the key points in this post, if you want to study the details, please check it on github.

Saturday, 29 September 2018

Install cpp package of mxnet on windows 10, with cuda and opencv

    Compile and install cpp-package of mxnet on windows 10 is a little bit tricky when I writing this post.

     The install page of mxnet tell us almost everything we need to know, but there are something left behind haven't wrote into the pages yet, today I would like to write down the pitfalls I met and share with you how do I solved them.


1. Remember to download the mingw dll from the openBLAS download page, put those dll into some place could be found by the system, else you wouldn't be able to generate op.h for cpp-package.

2. Install Anaconda(recommended) or the python package on the mxnet install page on your , machines and register the path(the path with python.exe), else you wouldn't be able to generate op.h for cpp-package.

3. Compile the project without cpp-package first, else you may not able to generate op.h.

Cmake command for reference, change it to suit your own need

a : Run these command first

cmake -G "Visual Studio 14 2015 Win64" ^

cmake --build . --config Release

b : Run these command, with cpp package on

cmake -G "Visual Studio 14 2015 Win64" ^

cmake --build . --config Release --target INSTALL

4. After you compile and install the libs, you may find out you missed some headers in the install
path, I missed nnvm and mxnet-cpp. What I did is copy the folders to the install folder.

    Hope these could help someone who is pulling their head when compile cpp-package of mxnet on windows 10.

Saturday, 4 August 2018

Qt and computer vision 2 : Build a simple computer vision application with Qt5 and opencv3

    In this post, I will show you how to build a dead simple computer vision application with Qt Creator and opencv3 step by step.

Install opencv3.4.1(or newer version) on windows

0. Go to source forge, download prebuild binary of opencv3.4.2. or you could build it by yourself

1. Double click on the opencv-3.4.2-vc14_vc15.exe and extract it to your favorite folder(pic_00)


2. Open the folder you extract(assume you extract it to /your_path/opencv_3_4_2). You will see a folder call "opencv" .

3. Open your QtCreator you installed.

Create a new project by Qt Creator

4. Create a new project

5. You will see a lot of options, for simplicity, let us choose "Application->Non-Qt project->Plain c++ application". This tell the QtCreator, we want to create a c++ program without using any Qt components.


6. Enter the path of the folder and name of the project.

7. Click the Next button and use qmake as your build system by now(you can prefer cmake too, but I always prefer qmake when I am working with Qt).

8. You will see a page ask you to select your kits, kits is a tool QtCreator use to group different settings like device, compiler, Qt version etc.

9. Click on next, QtCreator may ask you want to add to version control or not, for simplicity, select None. Click on finish.

10. If you see a screen like this, that means you are success.


11. Write codes to read an image by opencv

#include <iostream>

#include <opencv2/core.hpp>
#include <opencv2/highgui.hpp>

//propose of namespace are
//1. Decrease the chance of name collison
//2. Help you organizes your codes into logical groups
//Without declaring using namespace std, everytime when you are using
//the classes, functions in the namespace, you have to call with the
//prefix "std::".
using namespace cv;
using namespace std;

 * main function is the global, designated start function of c++.
 * @param argc Number of the parameters of command line
 * @param argv Content of the parameters of command line.
 * @return any integer within the range of int, meaning of the return value is
 * defined by the users
int main(int argc, char *argv[])
    if(argc != 2){
        cout<<"Run this example by invoking it like this: "<<endl;
        cout<<"./step_02.exe lena.jpg"<<endl;
        return -1;

    //If you execute by Ctrl+R, argv[0] == "step_02.exe", argv[1] == lena.jpg

    //Open the image
    auto const img = imread(argv[1]);
        imshow("img", img); //Show the image on screen
        waitKey(); //Do not exist the program until users press a key
        cout<<"cannot open image:"<<argv[1]<<endl;

        return -1;

    return 0; //usually we return 0 if everything are normal

How to compile and link the opencv lib with the help of Qt Creator and qmake

  Before you can execute the app, you will need to compile and link to the libraries of opencv. Let me show you how to do it. If you missed steps a and b, you will see a lot of error messages like Pic07 or Pic09 show.

12. Tell the compiler, where are the header files, this could be done by adding following command in the

INCLUDEPATH += your_install_path_of_opencv/opencv/opencv_3_4_2/opencv/build/include

  The compiler will tell you it can't locate the header files if you do not add this line(see Pic07).


  If your INCLUDEPATH is correct, QtCreator should be able to find the headers and use the auto complete to help you type less words(Pic08).


13. Tell linker which libraries of the opencv it should link to by following command.

LIBS += your_install_path_of_opencv/opencv/opencv_3_4_2/opencv/build/x64/vc14/lib/opencv_world342.lib

Without this step, you will see the errors of "unresolved external symbols"(Pic08).

14. Change from debug to release.


  Click the icon surrounded by the red region and change it from debug to release. Why do we do that? Because

  • Release mode is much more faster than debug mode in many cases
  • The library we link to is build as release library, do not mixed debug and release libraries in your project unless you are asking for trouble
  I will introduce more details of compile, link, release, debug in the future, for now, just click Ctrl+B to compile and link the app.

Execute the app

  After we compile and link the app, we already have the exe in the folder(in the folder show at Pic11).


  We are almost done now, just few more steps the app could up and run.

13. Copy the dll opencv_world342.dll and opencv_ffmpeg342_64.dll(they place in /your_path/opencv/opencv_3_4_2/opencv/build/bin) into a new folder(we called it global_dll).

14. Add the path of this folder into system path. Without step 13 and 14, the exe wouldn't be able to find the dll when we execute the app, and you may see following error when you execute the app from command line(Pic12). I recommend you use the tool--Rapid environment editor(Pic13) to edit your path on windows.


15. Add command line argument in the QtCreator, without it, the app do not know where is the image when you click Ctrl+R to execute the program.

16. If success, you should see the app open an image specify from the command line arguments list(Pic15).


  They are easy, but they could be annoying at first. I hope this post could leverage your frustration. You can find the source codes located at github.

Sunday, 22 April 2018

Qt and computer vision 1 : Setup environment of Qt5 on windows step by step

    Long time haven't updated my blog, today rather than write a newer, advanced deep learning topics like "Modern way to estimate homography matrix(by lightweight cnn)" or "Let us create a semantic segmentation model by PyTorch", I prefer to start a series of topics for new comers who struggling to build a computer vision app by c++. I hope my posts could help more people find out  use c++ to develop application could as easy as another "much easier to use languages"(ex : python).

    Rather than introduce you most of the features of Qt and opencv like other books did, these topics will introduce the subsets of Qt which could help us develop decent computer vision application step by step.

c++ is as easy to use as python, really?

    Many programmers may found this nonsense, but my own experience told me it is not, because I never found languages like python, java or c# are "much easier to use compare with c++". What make our perspective become so different? I think the answers are

1. Know how to use c++ effectively.
2. There exist great libraries for the tasks(ex : Qt, boost, opencv, dlib, spdlog etc).

    As long as these two conditions are satisfy, I believe many programmers will have the same conclusion as mine. I will try my best to help you learn how to develop easy to maintain application by c++ in these series, show you how to solve those "small yet annoying issues" which may scare away many new comers.

Install visual c++ 2015

    Some of you may ask "why 2015 but not 2017"? Because when I am writing this post, cuda do not have decent support on visual c++ 2017 or mingw yet, cuda is very important to computer vision app, especially when deep learning take over many computer vision tasks today.

1. Go to this page, click on the download button of visual studio 2015.

2. Download visual studio 2015 community(you may need to open an account before you can enter this page)

3. Double click on the exe "en_visual_studio_community_2015_with_update_3_x86_x64_web_installer_xxxx" and wait a few minutes.

4. Install visual c++ tool box as following shown, make sure you select all of them.

    Install Qt5 on windows

1. Go to the download page of Qt
2. Select open source version of Qt

3. Click download button, wait until qt-unified-windows downloaded.

4. Double click on the installer, click next->skip->next

5. Select the path you want to install Qt

6. Select the version of Qt you want to install, every version of Qt(Qt5.x) have a lot of binary files to download, only select the one you need. We prefer to install Qt5.9.5 at here. Why Qt5.9.5? Because Qt5.9 is a long term support version of Qt, in theory long term support should be more stable.

7. Click next and install.

Test Qt5 installed or not

1. Open QtCreator and run an example. Go to your install path(ex: C:/Qt/3rdLibs/Qt), navigate to your_install_path/Tools/QtCreator/bin and double click on the qtcreator.exe.

2. Select Welcome->Example->Qt Quick Controls 2 - Gallery

3. Click on the example, it may pop out a message box to ask you some questions, you can click on yes or no.

4. Every example you open would pop out help page like this, keep it or not is your choices, sometimes they are helpful.

5. First, select the version of Qt you want to use(surrounded by red bounding box). Second, keep the shadow build option on(surrounded by green bounding box), why keep it on? Because shadow build could help you separate your source codes and build binary. Third select you want to build your binary as debug or release version(surrounded by blue bounding box). Usually we cannot mix debug/release libraries together, I will open another topic to discuss the benefits of debug/release, explain what is MT/MD, which one you should choose etc.

6. Click on the run button or Ctrl + R, then you should see the example running on your computer.



Tuesday, 10 October 2017

Deep learning 11-Modern way to estimate homography matrix(by light weight cnn)

  Today I want to introduce a modern way to estimate relative homography between a pair of images. It is a solution introduced by the paper titled Deep Image Homography Estimation.


Q : What is a homography matrix?

A : Homography matrix is a 3x3 transformation matrix that maps the points in one image to the corresponding points in another image.

Q : What are the use of homography matrix?

A : There are many applications depend on the homography matrix, a few of them are image stitching, camera calibration, augmented reality.

Q : How to calculate a homography matrix between two images?

A : Traditional solution are based on two steps, corner estimation and robust homography estimation. In corner detection step, You need at least  4 points correspondences between the two images, usually we would find out these points by matching features like AKAZE, SIFT, SURF.  Generally, the features found by those algorithms are over complete, we would prune out the outliers(ex : by RANSAC) after corner estimation. If you are interesting about the whole process describe by c++, take a look at this project.

Q : Traditional solution require heavy computation, do we have another way to obtain homography matrix between two images?

A : This is the question the paper want to answer, instead of design the features by hand, this paper design an algorithm to learn the homography between two images. The biggest selling point of the paper is they turn the homography estimation problem into a machine learning problem.


  This paper use VGG style CNN to measure the homography matrix between two images, they call it HomographyNet. This model is trained in an end to end fashion, quite simple and neat.

  HomographyNet come with two versions, classification and regression. Regression network produces eight real value numbers and use L2 loss as the final layer. Classification network use softmax as the final layer and quantize every real values into 21bins. First version has better accuracy, while average accuracy of second version is much worse than first version, it can produce confidences. 


4-Point Homography Parameterization

  Instead of using a 3x3 homography matrix as the label(ground truth), this paper use 4-point parameterization as label.

Q : What is 4-point parameterization?

A : 4-point parameterization store the different of 4 corresponding points between two images, Fig03 and Fig04 explain it well.


Q : Why do they use 4-point parameterization but not 3x3 matrix?

A : Because the 3x3 homography is very difficult to train, the problem is the 3x3 matrix mixing 
rotation and translation together, the paper explain why. 

The submatrix [H11, H12; H21, H22] represents the rotational terms in the homography., while the vector [H13, H23] is the translational offset. Balancing the rotational and translational terms as part of an optimization problem is difficult.


  Data Generation

Q : Training deep convolution neural networks from scratch requires a large amount of data, where could we obtain the data?

A : The paper invent a smart solution to generate nearly unlimited number of labeled training examples. Fig05 summarize the whole process



  Results of my implementation is outperform the paper, average loss of mine is 2.58, while the paper is 9.2. Largest loss of my model is 19.53. Performance of my model are better than the paper more than 3.5 times(9.2/2.58 = 3.57).  What makes the performance improve so much?A few of reasons I could think of are

1. I change the network architectures from vgg like to squeezeNet1.1 like.
2. I do not apply any data augmentation, maybe blurring or occlusion cause the model harder to train.
3. The paper use data augmentation to generate 500000 data for training, but I use 500032 images from imagenet as my training set. I guess this potentially increase variety of the data, the end result is network become easier to train and more robust(but they may not work well for blur or occlusion).

  Following are some of the results, the region estimated by the model(red rectangle) is very close to the real regions(blue rectangle).


Final thoughts

  The results looks great, but this paper do not answer two important questions.

1. The paper only test on synthesis images, do they work on real world images?
2. How should I use the trained model to predict a homography matrix?

  I would like to know the answer, if anyone find out, please leave me a message.

Codes and model

  As usual, I place my codes at github, model at mega.

  If you liked this article, please help others find it by clicking the little g+ icon below. Thanks a lot!

Sunday, 3 September 2017

Wrong way to use QThread

  There are two ways to use QThread, first solution is inherit QThread and override run function, the other solution is create a controller. Today I would like to talk about second solution and show you how to misused QThread(general gotcha of QThread).

  The most common error I see is calling the function of the worker directly, please do not do that, because in this way, your worker will not work on another thread but the thread you are calling it.

  Allow me prove this to you by a small example. I do not separate implementation and declaration in this post because this make the post easier to read.

case 1 : Call by function

1 : let us create a very simple, naive worker, this worker must be an QObject, because we need to move the worker into QThread.

class naive_worker : public QObject
    explicit naive_worker(QObject *obj = nullptr);

    void print_working_thread()

2 : create a dead simple gui by QtDesigner. Button "Call by normal function" will call the function "print_working_thread", directly, button "Call by signal and slot" will call the "print_working_thread" by signal and slot, "Print current thread address" will print the address of main thread(gui thread).

3 : Create a controller

class naive_controller : public QObject
    explicit naive_controller(QObject *parent = nullptr):
    worker_(new naive_worker)
        //move your worker to thread, so Qt know how to handle it
        //your worker should not have a parent before calling

        connect(&thread_, &QThread::finished, worker_, &QObject::deleteLater);

        //this connection is very important, in order to make worker work on the thread
        //we move to, we have to call it by the mechanism of signal and slot
        connect(this, &naive_controller::print_working_thread_by_signal_and_slot,
                worker_, &naive_worker::print_working_thread);


    void print_working_thread_by_normal_call()

    void print_working_thread_by_signal_and_slot();

    QThread thread_;
    naive_worker *worker_;

4 : Call it by two different functions and compare their address.

class MainWindow : public QMainWindow
    explicit MainWindow(QWidget *parent = nullptr);

private slots:
    void on_pushButtonPrintCurThread_clicked()
        //this function will be called when 
        //"Print current thread address" is clicked
    void on_pushButtonCallNormalFunc_clicked()
        //this function will be called when
        //"Call by normal function" is clicked

    void on_pushButtonCallSignalAndSlot_clicked()
       //this function will be called when
       //"Call by signal and slot" is clicked

    naive_controller *controller_;
    naive_worker *worker_;
    Ui::MainWindow *ui;

5. Run the app and click the button with following order. "print_working_thread"->"Call by normal function"->"Call by signal and slot" and see what happen. Following are my results

QThread(0x1bd25796020) //call "print_working_thread"
QThread(0x1bd25796020) //call "Call by normal function"
QThread(0x1bd2578bf70) //call "Call by signal and slot"

    Apparently, to make our worker run in the QThread we moved to, we have to call it through signal and slot machanism, else it will execute in the same thread. You may ask, this is too complicated, do we have an easier way to spawn a thread? Yes we do, you can try QtConcurrent::run and std::async, they are easier to use compare with QThread(it is a regret that c++17 fail to include future.then) , I use QThread when I need more power, like thread communication, queue operation.

Source codes

    Located at github.

Monday, 28 August 2017

Deep learning 10-Let us create a semantic segmentation model(LinkNet) by PyTorch

  Deep learning, in recent years this technique take over many difficult tasks of computer vision, semantic segmentation is one of them. The first segmentation net I implement is LinkNet, it is a fast and accurate segmentation network. 


Q : What is LinkNet?

A :  LinkNet is a convolution neural network designed for semantic segmentation. This network is 10 times faster than SegNet and more accurate.

Q : What is semantic segmentation? Any difference with segmentation?

A :  Of course they are difference. Segmentation partition image into several "similar" parts, but you do not know what are those parts presents. On the other hand, semantic segmentation partition the image into different pre-determined labels. Those labels are present as color as the end results. For example, checkout the following images(from camvid).

Q : Semantic segmentation sounds like object detection, are they the same thing?

A : No, they are not, although you may achieve the same goal by both of them.
From the aspect of tech, they use different approach. From the view of end results, semantic segmentation tell you what are those pixels are, but they do not tell you how many instance in your images, object detection show you how many instance in your images by minimal bounding box, but it do not give you delienation of objects. For example, checkout below images(from yolo).

Network architectures

  LinkNet paper describe their network architecture with excellent graphs and simple descriptions, following are the figures copy shameless from the paper.

  LinkNet adopt encoder-decoder architecture, according to the paper, LinkNet performance or come from adding the output of encoder to the decoder, this help the decoder easier to recover the information. If you want to know the details, please study section 3 of the paper, it is nice writing, very easy to understand.

Q : The paper is easy to read, but they do not explain what is full convolution, could you tell me what that means?

A :  Full convolution indicates that the neural network is composed of convolution layers and activation only, without any full connection or pooling layers. 

Q : How do they perform down-sampling without pooling layers?

A : Make the stride of convolution as 2 x 2 and do zero padding, if you cannot figure it out why this work, I suggest you create an excel file, write down some data and do some experiment.

Q : Which optimizer work best?

A : According to the paper, rmsprop is the winner, my experiments told me the same thing too, in case you are interesting, below are the graph of training loss. From left to right is rmsprop, adam, sgd. Hyper parameters are

Initial learning rate : adam and rmsprop are 5e-4, sgd is 1e-3
Augmentation : random crop(480,320) and horizontal flip
Normalize : subtract mean(based on imagenet mean value) and divided by 255
Batch size : 16
Epoch : 800
Training examples : 368

  The results of adam and rmsprop are very close. Loss of sgd steadily decrease, but it converge very slow even with higher learning rate, maybe higher learning rate would work better for SGD.

Data pre-processing

  Almost every computer vision task need you to pre-process your data, segmentation is not an exception, following are my steps.

1 : Convert the color do not exist in the category into void(0, 0, 0)
2 : Convert the color into integer
3 : Zero mean(mean value come from imagenet)

Experiment on camvid

  Enough of Q&A, let us have some benchmark and pictures­čśŐ.

  Model 1,2,3 all train with same parameters, pre-processing but with different input size when training, they are (128,128), (256,256), (512, 512). When testing, the size of the images are (960,720).

  Following are some examples, from left to right is original image, ground truth and predicted image.

  Results looks quite good and  IoU is much better than the paper, possible reasons are

1 : I augment the data by random crop and horizontal flip, the paper may use another methods or do not perform augmentation at all(?).

2 : My pre-processing are different with the paper

3 : I did not omit void when training

4 : My measurement on IoU is wrong

5 : My model is more complicated than the paper(wrong implementation)

6 : It is overfit

7 : Random shuffle training and testing data create data leakage because many images of camvid
are very similar to each other

Trained models and codes

1 : As usual, located at github.
2 :  Model trained with 368 images, 12 labels(include void), random crop (128x128),800 epoch
3 :  Model trained with 368 images, 12 labels(include void), random crop (480x320),800 epoch
4 :  Model trained with 368 images, 12 labels(include void), random crop (512x512),800 epoch


Q : Is it possible to create portable model by PyTorch?

A : It is possible, but not easy. you could check out ONNX and caffe2 if you want to try it. Someone manage to convert pytorch model to caffe model and loaded by opencv dnn. Right now opencv dnn do not support PyTorch but PyTorch. Thanks god opencv dnn can import model trained by torch  at ease(right now opencv dnn do not support nngraph).

Q : What are IoU and iIoU in the paper refer to?

A : This page give good definition, although I still can't figure out how to calculate iIoU.

  If you liked this article, please help others find it by clicking the little g+ icon below. Thanks a lot!