Qt and openCV

Enter non-ascii charaters into the app created by Qt for webassembly

2021-03-13T03:08:00.003-08:00

With the help of wasm, we are able to port the apps written by Qt to the browser, this save us a lot of times in many cases, but same as android/ios, wasm of Qt do have their limitations and the Qt company do not have much interest to fix those issues, I guess this is because major market of Qt are automobile, IOT etc, but not mobile phone or browser.

In this post I would like to write down how do I get around the issue 78826--Cannot enter non ascii characters to the TextField and QLineEdit from browser(wasm).

Limitations

1.Cannot access system font

Download the font you want to display/enter first, unlike desktop/mobile, Qt for wasm cannot access system font. Today I would use the font download from oppo as an example. After you download the ttf, compress it by qCompress in order to reduce the size.

QFile file("OPPOSans-H.ttf");
if(file.open(QIODevice::ReadOnly)){
    auto const contents = qCompress(file.readAll(), 9);
    QFile fwrite("OPPOSans-H.zip");
    if(fwrite.open(QIODevice::WriteOnly)){
        fwrite.write(contents);
    }else{
        qDebug()<<__func__<<": cannot write file";
    }
}else{
    qDebug()<<__func__<<": cannot open file";
}

After that, add the zip file into the Qt resource system, load it and set it as the font of the QPlainTextEdit.

//MainWindow.cpp, set font for QPlainTextEdit
QFont const monospace(font_manager().get_font_family());
ui->plainTextEdit->setFont(monospace);

//font_manager.cpp, a class use to register the font and access the font id
font_manager::font_manager()
{
    QFile ifile(":/assets/OPPOSans-H.zip");
    if(ifile.open(QIODevice::ReadOnly)){
        auto const font_id = QFontDatabase::addApplicationFontFromData(qUncompress(ifile.readAll()));
        qDebug()<<__func__<<"font id:"<<font_id;

        font_family_ = QFontDatabase::applicationFontFamilies(font_id).at(0);
    }
}

QFont font_manager::get_font() const
{
    return QFont(get_font_family());
}

QString font_manager::get_font_family() const noexcept
{
    return font_family_;
}

2. Cannot enter Chinese from QPlainTextEdit

This issue is quite annoying, but we do have a "stupid" solution to overcome this problem. The way I found is use the prompt dialog of js library to input the text. The library I pick called bootbox. You can get around this limit by following steps

a. Use js to enter the text

var _global_text_process_result = '';

//You need to allocate memory before return the array to the c++ side

function getStringFromJS(targetStr, msg)
{
    console.log(msg + ":" + targetStr);
    var jsString = targetStr;
    var lengthBytes = lengthBytesUTF8(jsString)+1;
    var stringOnWasmHeap = _malloc(lengthBytes);
    stringToUTF8(jsString, stringOnWasmHeap, lengthBytes);
    console.log(msg + " heap:" + stringOnWasmHeap);

    return stringOnWasmHeap;
}

//call the prompt dialog of bootbox

function multiLinesPrompt(inputString, inputMode, inputTitle)
{
    bootbox.prompt({        
	    title: inputTitle,
            inputType: "textarea",
            value: inputString,
                
            callback: function (result) { console.log(result); _global_text_process_result = result; }
    });
}

//This function use to avoid the text update repeatly

function globalTextProcessResultIsValid()
{
    return _global_text_process_result !== ""
}

function getGlobalTextProcessResult()
{    
    results = getStringFromJS(_global_text_process_result, "js getGlobalTextProcessResult")
    _global_text_process_result = ""
    
    return results	
}

b. Call the js function from c++

#include "custom_qplain_text_edit.hpp"

#include <QDebug>
#include <QTimer>

#ifdef Q_OS_WASM

#include <emscripten.h>

namespace{

EM_JS(char*, global_text_process_result_is_valid, (), {
          return globalTextProcessResultIsValid();
      })

EM_JS(char*, get_global_text_process_result, (), {
          return getGlobalTextProcessResult();
      })

EM_JS(void, multi_lines_prompt, (char const *input_strings, char const *input_mode, char const *title), {
          multiLinesPrompt(UTF8ToString(input_strings), UTF8ToString(input_mode), UTF8ToString(title));
      })

}

#endif

custom_qplain_text_edit::custom_qplain_text_edit(QWidget *parent) :
    QPlainTextEdit(parent)
{
#ifdef Q_OS_WASM

    timer_ = new QTimer(this);
    timer_->setInterval(100);

    connect(timer_, &QTimer::timeout, [this]()
    {
        if(global_text_process_result_is_valid()){
            char *msg = get_global_text_process_result();
            setPlainText(QString(msg));
            free(msg);
            timer_->stop();
        }
    });
#endif
}

void custom_qplain_text_edit::mousePressEvent(QMouseEvent *e)
{
#ifdef Q_OS_WASM
    if(e->button() == Qt::RightButton){
        QPlainTextEdit::mousePressEvent(e);
    }else{
        multi_lines_prompt(toPlainText().toUtf8().data(), "textarea", "Input plain text");
        timer_->start();
    }
#else
    QPlainTextEdit::mousePressEvent(e);
#endif
}

c. Include js script and css in generated html file

To be honest, this is really troublesome, and I hope that one day this shortcoming can be resolved.

Source codes

Github

Full package, include the fonts and js libraries

Dense extreme inception edge detection with opencv and deep learning

2020-06-25T13:31:00.000-07:00

This tutorial introduce how to perform edge detection with opencv and deep learning. You will learn how to apply Dense Extreme Inception Network(DexiNed) and Holistically-Nested Edge Detection (HED) to images and videos. If you were interesting about HED, I recommend you study a brilliant post of pyimagesearch, I would explain the main points of how to perform edge detection by these networks with c++ and opencv.

Apply HED by opencv and c++

This page explain how to do it, you can find the link of the model and prototxt from there too. The author register a new layer, without it opencv cannot generate proper results.

Apply DexiNed by opencv and c++

You can find out the explanation of DexiNed from this page, in order to perform edge detection by DexiNed, we need to convert the model to onnx, I prefer pytorch for this purpose. Why I do not prefer tensorflow? Because convert the model of tensorflow to the format opencv can read is much more complicated from my ex experiences, tensorflow is feature rich but I always feel like they are trying very hard to make things unnecessary complicated, their notoriously bad api design explain this very well.

1.Convert pytorch model of DexiNed to onnx

Clone the project blogCodes2
Navigate into edges_detection_with_deep_learning
Clone the project DexiNed
Copy the file model.py in edges_detection_with_deep_learning/model.py into DexiNed/DexiNed-Pytorch
Run the script to_onnx.py

If you do not want to go through the trouble, just download the model from here.

2.Load and forward image by DexiNed and HED

void switch_to_cuda(cv::dnn::Net &net)
{
    try {
        for(auto const &vpair : cv::dnn::getAvailableBackends()){
            std::cout<<vpair.first<<", "<<vpair.second<<std::endl;
            if(vpair.first == cv::dnn::DNN_BACKEND_CUDA && vpair.second == cv::dnn::DNN_TARGET_CUDA){
                std::cout<<"can switch to cuda"<<std::endl;
                net.setPreferableBackend(cv::dnn::DNN_BACKEND_CUDA);
                net.setPreferableTarget(cv::dnn::DNN_TARGET_CUDA);
                break;
            }
        }
    }catch(std::exception const &ex){
        net.setPreferableBackend(cv::dnn::DNN_BACKEND_DEFAULT);
        net.setPreferableTarget(cv::dnn::DNN_TARGET_CPU);
        throw std::runtime_error(ex.what());
    }
}

std::tuple<cv::Mat, long long> forward_utils(cv::dnn::Net &net, cv::Mat const &input, cv::Size const &blob_size)
{
    using namespace std::chrono;

    //measure duration
    auto const start = high_resolution_clock::now();
    cv::Mat blob = cv::dnn::blobFromImage(input, 1.0, blob_size,
                                          cv::Scalar(104.00698793, 116.66876762, 122.67891434), false, false);
    net.setInput(blob);
    cv::Mat out = net.forward();
    cv::resize(out.reshape(1, blob_size.height), out, input.size());
    //the data type of out is CV_32F(single channel, floating point) so we need to upscale the value and convert
    //it to CV_8U(single channel, uchar)
    out *= 255;
    out.convertTo(out, CV_8U);
    //convert gray to bgr because we need to create montage(1 row, 3 column of images in our case)
    auto const finish = high_resolution_clock::now();
    auto const elapsed = duration_cast<milliseconds>(finish - start).count();
    cv::cvtColor(out, out, cv::COLOR_GRAY2BGR);

    return {out, elapsed};
}

class hed_edges_detector
{
public:
    hed_edges_detector(std::string const &weights, std::string const &config) :
        net_(cv::dnn::readNet(config, weights))
    {
        switch_to_cuda(net_);
    }

    long long elapsed() const
    {
        return elapsed_;
    }

    cv::Mat forward(cv::Mat const &input)
    {
        auto result = forward_utils(net_, input, {500, 500});
        elapsed_ += std::get<1>(result);
        return std::get<0>(result);
    }

private:
    long long elapsed_ = 0;
    cv::dnn::Net net_;
};

class dexi_edges_detector
{
public:
    explicit dexi_edges_detector(std::string const &model) :
        net_(cv::dnn::readNet(model))
    {
        switch_to_cuda(net_);
    }

    long long elapsed() const
    {
        return elapsed_;
    }

    cv::Mat forward(cv::Mat const &input)
    {
        auto result = forward_utils(net_, input, {400, 400});
        elapsed_ += std::get<1>(result);
        return std::get<0>(result);
    }

private:
    long long elapsed_ = 0;
    cv::dnn::Net net_;
};

3. Detect edges of image

void test_image(std::string const &mpath)
{
    cv::Mat img = cv::imread("2007_000129.jpg");
    hed_edges_detector hed(mpath + "hed_pretrained_bsds.caffemodel", mpath + "deploy.prototxt");
    auto hed_out = hed.forward(img);

    dexi_edges_detector dexi(mpath + "24_model.onnx");
    auto dexi_out = dexi.forward(img);

    cv::Size const frame_size(img.cols, img.rows);
    int constexpr grid_x = 3;
    int constexpr grid_y = 1;
    ocv::montage mt(frame_size, grid_x, grid_y);
    mt.add_image(img);
    mt.add_image(hed_out);
    mt.add_image(dexi_out);

    cv::imshow("results", mt.get_montage());
    cv::imwrite("results2.jpg", mt.get_montage());
    cv::waitKey();
}

4. Detect edges of video

void test_video(std::string const &mpath)
{
    cv::VideoCapture cap("pedestrian.mp4");
    if(cap.isOpened()){
        hed_edges_detector hed(mpath + "hed_pretrained_bsds.caffemodel", mpath + "deploy.prototxt");
        dexi_edges_detector dexi(mpath + "24_model.onnx");

        //unique_ptr is a resource manager class(smart pointer) of c++,
        //we allocate memory by the reset(or make_unique) api,
        //after leaving the scope(scope is surrounded by {}), the memory will be released. In c++, the
        //best way of manage the resource is avoid explicit memory allocation, if you really need to do it,
        //guard your memory by smart pointer. I use unique_ptr at here because I cannot
        //initialize the objects before I know the frame size of the video.
        std::unique_ptr<ocv::montage> mt;
        std::unique_ptr<cv::VideoWriter> vwriter;

        cv::Mat frame;
        float frame_count = 0;
        while(1){
            cap>>frame;
            if(frame.empty()){
                break;
            }

            ++frame_count;
            cv::resize(frame, frame, {}, 0.5, 0.5);
            auto const hed_out = hed.forward(frame);
            auto const dexi_out = dexi.forward(frame);
            if(!mt){
                //initialize the class to create montage
                //First arguments tell the class the size of each frame
                cv::Size const frame_size(frame.cols, frame.rows);
                int constexpr grid_x = 3;
                int constexpr grid_y = 1;
                mt.reset(new ocv::montage(frame_size, grid_x, grid_y));
            }
            if(!vwriter){
                auto const fourcc = cv::VideoWriter::fourcc('F', 'M', 'P', '4');
                int constexpr fps = 30;
                //because the montage is 3 columns and 1 row, so the cols need to multiply by 3
                vwriter.reset(new cv::VideoWriter("out.avi", fourcc, fps, {frame.cols * 3, frame.rows}));
            }
            mt->add_image(frame);
            mt->add_image(hed_out);
            mt->add_image(dexi_out);

            auto const montage = mt->get_montage();
            cv::imshow("out", mt->get_montage());
            vwriter->write(montage);
            cv::waitKey(10);
            mt->clear();
        }
        std::cout<<"hed elapsed time = "<<hed.elapsed()<<", frame count = "<<frame_count
                <<", fps = "<<1000.0f/(hed.elapsed()/frame_count)<<std::endl;
        std::cout<<"dexi elapsed time = "<<dexi.elapsed()<<", frame count = "<<frame_count
                <<", fps = "<<1000.0f/(dexi.elapsed()/frame_count)<<std::endl;
    }else{
        std::cerr<<"cannot open video pedestrian.mp4"<<std::endl;
    }
}

Results of image detection

img_00

img_01

Results of video detection

Runtime performance on gpu(gtx 1060)

Following results are based on the video I posted on youtube.The video has 733 images. From left to right is original frame, frame processed by HED, frame processed by DexiNed.

    HED elapsed time is 43870ms, fps is 16.7085.
    DexiNed elapsed time is 45149ms, fps is 16.2351.

    The crop layer of HED do not support cuda, it should become faster after the cuda layer is done.

Source codes

Located at github.

Asynchronous video capture written by opencv and Qt

2020-06-09T12:05:00.003-07:00

Before we start

If you do not familiar with QThread, the document of Qt5 show us how to use QThread properly already, please check it out by google(keyword is "QThread doc", or you could open the page by this link), and read this post, it may save you a lot of troubles.
If you do not familiar with thread, I suggest you read the book c++ concurrency in action, chapter 1~4 and basic atomic knowledge from this site should be more than enough for most of the tasks.

Why do we need asynchronous video capture

Performance, VideoCapture could be slow when capturing the frame from rtsp protocol, especially when the frame size is big, the ideal solution is capture the frame and process the frame in different thread.
Do not freeze the ui. cv::waitKey is a blocking operation, it is a bad idea to use it directly in gui programming.

Dependencies

    opencv4(using 4.3.0 in this tutorial)
    Qt5(using 5.13.2 in this tutorial)

    If you do not want to register a new account in order to download Qt5(they did a great job to piss off open source communities), you can try this link.

Define interfaces for worker

In order to make us easier to switch the frame capture in the future, I create a worker_base class for this purpose.

frame_capture_config.hpp

#ifndef FRAME_CAPTURE_CONFIG_HPP
#define FRAME_CAPTURE_CONFIG_HPP

#include <QString>

namespace frame_capture{

struct frame_capture_config
{
    //True will copy the frame captured, useful if the functors
    //you add work in different thread
    bool deep_copy_ = false;
    int fps_ = 30;
    QString url_;
};

}

#endif // FRAME_CAPTURE_CONFIG_HPP

frame_worker_base.hpp

#ifndef FRAME_WORKER_BASE_HPP
#define FRAME_WORKER_BASE_HPP

#include <opencv2/core.hpp>

#include <QObject>

#include <functional>

namespace frame_capture{

struct frame_capture_config;

class worker_base : public QObject
{
    Q_OBJECT
public:
    explicit worker_base(QObject *parent = nullptr);

    /**
     * Add listener to process the frame
     * @param functor Process the frame
     * @param key Key of the functor, we need it to remove the functor
     * @return True if able to add the listener and vice versa
     */
    virtual bool add_image_listener(std::function<void(cv::Mat)> functor, void *key) = 0;

    virtual frame_capture_config get_params() const = 0;
    virtual QString get_url() const = 0;
    virtual bool is_stop() const = 0;
    /**
     * This function will stop the frame capturer, release all functor etc
     */
    virtual void release() = 0;
    /**
     * Remove the listener
     * @param key The key same as add_image_listener when you add the functor
     * @return True if able to remove the listener and vice versa
     * @warning Remember to remove_image_listener before the resources of the register functor
     * released, else the app may crash.
     */
    virtual bool remove_image_listener(void *key) = 0;
    virtual void set_max_fps(int input) = 0;
    virtual void set_params(frame_capture_config const &config) = 0;
    /**
     * Will start the frame captured with the url set by the start_url api
     */
    virtual void start() = 0;
    virtual void start_url(QString const &url) = 0;
    virtual void stop() = 0;

signals:
    void cannot_open(QString const &media_url);    
};

}

#endif // FRAME_WORKER_BASE_HPP

Implement frame capture by opencv

Compare with another libraries like gstreamer, ffmpeg, libvlc, Qt etc. cv::VideoCapture has the simplest api to capture the frame, although c++ api of this class do not work on android yet(you have to use jni, this make porting the app to android become more troubles).

frame_capture_opencv_worker.hpp

#ifndef FRAME_CAPTURED_OPENCV_WORKER_HPP
#define FRAME_CAPTURED_OPENCV_WORKER_HPP

#include "frame_worker_base.hpp"

#include <QObject>

#include <opencv2/core.hpp>
#include <opencv2/highgui.hpp>
#include <opencv2/video.hpp>

#include <atomic>
#include <functional>
#include <map>
#include <mutex>

class QTimer;

namespace frame_capture{

struct frame_capture_config;

class capture_opencv_worker : public worker_base
{
    Q_OBJECT
public:
    explicit capture_opencv_worker(frame_capture_config const &config);
    ~capture_opencv_worker() override;

    bool add_image_listener(std::function<void(cv::Mat)> functor, void *key) override;
    frame_capture_config get_params() const override;
    QString get_url() const override;
    void set_max_fps(int input) override;

    bool is_stop() const override;
    void release() override;
    bool remove_image_listener(void *key) override;
    void set_params(frame_capture_config const &config) override;
    void start() override;
    void start_url(QString const &url) override;
    void stop() override;

private:
    void open_media(QString const &media_url);
    void captured_frame();
    void set_max_fps_non_ts(int input);
    void time_out();

    cv::VideoCapture capture_;
    bool deep_copy_ = false;
    int frame_duration_ = 30;
    std::map<void*, std::function<void(cv::Mat)>> functors_;
    int max_fps_;
    QString media_url_;
    mutable std::mutex mutex_;
    bool stop_;
    QTimer *timer_ = nullptr;
    int webcam_index_ = 0;
};

}

#endif // FRAME_CAPTURED_OPENCV_WORKER_HPP

frame_capture_opencv_worker.cpp

Please read the comments carefully, they may help you avoid subtle bugs.

#include "frame_capture_opencv_worker.hpp"

#include "frame_capture_config.hpp"

#include <QDebug>
#include <QElapsedTimer>
#include <QThread>

#include <QTimer>

#include <chrono>
#include <thread>

namespace frame_capture{

capture_opencv_worker::capture_opencv_worker(frame_capture_config const &config) :
    worker_base(),
    deep_copy_(config.deep_copy_),
    media_url_(config.url_),
    stop_(true)
{
    set_max_fps(config.fps_);
}

capture_opencv_worker::~capture_opencv_worker()
{
    release();
}

bool capture_opencv_worker::add_image_listener(std::function<void (cv::Mat)> functor, void *key)
{    
    std::lock_guard<std::mutex> lock(mutex_);
    return functors_.insert(std::make_pair(key, std::move(functor))).second;
}

frame_capture_config capture_opencv_worker::get_params() const
{
    std::lock_guard<std::mutex> lock(mutex_);
    frame_capture_config config;
    config.deep_copy_ = deep_copy_;
    config.fps_ = max_fps_;
    config.url_ = media_url_;

    return config;
}

QString capture_opencv_worker::get_url() const
{
    std::lock_guard<std::mutex> lock(mutex_);
    return media_url_;
}

void capture_opencv_worker::set_max_fps(int input)
{
    std::lock_guard<std::mutex> lock(mutex_);
    set_max_fps_non_ts(input);
}

void capture_opencv_worker::open_media(const QString &media_url)
{   
    qDebug()<<__func__<<": "<<media_url;
    bool can_convert_to_int = 0;
    if(timer_){
        timer_->stop();
    }
    media_url.toInt(&can_convert_to_int);
    stop_ = true;
    try{
        capture_.release();
        //If you pass in int, opencv will open webcam if it could
        if(can_convert_to_int){
            capture_.open(media_url.toInt());
        }else{
            capture_.open(media_url.toStdString());
        }
    }catch(std::exception const &ex){
        qDebug()<<__func__<<ex.what();
    }

    if(capture_.isOpened()){
        stop_ = false;        
    }else{
        stop_ = true;
        emit cannot_open(media_url);
    }
}

bool capture_opencv_worker::is_stop() const
{    
    std::lock_guard<std::mutex> lock(mutex_);
    return stop_;
}

void capture_opencv_worker::release()
{
    qDebug()<<__func__<<": delete cam with url = "<<media_url_;
    std::lock_guard<std::mutex> lock(mutex_);
    qDebug()<<__func__<<": enter lock region";
    stop_ = true;
    qDebug()<<__func__<<": clear functor";
    functors_.clear();
    qDebug()<<__func__<<": release capture";
    capture_.release();
    qDebug()<<__func__<<": delete timer later";
}

bool capture_opencv_worker::remove_image_listener(void *key)
{    
    std::lock_guard<std::mutex> lock(mutex_);
    return functors_.erase(key) > 0;
}

void capture_opencv_worker::set_params(const frame_capture_config &config)
{
    std::lock_guard<std::mutex> lock(mutex_);
    deep_copy_ = config.deep_copy_;
    media_url_ = config.url_;
    set_max_fps_non_ts(config.fps_);
}

void capture_opencv_worker::start()
{
    start_url(media_url_);
}

void capture_opencv_worker::start_url(QString const &url)
{    
    stop();
    qDebug()<<__func__<<": stop = "<<stop_<<", url = "<<url;
    std::lock_guard<std::mutex> lock(mutex_);
    open_media(url);
    if(capture_.isOpened()){
        media_url_ = url;
        captured_frame();
    }
}

void capture_opencv_worker::stop()
{        
    std::lock_guard<std::mutex> lock(mutex_);
    stop_ = true;
    qDebug()<<__func__<<": stop the worker = "<<stop_;
}

void capture_opencv_worker::captured_frame()
{        
    qDebug()<<__func__<<": capture_.isOpened()";

    //You must initialize and delete timer in the same thread
    //of the VideoCapture running, else you may trigger undefined
    //behavior
    if(!timer_){
        qDebug()<<__func__<<": init timer";
        timer_ = new QTimer;
        timer_->setSingleShot(true);
        connect(timer_, &QTimer::timeout, this, &capture_opencv_worker::time_out);
    }

    qDebug()<<__func__<<": start timer";
    timer_->start();
    qDebug()<<__func__<<": called start timer";
}

void capture_opencv_worker::set_max_fps_non_ts(int input)
{
    max_fps_ = std::max(input, 1);
    frame_duration_ = std::max(1000 / max_fps_, 1);
}

void capture_opencv_worker::time_out()
{        
    QElapsedTimer elapsed;
    elapsed.start();
    std::lock_guard<std::mutex> lock(mutex_);    
    if(!stop_ && timer_){
        capture_.grab();
        cv::Mat frame;
        capture_.retrieve(frame);
        if(!frame.empty()){
            for(auto &iter : functors_){
                iter.second(deep_copy_ ? frame.clone() : frame);
            }
            auto const interval = frame_duration_ - elapsed.elapsed();
            timer_->start(std::max(static_cast<int>(interval), 10));
        }else{
            open_media(media_url_);
            timer_->start();
        }
    }else{
        capture_.release();
        if(timer_){
            //you must delete timer in the thread where you initiaize it
            delete timer_;
            timer_ = nullptr;
        }
    }
}

}

Create controller associate with the worker

frame_capture_controller.hpp

#ifndef FRAME_CAPTURE_OPENCV_CONTROLLER_HPP
#define FRAME_CAPTURE_OPENCV_CONTROLLER_HPP

#include <QObject>
#include <QThread>
#include <QVariant>

#include <opencv2/core.hpp>

#include <functional>

namespace frame_capture{

struct frame_capture_config;

class worker_base;

class capture_controller : public QObject
{
    Q_OBJECT
public:
    explicit capture_controller(frame_capture_config const &config);
    ~capture_controller() override;

    bool add_image_listener(std::function<void(cv::Mat)> functor, void *key);
    frame_capture_config get_params() const;
    QString get_url() const;
    bool is_stop() const;
    bool remove_image_listener(void *key);
    void set_max_fps(int input);
    void set_params(frame_capture_config const &config);    
    void stop();

signals:
    void cannot_open(QString const &media_url);
    void reach_the_end();

    void start();
    void start_url(QString const &url);

private:
    void init_frame_capture();

    worker_base *frame_capture_;
    QThread thread_;
};

}

#endif // FRAME_CAPTURE_OPENCV_CONTROLLER_HPP

frame_capture_controller.cpp

#include "frame_capture_controller.hpp"

#include "frame_capture_config.hpp"

#include "frame_capture_opencv_worker.hpp"

#include <QDebug>

namespace frame_capture{

void capture_controller::init_frame_capture()
{
    frame_capture_->moveToThread(&thread_);
    connect(&thread_, &QThread::finished, frame_capture_, &QObject::deleteLater);

    connect(frame_capture_, &worker_base::cannot_open, this, &capture_controller::cannot_open);    

    connect(this, &capture_controller::start, frame_capture_, &worker_base::start);
    connect(this, &capture_controller::start_url, frame_capture_, &worker_base::start_url);

    thread_.start();
}

capture_controller::capture_controller(frame_capture_config const &config) :
    QObject(),
    frame_capture_(new capture_opencv_worker(config))
{
    init_frame_capture();
}

capture_controller::~capture_controller()
{        
    qDebug()<<__func__<<": quit";
    //must called release or before quit and wait, else the
    //frame capture will fall into infinite loop
    frame_capture_->release();
    thread_.quit();
    qDebug()<<__func__<<": wait";
    thread_.wait();
    qDebug()<<__func__<<": wait exit";
}

bool capture_controller::add_image_listener(std::function<void (cv::Mat)> functor, void *key)
{
    return frame_capture_->add_image_listener(std::move(functor), key);
}

frame_capture_config capture_controller::get_params() const
{
    return frame_capture_->get_params();
}

QString capture_controller::get_url() const
{
    return frame_capture_->get_url();
}

bool capture_controller::is_stop() const
{
    return frame_capture_->is_stop();
}

bool capture_controller::remove_image_listener(void *key)
{
    return frame_capture_->remove_image_listener(key);
}

void capture_controller::set_max_fps(int input)
{
    frame_capture_->set_max_fps(input);
}

void capture_controller::set_params(const frame_capture_config &config)
{
    frame_capture_->set_params(config);
}

void capture_controller::stop()
{
    frame_capture_->stop();
}

}

Do we need mutex?

With current solution, yes, we need it. But we could avoid mutex if we declare the other api as signal and connect them to the worker, just like the start and start_url signals of the frame_capture_controller did.

How to use it?

You can find the answer from tthis file--mainwindow.cpp. For simplicity, I did not put the functor into another thread.

Example

Warning

Remeber to release the listener of the frame capture if the resources associate will be deleted.

Asynchronous computer vision algorithm

2019-03-27T02:13:00.001-07:00

In last post I introduce how to create an asynchronous class to capture the frame by cv::VideoCapture, today I would show you how to create an asynchronous algorithm which could be called a lot of times without re-spawn any new thread.

Main flow of async_to_gray_algo

async_to_gray_algo is a small class which convert the image from bgr channels to gray image in another thread. If you have use any thread pool library before, all of them are using similar logic under the hood, but with more generic, flexible api.

async_to_gray_algo::async_to_gray_algo(cv::Mat &result, std::mutex &result_mutex) :
    result_(result),
    result_mutex_(result_mutex),
    stop_(false)
{
    auto func = [&]()
    {
        //1. In order to reuse the thread, we need to keep it alive
        //that is why we should put it in an infinite for loop
        for(;;){
            unique_lock<mutex> lock(mutex_);
            //2. use condition_variable to replace sleep(x milliseconds) is more efficient
            wait_.wait(lock, [&]() //wait_ will acquire the lock if condition satisfied
            {
                return stop_ || !input_.empty();
            });
            //3. stop the thread in destructor
            if(stop_){
                return;
            }

            //4. convert and write the results into result_
            //we need gmutex to synchronize the result_, else it may incur
            //race condition in the main thread.
            {
                lock_guard<mutex> glock(result_mutex);
                cv::cvtColor(input_, result_, COLOR_BGR2GRAY);
            }
            //5: clear the input_, else the wait_ variable may wake up and continue the task
            //due to spurious wake up
            input_.resize(0);
        }
    };
    thread_ = std::thread(func);
}

After we initialize the thread, all we need to do is call it by the process api whenever we need to convert image from bgr channels to gray image.

void async_to_gray_algo::process(Mat input)
{
    {
        lock_guard<mutex> lock(mutex_);
        input_ = input;
    }

    //wait condition will acquire the mutex after it receive notification

    wait_.notify_one();
}

If we do not need this class anymore, we can and should stop it in the destructor, always followed the rule of RAII when you can is a best practices to keep your codes clean, robust and (much)easier to maintain(let the machine do the jobs of book keeping for humans).

async_to_gray_algo::~async_to_gray_algo()
{
    {
        lock_guard<mutex> lock(mutex_);
        stop_ = true;
    }
    wait_.notify_one();
    thread_.join();
}

What is spurious wake up?

That means the condition_variable may wake up even no notification(notify_one or notify_all) happened. This is one of the reason why we should not wait without a condition(Another reason lost wake up).

Do we have a better way to reuse the thread?

Yes, we have. The easiest solution is create a generic thread pool, you can check the codes of a simple thread pool at here. I would show you how to use it in the future.

Better way to pass the variable between different thread?

As you see, the way I communicate between main thread and the other thread are awkward, it will be a hell to maintain the source codes like that when your program become bigger and bigger. Fortunately, we have better way to pass the variable between different thread with the help of Qt5, by their signal and slot mechanism.Not to mention, Qt5 can help us make the codes much more easy to maintain.

Summary

The source codes of async_opencv_video_capture could find on github.

Asynchronous videoCapture of opencv

2019-03-23T05:51:00.001-07:00

Today I would like to introduce how to create an asynchronous videoCapture by opencv and standard library of c++. Captured video from HD video, especially the HD video from internet could be a time consuming task, it is not a good idea to waste the cpu cycle to wait the frame arrive, in order to speed up our app, or keep the gui alive, we better put the video capture part into another thread.

With the helps of thread facilities added since c++11, make the videoCapture of opencv support cross platform asynchronous read operation become a simple task, let us have a simple example.

#include <ocv_libs/camera/async_opencv_video_capture.hpp>
#include <opencv2/core.hpp>
#include <opencv2/highgui.hpp>

#include <iostream>
#include <mutex>

int main(int argc, char *argv[])
{    
    if(argc != 2){
        std::cerr<<"must enter url of media\n";
        return -1;
    }

    std::mutex emutex;
    //create the functor to handle the exception when cv::VideoCapture fail
    //to capture the frame and wait 30 msec between each frame
    long long constexpr wait_msec = 30;
    ocv::camera::async_opencv_video_capture<> cl([&](std::exception const &ex)
    {
        //cerr of c++ is not a thread safe class, so we need to lock the mutex
        std::lock_guard<std::mutex> lock(emutex);
        std::cerr<<"camera exception:"<<ex.what()<<std::endl;

        return true;
    }, wait_msec);
    cl.open_url(argv[1]);

    //add listener to process captured frame
    //the listener could process the task in another thread too,
    //to make things easier to explain, I prefer to process it in
    //the same thread of videoCapture
    cv::Mat img;
    cl.add_listener([&](cv::Mat input)
    {
        std::lock_guard<std::mutex> lock(emutex);
        img = input;
    }, &emutex);

    //execute the task(s)
    cl.run();

    //We must display the captured image at main thread but not
    //in the listener, because every manipulation related to gui
    //must perform in the main thread(it also called gui thread)
    for(int finished = false; finished != 'q';){
        finished = std::tolower(cv::waitKey(30));
        std::lock_guard<std::mutex> lock(emutex);
        if(!img.empty()){
            cv::imshow("frame", img);
        }
    }
}

Important details of async_opencv_video_capture

1. Create an infinite for loop to read the frame in another thread

void run()
{
    if(thread_){
        //before we start the thread,
        //we need to stop it
        set_stop(true);
        //call join before task(s)
        //of the thread done
        thread_->join();
        set_stop(false);
    }

    //create a new thread
    create_thread();
}

    void create_thread()
    {
        thread_ = std::make_unique<std::thread>([this]()
        {
            //read the frames in infinite for loop
            for(cv::Mat frame;;){
                std::lock_guard<Mutex> lock(mutex_);
                if(!stop_ && !listeners_.empty()){
                    try{
                        cap_>>frame;
                    }catch(std::exception const &ex){
                        //reopen the camera if exception thrown ,this may happen frequently when you
                        //receive frames from network
                        cap_.open(url_);
                        cam_exception_listener_(ex);
                    }

                    if(!frame.empty()){
                        for(auto &val : listeners_){
                            val.second(frame);
                        }
                    }else{
                        if(replay_){
                            cap_.open(url_);
                        }else{
                            break;
                        }
                    }
                    std::this_thread::sleep_for(wait_for_);
                }else{
                    break;
                }
            }
        });
    }

The listeners_ is a vector which stores the std::function<cv::Mat> to be called in the infinite loop if the frame readed by the videoCapture was not empty. The users must handle the exceptions thrown by those functors by themselves else the app will crash.

2. Stop the thread in the destructor

void set_stop(bool val)
{
    std::lock_guard<Mutex> lock(mutex_);
    stop_ = val;
}

void stop()
{
    set_stop(true);
}

template<typename Mutex>
async_opencv_video_capture<Mutex>::~async_opencv_video_capture()
{
    stop();
    thread_->join();
}

We must stop and join the thread in the destructor, else the thread may never end and cause the app freeze.

3. Select mutex type by template

By default, async_opencv_video_capture use std::mutex, it is more efficient but may cause dead lock if you called the api of async_opencv_video_capture in the listeners. If you want to avoid dead lock this issue, use std::recursive_mutex to replace std::mutex.

Summary

The source codes of async_opencv_video_capture could find on github.

Build mxnet 1.3.1 on windows

2019-03-09T17:07:00.002-08:00

If you were like me, tried to build mxnet 1.3.1 on windows, you may suffer a lot of pains since mxnet do not have decent support on windows, apparently the developers of mxnet do not perform enough tests(maybe none) on windows before they release the stable version. Despite of all of the troubles mxnet brought, it is still a nice tool of deep learning, that is why I am still prefer to work with it.

I believe one of the best way to make the open source project become better is contribute something back to it, that is why I would like to write down how to build mxnet 1.3.1 on windows step by step.

1. Do not build mxnet on windows with intel mkl

Do not do this unless you are asking for trouble, please check the details on stackoverflow and issue 14343.

2. Build openBLAS with native msvc ABI

The openBLAS post at here do not work with vc2015 anymore(if you updated your vc2015), the abi are not compatible with msvc. The easiest solution to solve this issue is build the openBLAS by yourself.The steps are

a. Clone openBLAS of xianyi from github
b. Compile openBLAS as the instruction shown here. Do not install Anaconda and miniconda together, just pick one of them. If you do not know where is your vcvars64.bat on your pc, I suggest you use Everything to find the path.
c. Copy the files(cblas.h, f77blas.h) from the generated folder into the build folder.

3. Clone mxnet fork by me

git clone --recursive https://github.com/stereomatchingkiss/incubator-mxnet

cd mxnet

git checkout 1.3.1_win_compile_fix

    This branch fix some type mismatch errors.

`4. Comment out codes in shuffel_op.cu`

     This file is under the folder "mxnet\src\operator\random", there is a function ShuffleForwardGPU,

comment out the implementation else there will have a lot of compile times errors(no suitable

user-defined conversion from "mshadow::Tensor<mxnet::gpu, 1, mxnet::index_t>" to

"const mshadow::Tensor<mshadow::gpu, 1, unsigned int>" exists).

     I guess this function would not be called when doing inference task, after all who would like to

make their inference results become unpredictable? If you were like me, only want to use

cpp_package to do the inference task, you should be safe to comment out the codes.

`5. Open cmake`

  Open your cmake and select msvc with 64bits.

`6. Configuration`

`The most important note are`

`1. Do not use anything related to intel MKL.`

`2. Do not build cpp_package at the first time`

    Without mkl mxnet cannot exploit full power of the cpu, but with it your app cannot run at all,

depending on how you build it, your app may throw the error

"Intel MKL FATAL ERROR: Cannot load mkl_intel_thread.dll." or

"Check failed: MXNDArrayWaitToRead(blob_ptr_->handle_) == 0 (-1 vs. 0)"

If you do not need cuda, uncheck following options, USE_CUDA and USE_CUDNN. After that, click Configure->uncheck BUILD_TESTING->click Configure->click generate.

7. Build mxnet without cpp_package

    a. Open your ALL_BUILD.vcxproj
b. Navigate to the project "mxnet"
    c. Right click your mouse, select "Properties"
    d. Select Linker->Input
    e. Link to flangmain.lib, flangrti.lib, flang.lib, ompstub.lib. For example, my paths are

    C:\Users\yyyy\Anaconda3\pkgs\flang-5.0.0-he025d50_20180525\Library\lib\flangmain.lib
    C:\Users\yyyy\Anaconda3\pkgs\flang-5.0.0-he025d50_20180525\Library\lib\flangrti.lib
    C:\Users\yyyy\Anaconda3\pkgs\flang-5.0.0-he025d50_20180525\Library\lib\flang.lib
    C:\Users\yyyy\Anaconda3\pkgs\flang-5.0.0-he025d50_20180525\Library\lib\ompstub.lib

    If you do not know where are they, use Everything to find the path.

f. Navigate to the project "ALL_Build"
    g.Right click your mouse, click build.

8. Configure cmake to build with cpp_package

Now we can build mxnet with cpp_package, let us go to cmake again and change some settings.

a. (Optional)Change your install path, else you may not be able to install(ex : change to C:/Users/yyyy/programs/Qt/3rdLibs/mxnet/build_gpu_1_3_1_temp/install).

b. Make sure you have set the PATH of python, if you are building 32/64bits version of mxnet, you need python of 32/64bits, else you wouldn't be able to generate op.h. I suggest you use Rapid environment to manage your path on windows.

If your vc complain it cannot find the python exe, reopen your vc.

c. Check USE_CPP_PACKAGE->uncheck BUILD_TESTING->configure->generate

d. Remove the example projects since they will hinder the build process, those projects are alexnet, charRNN, googleNet, inception_bn, lenet, lenet_with_mxdataiter, mlp, mlp_cpu, mlp_gpu, resnset.

e. Go to your build/Release folder, copy the libmxnet.dll into any folder which could be found by
the windows(the path in the Path), let us assume that path call global_path.

f. Open your Developer command prompt(mine is developer command prompt for vs2015), let us call it DCP

g. Navigate your DCP to global_path

h. Enter "dumpbin /dependents libmxnet.dll", this command will show you the dependencies of this
dll. In my case, it show

flangrti.dll
flang.dll
ompstub.dll
cudnn64_7.dll
cublas64_92.dll
cufft64_92.dll
cusolver64_92.dll
curand64_92.dll
nvrtc64_92.dll
nvcuda.dll
KERNEL32.dll
VCOMP140.DLL

We only need to copy flangrti.dll, flang.dll, ompstub.dll into global_dll in order to generate op.h, because another dll already exist in the PATH. Again, please use Everything to find the path.

i. Your mxnet need to link to Link to flangmain.lib, flangrti.lib, flang.lib, ompstub.lib again since generate clear them.

j. Navigate to the project "ALL_Build"

k. Right click your mouse, click build.

l. Navigate to the project "INSTALL"

m. Right click your mouse, click build.

n. Copy cpp-package\include\mxnet-cpp into build/install/include

o. Copy mxnet\3rdparty\tvm\nnvm\include\nnvm into build/install/include

9. Add mx_float for scale, int for num_filter

The op.h generated by this solution, there are two parameters lack type declaration, you need to add them by yourself.

Conclusion

Congratulation, now you have build the mxnet successfully, to tell you the truth, this is not a pleasant journey, there are too many bugs/issues when I try to build mxnet1.3.1on windows(1.4.0 got more bugs on windows when you try to build it) , there are many bugs should be found before they release the major version if them have tried to build mxnet on windows. I believe windows and cpp_package are not their main concern yet, let us hope that

a. Someday they can put more love into windows and cpp_package. Windows still dominate market of desktop/laptop and cpp_package is a much better choice than python if you want to do edge deployment.

b. Adopt a commit system like opencv(whenever you commit your codes, opencv build it on every single platforms they support), this could prevent a log of bugs, the later you adopt, the more cost you need to pay for cross-platform.

c. Let us cross our finger, hope them can fix all of these bugs before next version release

Age and gender classification by opencv, dlib and mxnet

2019-02-24T01:45:00.002-08:00

In this post, I will show you how to build an age gender classification application with the infrastructures I created in the last post. Almost everything are same as before, except the part of parsing the NDArray in the forward function.

Before we dive into the source codes, let us have some examples. The images are predicted by two networks and concatenate, left side predicted by light model, right side predicted by heavy model based on resnet50.

The results do not looks bad for both of the models if we don't know their ages :), let us use the model to predict the ages of famous person, like trumps(with resnet50).

Unfortunately, results of age classification are not that good under different angles and expressions, this is because age classification from an image is very difficult, even human cannot accurately predict ages of persons by looking at a single image.

1. Difference of face recognition and age gender classification

The codes of parsing face recognition features are

std::vector<insight_face_key> result;
size_t constexpr feature_size = 512;
Shape const shape(1, feature_size);
for(size_t i = 0; i != batch_size; ++i){
    NDArray feature(features.GetData() + i * feature_size, shape, Context(kCPU, 0));
    result.emplace_back(std::move(feature));
}

The codes of parsing age and gender classification are

std::vector<insight_age_gender_info> result;
int constexpr features_size = 202;
for(size_t i = 0; i != batch_size; ++i){
    auto const *ptr = features.GetData() + i * features_size;
    insight_age_gender_info info;
    info.gender_ = ptr[0] > ptr[1] ? gender_info::female_ : gender_info::male_;
    for(int i = 2; i < features_size; i += 2){
        if(ptr[i + 1] > ptr[i]){
                info.age_ += 1;
        }
    }
    result.emplace_back(info);
}

Except of these part, everything are the same as before.

2. Make codes easier to reuse

It is a pain to maintain similar codes with minor difference, in order to alleviate the prices of maintenance, I create a generic predictor as a template class with three policies, implement the face recognition and age/gender classification with this generic predictor.

template<typename Return, typename ProcessFeature, typename ImageConvert = dlib_mat_to_separate_rgb>
class generic_predictor
{
/*please check the details on github*/
}

We could use it like to create age gender predictor as following

struct predict_age_gender_functor
{
    std::vector<insight_age_gender_info>
    operator()(const mxnet::cpp::NDArray &features, size_t batch_size) const
    {
        std::vector<insight_age_gender_info> result;
        int constexpr features_size = 202;
        for(size_t i = 0; i != batch_size; ++i){
            auto const *ptr = features.GetData() + i * features_size;
            insight_age_gender_info info;
            info.gender_ = ptr[0] > ptr[1] ? gender_info::female_ : gender_info::male_;
            for(int i = 2; i < features_size; i += 2){
                if(ptr[i + 1] > ptr[i]){
                    info.age_ += 1;
                }
            }
            result.emplace_back(info);
        }
        return result;
    }
 };
 
using insight_age_gender_predict = mxnet_aux::generic_predictor<insight_age_gender_info, predict_age_gender_functor>;

Please check github if you want to know the implementation details.

3. Summary

Gender prediction works very well, unfortunately age predictions is far from ideal. If we could obtain huge data set, which contain the face of the same person with different range of ages, expression, angles and the number of races are not super imbalance, accuracy of age accuracy may improve very much, but a huge data set like this is very hard to collect.

The source codes could find on github.

Face recognition with mxnet, dlib and opencv

2019-02-18T04:21:00.001-08:00

   In this post I will show you how to implement an industrial level, portable face recognition application with a small, reuseable example, without relying on any commercial library(except of Qt5, unless the module I use in this example support LGPL license).

    Before deep learning become main stream technology in computer vision fields, 2D face recognition only works well under strict environments, this make it an impractical technology.

    Thanks to the contributions of open source communities like dlib, opencv and mxnet, today, high accuracy 2D face recognition is not a difficult problem anymore.

    Before we start, let us see an interesting example(video_00).

video_00

Although different angles and expressions affect the confidence value a lot, but in most of the time the algorithm still able to find out the most similar faces from 25 faces.

The flow of face recognition on github are composed by 4 critical steps.

pic_00

Detect face by dlib

std::vector<mmod_rect> face_detector::forward_lazy(const cv::Mat &input)
{
    //make sure input image got 3 channels
    CV_Assert(input.channels() == 3);

    //Resize the input image to certain width, 
    //The bigger the face_detect_width_, more 
    //faces could be detected, but will consume
    //more memory, and slower
    if(input.cols != face_detect_width_){
        //resize_cache_ is a simple trick to reduce the
        //number of memory allocation
        double const ratio = face_detect_width_ / 
                             static_cast<double>(input.cols);
        cv::resize(input, resize_cache_, {}, ratio, ratio);
    }else{
        resize_cache_ = input;
    }

    //1. convert cv::Mat to dlib::matrix
    //2. Swap bgr channel to rgb
    img_.set_size(resize_cache_.rows, resize_cache_.cols);
    dlib::assign_image(img_, dlib::cv_image<bgr_pixel>(resize_cache_));

    return net_(img_);
}

Face detector of dlib perform very well, you can check the results on their post.

If you want to know the details, please study the example provided by dlib, if you want to know more options, please study the excellent post of Learn Opencv.

Perform face alignment by dlib

We can treat face alignment as a data normalization skills develop for face recognition, usually you would align the faces before training your model, and align the faces when predict, this could help you obtain higher accuracy.

With dlib, face alignment become very simple. Just a few lines of codes.

//rect contain the roi of the face
dlib::matrix<rgb_pixel> face_detector::
get_aligned_face(const mmod_rect &rect)
{
    //Type of pose_model_ is dlib::shape_predictor
    //It return the landmarks of the face
    auto shape = pose_model_(img_, rect);
    matrix<rgb_pixel> face_chip;
    auto const details = 
          get_face_chip_details(shape, face_aligned_size_, 0.25);
    //extract face after aligned from the image
    extract_image_chip(img_, details, face_chip);
    return face_chip;
}

Extract features of face by mxnet

This section will need to load the model from mxnet, unlike dlib or opencv, the c++ api of mxnet is more complicated, if you do not know how to load the model of mxnet yet, I recommend you study this post.

This section is the most complicated part, because it contains three main points.

1. Extract the features of faces.
2. Perform batch processing.
3. Convert aligned face of dlib(store as matrix<rgb_pixel>) to a memory continuous float array with
the format expected by the mxnet model.

A.Load the model with variable batch size

In order to load the model which support variable batch size, all we need to do is add one more argument to the argument list.

std::unique_ptr<Executor> create_executor(const std::string &model_params,
                                          const std::string &model_symbols,
                                          const Context &context,
                                          const Shape &input_shape)
{    
    Symbol net;
    std::map<std::string, NDArray> args, auxs;
    load_check_point(model_params, model_symbols, &net, 
                     &args, &auxs, context);

    //if "data" throw exception, try another key, like "data0"
    args["data"] = NDArray(input_shape, context, false);
    //we only need to add the new key if batch size larger than 1
    if(input_shape[0] > 1){
        //all we need is the new key "data1"
        args["data1"] = NDArray(Shape(1), context, false);
    }

    std::unique_ptr<Executor> executor;
    executor.reset(net.SimpleBind(context, 
                                  args, 
                                  std::map<std::string, NDArray>(),
                                  std::map<std::string, OpReqType>(), 
                                  auxs));

    return executor;
}

B.Convert aligned face to array

Unlike the example of yolo v3, the input data of deepsight need more preprocess steps before you can feed the aligned face into the model. Instead of arranged the pixels as rgb order, you need to split each channels of the face into separate "page". Simply put, instead of arrange the pixels as

R1G1B1R2G2B2......RnGnBn

We should arrange the pixels as

R1R2....RNG1G2......GNB1B2.....BN

//using dlib_const_images_ptr = std::vector<matrix<rgb_pixel> const*>;
void face_key_extractor::
dlib_matrix_to_float_array(dlib_const_images_ptr const &rgb_image)
{
    size_t index = 0;
    for(size_t i = 0; i != rgb_image.size(); ++i){
        for(size_t ch = 0; ch != 3; ++ch){
            for(long row = 0; row != rgb_image[i]->nr(); ++row){
                for(long col = 0; col != rgb_image[i]->nc(); ++col){
                    auto const &pix = (*rgb_image[i])(row, col);
                    switch(ch){
                    case 0:
                        //image_vector_ is a std::vector<float>, resized in 
                        //constructor.

                        //image_vector_.resize(params_->shape_.Size())
                        //params_->shape_.Size() return total number 
                        //of elements in the tenso
                        image_vector_[index++] = pix.red;
                        break;
                    case 1:
                        image_vector_[index++] = pix.green;
                        break;
                    case 2:
                        image_vector_[index++] = pix.blue;
                        break;
                    default:
                        break;
                    }
                }
            }
        }
    }
}

C.Forward aligned faces with variable batch size

There are two things you must know before we dive into the source codes.

1. To avoid memory reallocation, we must allocate memory for the largest possible batch size and reuse that same memory when batch size is smaller.
2. The batch size of the float array input to the model must be the same as the largest possible batch size

//input contains all of the aligned faces detected from the image
std::vector<face_key> face_key_extractor::
forward(const std::vector<dlib::matrix<dlib::rgb_pixel> > &input)
{
    if(input.empty()){
        return {};
    }

    //Size of the input may not divisible by batch size
    //That is why we need some preprocess job to make sure
    //features of every faces are extracted
    auto const forward_count = static_cast<size_t>(
         std::ceil(input.size() / static_cast<float>(params_->shape_[0])));
    std::vector<face_key> result;
    for(size_t i = 0, index = 0; i != forward_count; ++i){
        dlib_const_images_ptr faces;
        for(size_t j = 0; 
            j != params_->shape_[0] && index < input.size(); ++j){
            faces.emplace_back(&input[index++]);
        }
        dlib_matrix_to_float_array(faces);
        auto features = 
             forward(image_vector_, static_cast<size_t>(faces.size()));
        std::move(std::begin(features), std::end(features), 
                  std::back_inserter(result));
    }

    return result;
}

D.Extract features of faces

std::vector<face_key> face_key_extractor::
forward(const std::vector<float> &input, size_t batch_size)
{
    executor_->arg_dict()["data"].SyncCopyFromCPU(input.data(), 
                                                  input.size());
    //data1 tell the executor, how many face(s) need to process
    executor_->arg_dict()["data1"] = batch_size;
    executor_->Forward(false);
    std::vector<face_key> result;
    if(!executor_->outputs.empty()){
        //shape of features is [batch_size, 512]
        auto features = executor_->outputs[0].Copy(Context(kCPU, 0));
        Shape const shape(1, step_per_feature);
        features.WaitToRead();
        //split features into and array
        for(size_t i = 0; i != batch_size; ++i){
            //step_per_feature is 512, memory 
            //of NDArray is continuous make things easier
            NDArray feature(features.GetData() + i * step_per_feature, 
                            shape, Context(kCPU, 0));
            result.emplace_back(std::move(feature));
        }
        return result;
    }

    return result;
}

Find most similar faces from database

    I use cosine similarity to compare similarity in this small example, it is quite easy with the help of

opencv.

A.Similarity compare

double face_key::similarity(const face_key &input) const
{
    CV_Assert(key_.GetData() != nullptr && 
              input.key_.GetData() != nullptr);

    cv::Mat_<float> const key1(1, 512, 
                               const_cast<float*>(input.key_.GetData()), 0);
    cv::Mat_<float> const key2(1, 512, 
                               const_cast<float*>(key_.GetData()), 0);
    auto const denominator = std::sqrt(key1.dot(key1) * key2.dot(key2));
    if(denominator != 0.0){
        return key2.dot(key1) / denominator;
    }

    return 0;
}

B.Find most similar face

Find the most similar face is really easy, all we need to do is compare the features stored in the array one by one and return the one with the highest confidence.

//for simplicity, I put struct at here in this blog
struct id_info
{
   double confident_ = -1.0;
   std::string id_;
};

struct face_info
{
   face_key key_;
   std::string id_;
};

face_reg_db::id_info face_reg_db::
find_most_similar_face(const face_key &input) const
{
    id_info result;
    //type of face_keys_ is std::vector<face_info>
    for(size_t i = 0; i != face_keys_.size(); ++i){
        auto const confident = 
             face_keys_[i].key_.similarity(input);
        if(confident > result.confident_){
            result.confident_ = confident;
            result.id_ = face_keys_[i].id_;
        }
    }

    return result;
}

Summary

In today's post, I show you the most critical parts of face recognize with opencv, dlib and mxnet. I believe this is a great starting point if you want to build a high quality face recognition app by c++.

Real world applications are much more complicated than this small example since they always need to support more features and required to be efficient, but no matter how complex they are, the main flow of the 2D face recognition are almost the same as this post show you.

Use person re-id model to identify person do not exist in the data set by c++

2019-02-05T00:10:00.002-08:00

Person re-id comparing two images of person captured under different conditions, recently this field achieve big improvement with the helps of deep learning, but is it good enough to identify person do not exist in the data set? This is the question I want to figure out in this post.

Let me show you an example before we start.

The results are not perfect yet, let us hope that better techniques and larger data sets would release in the future. The algorithm itself is very easy, main flows are drawn in pic00

pic00

For those who wants to read the source codes directly, please go to github, in order to compile it, you will need opencv3.4.2 and mxnet. You can pick every build tools you like, I use qmake in this example. If you want to know how to reproduce the results, please read on.

1. Download pretrained model of person re-id

Download pretrained model from here. Precision and mAP of this model perform on market1501 are

top1:0.923100
top5:0.972090
top10:0.984264
mAP:0.797564

If you want to train it by yourself, please follow the guide of gluoncv, it is quite easy.

2. Download pretrained model of yolo v3

Download pretrained model from here. This is the model converted from the pretrained model of gluoncv.

3. Detect person from video by yolo v3

Before we perform person re-id, we need to detect person from the video, yolo v3 works well for this task, you could find more details in this blog. It show you how to load the models trained by gluoncv(or mxnet) too, you will need that skills to load the model of person re-id too.

4. Extract features of person

After we find out bounding boxes of the persons, we need to extract the features of the persons, this could be done by mxnet without much issues.

cv::Mat_<float> person_feautres_extractor::get_features(const cv::Mat &input)
{
    //convert cv::Mat to ndarray
    auto data = to_ndarray_->convert(input);
    data.CopyTo(&executor_->arg_dict()["data"]);
    executor_->Forward(false);

    cv::Mat_<float> result(1, 2048);
    if(!executor_->outputs.empty()){
        //copy data to cpu by synchronize api since

        //Forward api of mxnet is async
        executor_->outputs[0].SyncCopyToCPU(result.ptr<float>(), 2048);
    }

    return result;
}

5. Find out most similar persons from the features pool

    I use cosine similarity to compare two features in this experiment.

float cosine_similarity:: 

compare_feature(const cv::Mat_<float> &lhs, const cv::Mat_<float> &rhs)
{
    cv::multiply(lhs, rhs, numerator_);
    cv::pow(lhs, 2, lhs_pow_);
    cv::pow(rhs, 2, rhs_pow_);

    auto const numerator = cv::sum(numerator_)[0];
    auto const denom_lhs = cv::sum(lhs_pow_)[0];
    auto const denom_rhs = cv::sum(rhs_pow_)[0];

    auto const denominator = std::sqrt(denom_lhs * denom_rhs);

    return static_cast<float>(denominator != 0.0 ? numerator / 
                              denominator : 0.0);
}

  Then find out the most similar features in the db, return the id in the db if similarity value greater

than threshold,  else create a new id and return it.

std::vector<visitor_identify::visitor_info> visitor_identify:: 
detect_and_identify_visitors(const cv::Mat &input)
{
    //detect persons in the input
    obj_det_->forward(input);
    auto const input_size = cv::Size(input.cols, input.rows);
    auto const detect_results = obj_filter_->filter(obj_det_->get_outputs(), input_size );
    
    std::vector<visitor_info> result;
    for(auto const &det : detect_results){
       //extract features from the person
       auto const feature = 
              feature_extract_->get_features(input(det.roi_).clone());
       //find most similar features in the database
       auto const id_info = db_->find_most_similar_id(feature);
       visitor_info vinfo;
       vinfo.roi_ = det.roi_;
       //if the confident(similarity) of the most similar features 

       //were greather than the threshold
       //return the id found in the db, else add a new id and return it
       if(id_info.confident_ > re_id_threshold_){
           vinfo.id_ = id_info.id_;
           vinfo.confidence_ = id_info.confident_;
       }else{
           auto const new_id = db_->add_new_id(feature);
           vinfo.id_ = new_id;
           vinfo.confidence_ = 1.0f;
       }
       result.emplace_back(std::move(vinfo));
    }

    return result;
}

Deep learning 12-Train a detector based on yolo v3(by gluoncv) by custom data

2019-01-13T00:47:00.002-08:00

GluonCV come with lots of useful pretrained model for object detection, including ssd, yolo v3 and faster-rcnn. Their website come with an example to show you how to fine tune your own data set with ssd, but they do not show us how to do it with yolo v3. If you were like me, struggling in training your custom data with yolo v3, this post may ease your pain since I already modify the script to help you train your custom data.

1.Select a tool to draw your bounding box

I use labelImg for my purpose, easy to install and use on windows and ubuntu.

2.Convert the xml files generated by labelImg to lst format

I write some small classes to perform the conversion task, you can find them on github, if you cannot compile it, please open an issue at github.You don't need opencv and mxnet for this task after all.

3.Convert lst file to rec format

Follow the instructions at here, study how to use im2rec.py should be enough.

4. Adjust the train_yolo3.py, make it able to read file with rec format

I do this part for you already, you can download the script(train_yolo3_custom.py) from github. Before you use that, you will need to

Copy voc_detection.py on github
Change the file name to voc_detection_2.py
Move it to the folder of gluoncv.utils.metrics(mine is C:\my_folder\Anaconda3\Lib\site-packages\gluoncv\utils\metrics)
Change the codes from from gluoncv.utils.metrics.voc_detection import VOC07MApMetric to from gluoncv.utils.metrics.voc_detection_2 import VOC07MApMetric

This is because the voc_detection.py Anaconda on windows got bug, if your voc_detection.py is fine, you can omit this step and change from gluoncv.utils.metrics.voc_detection_2 to import VOC07MApMetric gluoncv.utils.metrics.voc_detection import VOC07MApMetric

You can prefer to install nightly release too if you want to save some troubles.

5.Enter command to train your data

I add a few command line options in this script, they are

--train_dataset : Location of the rec file for training
--validate_dataset: Location of the rec file validate
--pretrained: If you enter this, it will use the pretrained weights of coco dataset, else only use the pretrained weights of imageNet.
--classes_list : Location of the file with the name of classes. Every line present one class, each class should match with their own id. Example :

pic00

ID of face is 0 so it is put at line 0, ID of person is 1 so it is put at line 1

Example : python train_yolo3_custom.py --epochs 20 --lr 0.0001 --train_dataset face_person.rec --validate_dataset face_person.rec --classes_list face_person_list.txt --batch-size 3 --val-interval 5 --mixup

6.Tips

1. If you do not enter --no-random-shape, you better make your learning rate lower(ex : 0.0001 instead of 0.001), else it is very easy to explode(loss become nan).
2. Not every dataset works better with random-shape, run a few epoch(ex : 5) with smaller data(ex , 300~400 images) to find out which parameters works well.
3. Enable random-shape will eat much more ram, without it I can set my batch-size as 8, with it I could only set my batch-size as 3.

7. Measure performance

In order to measure the performance, we need a test set,unfortunately there do not exist a test set which designed for human and face detection, therefore I pick two data sets to measure the performance of the trained model. FDDB for face detection(the label of FDDB is more like head rather than face), Pascal Voc for human detection. You can find the validate_yolo3.py at github.

The model I used train with 40 epoch, the mAP of this on model on the training set is close to 0.9. Both of the experiments are based on IOU = 0.5.

7.1 Performance of face detection

mAP close to 1.0 when IOU is 0.5, this looks too good for real, let us check the inference results by our eyes to find out what is happening. Following images are inference with input-shape as 320 with the model trained with 40 epochs.

pic01(2002/08/11/big/img_534.jpg)

pic02(2002/08/11/big/img_558.jpg)

pic03(2002/08/11/big/img_570.jpg)

pic04(2002/08/11/big/img_58.jpg)

pic05(2002/08/11/big/img_726.jpg)

pic06(2002/08/11/big/img_752.jpg)

pic07(2002/08/11/big/img_478.jpg)

pic08(2002/08/11/big/img_492.jpg)

pic09(2002/08/11/big/img_496.jpg)

From these pictures(pic01~pic09), we know the model works quite well.

Usually mAP of test would not higher than training set, either this test set is far too easy than the training set, or this detector is overfit to this test set. No matter what, this test set is not good enough to measure the performance of the face detector.

7.2 Performance of person detection

Unlike face detector, mAP of person detector only got 0.583mAP on the images listed by person_val.txt(I only apply on the images contain person), there are still a big room to improve the accuracy.

Adding more data may improve the performance since this test results tell us this model got high variance, in order to find out what kind of data we should add, one of the solution is study the mis-classify or the person cannot detected by eyes, then write down the reasons.

pic10(2008_000003.jpg)

pic11(2008_000032.jpg)

pic12(2008_000051.jpg)

pic13(2008_000082.jpg)

pic14(2008_000138.jpg)

After we gather the data, we can create a table, list out the name of the images and describe the errors(pic15, a small example).

pic15

With the helps of error analysis like this, we could find out which part we should put more focus into, and what kind of data we should collect. From the experiments we can find out accuracy of yolo v3 is very high, although recall got lots of space to improve.

8. Model and data

You can find the model and data at mega. I do not put the data with images but the annotations only, you need to download the images by yourself(since I worry this may have legal issues if I publish the data). Not only added the bounding boxes for person, I also adjust the bounding boxes of faces a lot, original bounding boxes provided by kaggle are more like designed for "head detector" rather than "face detector".

9. Conclusion

    This detector got a lot of space to improve, especially the mAP of human, but it will take a lot of times to do it so I decide to stop at here. My annotations of the humans, you can find a lot of them are overlap with other person a lot, bounding boxes without overlap with the person may help the models detect more persons.

    You can use the annotations and model as your free will, do me a favor by reference this site if you do use them, thanks.

    The source codes can find at github.

Person detection(Yolo v3) with the helps of mxnet, able to run on gpu/cpu

2018-10-04T05:04:00.001-07:00

In this post I will show you how to do object detection with the helps of the cpp-package of mxnet. Why do I introduce mxnet? Because following advantages make it a decent library for standalone project development

1. It is open source and royalty free
2. Got decent support for GPU/CPU
3. Scaling efficiently to multiple GPU and machines
4. Support cpp api, which means you do not need to ask the users to install python environment , shipped the source codes in order to run your apps
5. mxnet support many platforms, including windows, linux, mac, aws, android, ios
6. It got a lot of pre-trained models
7. MMDNN support mxnet, which mean we can convert the models trained by different libraries to mxnet(although not all of the models could be converted).

Step 1 : Download model and convert it to the format can load by cpp package

1. Install anaconda(the version come with python3)
2. Install mxnet from the terminal of anaconda
3. Install gluon-cv from the terminal of anaconda
4. Download model and convert it by following scripts

import gluoncv as gcv
from gluoncv.utils import export_block

net = gcv.model_zoo.get_model('yolo3_darknet53_coco', pretrained=True)
export_block('yolo3_darknet53_coco', net)

Step 2 : Load the models after convert

void load_check_point(std::string const &model_params,
                      std::string const &model_symbol,
                      Symbol *symbol,
                      std::map<std::string, NDArray> *arg_params,
                      std::map<std::string, NDArray> *aux_params,
                      Context const &ctx)
{        
    Symbol new_symbol = Symbol::Load(model_symbol);
    std::map<std::string, NDArray> params = NDArray::LoadToMap(model_params);
    std::map<std::string, NDArray> args;
    std::map<std::string, NDArray> auxs;
    for (auto iter : params) {
        std::string type = iter.first.substr(0, 4);
        std::string name = iter.first.substr(4);
        if (type == "arg:")
            args[name] = iter.second.Copy(ctx);
        else if (type == "aux:")
            auxs[name] = iter.second.Copy(ctx);
        else
            continue;
    }

    *symbol = new_symbol;
    *arg_params = args;
    *aux_params = auxs;
}

You could use the load_check_point function as following

    Symbol net;
    std::map<std::string, NDArray> args, auxs;
    load_check_point(model_params, model_symbols, &net, &args, &auxs, context);

    //The shape of the input data must be the same, if you need different size,
    //you could rebind the Executor or create a pool of Executor.
    //In order to create input layer of the Executor, I make a dummy NDArray.
    //The value of the "data" could be change later
    args["data"] = NDArray(Shape(1, static_cast<unsigned>(input_size.height),
                                 static_cast<unsigned>(input_size.width), 3), context);
    executor_.reset(net.SimpleBind(context, args, std::map<std::string, NDArray>(),
                                   std::map<std::string, OpReqType>(), auxs));

model_params is the location of the weights(ex : yolo3_darknet53_coco.params), model_symbols(ex : yolo3_darknet53_coco.json) is the location of the symbols saved as json.

Step 3: Convert image format

Before we feed the image into the executor of mxnet, we need to convert them.

NDArray cvmat_to_ndarray(cv::Mat const &bgr_image, Context const &ctx)
{    
    cv::Mat rgb_image;
    cv::cvtColor(bgr_image, rgb_image, cv::COLOR_BGR2RGB); 
    rgb_image.convertTo(rgb_image, CV_32FC3);

    //This api copy the data of rgb_image into NDArray. As far as I know,

    //opencv guarantee continuous of cv::Mat unless it is sub matrix of cv::Mat
    return NDArray(rgb_image.ptr<float>(),
                   Shape(1, static_cast<unsigned>(rgb_image.rows), static_cast<unsigned>(rgb_image.cols), 3),
                   ctx);
}

Step 4 : Perform object detection on video

void object_detector::forward(const cv::Mat &input)
{
    //By default, input_size_.height equal to 256 input_size_.width equal to 320.
    //Yolo v3 has a limitation, width and height of the image must be divided by 32.
    if(input.rows != input_size_.height || input.cols != input_size_.width){
        cv::resize(input, resize_img_, input_size_);
    }else{
        resize_img_ = input;
    }

    auto data = cvmat_to_ndarray(resize_img_, *context_);
    //Copy the data of the image to the "data"
    data.CopyTo(&executor_->arg_dict()["data"]);    
    //Forward is an async api.
    executor_->Forward(false);
}

Step 5 : Draw bounding boxes on image

void plot_object_detector_bboxes::plot(cv::Mat &inout,
                                       std::vector<mxnet::cpp::NDArray> const &predict_results,
                                       bool normalize)
{
    using namespace mxnet::cpp;

    //1. predict_results get from the output of Executor(executor_->outputs)
    //2. Must Set Context as cpu because we need process data by cpu later
    auto labels = predict_results[0].Copy(Context(kCPU, 0));
    auto scores = predict_results[1].Copy(Context(kCPU, 0));
    auto bboxes = predict_results[2].Copy(Context(kCPU, 0));
    //1. Should call wait because Forward api of Executor is async
    //2. scores and labels could treat as one dimension array
    //3. BBoxes can treat as 2 dimensions array
    bboxes.WaitToRead();
    scores.WaitToRead();
    labels.WaitToRead();

    size_t const num = bboxes.GetShape()[1];
    for(size_t i = 0; i < num; ++i) {
        float const score = scores.At(0, 0, i);
        if (score < thresh_) break;

        size_t const cls_id = static_cast<size_t>(labels.At(0, 0, i));
        auto const color = colors_[cls_id];
        //pt1 : top left; pt2 : bottom right
        cv::Point pt1, pt2;
        //get_points perform normalization
        std::tie(pt1, pt2) = normalize_points(bboxes.At(0, i, 0), bboxes.At(0, i, 1),
                                              bboxes.At(0, i, 2), bboxes.At(0, i, 3),
                                              normalize, cv::Size(inout.cols, inout.rows));
        cv::rectangle(inout, pt1, pt2, color, 2);

        std::string txt;
        if (labels_.size() > cls_id) {
            txt += labels_[cls_id];
        }
        std::stringstream ss;
        ss << std::fixed << std::setprecision(3) << score;
        txt += " " + ss.str();
        put_label(inout, txt, pt1, color);
    }
}

I only mentioned the key points in this post, if you want to study the details, please check it on github.

Install cpp package of mxnet on windows 10, with cuda and opencv

2018-09-29T20:26:00.000-07:00

Compile and install cpp-package of mxnet on windows 10 is a little bit tricky when I writing this post.

The install page of mxnet tell us almost everything we need to know, but there are something left behind haven't wrote into the pages yet, today I would like to write down the pitfalls I met and share with you how do I solved them.

Pitfalls

1. Remember to download the mingw dll from the openBLAS download page, put those dll into some place could be found by the system, else you wouldn't be able to generate op.h for cpp-package.

2. Install Anaconda(recommended) or the python package on the mxnet install page on your , machines and register the path(the path with python.exe), else you wouldn't be able to generate op.h for cpp-package.

3. Compile the project without cpp-package first, else you may not able to generate op.h.

Cmake command for reference, change it to suit your own need

a : Run these command first

cmake -G "Visual Studio 14 2015 Win64" ^
-DCUDA_USE_STATIC_CUDA_RUNTIME=ON ^
-DENABLE_CUDA_RTC=ON ^
-DMKLDNN_VERBOSE=ON ^
-DUSE_CUDA=ON ^
-DUSE_CUDNN=ON ^
-DUSE_F16C=ON ^
-DUSE_GPERFTOOLS=ON ^
-DUSE_JEMALLOC=OFF ^
-DUSE_LAPACK=ON ^
-DUSE_MKLDNN=ON ^
-DUSE_MKLML_MKL=ON ^
-DUSE_MKL_IF_AVAILABLE=ON ^
-DUSE_MXNET_LIB_NAMING=ON ^
-DUSE_OPENCV=ON ^
-DUSE_OPENMP=ON ^
-DUSE_PROFILER=ON ^
-DUSE_SSE=ON ^
-DWITH_EXAMPLE=ON ^
-DWITH_TEST=ON ^
-DCMAKE_INSTALL_PREFIX=install ..

cmake --build . --config Release

b : Run these command, with cpp package on

cmake -G "Visual Studio 14 2015 Win64" ^
-DCUDA_USE_STATIC_CUDA_RUNTIME=ON ^
-DENABLE_CUDA_RTC=ON ^
-DMKLDNN_VERBOSE=ON ^
-DUSE_CUDA=ON ^
-DUSE_CUDNN=ON ^
-DUSE_F16C=ON ^
-DUSE_GPERFTOOLS=ON ^
-DUSE_CPP_PACKAGE=ON ^
-DUSE_LAPACK=ON ^
-DUSE_MKLDNN=ON ^
-DUSE_MKLML_MKL=ON ^
-DUSE_MKL_IF_AVAILABLE=ON ^
-DUSE_MXNET_LIB_NAMING=ON ^
-DUSE_OPENCV=ON ^
-DUSE_OPENMP=ON ^
-DUSE_PROFILER=ON ^
-DUSE_SSE=ON ^
-DWITH_EXAMPLE=ON ^
-DWITH_TEST=ON ^
-DCMAKE_INSTALL_PREFIX=install ..

cmake --build . --config Release --target INSTALL

4. After you compile and install the libs, you may find out you missed some headers in the install
path, I missed nnvm and mxnet-cpp. What I did is copy the folders to the install folder.

Hope these could help someone who is pulling their head when compile cpp-package of mxnet on windows 10.

Qt and computer vision 2 : Build a simple computer vision application with Qt5 and opencv3

2018-08-04T20:08:00.001-07:00

In this post, I will show you how to build a dead simple computer vision application with Qt Creator and opencv3 step by step.

Install opencv3.4.1(or newer version) on windows

0. Go to source forge, download prebuild binary of opencv3.4.2. or you could build it by yourself

1. Double click on the opencv-3.4.2-vc14_vc15.exe and extract it to your favorite folder(pic_00)

Pic00

2. Open the folder you extract(assume you extract it to /your_path/opencv_3_4_2). You will see a folder call "opencv" .

Pic01

3. Open your QtCreator you installed.

Create a new project by Qt Creator

4. Create a new project

Pic02

5. You will see a lot of options, for simplicity, let us choose "Application->Non-Qt project->Plain c++ application". This tell the QtCreator, we want to create a c++ program without using any Qt components.

Pic03

6. Enter the path of the folder and name of the project.

Pic04

7. Click the Next button and use qmake as your build system by now(you can prefer cmake too, but I always prefer qmake when I am working with Qt).

8. You will see a page ask you to select your kits, kits is a tool QtCreator use to group different settings like device, compiler, Qt version etc.

Pic05

9. Click on next, QtCreator may ask you want to add to version control or not, for simplicity, select None. Click on finish.

10. If you see a screen like this, that means you are success.

Pic06

11. Write codes to read an image by opencv

#include <iostream>

#include <opencv2/core.hpp>
#include <opencv2/highgui.hpp>

//propose of namespace are
//1. Decrease the chance of name collison
//2. Help you organizes your codes into logical groups
//Without declaring using namespace std, everytime when you are using
//the classes, functions in the namespace, you have to call with the
//prefix "std::".
using namespace cv;
using namespace std;

/**
 * main function is the global, designated start function of c++.
 * @param argc Number of the parameters of command line
 * @param argv Content of the parameters of command line.
 * @return any integer within the range of int, meaning of the return value is
 * defined by the users
 */
int main(int argc, char *argv[])
{
    if(argc != 2){
        cout<<"Run this example by invoking it like this: "<<endl;
        cout<<"./step_02.exe lena.jpg"<<endl;
        cout<<endl;
        return -1;
    }

    //If you execute by Ctrl+R, argv[0] == "step_02.exe", argv[1] == lena.jpg
    cout<<argv[0]<<","<<argv[1]<<endl;

    //Open the image
    auto const img = imread(argv[1]);
    if(!img.empty()){
        imshow("img", img); //Show the image on screen
        waitKey(); //Do not exist the program until users press a key
    }else{
        cout<<"cannot open image:"<<argv[1]<<endl;

        return -1;
    }

    return 0; //usually we return 0 if everything are normal
}

How to compile and link the opencv lib with the help of Qt Creator and qmake

Before you can execute the app, you will need to compile and link to the libraries of opencv. Let me show you how to do it. If you missed steps a and b, you will see a lot of error messages like Pic07 or Pic09 show.

12. Tell the compiler, where are the header files, this could be done by adding following command in the step_02.pro.

INCLUDEPATH += your_install_path_of_opencv/opencv/opencv_3_4_2/opencv/build/include

The compiler will tell you it can't locate the header files if you do not add this line(see Pic07).

Pic07

If your INCLUDEPATH is correct, QtCreator should be able to find the headers and use the auto complete to help you type less words(Pic08).

Pic08

13. Tell linker which libraries of the opencv it should link to by following command.

LIBS += your_install_path_of_opencv/opencv/opencv_3_4_2/opencv/build/x64/vc14/lib/opencv_world342.lib

Without this step, you will see the errors of "unresolved external symbols"(Pic08).

Pic09

14. Change from debug to release.

Pic10

Click the icon surrounded by the red region and change it from debug to release. Why do we do that? Because

Release mode is much more faster than debug mode in many cases
The library we link to is build as release library, do not mixed debug and release libraries in your project unless you are asking for trouble

I will introduce more details of compile, link, release, debug in the future, for now, just click Ctrl+B to compile and link the app.

Execute the app

After we compile and link the app, we already have the exe in the folder(in the folder show at Pic11).

Pic11

We are almost done now, just few more steps the app could up and run.

13. Copy the dll opencv_world342.dll and opencv_ffmpeg342_64.dll(they place in /your_path/opencv/opencv_3_4_2/opencv/build/bin) into a new folder(we called it global_dll).

14. Add the path of this folder into system path. Without step 13 and 14, the exe wouldn't be able to find the dll when we execute the app, and you may see following error when you execute the app from command line(Pic12). I recommend you use the tool--Rapid environment editor(Pic13) to edit your path on windows.

Pic12

Pic13

15. Add command line argument in the QtCreator, without it, the app do not know where is the image when you click Ctrl+R to execute the program.

Pic14.jpg

16. If success, you should see the app open an image specify from the command line arguments list(Pic15).

Pic15

These are easy but could be annoying at first. I hope this post could leverage your frustration. You can find the source codes located at github.

Qt and computer vision 1 : Setup environment of Qt5 on windows step by step

2018-04-22T09:42:00.002-07:00

Long time haven't updated my blog, today rather than write a newer, advanced deep learning topics like "Modern way to estimate homography matrix(by lightweight cnn)" or "Let us create a semantic segmentation model by PyTorch", I prefer to start a series of topics for new comers who struggling to build a computer vision app by c++. I hope my posts could help more people find out use c++ to develop application could as easy as another "much easier to use languages"(ex : python).

Rather than introduce you most of the features of Qt and opencv like other books did, these topics will introduce the subsets of Qt which could help us develop decent computer vision application step by step.

c++ is as easy to use as python, really?

Many programmers may found this nonsense, but my own experience told me it is not, because I never found languages like python, java or c# are "much easier to use compare with c++". What make our perspective become so different? I think the answers are

1. Know how to use c++ effectively.
2. There exist great libraries for the tasks(ex : Qt, boost, opencv, dlib, spdlog etc).

As long as these two conditions are satisfy, I believe many programmers will have the same conclusion as mine. I will try my best to help you learn how to develop easy to maintain application by c++ in these series, show you how to solve those "small yet annoying issues" which may scare away many new comers.

Install visual c++ 2015

Some of you may ask "why 2015 but not 2017"? Because when I am writing this post, cuda do not have decent support on visual c++ 2017 or mingw yet, cuda is very important to computer vision app, especially when deep learning take over many computer vision tasks today.

1. Go to this page, click on the download button of visual studio 2015.

2. Download visual studio 2015 community(you may need to open an account before you can enter this page)

3. Double click on the exe "en_visual_studio_community_2015_with_update_3_x86_x64_web_installer_xxxx" and wait a few minutes.

4. Install visual c++ tool box as following shown, make sure you select all of them.

Install Qt5 on windows

1. Go to the download page of Qt
2. Select open source version of Qt

3. Click download button, wait until qt-unified-windows downloaded.

4. Double click on the installer, click next->skip->next

5. Select the path you want to install Qt

6. Select the version of Qt you want to install, every version of Qt(Qt5.x) have a lot of binary files to download, only select the one you need. We prefer to install Qt5.9.5 at here. Why Qt5.9.5? Because Qt5.9 is a long term support version of Qt, in theory long term support should be more stable.

7. Click next and install.

Test Qt5 installed or not

1. Open QtCreator and run an example. Go to your install path(ex: C:/Qt/3rdLibs/Qt), navigate to your_install_path/Tools/QtCreator/bin and double click on the qtcreator.exe.

2. Select Welcome->Example->Qt Quick Controls 2 - Gallery

3. Click on the example, it may pop out a message box to ask you some questions, you can click on yes or no.

4. Every example you open would pop out help page like this, keep it or not is your choices, sometimes they are helpful.

5. First, select the version of Qt you want to use(surrounded by red bounding box). Second, keep the shadow build option on(surrounded by green bounding box), why keep it on? Because shadow build could help you separate your source codes and build binary. Third select you want to build your binary as debug or release version(surrounded by blue bounding box). Usually we cannot mix debug/release libraries together, I will open another topic to discuss the benefits of debug/release, explain what is MT/MD, which one you should choose etc.

6. Click on the run button or Ctrl + R, then you should see the example running on your computer.

Deep learning 11-Modern way to estimate homography matrix(by light weight cnn)

2017-10-10T23:49:00.003-07:00

Today I want to introduce a modern way to estimate relative homography between a pair of images. It is a solution introduced by the paper titled Deep Image Homography Estimation.

Introduction

Q : What is a homography matrix?

A : Homography matrix is a 3x3 transformation matrix that maps the points in one image to the corresponding points in another image.

Q : What are the use of homography matrix?

A : There are many applications depend on the homography matrix, a few of them are image stitching, camera calibration, augmented reality.

Q : How to calculate a homography matrix between two images?

A : Traditional solution are based on two steps, corner estimation and robust homography estimation. In corner detection step, You need at least 4 points correspondences between the two images, usually we would find out these points by matching features like AKAZE, SIFT, SURF. Generally, the features found by those algorithms are over complete, we would prune out the outliers(ex : by RANSAC) after corner estimation. If you are interesting about the whole process describe by c++, take a look at this project.

Q : Traditional solution require heavy computation, do we have another way to obtain homography matrix between two images?

A : This is the question the paper want to answer, instead of design the features by hand, this paper design an algorithm to learn the homography between two images. The biggest selling point of the paper is they turn the homography estimation problem into a machine learning problem.

HomographyNet

This paper use VGG style CNN to measure the homography matrix between two images, they call it HomographyNet. This model is trained in an end to end fashion, quite simple and neat.

Fig00

HomographyNet come with two versions, classification and regression. Regression network produces eight real value numbers and use L2 loss as the final layer. Classification network use softmax as the final layer and quantize every real values into 21bins. First version has better accuracy, while average accuracy of second version is much worse than first version, it can produce confidences.

Fig01

Fig02

4-Point Homography Parameterization

Instead of using a 3x3 homography matrix as the label(ground truth), this paper use 4-point parameterization as label.

Q : What is 4-point parameterization?

A : 4-point parameterization store the different of 4 corresponding points between two images, Fig03 and Fig04 explain it well.

Fig03

Fig04

Q : Why do they use 4-point parameterization but not 3x3 matrix?

A : Because the 3x3 homography is very difficult to train, the problem is the 3x3 matrix mixing

rotation and translation together, the paper explain why.

The submatrix [H11, H12; H21, H22] represents the rotational terms in the homography., while the vector [H13, H23] is the translational offset. Balancing the rotational and translational terms as part of an optimization problem is difficult.

Deep Image Homography Estimation

Fig05

Data Generation

Q : Training deep convolution neural networks from scratch requires a large amount of data, where could we obtain the data?

A : The paper invent a smart solution to generate nearly unlimited number of labeled training examples. Fig05 summarize the whole process

Fig06

Results

Results of my implementation is outperform the paper, average loss of mine is 2.58, while the paper is 9.2. Largest loss of my model is 19.53. Performance of my model are better than the paper more than 3.5 times(9.2/2.58 = 3.57). What makes the performance improve so much?A few of reasons I could think of are

1. I change the network architectures from vgg like to squeezeNet1.1 like.

2. I do not apply any data augmentation, maybe blurring or occlusion cause the model harder to train.

3. The paper use data augmentation to generate 500000 data for training, but I use 500032 images from imagenet as my training set. I guess this potentially increase variety of the data, the end result is network become easier to train and more robust(but they may not work well for blur or occlusion).

Following are some of the results, the region estimated by the model(red rectangle) is very close to the real regions(blue rectangle).

Fig07

Final thoughts

The results looks great, but this paper do not answer two important questions.

1. The paper only test on synthesis images, do they work on real world images?

2. How should I use the trained model to predict a homography matrix?

I would like to know the answer, if anyone find out, please leave me a message.

Codes and model

As usual, I place my codes at github, model at mega.

If you liked this article, please help others find it by clicking the little g+ icon below. Thanks a lot!

Wrong way to use QThread

2017-09-03T00:37:00.002-07:00

There are two ways to use QThread, first solution is inherit QThread and override run function, the other solution is create a controller. Today I would like to talk about second solution and show you how to misused QThread(general gotcha of QThread).

The most common error I see is calling the function of the worker directly, please do not do that, because in this way, your worker will not work on another thread but the thread you are calling it.

Allow me prove this to you by a small example. I do not separate implementation and declaration in this post because this make the post easier to read.

case 1 : Call by function

1 : let us create a very simple, naive worker, this worker must be an QObject, because we need to move the worker into QThread.

class naive_worker : public QObject
{
    Q_OBJECT
public:
    explicit naive_worker(QObject *obj = nullptr);

    void print_working_thread()
    {
        qDebug()<<QThread::currentThread();
    }
};

2 : create a dead simple gui by QtDesigner. Button "Call by normal function" will call the function "print_working_thread", directly, button "Call by signal and slot" will call the "print_working_thread" by signal and slot, "Print current thread address" will print the address of main thread(gui thread).

3 : Create a controller

class naive_controller : public QObject
{
    Q_OBJECT
public:
    explicit naive_controller(QObject *parent = nullptr):
    QObject(parent),
    worker_(new naive_worker)
    {
        //move your worker to thread, so Qt know how to handle it
        //your worker should not have a parent before calling
        //moveToThread
        worker_->moveToThread(&thread_);

        connect(&thread_, &QThread::finished, worker_, &QObject::deleteLater);

        //this connection is very important, in order to make worker work on the thread
        //we move to, we have to call it by the mechanism of signal and slot
        connect(this, &naive_controller::print_working_thread_by_signal_and_slot,
                worker_, &naive_worker::print_working_thread);
        thread_.start();
    }

    ~naive_controller()
    {
        thread_.wait();
        thread_.quit();
    }

    void print_working_thread_by_normal_call()
    {
        worker_->print_working_thread();
    }

signals:
    void print_working_thread_by_signal_and_slot();

private:
    QThread thread_;
    naive_worker *worker_;
};

4 : Call it by two different functions and compare their address.

class MainWindow : public QMainWindow
{
    Q_OBJECT
public:
    explicit MainWindow(QWidget *parent = nullptr);
    ~MainWindow();

private slots:
    void on_pushButtonPrintCurThread_clicked()
    {
        //this function will be called when 
        //"Print current thread address" is clicked
        qDebug()<<QThread::currentThread();
    }
    void on_pushButtonCallNormalFunc_clicked()
    {
        //this function will be called when
        //"Call by normal function" is clicked
        controller_->print_working_thread_by_normal_call();
    }

    void on_pushButtonCallSignalAndSlot_clicked()
    {
       //this function will be called when
       //"Call by signal and slot" is clicked
       controller_->print_working_thread_by_signal_and_slot();
    }

private:
    naive_controller *controller_;
    naive_worker *worker_;
    Ui::MainWindow *ui;
};

5. Run the app and click the button with following order. "print_working_thread"->"Call by normal function"->"Call by signal and slot" and see what happen. Following are my results

QThread(0x1bd25796020) //call "print_working_thread"
QThread(0x1bd25796020) //call "Call by normal function"
QThread(0x1bd2578bf70) //call "Call by signal and slot"

Apparently, to make our worker run in the QThread we moved to, we have to call it through signal and slot machanism, else it will execute in the same thread. You may ask, this is too complicated, do we have an easier way to spawn a thread? Yes we do, you can try QtConcurrent::run and std::async, they are easier to use compare with QThread(it is a regret that c++17 fail to include future.then) , I use QThread when I need more power, like thread communication, queue operation.

Source codes

Located at github.

Deep learning 10-Let us create a semantic segmentation model(LinkNet) by PyTorch

2017-08-28T22:54:00.000-07:00

Deep learning, in recent years this technique take over many difficult tasks of computer vision, semantic segmentation is one of them. The first segmentation net I implement is LinkNet, it is a fast and accurate segmentation network.

Introduction

Q : What is LinkNet?

A : LinkNet is a convolution neural network designed for semantic segmentation. This network is 10 times faster than SegNet and more accurate.

Q : What is semantic segmentation? Any difference with segmentation?

A : Of course they are difference. Segmentation partition image into several "similar" parts, but you do not know what are those parts presents. On the other hand, semantic segmentation partition the image into different pre-determined labels. Those labels are present as color as the end results. For example, checkout the following images(from camvid).

Q : Semantic segmentation sounds like object detection, are they the same thing?

A : No, they are not, although you may achieve the same goal by both of them.
From the aspect of tech, they use different approach. From the view of end results, semantic segmentation tell you what are those pixels are, but they do not tell you how many instance in your images, object detection show you how many instance in your images by minimal bounding box, but it do not give you delienation of objects. For example, checkout below images(from yolo).

Network architectures

LinkNet paper describe their network architecture with excellent graphs and simple descriptions, following are the figures copy shameless from the paper.

LinkNet adopt encoder-decoder architecture, according to the paper, LinkNet performance or come from adding the output of encoder to the decoder, this help the decoder easier to recover the information. If you want to know the details, please study section 3 of the paper, it is nice writing, very easy to understand.

Q : The paper is easy to read, but they do not explain what is full convolution, could you tell me what that means?

A : Full convolution indicates that the neural network is composed of convolution layers and activation only, without any full connection or pooling layers.

Q : How do they perform down-sampling without pooling layers?

A : Make the stride of convolution as 2 x 2 and do zero padding, if you cannot figure it out why this work, I suggest you create an excel file, write down some data and do some experiment.

Q : Which optimizer work best?

A : According to the paper, rmsprop is the winner, my experiments told me the same thing too, in case you are interesting, below are the graph of training loss. From left to right is rmsprop, adam, sgd. Hyper parameters are

Initial learning rate : adam and rmsprop are 5e-4, sgd is 1e-3
Augmentation : random crop(480,320) and horizontal flip
Normalize : subtract mean(based on imagenet mean value) and divided by 255
Batch size : 16
Epoch : 800
Training examples : 368

The results of adam and rmsprop are very close. Loss of sgd steadily decrease, but it converge very slow even with higher learning rate, maybe higher learning rate would work better for SGD.

Data pre-processing

Almost every computer vision task need you to pre-process your data, segmentation is not an exception, following are my steps.

1 : Convert the color do not exist in the category into void(0, 0, 0)
2 : Convert the color into integer
3 : Zero mean(mean value come from imagenet)

Experiment on camvid

Enough of Q&A, let us have some benchmark and pictures😊.

Performance

Model 1,2,3 all train with same parameters, pre-processing but with different input size when training, they are (128,128), (256,256), (512, 512). When testing, the size of the images are (960,720).

Following are some examples, from left to right is original image, ground truth and predicted image.

Results looks quite good and IoU is much better than the paper, possible reasons are

1 : I augment the data by random crop and horizontal flip, the paper may use another methods or do not perform augmentation at all(?).

2 : My pre-processing are different with the paper

3 : I did not omit void when training

4 : My measurement on IoU is wrong

5 : My model is more complicated than the paper(wrong implementation)

6 : It is overfit

7 : Random shuffle training and testing data create data leakage because many images of camvid
are very similar to each other

Trained models and codes

1 : As usual, located at github.
2 : Model trained with 368 images, 12 labels(include void), random crop (128x128),800 epoch
3 : Model trained with 368 images, 12 labels(include void), random crop (480x320),800 epoch
4 : Model trained with 368 images, 12 labels(include void), random crop (512x512),800 epoch

Miscellaneous

Q : Is it possible to create portable model by PyTorch?

A : It is possible, but not easy. you could check out ONNX and caffe2 if you want to try it. Someone manage to convert pytorch model to caffe model and loaded by opencv dnn. Right now opencv dnn do not support PyTorch but PyTorch. Thanks god opencv dnn can import model trained by torch at ease(right now opencv dnn do not support nngraph).

Q : What are IoU and iIoU in the paper refer to?

A : This page give good definition, although I still can't figure out how to calculate iIoU.

If you liked this article, please help others find it by clicking the little g+ icon below. Thanks a lot!

Deep learning 09-Performance of perceptual losses for super resolution

2017-08-07T00:40:00.003-07:00

Have you ever scratch your head when upscaling low resolution images? I do, because we all know the quality of the images after upscaling degrade. Thanks to the rise of machine learning in recent years, we are able to upscale single image with better results compare with traditional solutions(ex : bilinear, bicubic. You do not need to know what they are except they are apply widely in many products), we call this technique super resolution.

This sound great, but how could we do it?I did not know it either until I study the tutorials of part2 of the marvelous Practical Deep learning for Coders, this course is fantastic to get your feet wet on deep learning.

I will try my best to explain everything with minimal prerequisite knowledge on machine learning and computer vision, however, some knowledge of convolution neural network(cnn) is needed. The course of part1 is excellent if you want to learn cnn in depth. If you are in a hurry, pyimagesearch and medium has a short tutorial about cnn.

What is super resolution and how does it work

Q : What is super resolution

A : Super resolution is a class of technique to enhance the resolution of images or videos.

Q : There are many softwares could help us upscale images, why do we need super resolution?

A : Traditional solutions of upscaling image apply interpolation algorithm on one image only(ex: bilinear or bicubic). In the contrast, super resolution exploit info from another source, either from contiguous frames, from the model trained by machine learning or different scale from one image.

Q : How does super resolution work

A : Super resolution I want to introduce today is based on Perceptual losses for Real-Time style Transfer and Super-Resolution.(please consult wiki if you want to study another type of super resolution). The most interesting part of this solution is it treat super resolution as an image transformation problem(it is a process where an input image is transformed into an output image). This mean we may use the same technique to solve colorization, denoising, depth estimation, semantic segmentation and another tasks(It is not a problem if you do not know what they are).

Q : How do we transformed low resolution image to high resolution image?

A : A picture worth a thousand words.

This network is composed by two components, image transformation network and a loss network. Image transformation network transform low resolution image into high resolution image, while loss network measuring the difference between predicted high resolution image and the true high resolution image

Q : What is the loss network anyway?Why do we use it to measure the loss?

A : Loss network is an image classification network train on imagenet (ex : vgg16, resnet, densenet). We use it to measure the loss because we want our network to better measure perceptual and semantic difference between images. The paper call the loss measure by this loss network perceptual loss.

Q : What makes the loss network able to generate better loss?

A : The loss network can generate better loss because the convolutional neural network trained for image classification have already learned to encode the perceptual and semantic information we want.

Q : The color of the image is different after upscale, how could I fixed it?

A : You could apply histogram matching as the paper mentioned, this should be able to deal with most of the cases.

Q : Any draw back of this algorithm?

A : Of course, nothing is perfect.

1 : Not all of the image work, they may look very ugly after upscale.
2 : The image maybe ice cream to your eyes, but it is not reconstructing the photo exactly but create details based on its training from example images.It is impossible to reconstruct the image with perfect results, because we have no way to retrieve the information did not exist from the beginning.
3 : Color of part of the images change after upscale, even histogram matching cannot fix it.

Q : What is histogram matching?

A : It is a way to make the color distribution of image A looks like image B.

Experiment

All of the experiments use same network architecture and train on 80000 images from imagenet, 2 epoch. From left to right are original image, image upscale 4x by bicubic, image upscale by super resolution by 4x.

The results are not perfect, but this is not the end, super resolution is a hot research topic, every paper is a stepping stone for next algorithm, we will see more and more better, advance techniques pop out in the future.

Sharing trained model and codes

1 : Notebook to transform the imagenet data to training data
2 : Notebook to train and use the super resolution model
3 : Network model with transformation network and loss network, trained on 80000 images

If you liked this article, please help others find it by clicking the little g+ icon below. Thanks a lot!

Deep learning 08--Neural style by Keras

2017-07-19T08:19:00.003-07:00

Today I want to write down how to implement the neural style of the paper A Neural Algorithm of Artistic Style by Keras learn from fast.ai course. You can find the codes located at github.

Before I begin to explain how to do it, I want to mentioned that generate artistic style by deep neural network is different with image classification, we need to learn new concepts and add them into our tool boxes, if you find it hard to understand at the first time you saw it, do not fear, I have the same feeling too. You can ask me the questions or go to fast ai forum.

The paper present an algorithm to generate artistic style image by combine two image together using convolution neural network. Here are examples combine source images(bird, dog, building) with style images like starry , alice and tes_teach. From left to right is style image, source image, image combined by convolution neural network.

Let us begin our journey of the implementation of the algorithm(I assume you know how to install Keras, tensorflow, numpy, cuda and other tools, I recommend using ubuntu16.04.x as your os, this could save you tons of headache when setup your deep learning toolbox).

Step 1 : Import file and modules

 
from PIL import Image

import os

import keras.backend as K
import vgg16_avg

from keras.models import Model
from keras.layers import *
from keras import metrics

from scipy.optimize import fmin_l_bfgs_b
from scipy.misc import imsave

Step 2 : Preprocess our input image

 
#the value of rn_mean is come from image net data set
rn_mean = np.array([123.68, 116.779, 103.939], dtype=np.float32)

#create image close to zero mean and convert rgb channel to bgr channel 
#since the vgg model need bgr channel. ::-1 invert the order of axis 0
preproc = lambda x: (x - rn_mean)[:,:,:,::-1]
#We need to undo the preprocessing before we save it to our hard disk
deproc = lambda x: x[:,:,:,::-1] + rn_mean

Step 3 : Read the source image and style image

Source image is the image you want to apply style on it. Style image is the style you want to apply on the source image.

dpath= os.getcwd() + "/"

#I make the size of content image, style image, generated img
#have the same shape, but this is not mandatory
#since we do not use any full connection layer
def read_img(im_name, shp):
    style_img = Image.open(im_name)
    if len(shp) > 0:
        style_img = style_img.resize((shp[2], shp[1]))
    style_arr = np.array(style_img)    
    #The image read by PIL is three dimensions, but the model
    #need a four dimensions tensor(first dim is batch size)
    style_arr = np.expand_dims(style_arr, 0)
    
    return preproc(style_arr)

content_img_name = "dog"
content_img_arr = read_img(dpath + "img/{}.png".format(content_img_name), [])
content_shp = content_img_arr.shape
style_img_arr = read_img(dpath + "img/starry.png", content_shp)

Step 4 : Load vgg16_avg

Unlike doing image classification with pure sequential api of Keras, to build a neural style network, we need to use backend api of Keras.

content_base = K.variable(content_img_arr)
style_base = K.variable(style_img_arr)
gen_img = K.placeholder(content_shp)
batch = K.concatenate([content_base, style_base, gen_img], 0)

#Feed the batch into the vgg model, every time we call the model/layer to
#generate output, it will generate output of content_base, style_base,
#gen_img. Unlike content_base and style_base, gen_img is a placeholder,
#that means we will need to provide data to this placeholder later on
model = vgg16_avg.VGG16_Avg(input_tensor = batch, include_top=False)

#build a dict of model layers
outputs = {l.name:l.output for l in model.layers}
#I prefer these 1~3 layers hierarchy as my style_layers, 
#you can try it out with different range
style_layers = [outputs['block{}_conv1'.format(i)] for i in range(1,4)]
content_layer = outputs['block4_conv2']

If you find K.variable, K.placeholder very confuse, please check the document of TensorFlow and Keras backend api.

Step 5 : Create function to find loss and gradient

#gram matrix is a matrix collect the correlation of all of the vectors
#in a set. Check wiki(https://en.wikipedia.org/wiki/Gramian_matrix) 
#for more details
def gram_matrix(x):
    #change height,width,depth to depth, height, width, it could be 2,1,0 too
    #maybe 2,0,1 is more efficient due to underlying memory layout
    features = K.permute_dimensions(x, (2,0,1))
    #batch flatten make features become 2D array
    features = K.batch_flatten(features)
    return K.dot(features, K.transpose(features)) / x.get_shape().num_elements()    

def style_loss(x, targ):
    return metrics.mse(gram_matrix(x), gram_matrix(targ))
    
content_loss = lambda base, gen: metrics.mse(gen, base)    

#l[1] is the output(activation) of style_base, l[2] is the
#output of gen_img loss of style image and gen_img. As the
#paper suggest, we add the loss of all convolution layers
loss = sum([style_loss(l[1], l[2]) for l in style_layers]) 

#content_layer[0] is the output of content_base,
#content_layer[2] is the output of gen_img
#loss of content image and gen_img
loss += content_loss(content_layer[0], content_layer[2]) / 10. 

#The loss need two variables but we only pass in one,
#because we only got one placeholder in the graph,
#the other variable already determine by K.variable
grad = K.gradients(loss, gen_img)
#We cannot call loss and grad directly, we need
#to create a function(convert it to symbolic definition)
#before we can feed it into the solver
fn = K.function([gen_img], [loss] + grad)

You can adjust the weight of style loss and content loss by yourself until you think the image looks good enough. The function at the end only tells you that the concatenated list of loss and grads is the output that you want to - eventually - minimize. So, when you feed it to the solver bfgs, it will try to minimize the loss and will stop when the gradients are also zero (a minimum, hopefully not just a local one).

Step 6 : Create a helper class to separate loss and gradient

#fn will return loss and grad, but fmin_l_bfgs need to seperate them
#that is why we need a class to separate loss and gradient and store them
class Evaluator:
    def __init__(self, fn_, shp_):
        self.fn = fn_
        self.shp = shp_
        
    def loss(self, x):
        loss_, grads_ = self.fn([x.reshape(self.shp)])
        self.grads = grads_.flatten().astype(np.float64)
        
        return loss_.astype(np.float64)
    
    def grad(self, x):
        return np.copy(self.grads)
    
evaluator = Evaluator(fn, content_shp)

Step 7 : Generate a random noise image(white noise image mentioned by the paper)

#This is the real value of the placeholder--gen_img
rand_img = lambda shape: np.random.uniform(-2.5, 2.5, shape)/100

Step 8 : Minimize the loss of rand_img with the source image and style image

def solve_img(evalu, niter, x):
    for i in range(0, niter):
        x, min_val, info = fmin_l_bfgs_b(evalu.loss, x.flatten(), 
                                         fprime=evalu.grad, maxfun = 20)
        #value of PIL lie within -127 and 127
        x = np.clip(x, -127, 127)
        print(i, ',Current loss value:', min_val)
        x = x.reshape(content_shp)
        simg = deproc(x.copy())
        img_name = '{}_{}_neural_style_img_{}.png'.
                    format(dpath + "gen_img/", content_img_name, i)
        imsave(img_name, simg[0])
    return x

solve_img(evaluator, 10, rand_img(content_shp)/10.)

You may ask, why using fmin_l_bfgs_b but not stochastic gradient descent? The answer is we can, but we have a better choice. Unlike image classification, we do not have a lot of batch to run, right now we only need to figure out the loss and gradient between three inputs, they are source image, style image and the random image, using fmin_l_bfgs_b is more than enough.

Create a better images downloader(Google, Bing and Yahoo) by Qt5

2017-05-30T19:56:00.003-07:00

I mentioned how to create a simple Bing image downloader in Download Bing images by Qt5, in this post I will explain how do I tackle the challenges I have encountered when I try to build a better image downloader app by Qt5, the skills I used are apply on QImageScraper version_1.0, if you want to know the details, please dive into the codes, they are too complicated to write down in this blog.

1 : Show all of the images searched by Bing

To show all of the images searched by Bing, we need to make sure the page is scrolled to the bottom, unfortunately there is no way to check it with 100% accuracy if we are not scrolling the page manually, because the height of the scroll bar keep changing when you scrolling it, this make the program almost impossible to determine when should it stop to scroll the page.

Solution

I give several solutions a try but none of them are optimal, I have no choice but seek a compromise. Rather than scrolling the page full auto, I adopt semi auto solution as Pic.1 shown.

Pic.1

2 : Not all of the images are downloable

There are several reasons may cause this issue.

The search engine(Bing, Yahoo, Google etc) fail to find direct link of the image.
The server "think" you is not a real human(robot?)
Network error
There are no error happen, but the reply of the server take too long

Solution

Although I cannot find a perfect solution for problem 2, but there are some tricks to alleviate it, let the flow chart(Pic.2) clear the miasma.

Pic.2

Pic.3

Simply put, if error happen, I will try to download thumbnail, if even the thumbnail cannot download, I will try next image. After all, this solution is not too bad, let us see the results of download 893 smoke images search by Google.

Pic.4

All of the images could be downloaded, 817 of them are big images, 76 of them are small images, not a perfect results but not bad either. Something I did not mentioned in Pic.2 and Pic.3 are

I always switch user agents
I start next download with random period(0.5second~1.5second)

Purpose of these "weird" operations is try to emulate behaviors of humans, this could lower down the risk of being treated as "robot" by the servers. I cannot find free, trustable proxies yet, else I would like to randomly connect to different proxies time to time too, please tell me where to find those proxies if you know, thanks.

3 : Type of the images are mismatch or did not specify in file extension

Not all of the images have correct type(jpg, png, gif etc), I am very lucky that Qt5 provide us QImageReader, this class can determine the type of the image from contents rather than extension. With it we can change the suffix of the file into the real format, remove the files which are not images.

4 : QFile fail to rename/remove file

QFile::rename and QFile::remove got some troubles on windows(works well on mac), this bug me a while, it cost me one day to find out QImageReader blocking the file

5 : Invalid file name

Not all of the file name are valid, it is extremely hard to find out a perfect way to determine the file name is valid or not, I only do some minimal process for this issue--Remove illegal characters and trimmed white spaces.

6 : Deploy app on major platforms

One of the strong selling points Qt is the ability of cross-platform, to tell you the truth I can build the app and run it on windows, mac and linux without changing single line of codes, it work out of the box. Problem is, deploy the app on linux is not fun at all, it is a very complicated task, I will try deploy this image downloader after linuxqtdeploy become mature.

Summary

In this blog post, I reviewed some problems I met when using Qt5 to develop an image downloader, this is by no means exhaustive but scrape the surface, if you want to know the details, every nitty-gritty, better dive into source codes.

Download Bing images by Qt5
Source codes of QImageScraper

Download Bing images by Qt5

2017-05-14T09:23:00.002-07:00

Have you ever need more data for your image classifier?I do, but download images search by Google, Bing nor Flickr one by one are very time consuming, why not we write a small, simple images scraper to help us? Sounds like a good idea, as usual, before I start the task, I list out the requirements of this small app.

a : Cross platform, able to work under ubuntu and windows with one code base(no plan for mobiles since this is a tool design for machine learning)

b : Support regular expression, because I need to parse the html

c : Support high level api of networking

d : Have decent webEngine, it is very hard(impossible?) to scrape the images from those search engine without it

e : Support unicode

f : Easy to create ui, because I want instant feedback of the website, this could speed up development times
g : Ease to build, solving dependency problem of different 3rd libraries are not fun at all

After search through my toolbox I find out Qt5 is almost ideal for my task. In this post I will use Bing as an example(Google and Yahoo images share the same tricks, processes of scraping these big 3 image search engine are very similar). If you ever try to study the source codes of the search results of Bing, you will find out they are very complicated, difficult to read(Maybe MS spend lots of time to prevent users scrape images). Are you afraid?Rest assured, the steps of scraping image from Bing is a little bit complicated but not impossible as long as you have nice tools to aid you :).

Step 1 : You need a decent, modern browser like firefox or chrome

Why do we need a decent browser?Because they have a powerful feature--Inspect Element, this function can help you find out the contents(links, buttons etc) of the website.

Pic1

Step 2 : Click Inspect Element on interesting content

Move your mouse to the contents you want to observe and click Inspect Element.

Pic2

After that the browser should show you the codes of the interesting content.

Pic3

The codes point by the browser may not something you want, if this is the case, look around the codes point by the browser as Pic3 show.

Step 3 : Create a simple prototype by Qt5

We already know how to inspect the source codes of the web page, let us create a simple ui to help us. This ui do not need to be professional or beautiful, after it is just a prototype. The functions we need to create a Bing image scraper are

a : Scroll pages

b : Click see more images

c : Parse and get the links of images

d : Download images

With the help of Qt Designer, I am able to "draw" the ui(Pic4) within 5 minutes(ignore parse_icon_link and next_page buttons for this tutorial).

Pic4

Since Qt Designer do not QWebEngineView yet, I add it manually by codes

    ui->gridLayout->addWidget(web_view_, 4, 0, 1, 2);

Pic5 is what it looks like when running.

Pic5

Step 4 : Implement scroll function by js

    //get scroll position of the scroll bar and make it deeper
    auto const ypos = web_page_->scrollPosition().ry() + 10000;
    //scroll to deeper y position
    web_page_->runJavaScript(QString("window.scrollTo(0, %1)").arg(ypos));

Step 5 : Implement parse image link function

Before we can get the full link of the images, we need to scrape the links of the page.

    web_page_->gttoHtml([this](QString const &contents)
    {             
        QRegularExpression reg("(search\\?view=detailV2[^\"]*)");
        auto iter = reg.globalMatch(contents);
        img_page_links_.clear();
        while(iter.hasNext()){
            QRegularExpressionMatch match = iter.next();            
            if(match.captured(1).right(20) != "ipm=vs#enterinsights"){
                QString url = QUrl("https://www.bing.com/images/" + 
                                   match.captured(1)).toString();
                url.replace("&amp", "&");
                img_page_links_.push_back(url);
            }
        }
    });

Step 6 : Simulate "See more image"

This part is a little bit tricky, I tried to find the words "See more images" but find nothing, the reason is the source codes return by View Page Source(Pic 6) do not update.

Pic 6

Solution is easy, use Inspect Element to replace View Page Source(sometimes it is easier to find the contents you want by View Page Source, both of them are valuable for web scraping).

    
    web_page_->runJavaScript("document.getElementsByClassName"
                             "(\"btn_seemore\")[0].click()");

Step 7 : Download images

Overall our ultimate goal is download the image we want, let us finish last part of this prototype.

First, we need to get the html text of the image page(the page with the link of image source).

    if(!img_page_links_.isEmpty()){
        web_page_->load(img_page_links_[0]);
    }

Second, download the image

    
    void experiment_bing::web_page_load_finished(bool ok)
    {
        if(!ok){
          qDebug()<<"cannot load webpage";
          return;
        }

        web_page_->toHtml([this](QString const &contents)
        {
            QRegularExpression reg("src2=\"([^\"]*)");
            auto match = reg.match(contents);
            if(match.hasMatch()){           
               QNetworkRequest request(match.captured(1));
               QString const header = "msnbot-media/1.1 (+http://search."
                                      "msn.com/msnbot.htm)";
               //without this header, some image cannot download
               request.setHeader(QNetworkRequest::UserAgentHeader, header);                      
               downloader_->append(request, ui->lineEditSaveAt->text());
            }else{
               qDebug()<<"cannot capture img link";
            }
            if(!img_page_links_.isEmpty()){
               //this image should not download again
               img_page_links_.pop_front();
            }
        });
    }

Third, download next image

    
void experiment_bing::download_finished(size_t unique_id, QByteArray)
{
    if(!img_page_links_.isEmpty()){
        web_page_->load(img_page_links_[0]);
    }
}

Summary

These are the key points of scraping images of Bing search by QtWebEngine, The downloader I use in this post are come from qt_enhance, whole prototype are placed at mega. If you want to know more, visit following link

Create a better images downloader(Google, Bing and Yahoo) by Qt5

Warning

Because Qt-bug 66099, this prototype do not work under windows 10, unfortunately this bug is rated as P2, that means we may need to wait a while before it can be fixed by Qt community.

Edit

Qt5.9 Beta4 fixed Qt-bug 60669, it works on my laptop(win 10 64bits) and desktop(Ubuntu 16.04.1).

There exist a better solution to scrape image link of Bing, I will mentioned it on the next post.

Deep learning 07-Challenge dog vs cat fun competition of kaggle by dlib and mxnet

2017-03-12T00:40:00.001-08:00

Today I want to record down the experiences I learned from the dog vs cat fun competition of kaggle which I spend about two weeks on it(this is the first competition I took). My best rank in this competition is 67, this rank is close to top 5%(there are 1314 team).

The first tool I give it a try is dlib, although this library lack a lot of features compare with another deep learning toolbox, I still like it very much, especially the fact that dlib can work as zero dependency library.

What I have learned

1 : Remember to record down the parameters you used

At first I write down the records in my header file, this make my codes become harder to read as times go on, I should save those records in excel like format from the beginning. The other thing I learn is, I should record the parameters even I am running out of times, I find out without the records, I become more panic when the dead line was closer.

2 : Feed pseudo labels into the mini-batch with naive way do not work

I should split the data of mini-batch with some sort of ratio, like 2/3 truth labels, 1/3 pseudo labels.

3 : Leverage pretrained model is much easier to get good results

I do not use pre-trained model but train the network from scratch, this do not give me great results, especially when I cannot afford to train the image with bigger size since my gpu only got 2GB of rams, my score was 0.27468 with brand new model. To speed things up, I treat resnet34 of dlib as feature extractor, save the features extracted by resnet34 and train on new network, this push my score to 0.09627, a big improve.

4 : Ensemble and k-cross validation

To improve my score, I split the data set into 5 cross data set and ensemble the results by average them, this push my score to 0.06266. I do not apply stacking because I learn this technique after the competition finished. Maybe I can get better results if I know this technique earlier.

5 : How to use dlib, keras and mxnet

I put the codes of dlib and mxnet on github, I removed all of the non-work solutions, that is why you do not see any codes related to keras. keras did not help me improve my score but mxnet did, what I have done with mxnet was finetune all of the resnet pretrained models and ensemble them with the results trained by dlib. This improve my score to 0.05051.

6 : Read the post at forums, it may give you useful info

I learned that the data set got some "errors" in it, I removed those false images from the training data set.

7 : Fast ai course is awesome, I should view them earlier

If I have watched the videos before I take this competition, I believe I could perform better in this competition. The forum of this course is very helpful too, it is royal free and open.

8 : X-Crop validation may help you improve your score

I found this technique from PyImageSearch and dlib, but I do not have enough of times to try this technique out.

9 : Save settings in JSON format

Rather than hard code the parameters, save them in JSON file is better, because

a : Do not need to change the source codes frequently, this save compile time
b : Every models, experiments can have their own settings record, easier to reproduce
training result

Speed up image hashing of opencv(img_hash) and introduce color moment hash

2016-06-30T17:04:00.003-07:00

In this post, I would like to show you two things.

1 : How could I accelerate the speed of the img_hash module(click me) from 1.5x~500x(roughly) from my last post(click me).

2 : A new image hash algorithms which works quite well under rotation attack.

Accelerate the speed of img_hash

We only need one line to gain this huge performance gain, no more, no less.

cv::ocl::setUseOpenCL(false);

What I do is close the optimization of openCL(I would not discuss why this speed things up dramatically on my laptop, if you are interesting about it, I would open another topic to discuss this phenomenon). Let us measure the performance after the change. Codes located at here(click me).

Following comparison do not list the results of PHash about Average hash, PHash and Color hash algorithms, because I cannot find these algorithms in PHash library.

Computation time

Comparison time

Computation of img_hash with and without opencl

As the results show, computation time of img_hash outperform PHash after I switch off opencl support(on you computer, switch it on may help you gain better performance) on my laptop(y410p). Whatever, the comparison performance do not change much with or without opencl support.

Benchmark of Color Moment Hash

In this section, I would like to introduce an image hash algorithm which works quite well under rotation attack and provide a much better test results than my last post(click me). This algorithm is introduced by this paper(click me), the class ColorMomentHash of img_hash module implement this algorithm.

My last post only use one image--lena.png to do the experiment under different attack, in this post I will use the data set from phash to do the test(use miscellaneous data set(click me) as original image, apply different attack on it). These 3D bar charts are generated by Qt data visualization, I do not upload it to github yet because the codes are quite messy, if you need the source codes, please send the request to my email(thamngapwei@gmail.com), I would send you a copy of the codes, but do not expect I would refine the codes any time soon.

The name of the images are quite long, it do not looks good when I draw it on chart, so I rename them to shorter form(001~023). Following are the mapping of those images. You can download the mapping of new name and old name from mega(click me).

Threshold of the tests of color moment hash is 8, if the L2-Norm of two hash greater than 8, we treat it as fail, and draw it with red bars.

Contrast attack

Contrast attack on color moment hash

  Param is the gamma value of gamma correction.

Resize attack

Resize attack on color moment hash

Param is the aspect ratio of horizontal and vertical site.

Gaussion noise attack

Gaussian noise attack on color moment hash

  Param is the standard deviation of gaussion.

Salt and pepper noise attack

Salt and pepper noise attack on color moment hash

Param is the threshold of pepper and salt.

Rotation attack

Rotation attack on color moment hash

Param is the angle of rotation.

Gaussian blur attack

Gaussian blur attack on color moment hash

Param is the standard deviation of 3x3 gaussian filter.

Jpeg compression attack

Jpeg compression attack on color moment hash

Param is the quality factor of jpeg compression, 100 means no compress.

Watermark attack

Watermark attack on color moment hash

Param is the strength of watermark, 1.0 means the mark is 100% opaque. Image 017 and image 023 perform very poor because they are gray scale image.

From these experiment data, we can say color moment hash perform very well under various attack except gaussion noise, salt and pepper noise and contrast attack.

Overall results of different algorithms

Apparently, there are too many data to show for all of the algorithms, to make things more intuitive, I create the charts to help you measure the performance of these algorithms under different attacks.Their threshold are same as the last post(click me).

Average algorithm performance

PHash algorithm performance

Marr Hildreth algorithm performance

Radial hash algorithm performance

BMH zero algorithm performance

BMH one algorithm performance

Color moment algorithm performance

Overall reults

These are the results of all of the algorithms, from the Overall results chart, it is easy to see that every algorithms have their pros and cons, you need to pick the one suit for your database. If speed is crucial, then average hash maybe is your best choices, because it is the fastest algorithms compare with other and perform very well under different attacks except of rotation and salt and pepper noise.If you need rotation resistance, color moment hash is you only choice because other algorithms suck on rotation attack. You can find the codes of these test cases from here(click me).

Compare with PHash library

As this post show, img_hash module possess five advantages over the PHash library(click me).

1 : Processing speed of this module outperform PHash.

2 : This module adopt the same license as opencv(click me), which means you can do anything with it as you like without charging.

3 : The codes are much more modern, easier to use, img_hash free you from memory management chores once and for all. A modern, good c++ library should not force their users take care the resources by themselves.

4 : Api of img_hash are consistent, much easier to use than PHash library. Do not believe it? Let us see some examples.

Case 1a : Compute Radial Hash by PHash library

Digest digests_0, digests_1;
digest_0.coeffs = 0;
digest_1.coeffs = 1;
ph_image_digest(img_0, 1.0, 1.0, digest_0);
ph_image_digest(img_1, 1.0, 1.0, digest_1);

double pcc = 0;
ph_crosscorr(digest_0, digest_1, pcc, 0.9);
//do something, remember to free your memory :(
free(digest_0.coeffs);
free(digest_1.coeffs);

Case 1b : Compare Radial Hash by img_hash

auto algo = RadialVarianceHash::create();
cv::Mat hash_0, hash_1;
algo->compute(img_0, hash_0);
algo->compute(img_1, hash_1);
double const value = algo->compare(hash_0, hash_1);
//do something
//you do not need to free anything by yourself

Case 2a : Compute Marr Hash by PHash library

int N = 0;
uint8_t *hash_0 = ph_mh_imagehash(img_0, N);
uint8_t *hash_1 = ph_mh_imagehash(img_1, N);
double const value = ph_hammingdistance2(hash_0 , 72, hash_1, 72);   
//do something, remember to free your memory :(
free(hash_0);
free(hash_1);

Case 2b : Compare Marr Hash by img_hash

auto algo = MarrHildrethHash::create();
cv::Mat hash_0, hash_1;
algo->compute(img_0, hash_0);
algo->compute(img_1, hash_1);
double const value = algo->compare(hash_0, hash_1);
//do something
//you do not need to free anything by yourself

Case 3a : Compute Block mean Hash by PHash library

BinHash *hash_0 = 0;
BinHash *hash_1 = 0;
ph_bmb_imagehash(imgs_0, 1, &hash_0);
ph_bmb_imagehash(imgs_1, 1, &hash_1);

double const value = ph_hammingdistance2(hash_0->hash,
                hash_0->bytelength,
                hash_1->hash,
                hash_1->bytelength); 
//do something, remember to free your memory :(
ph_bmb_free(hash_0);
ph_bmb_free(hash_1);

Case 3b : Compare Block mean Hash by img_hash

auto algo = BlockMeanHash::create(0);
cv::Mat hash_0, hash_1;
algo->compute(img_0, hash_0);
algo->compute(img_1, hash_1);
double const value = algo->compare(hash_0, hash_1);
//do something
//you do not need to free anything by yourself

As you can see, img_hash not only faster, this module also provide you cleaner, more concise way to write your codes, you never need to remember different ways to find out your hash and how to compare them anymore, because the api of img_hash are consistent.

5 : This module only depend on opencv_core and opencv_imgproc, that means you should be able to compile it at ease on every major platform without scratching your heads.

Next move

Develop an application--Similar Vision to show the capability of img_hash. Functions of this app are find out similar images from image set(of course, it will leverage the power of img_hash module) and similar video clips from videos.

Introduction to image hash module of opencv

2016-06-19T11:59:00.001-07:00

Anyone using the defacto standard computer vision library--opencv, have you ever hope opencv provide us ready to use, image hash algorithms like average hash, perceptual hash, block mean hash, radial variance hash, marr hildreth hash like PHash does? PHash sound like a robust solution and run quite fast, but prefer PHash mean you need to add more dependencies into your project and open your source codes, open source is not a viable option in most of the commercial products. Do you, like me, do not want to add more dependencies into your codes? Have a royalty free, robust and high performance image hash algorithms for your project?Let us admit it, we do not like to solve dependencies issues related to programming, beyond that, many of the commercial project need to remain close source, it would be much better if opencv provide us an image hash module.

If opencv do not have one, why not just create one for it?

1 : The algorithms of image hash are not too complicated.
2 : PHash library already implement many of image hash algorithms, we could port them to opencv and use it as golden model.
3 : opencv is an open source computer vision library. If we ever found any bugs, missing features, poor performance, we can do something to make it better.

The good news is I have implement all of the algorithms I mentioned above, refine the performance(ex : block mean hash able to process single channel image), free you from memory management chores. The bad news is this pull request hasn't merged yet when I write this post, so you need to clone/pull it down and build by yourself. Fear not, this module only depend on the core and imgproc of opencv, it should be fairly easy to build(opencv is quite easy to build from the beginning :)).

Following examples will show you how to use img_hash, you will find out it is much easier to use than PHash library because the api are more consistent + you do not need to manage the memory by yourself.

How to use it

#include <opencv2/core.hpp>
#include <opencv2/core/ocl.hpp>
#include <opencv2/highgui.hpp>
#include <opencv2/img_hash.hpp>
#include <opencv2/imgproc.hpp>

void computeHash(cv::Ptr<cv::img_hash::ImgHashBase> algo)
{
    cv::Mat const input = cv::imread("lena.png");
    cv::Mat const target = cv::imread("lena_blur.png");
    
    cv::Mat inHash; //hash of input image
    cv::Mat targetHash; //hash of target image

    //comupte hash of input and target
    algo->compute(input, inHash);
    algo->compute(target, targetHash);
    //Compare the similarity of inHash and targetHash
    //recommended thresholds are written in the header files
    //of every classes
    double const mismatch = algo->compare(inHash, targetHash);
    std::cout<<mismatch<<std::endl;
}

int main()
{
    //disable opencl acceleration may boost up speed of img_hash
    //however, in this post I do not disable the optimization of opencl    
    //cv::ocl::setUseOpenCL(false);

    computeHash(img_hash::AverageHash::create());
    computeHash(img_hash::PHash::create());
    computeHash(img_hash::MarrHildrethHash::create());
    computeHash(img_hash::RadialVarianceHash::create());
    //BlockMeanHash support mode 0 and mode 1, they associate to 
    //mode 1 and mode 2 of PHash library
    computeHash(img_hash::BlockMeanHash::create(0));
    computeHash(img_hash::BlockMeanHash::create(1));
    computeHash(img_hash::ColorMomentHash::create());
}

With these functions, we can measure the performance of our algorithms under different "attack", like resize, contrast, noise and rotation. Before we start the test, let me define the thresholds of "pass" and "fail".One thing to remember is, to make thing simple, I only use lena to show the results, different data set may need different thresholds/algorithms to get best results.

Threshold

After we determine our threshold, we could use our beloved lena to do the test :).

lena.png

Resize attack

Resize attack

Every algorithms(BMH mean block mean hash) work very well on different size and aspect ratio except of radial variance hash, this algorithms work on different size, but we need to keep the aspect ratio.

Contrast Attack

Contrast Attack

Every algorithms works quite well under different contrast, although Radical variance hash, BMH zero and BMH one do not works well under very low contrast.

Gaussian Noise Attack

Gaussian noise attack

Very fortunate, every algorithms survive under the attack of gaussian nose.

Salt And Pepper Noise Attack

Salt and pepper noise attack

As we can see, only Radical hash and BMH perform well under the attack of pepper and salt.

Rotation Attack

Rotation attack

Apparently, all of the algorithms can not survive under rotation attack. But is this really matter?I guess not(do you always need to search the image after rotation by google?). If you really need to deal with rotation attack, I suggest you give BOVW(bag of visual words) a try, I use it to construct robust CBIR system before, the defects of robust BOVW based CBIR are long computation time, consume a lot of memory and much harder to scale to large data set(you will need to build up distributed system in that case).

We have go through all of the tests, now let us measure the performance of hash computation time and comparison time of different algorithms(my laptop is Y410P, os is windows 10 64bits, compiler is vc2015 64bits with update 2 install).

You can find all the details of different attacks at here(click me).

Computation Performance Test--img_hash vs PHash library

I use different algorithms to compute the hash of 100 images from ukbench(ukbench03000.jpg~ukbench03099.jpg). The source codes of opencv comparison is located at here(check the function measure_computation_time and measure_comparison_time, I am using img_hash_1_0 when I am writing this post), source codes of PHash performance test(version 0.94 since I am on windows) is located at here.

Computation performance test

Comparison performance test

In most cases, img_hash is faster than PHash, but the speed of BMH zero and BMH one are slower than PHash version almost 30% or 40%. The bottleneck is cv::resize(over 95% of times spend on it), to speed things up, we need a faster resize function.

Find similar image from ukbench

The results looks good, but could it find similar images? Of course dude, let me show you how could we measure the hash values of our target from ukbench(for simplicity, I only pick 100 images from ukbench).

target

void find_target(cv::Ptr<cv::img_hash::ImgHashBase> algo, bool smaller)
{
    using namespace cv::img_hash;

    cv::Mat input = cv::imread("ukbench/ukbench03037.jpg");
    //not a good way to reuse the codes by calling
    //measure comparision time, please bear with me
    std::vector<cv::Mat> targets = measure_comparison_time(algo, "");

    double idealValue;
    if(smaller)
    {
        idealValue = std::numeric_limits<double>::max();
    }
    else
    {
        idealValue = std::numeric_limits<double>::min();
    }
    size_t targetIndex = 0;
    cv::Mat inputHash;
    algo->compute(input, inputHash);
    for(size_t i = 0; i != targets.size(); ++i)
    {
        double const value = algo->compare(inputHash, targets[i]);
        if(smaller)
        {
            if(value < idealValue)
            {
                idealValue = value;
                targetIndex = i;
            }
        }
        else
        {
            if(value > idealValue)
            {
                idealValue = value;
                targetIndex = i;
            }
        }
    }
    std::cout<<"mismatch value : "<<idealValue<<std::endl;
    cv::Mat result = cv::imread("ukbench/ukbench0" +
                                std::to_string(targetIndex + 3000) +
                                ".jpg");
    cv::imshow("input", input);
    cv::imshow("found img " + std::to_string(targetIndex + 3000), result);
    cv::waitKey();
    cv::destroyAllWindows();
}

void find_target()
{
    using namespace cv::img_hash;

    find_target(AverageHash::create());
    find_target(PHash::create());
    find_target(MarrHildrethHash::create());
    find_target(RadialVarianceHash::create(), false);
    find_target(BlockMeanHash::create(0));
    find_target(BlockMeanHash::create(1));
}

You will find out every algorithms give you back the same image you are looking for.

Conclusion

Average hash and PHash are the fastest algorithms, but if you want a more robust one, pick BMH zero, BMH zero and BMH give similar resutls, but BMH one is slower since it need to spend more computation power. Hash comparision of Radial hash are much slower than other's, because it need to find out peak cross-correlation values from 40 combinations. If you want to know how to speed things up and know more about rotation invariant image hash algorithm, give this link(click me) a try.

You can find the test cases at here. If you think this post helpful, please give my repositories(blogCodes2 and my img_hash of opencv_contrib) a star :). If you want to join the developments, please open a pull request, thanks.

Remove annoying trailing white space by c++

2016-06-17T21:07:00.004-07:00

If you ever try to commit something to opencv(I am porting/implementing various image hash algorithms to opencv_contrib when I writing this post, you can find my branch at here), you would likely to find out some extremely annoying messages as

modules/tracking/include/opencv2/tracking/tracker.hpp:857: trailing whitespace.
+  
modules/tracking/include/opencv2/tracking/tracker.hpp:880: trailing whitespace.
+ 
modules/tracking/include/opencv2/tracking/tracker.hpp:890: trailing whitespace.
+ 
modules/tracking/include/opencv2/tracking/tracker.hpp:1433: trailing whitespace.
+        Params(); 
modules/tracking/include/opencv2/tracking/tracker.hpp:1434: trailing whitespace.
+        
modules/tracking/include/opencv2/tracking/tracker.hpp:1444: trailing whitespace.

blablabla. They pop out in your files time to time, cost you more times to fix them, pollute your commit history, not only that, those trailing white spaces, they are hard to spot by human eyes.

Apparently, eliminate those trailing white space is not a job suit for humans, we would better leave those tedious tasks to our friends--computer.

    To teach our friend know what do I want to do, I write a small program to help us(source codes located at here), you should be able to compile and run it if you familiar with c++ and boost.

    Enough of talk, let me show you an example

Example 00

    As you can see, Example 00 contains a lot of tabs and trailing white space, not only that, there are a tab we should not removed(tab of std::string("\t")), this is the time my small tool--kill_trailing_white_space come in. All you need to do is specify you want to remove the tab and trailing white space of a file, or the files inside the folder(will scan the folders recursively). Example

"kill_trailing_white_space --input_file main.cpp"

"kill_trailing_white_space --input_folder img_hash"

    After the process, we could have a clean file as Example 01.

Example 01

    You can see the help menu if you enter --help.By now this small tool only support the files with extension ".hpp" and ".cpp". Feel free to modify the codes to suit your needs.