Qt and openCV: 2016

Thursday, 30 June 2016

Speed up image hashing of opencv(img_hash) and introduce color moment hash

In this post, I would like to show you two things.

1 : How could I accelerate the speed of the img_hash module(click me) from 1.5x~500x(roughly) from my last post(click me).

2 : A new image hash algorithms which works quite well under rotation attack.

Accelerate the speed of img_hash

We only need one line to gain this huge performance gain, no more, no less.

cv::ocl::setUseOpenCL(false);

What I do is close the optimization of openCL(I would not discuss why this speed things up dramatically on my laptop, if you are interesting about it, I would open another topic to discuss this phenomenon). Let us measure the performance after the change. Codes located at here(click me).

Following comparison do not list the results of PHash about Average hash, PHash and Color hash algorithms, because I cannot find these algorithms in PHash library.

Computation time

Comparison time

Computation of img_hash with and without opencl

As the results show, computation time of img_hash outperform PHash after I switch off opencl support(on you computer, switch it on may help you gain better performance) on my laptop(y410p). Whatever, the comparison performance do not change much with or without opencl support.

Benchmark of Color Moment Hash

In this section, I would like to introduce an image hash algorithm which works quite well under rotation attack and provide a much better test results than my last post(click me). This algorithm is introduced by this paper(click me), the class ColorMomentHash of img_hash module implement this algorithm.

My last post only use one image--lena.png to do the experiment under different attack, in this post I will use the data set from phash to do the test(use miscellaneous data set(click me) as original image, apply different attack on it). These 3D bar charts are generated by Qt data visualization, I do not upload it to github yet because the codes are quite messy, if you need the source codes, please send the request to my email(thamngapwei@gmail.com), I would send you a copy of the codes, but do not expect I would refine the codes any time soon.

The name of the images are quite long, it do not looks good when I draw it on chart, so I rename them to shorter form(001~023). Following are the mapping of those images. You can download the mapping of new name and old name from mega(click me).

Threshold of the tests of color moment hash is 8, if the L2-Norm of two hash greater than 8, we treat it as fail, and draw it with red bars.

Contrast attack

Contrast attack on color moment hash

  Param is the gamma value of gamma correction.

Resize attack

Resize attack on color moment hash

Param is the aspect ratio of horizontal and vertical site.

Gaussion noise attack

Gaussian noise attack on color moment hash

  Param is the standard deviation of gaussion.

Salt and pepper noise attack

Salt and pepper noise attack on color moment hash

Param is the threshold of pepper and salt.

Rotation attack

Rotation attack on color moment hash

Param is the angle of rotation.

Gaussian blur attack

Gaussian blur attack on color moment hash

Param is the standard deviation of 3x3 gaussian filter.

Jpeg compression attack

Jpeg compression attack on color moment hash

Param is the quality factor of jpeg compression, 100 means no compress.

Watermark attack

Watermark attack on color moment hash

Param is the strength of watermark, 1.0 means the mark is 100% opaque. Image 017 and image 023 perform very poor because they are gray scale image.

From these experiment data, we can say color moment hash perform very well under various attack except gaussion noise, salt and pepper noise and contrast attack.

Overall results of different algorithms

Apparently, there are too many data to show for all of the algorithms, to make things more intuitive, I create the charts to help you measure the performance of these algorithms under different attacks.Their threshold are same as the last post(click me).

Average algorithm performance

PHash algorithm performance

Marr Hildreth algorithm performance

Radial hash algorithm performance

BMH zero algorithm performance

BMH one algorithm performance

Color moment algorithm performance

Overall reults

These are the results of all of the algorithms, from the Overall results chart, it is easy to see that every algorithms have their pros and cons, you need to pick the one suit for your database. If speed is crucial, then average hash maybe is your best choices, because it is the fastest algorithms compare with other and perform very well under different attacks except of rotation and salt and pepper noise.If you need rotation resistance, color moment hash is you only choice because other algorithms suck on rotation attack. You can find the codes of these test cases from here(click me).

Compare with PHash library

As this post show, img_hash module possess five advantages over the PHash library(click me).

1 : Processing speed of this module outperform PHash.

2 : This module adopt the same license as opencv(click me), which means you can do anything with it as you like without charging.

3 : The codes are much more modern, easier to use, img_hash free you from memory management chores once and for all. A modern, good c++ library should not force their users take care the resources by themselves.

4 : Api of img_hash are consistent, much easier to use than PHash library. Do not believe it? Let us see some examples.

Case 1a : Compute Radial Hash by PHash library

Digest digests_0, digests_1;
digest_0.coeffs = 0;
digest_1.coeffs = 1;
ph_image_digest(img_0, 1.0, 1.0, digest_0);
ph_image_digest(img_1, 1.0, 1.0, digest_1);

double pcc = 0;
ph_crosscorr(digest_0, digest_1, pcc, 0.9);
//do something, remember to free your memory :(
free(digest_0.coeffs);
free(digest_1.coeffs);

Case 1b : Compare Radial Hash by img_hash

auto algo = RadialVarianceHash::create();
cv::Mat hash_0, hash_1;
algo->compute(img_0, hash_0);
algo->compute(img_1, hash_1);
double const value = algo->compare(hash_0, hash_1);
//do something
//you do not need to free anything by yourself

Case 2a : Compute Marr Hash by PHash library

int N = 0;
uint8_t *hash_0 = ph_mh_imagehash(img_0, N);
uint8_t *hash_1 = ph_mh_imagehash(img_1, N);
double const value = ph_hammingdistance2(hash_0 , 72, hash_1, 72);   
//do something, remember to free your memory :(
free(hash_0);
free(hash_1);

Case 2b : Compare Marr Hash by img_hash

auto algo = MarrHildrethHash::create();
cv::Mat hash_0, hash_1;
algo->compute(img_0, hash_0);
algo->compute(img_1, hash_1);
double const value = algo->compare(hash_0, hash_1);
//do something
//you do not need to free anything by yourself

Case 3a : Compute Block mean Hash by PHash library

BinHash *hash_0 = 0;
BinHash *hash_1 = 0;
ph_bmb_imagehash(imgs_0, 1, &hash_0);
ph_bmb_imagehash(imgs_1, 1, &hash_1);

double const value = ph_hammingdistance2(hash_0->hash,
                hash_0->bytelength,
                hash_1->hash,
                hash_1->bytelength); 
//do something, remember to free your memory :(
ph_bmb_free(hash_0);
ph_bmb_free(hash_1);

Case 3b : Compare Block mean Hash by img_hash

auto algo = BlockMeanHash::create(0);
cv::Mat hash_0, hash_1;
algo->compute(img_0, hash_0);
algo->compute(img_1, hash_1);
double const value = algo->compare(hash_0, hash_1);
//do something
//you do not need to free anything by yourself

As you can see, img_hash not only faster, this module also provide you cleaner, more concise way to write your codes, you never need to remember different ways to find out your hash and how to compare them anymore, because the api of img_hash are consistent.

5 : This module only depend on opencv_core and opencv_imgproc, that means you should be able to compile it at ease on every major platform without scratching your heads.

Next move

Develop an application--Similar Vision to show the capability of img_hash. Functions of this app are find out similar images from image set(of course, it will leverage the power of img_hash module) and similar video clips from videos.

Sunday, 19 June 2016

Introduction to image hash module of opencv

Anyone using the defacto standard computer vision library--opencv, have you ever hope opencv provide us ready to use, image hash algorithms like average hash, perceptual hash, block mean hash, radial variance hash, marr hildreth hash like PHash does? PHash sound like a robust solution and run quite fast, but prefer PHash mean you need to add more dependencies into your project and open your source codes, open source is not a viable option in most of the commercial products. Do you, like me, do not want to add more dependencies into your codes? Have a royalty free, robust and high performance image hash algorithms for your project?Let us admit it, we do not like to solve dependencies issues related to programming, beyond that, many of the commercial project need to remain close source, it would be much better if opencv provide us an image hash module.

If opencv do not have one, why not just create one for it?

1 : The algorithms of image hash are not too complicated.
2 : PHash library already implement many of image hash algorithms, we could port them to opencv and use it as golden model.
3 : opencv is an open source computer vision library. If we ever found any bugs, missing features, poor performance, we can do something to make it better.

The good news is I have implement all of the algorithms I mentioned above, refine the performance(ex : block mean hash able to process single channel image), free you from memory management chores. The bad news is this pull request hasn't merged yet when I write this post, so you need to clone/pull it down and build by yourself. Fear not, this module only depend on the core and imgproc of opencv, it should be fairly easy to build(opencv is quite easy to build from the beginning :)).

Following examples will show you how to use img_hash, you will find out it is much easier to use than PHash library because the api are more consistent + you do not need to manage the memory by yourself.

How to use it

#include <opencv2/core.hpp>
#include <opencv2/core/ocl.hpp>
#include <opencv2/highgui.hpp>
#include <opencv2/img_hash.hpp>
#include <opencv2/imgproc.hpp>

void computeHash(cv::Ptr<cv::img_hash::ImgHashBase> algo)
{
    cv::Mat const input = cv::imread("lena.png");
    cv::Mat const target = cv::imread("lena_blur.png");
    
    cv::Mat inHash; //hash of input image
    cv::Mat targetHash; //hash of target image

    //comupte hash of input and target
    algo->compute(input, inHash);
    algo->compute(target, targetHash);
    //Compare the similarity of inHash and targetHash
    //recommended thresholds are written in the header files
    //of every classes
    double const mismatch = algo->compare(inHash, targetHash);
    std::cout<<mismatch<<std::endl;
}

int main()
{
    //disable opencl acceleration may boost up speed of img_hash
    //however, in this post I do not disable the optimization of opencl    
    //cv::ocl::setUseOpenCL(false);

    computeHash(img_hash::AverageHash::create());
    computeHash(img_hash::PHash::create());
    computeHash(img_hash::MarrHildrethHash::create());
    computeHash(img_hash::RadialVarianceHash::create());
    //BlockMeanHash support mode 0 and mode 1, they associate to 
    //mode 1 and mode 2 of PHash library
    computeHash(img_hash::BlockMeanHash::create(0));
    computeHash(img_hash::BlockMeanHash::create(1));
    computeHash(img_hash::ColorMomentHash::create());
}

With these functions, we can measure the performance of our algorithms under different "attack", like resize, contrast, noise and rotation. Before we start the test, let me define the thresholds of "pass" and "fail".One thing to remember is, to make thing simple, I only use lena to show the results, different data set may need different thresholds/algorithms to get best results.

Threshold

After we determine our threshold, we could use our beloved lena to do the test :).

lena.png

Resize attack

Resize attack

Every algorithms(BMH mean block mean hash) work very well on different size and aspect ratio except of radial variance hash, this algorithms work on different size, but we need to keep the aspect ratio.

Contrast Attack

Contrast Attack

Every algorithms works quite well under different contrast, although Radical variance hash, BMH zero and BMH one do not works well under very low contrast.

Gaussian Noise Attack

Gaussian noise attack

Very fortunate, every algorithms survive under the attack of gaussian nose.

Salt And Pepper Noise Attack

Salt and pepper noise attack

As we can see, only Radical hash and BMH perform well under the attack of pepper and salt.

Rotation Attack

Rotation attack

Apparently, all of the algorithms can not survive under rotation attack. But is this really matter?I guess not(do you always need to search the image after rotation by google?). If you really need to deal with rotation attack, I suggest you give BOVW(bag of visual words) a try, I use it to construct robust CBIR system before, the defects of robust BOVW based CBIR are long computation time, consume a lot of memory and much harder to scale to large data set(you will need to build up distributed system in that case).

We have go through all of the tests, now let us measure the performance of hash computation time and comparison time of different algorithms(my laptop is Y410P, os is windows 10 64bits, compiler is vc2015 64bits with update 2 install).

You can find all the details of different attacks at here(click me).

Computation Performance Test--img_hash vs PHash library

I use different algorithms to compute the hash of 100 images from ukbench(ukbench03000.jpg~ukbench03099.jpg). The source codes of opencv comparison is located at here(check the function measure_computation_time and measure_comparison_time, I am using img_hash_1_0 when I am writing this post), source codes of PHash performance test(version 0.94 since I am on windows) is located at here.

Computation performance test

Comparison performance test

In most cases, img_hash is faster than PHash, but the speed of BMH zero and BMH one are slower than PHash version almost 30% or 40%. The bottleneck is cv::resize(over 95% of times spend on it), to speed things up, we need a faster resize function.

Find similar image from ukbench

The results looks good, but could it find similar images? Of course dude, let me show you how could we measure the hash values of our target from ukbench(for simplicity, I only pick 100 images from ukbench).

target

void find_target(cv::Ptr<cv::img_hash::ImgHashBase> algo, bool smaller)
{
    using namespace cv::img_hash;

    cv::Mat input = cv::imread("ukbench/ukbench03037.jpg");
    //not a good way to reuse the codes by calling
    //measure comparision time, please bear with me
    std::vector<cv::Mat> targets = measure_comparison_time(algo, "");

    double idealValue;
    if(smaller)
    {
        idealValue = std::numeric_limits<double>::max();
    }
    else
    {
        idealValue = std::numeric_limits<double>::min();
    }
    size_t targetIndex = 0;
    cv::Mat inputHash;
    algo->compute(input, inputHash);
    for(size_t i = 0; i != targets.size(); ++i)
    {
        double const value = algo->compare(inputHash, targets[i]);
        if(smaller)
        {
            if(value < idealValue)
            {
                idealValue = value;
                targetIndex = i;
            }
        }
        else
        {
            if(value > idealValue)
            {
                idealValue = value;
                targetIndex = i;
            }
        }
    }
    std::cout<<"mismatch value : "<<idealValue<<std::endl;
    cv::Mat result = cv::imread("ukbench/ukbench0" +
                                std::to_string(targetIndex + 3000) +
                                ".jpg");
    cv::imshow("input", input);
    cv::imshow("found img " + std::to_string(targetIndex + 3000), result);
    cv::waitKey();
    cv::destroyAllWindows();
}

void find_target()
{
    using namespace cv::img_hash;

    find_target(AverageHash::create());
    find_target(PHash::create());
    find_target(MarrHildrethHash::create());
    find_target(RadialVarianceHash::create(), false);
    find_target(BlockMeanHash::create(0));
    find_target(BlockMeanHash::create(1));
}

You will find out every algorithms give you back the same image you are looking for.

Conclusion

Average hash and PHash are the fastest algorithms, but if you want a more robust one, pick BMH zero, BMH zero and BMH give similar resutls, but BMH one is slower since it need to spend more computation power. Hash comparision of Radial hash are much slower than other's, because it need to find out peak cross-correlation values from 40 combinations. If you want to know how to speed things up and know more about rotation invariant image hash algorithm, give this link(click me) a try.

You can find the test cases at here. If you think this post helpful, please give my repositories(blogCodes2 and my img_hash of opencv_contrib) a star :). If you want to join the developments, please open a pull request, thanks.

Friday, 17 June 2016

Remove annoying trailing white space by c++

If you ever try to commit something to opencv(I am porting/implementing various image hash algorithms to opencv_contrib when I writing this post, you can find my branch at here), you would likely to find out some extremely annoying messages as

modules/tracking/include/opencv2/tracking/tracker.hpp:857: trailing whitespace.
+  
modules/tracking/include/opencv2/tracking/tracker.hpp:880: trailing whitespace.
+ 
modules/tracking/include/opencv2/tracking/tracker.hpp:890: trailing whitespace.
+ 
modules/tracking/include/opencv2/tracking/tracker.hpp:1433: trailing whitespace.
+        Params(); 
modules/tracking/include/opencv2/tracking/tracker.hpp:1434: trailing whitespace.
+        
modules/tracking/include/opencv2/tracking/tracker.hpp:1444: trailing whitespace.

blablabla. They pop out in your files time to time, cost you more times to fix them, pollute your commit history, not only that, those trailing white spaces, they are hard to spot by human eyes.

Apparently, eliminate those trailing white space is not a job suit for humans, we would better leave those tedious tasks to our friends--computer.

    To teach our friend know what do I want to do, I write a small program to help us(source codes located at here), you should be able to compile and run it if you familiar with c++ and boost.

    Enough of talk, let me show you an example

Example 00

    As you can see, Example 00 contains a lot of tabs and trailing white space, not only that, there are a tab we should not removed(tab of std::string("\t")), this is the time my small tool--kill_trailing_white_space come in. All you need to do is specify you want to remove the tab and trailing white space of a file, or the files inside the folder(will scan the folders recursively). Example

"kill_trailing_white_space --input_file main.cpp"

"kill_trailing_white_space --input_folder img_hash"

    After the process, we could have a clean file as Example 01.

Example 01

    You can see the help menu if you enter --help.By now this small tool only support the files with extension ".hpp" and ".cpp". Feel free to modify the codes to suit your needs.

Friday, 29 April 2016

Content based image retrieval(CBIR) 02--Flow of CBIR, part B

This is the second part of the the flow of CBIR, I would record step 6 and step 7 in this post, although there are two steps only, the last step is a little bit complicated.

Step 6 : Build inverted index

void cbir_bovw::build_code_book(size_t code_size)
{
   hist_type hist;
   hist.load(setting_["hist"].GetString() +
             std::string("_") +
             std::to_string(code_size));

   invert_index invert;
   ocv::cbir::build_inverted_index(hist, invert);
   invert.save(setting_["inverted_index"].GetString() +
              std::string("_") +
              std::to_string(code_size));
}

This part is quite straigh forward, the invert_index is simply an encapsulation of std::map and std::vector. Apply inverted index may improve the accuracy of the CBIR system, this need to measure.

Step 7 : Search image

After step 6, I have prepared most of the tools of this CBIR system, it is time to start searching. I have four ways to search the image, it is shown at pic00.

pic00

As usual, a graph is worth a thousand words. The first solution(pic01) is the most easiest one, without IDF(inverse document frequency) and spatial information.

pic01

//api of this function is suck, but I think it is
//acceptable in this small example.However, in real case,

//we should not let this kind of codes exist, bad codes

//will attract more bad codes, in the end, your projects

//will become ultra hard to maintain
double measure_impl(Searcher &searcher,
                    ocv::cbir::f2d_detector &f2d,
                    BOVW const &bovw,
                    hist_type const &hist,
                    arma::Mat<cbir_bovw::feature_type> const &code_book,                    
                    rapidjson::Document const &doc,
                    rapidjson::Document const &setting)
{
    //toal_score save the number of "hit" image of ukbench
    double total_score = 0;
    auto const folder =
            std::string(setting["img_folder"].GetString());
    
    auto const files = ocv::file::get_directory_files(folder);
    for(int i = 0; i != files.size(); ++i){        
        cv::Mat gray =
                cv::imread(folder + "/" + files[i],
                           cv::IMREAD_GRAYSCALE);
        //f2d.get_descriptor is the bottle neck
        //of the program, more than 85% of computation
        //times come by it
        auto describe =
                f2d.get_descriptor(gray);
        //transfer cv::Mat to arma::Mat without copy
        arma::Mat const
                arma_features(describe.second.ptr<cbir_bovw::feature_type>(0),
                              describe.second.cols,
                              describe.second.rows,
                              false);
        //build the histogram of the image we want to search
        auto const target_hist =
                bovw.describe(arma_features,
                              code_book);   
        //search the image     
        auto const result =
                searcher.search(target_hist, hist);        

        //find relevant file of the image "files[i]"
        auto const &value = doc[files[i].c_str()];
        std::set relevant;
        for(rapidjson::SizeType j = 0;
            j != value.Size(); ++j){
            relevant.insert(value[j].GetString());
        }
        //increment total_score if the first 4 images
        //of the search result belongs to relevant image
        for(size_t j = 0; j != relevant.size(); ++j){
            auto it = relevant.find(files[result[j]]);
            if(it != std::end(relevant)){
                ++total_score;         
            }
        }        
    }

    return total_score;
}

This is it, I wrote down how to apply IDF and spatial info on github.

Results

Without inverse document frequency(IDF) and spatial verification(pic01) : 3.044

With inverse document frequency : 3.035

With spatial verfication : 3.082

With inverse document frequency and spatial verification : 3.13

In conclusion, if I apply IDF and spatial verification, I am able to get best results. The results could be improve if I invest more times to tune the parameters, like the number of code books, parameter of kaze, use another feature extractor to extract the features etc.

Problems of this solution

1 : It is slow, it took me about 300ms~500ms to extract kaze features and keypoints from a 640x480 image, single channel.

2 : It consume a lot of memory, kaze use about 150MB to extract keypoints and features.

If your applications only run on local machine, this is not a problem, but if you want to develop a web app, this would be a serious problem. We need a much faster yet quite accurate CBIR system if we want to deploy it on high traffic web app, just like TinEye and Google did.

Wednesday, 13 April 2016

Content based image retrieval(CBIR) 01--Flow of CBIR, part A

Before I dive into the codes, let me summarize the flow of CBIR, it is quite straightforward(pic00).

pic00

pic00 tell us the general idea of CBIR, in this post I would like to record how to implement step 1~5 by the codes located at github. There are too many variables need to pass in to this example, so I prefer to save those variables in json file--setting.json.

Step 1 ~ 4

cv::Mat cbir_bovw::
read_img(const std::string &name, bool to_gray) const
{
    if(to_gray){
        return cv::imread(name, cv::IMREAD_GRAYSCALE);
    }else{
        return cv::imread(name);
    }
}

void cbir_bovw::
add_data()
{
    using namespace ocv;

    //use kaze as feature detector and descriptor
    cv::Ptr<cv::KAZE> detector = cv::KAZE::create();
    cv::Ptr<cv::KAZE> descriptor = detector;
    cbir::f2d_detector f2d(detector, descriptor);

    //read the folder path from setting.json
    auto const folder =
            std::string(setting_["img_folder"].GetString());
    //iterate through the image inside the folder,
    //extract features and keypoints
    for(auto const &name : file::get_directory_files(folder)){
        auto const img = read_img(folder + "/" + name);
        if(!img.empty()){
            //find the keypoints and features by detector
            //and descriptor
            auto const result = f2d.get_descriptor(img);
            //first is keypoints, second is features
            fi_.add_features(name, result.first,
                             result.second);
        }else{
            throw std::runtime_error("image is empty");
        }
    }
}

In this example, I prefer to store the features, keypoints and other info into the hdf5 format, because these data could be very big, the ram of pc may not able to read them all at once.

Step 5 : Build code book

After I save the features and keypoints into the hdf5, it is time to build the code book. What is code book?In this case, it is just a bunch of features cluster by clustering algorithm. I pick kmeans for this task, because it is fast, robust and support by armadillo and opencv.

void cbir_bovw::build_code_book(size_t code_size)
{
    ocv::cbir::code_book_builder<feature_type>
            cb(fi_, setting_["features_ratio"].GetDouble(),
            cv_type_map<feature_type>::type);
    cb.create_code_book(arma::uword(code_size),
                        arma::uword(15), true);
    cb.get_code_book().save(setting_["code_book"].GetString() +
            std::string("_") +
            std::to_string(code_size),
            arma::arma_ascii);
}

After I generate the code book, I try to view what are those codes of the code book represent, although visualize the code book is not necessary, but it could be helpful for debug. Following(pic01, pic02, pic03) are part of the visualization results of code book.

pic01

pic02

pic03

The codes of this post are located at github.

Monday, 11 April 2016

Content based image retrieval(CBIR) 00--Use CBIR to find similar images of ukbench

Content based image retrieval(CBIR), also called as query by image content(QBIC), google search by image and TinEye maybe are the famous example in our daily live. In short, CBIR search the images based on the content of the image, not the name,date,meta data or other info.

I study how to implement CBIR from PyImageSearch Gurus, the algorithms are almost the same, but my codes are written by c++, build on top of opencv, hdf5, armadillo, boost, rapidjson. I pick c++ but not python(PyImageSearch use python) for this project because

1 : c++ suit for building stand alone package
2 : I like c++

These series of post would not discuss the implementation details of the codes(codes located at github) but summarize the keys I learn from the CBIR lessons of PyImageSearch Gurus.

The keys of this CBIR system

1 : Feature detector--kaze
2 : Feature descriptor--kaze
3 : Bag of visual words
4 : Data structure(hdf5, inverted index)
5 : Create Code book(I prefer kmeans)
6 : Quantization(build a histogram)
7 : Tf-idf(Term frequency and inverse document frequency)
8 : Spatial verification
9 : Evaluation

ukbench contain 6376 images, it would be a tedious job to find relevant images by human, this is why we need CBIR to save us from this kind of labor. Before I begin to summarize the keys of this CBIR system, I would post some examples, a picture is worth a thousand words.

Case 1 : Find similar image of the camera(pic00) within ukbench

pic00

Search result of pic00

Case 2 : Find similar image of the toy(pic01) within ukbench

pic01

Search result of pic01

The most similar image are shown at the first row. Until now, I think it should be clear enough to show what are CBIR intent to solve. We could use it to deal with a lot of problems, like object recognition(however, cnn is state of the art when I writing this post), search web site with the image(like google and TinEye), remove duplicate images and so on.

On next post, I would record part of the flows of this CBIR system, write down how to use the codes located on github(without explanation of implementation details).

Monday, 1 February 2016

Deep learning 06-Classify car and non-car by convolution neural network

Convolution neural network(cnn), a powerful tools for object recognition tasks in computer vision field, you can find good explanations of this powerful technique on cs231n, it is the best, free tutorial I could found by google.

Most of the famous cnn libraries(theano, caffe, torch etc) are hard to install on windows platform, the exceptions I found are mxnet and tiny-cnn. mxnet support cpu/gpu mode and distributed training, it is a nice tool for large-scale deep learning(whatever, my laptop do not suit for large-scale training), the draw back(for me) is mxnet do not provide good c++ api yet, instead it provide rich binding api of python. python is a decent tool to create prototype and a nice environment for research purpose, but it is not an good option to create stand alone binary, which could run on the machine without asking the users to install a bunch of tools(anaconda, virtual machine etc). This is why I choose tiny-cnn to train the binary classifier.

Object classification is a difficult task, there are many variations you need to deal with, like intra-class variation, different view-points, occlusion, background clutter, illumination variation, deformation.

Intra-class variation

Different view point

Background clutter

Variant illumination

Deformation

It is hard to solve all of the challenges at once(however, CS231n claim that cnn could solve all of the problems I mentioned above), instead, we make some assumptions on the object we want to classify(To create a successful image classifier, it is very important to make assumption before you write down single line of code). Following are my assumptions(preconditions) of this binary classifier.

Assumption on the classifer

1 : This classifier only able to classify car and non-car
2 : This classifier assume good lighting conditions
3 : This classifier can deal with different viewpoint of cars
4 : This classifier do not rely on color information
5 : This classifier can deal with intra-class variation

After the assumption has been made, we can start coding. The data set of the cars are come from the stanford AI lab, non cars example are come from caltech101. I randomly pick 6000 cars and 6000 non cars from these data set and do some augmentation to increase the size of the training. I use it to classify 1000 cars and 1000 non cars image(different data from the data set), the best accuracy is 1956/2000(97.8%). Not bad, but still got rooms to improve.

The codes are located at github. I do not intent to explain the details of the codes(I can understand what am I wrote even after several years), but summarize the key points I learned from this tiny classifier.

Tips of training cnn by tiny-cnn

1 : Shuffle your training set, else the accuracy would always be 50%.
2 : Initial weights have big impact on the training results, you may get bad results several times because the initial weights are bad especially when you are using adagrad to as the optimizer, remember to run the training process again if the accuracy is ridiculous low.
3 : Augment your data, cnn is a resource hungry(include cpu,gpu,ram,samples) machine learning algorithms, try out different augmentation scheme(rotation, horizontal/vertical flip, illumination variation, shifting etc) and find out those help you gain better results.
4 : Try with different optimization algorithms and error functions, for this data set, mse and adagrad work best for me.
5 : Try with different batch size and alpha value(learning rate).
6 : Log your results.
7 : Start from shallow network, deeper network do not equal to better results, especially for small data set.