Saturday 7 November 2015

Analyze tic-tac-toe by computer vision tool--contours

  Recently I am taking a course from PyImageSearch, this tutorial is based on it, I want to write down what I have learned from it and re-implement the example by c++, this post omit a lot of details from the original post, if you want to know more, please take the courses.

  Contours is a convenient and powerful tools in computer vision, with it, we can achieve many interesting tasks, like analyze the famous tic-tac-toe(pic00) game. Original tutorial of PyImageSearch do not show us how to detect the X lie in the center, in this example I try to do that with the help of cv::approxPolyDP.


  How could contour help us analyze this picture? Well, we can begin with the properties of contours.

1 : contour area, which equal to the pixel number of the contour
2 : convex hull, smallest possible pixel enclosing contour.More mathematically, it is a minimum set which contain the set X(in our case, it is contour) in euclidean space. Convex hull can be found by greedy algorithm, but that is another topic, let us back to tic-tac-toe
3 : solidity == (contour area) / (convex hull area), this value will <= 1 since the area of contour area will never greater than convex hull area.

  Now we have all the tools, let us party.

Step 1 : Parse command line

std::string parse_command_line(int argc, char **argv)
    using namespace boost::program_options;

        options_description desc{"Options"};
                //set up --help or -h to show help information
                ("help,h", "Help screen")
                //set up --image or -i as a required argument 
                //to store the name of image
                ("image,i", value()->required(), "Image to process");

        //parse the command line and store it in containers
        variables_map vm;
        store(parse_command_line(argc, argv, desc), vm);

        if (vm.count("help")){
            return {};
            return vm["image"].as<std::string>();

        //the magic to make the other required arguments
        //become optional if the users input "help"
    catch (const error &ex)
        std::cerr << ex.what() << '\n';

    return {};

  This part is fairly easy, just give boost program options a try, you will find out that parsing command line options by c++ is a piece of cake. To run this program, open your command prompt and enter command like "tic_tac_toe.exe --image ../tic_tac_toe/tic_tac_toe_small.jpg". Make sure your pc know where to find the dll/so.

Step 2 : Preprocessing

cv::Mat preprocess(cv::Mat &color_img)
    //resize the image to width 480 and keep the aspect ratio
    ocv::resize_aspect_ratio(color_img, color_img, {480, 0});

    //find coutours need a binary image, so we must 
    //transfer the origin img to binary image.
    cv::Mat gray_img;
    cv::cvtColor(color_img, gray_img, CV_BGR2GRAY);

    //this would not copy the value of gray_img but create a new header
    //I declare an alias to make code easier to read
    cv::Mat binary_img = gray_img;
    cv::threshold(gray_img, binary_img, 0, 255, 
    cv::Mat const Kernel = 
           cv::getStructuringElement( cv::MORPH_RECT, {5,5});
    cv::morphologyEx(binary_img, binary_img, cv::MORPH_CLOSE, Kernel);

    return binary_img;

Pretty standard preprocess, resize the image(codes can find at here), binarize it and use morphology to remove some small holes.

pic01--Binary image of tic-tac-toe

Step 3 : Find contour

std::vector contour_vec;
//findContours will change the input image, provide a copy if you
//want to resue original image. In pyimage serach guru,
//it use CV_RETR_EXTERNAL, but this will omit inner
//contour, to detect center X we need to retrieve deeper contour
                 contour_vec, CV_RETR_CCOMP,

  Find the contour of , to get the center contour of X, we have to declare it as CV_RETR_CCOMP.

Step 4 : Show the properties of contours

Contour get_approx_poly(Contour const &contour)
    Contour poly_contour;
    double const Epsillon = cv::arcLength(contour, true) * 0.02;
    bool const Close = true;
    cv::approxPolyDP(contour, poly_contour, Epsillon, Close);

    return poly_contour;

void print_contour_properties(Contour const &contour)
    double const ContourArea = cv::contourArea(contour);

    Contour convex_hull;
    cv::convexHull(contour, convex_hull);
    double const ConvexHullArea = cv::contourArea(convex_hull);

    auto const BoundingRect = cv::boundingRect(contour);

    Contour const Poly = get_approx_poly(contour);    

    std::cout<<"Contour area : "<<ContourArea<<std::endl;
    std::cout<<"Aspect ratio : "<<
               (BoundingRect.width / static_cast(BoundingRect.height))
    std::cout<<"Extend : "<<ContourArea/BoundingRect.area()<<std::endl;
    std::cout<<"Solidity : "<<ContourArea/ConvexHullArea<<std::endl;
    std::cout<<"Poly size : "<<Poly .size()<<", is convex "

void show_contours_properties(cv::Mat const &input,
                              std::vector<Contour> const &contour_vec)
    cv::Mat input_cpy;
    for(size_t i = 0; i != contour_vec.size(); ++i){
        cv::Scalar const Color{255};
        int const ThickNess = 3;
        cv::drawContours(input_cpy, contour_vec,
                         static_cast<int>(i), Color, ThickNess);

        auto const WindowName = "contour " + std::to_string(i);
        cv::imshow(WindowName, input_cpy);
        //without this line, previous contour image will not be closed
        //before the program end

  This step will analyze the properties of contours, print them out on the command prompt with the detected contour image(pic02).

pic02--Properties of contour

Step 5 : Recognize the type of contour

ContourType recognize_countour_type(Contour const &contour)
    double const ContourArea = cv::contourArea(contour);

    Contour const Poly = get_approx_poly(contour);    

    ContourType type = ContourType::unknown;
    if((Poly .size() > 7 && Poly .size() < 10) && ContourArea > 1000){
        type = ContourType::o_type;
    }else if(Poly .size() >= 10 && ContourArea < 10000){
        type = ContourType::x_type;

    return type;

  According to the properties shown by print_contour_properties, we could recognize the X and O by the size of approximate polygon and contour area.

Step 6 : Draw X and O on the image

void write_x_and_o(cv::Mat const &input,
                       std::vector<Contour> const &contour_vec)
    cv::Mat input_cpy = input.clone();
    std::string const TypeName[] = {"X", "O"};

    for(size_t i = 0; i != contour_vec.size(); ++i){
        ContourType type = recognize_countour_type(contour_vec[i]);

        if(type != ContourType::unknown){
            cv::Scalar const Color{255};
            int const ThickNess = 3;
            cv::drawContours(input_cpy, contour_vec, static_cast(i),
                             Color, ThickNess);
            auto point = cv::boundingRect(contour_vec[i]).tl();
            point.y -= 10;
            double const Scale = 1;
            cv::putText(input_cpy, TypeName[static_cast(type)],
                        point, cv::FONT_HERSHEY_COMPLEX, Scale, Color,
    cv::imshow("result", input_cpy);

  The last step is fairy easy, draw out X and O on the original image and verify the result(pic03).


  The source codes could be found at github

As PyimageSearch mentioned, there are another solutions, like machine learning, if I haven't read the tutorial, I may jump to the route of picking up machine learning from my tool box(ex : give dlib a try). Same as programming, pick easier solution first, keep things simple and stupid unless you cannot avoid more complicated solutions. This is one of the reason why all of my posts avoid old-style c or c with classes, because they are too verbose, easier to commit errors, harder to debug, maintain, low level codes do not mean the compiler will generate faster and smaller binary(In contrast, they could be slower and buggier) .


Wednesday 4 November 2015

Deep learning 05--Finetune deep network

  Rather than build a machine learning libraries by myself, prefer to improve an already exist, well written machine learning libraries looks like a more reasonable solution for me. Recently I begin to contribute some codes to mlpack, a fast, modular and scalable c++ machine learning library.

  I implement a fine tune algorithms of deep network on top of mlpack and try to open a pull request, but I guess it will take a long time before it could be merged, so I clone the whole project and do my customization on it, the implementation details of the fne tune class is place at github.

   I learn a lot from contribute to open source community, the experiences like code reviews are precious and hardly happen in my daily jobs. In Asia, many software companies do not care about the quality of codes, they over underestimate how important the quality of codes could affect the maintenance fees, always rush for "fast coding" and generate extremely crappy codes(most of the programmers/leaders/managers cannot understand their own codes within a few months, this suck). Almost no one has a will to keep studying after graduate, that is why so many of them hardly improve even they have 20 years experiences.

  If you are same as me, cannot find a good company really care about codes quality yet, give open source a try, you should have a higher chance to gain real world experiences of "what is a good software project looks like" rather than just study the "story" or "theorem" from the books. mlpack is not perfect, but the quality surpass any legacy codes I maintained for.

Tuesday 15 September 2015

Deep learning 04--Compile mlpack-1.0.12 on windows 8.1 by visual studio 2015(64bits)

    As far as I know, most of the machine learning libraries of c++ are difficult to compile on windows, mlpack is one of them too(this lib implement sparse autoencoder and sparse coding, I would like to contribute something to this library in the future).If you want to do large scale machine learning, windows really is not a good platform for c++ since many libraries are hard to build or cannot get maximum performance on windows. However, your apps may need to run on windows since it is the most popular desktop OS.

     After a tedious journey of making mlpack work on windows, I want to write down the steps of how to compile mlpack, so I will never forget it.

   The steps to compile mlpack-1.0.12 are:
    1 : visual studio 2015 community--this version fixed a bug of vc, this bug will bring some trouble when compile armadillo(there are work around, like replace () by [] and use pointer to access data)
    • Visual studio 2015 would not install the c++ compiler by default, you need to select the custom install and select c++ by yourself
    • After you install vs2015, execute following command on command window,[
      copy "C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\bin\mspdbsrv.exe" 
      "C:\Program Files (x86)\Microsoft Visual Studio 14.0\Common7\IDE"
      ], this could fix a link issues( link.exe complains that MSPDB140.dll has the wrong version installed)
    2 : libxml2-2.9.2--deprecated, you can compile mlpack without it start from 2.x 
    • extract source codes(ex : c:/libxml2-2.9.2)
    • go to the folder c:/libxml2-2.9.2 and copy the to
    • go to the folder c:/libxml2-2.9.2/win32 
    • open command prompt and enter cscript configure.js compiler=msvc iconv=no zlib=no debug=no
    • enter command "C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\vcvarsall.bat" x86_amd64
    • enter command nmake /f Makefile.msvc install
    3 : zlib-1.2.8--deprecated, you can compile mlpack without it start from 2.x 
    • extract source codes(ex : c:/zlib-1.2.8)
    • go to the folder c:/zlib-1.2.8
    • enter command "C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\vcvarsall.bat" x86_amd64
    • enter command  nmake -f win32/Makefile.msc AS=ml64 LOC="-DASMV -DASMINF -I." OBJA="inffasx64.obj gvmat64.obj inffas8664.obj"
    4 : libiconv-1.14-deprecated, you can compile mlpack without it start from 2.x 

        Download the zip file from source forge, open visual studio 2015 and compile, you will need an account of source forge to download this file

    5 : cmake3.3.2 or newer(start from 3.3.2, cmake support CMAKE_WINDOWS_EXPORT_ALL_SYMBOLS)

    6 : install mingw-w64(I use mingw-w64 5.1.0 in this post)

    7 : lapack3.5.0--I have heard that openBLAS or intel mkl are faster than the blas come with lapack, but in this post I will use the blas come with lapack3.5.0
    • extract source codes(ex : c:/lapack3.5.0)
    • go to the folder c:/lapack3.5.0
    • add commands in CMakeLists.txt
    • open the CMakeLists.txt by cmake-gui
    • setup the native compilers of c, c++ and fortran as "x86_64-w64-mingw32-gcc.exe", "x86_64-w64-mingw32-g++.exe", "x86_64-w64-mingw32-gfortran.exe"
    • Disable BUILD_STATIC_LIBS and enable BUILD_SHARED_LIBS under the lable BUILD
    • Set the value(under label Ungrouped Entries)  VCVARSAMD64 as "C:/Program Files (x86)/Microsoft Visual Studio 14.0/VC/bin/x86_amd64/vcvarsx86_amd64.bat"
    • Set the value(under label CMake) CMake_GNUtoMS_VCVARS as "C:/Program Files (x86)/Microsoft Visual Studio 14.0/VC/bin/x86_amd64/vcvarsx86_amd64.bat"
    • Click Configure until all white
    • Click generate
    • Open the vcproject files and build
    7.1 : build openBLAS
    • BLAS is good, but openBLAS is much more faster than BLAS, it is almost three times faster on my laptop(Y410P)
    • Download msys
    • Clone openBLAS(git clone git://
    • Setup the environment path of MSYS(ex : C:\msys)
    • Open command prompt
    • Go to the folder of openBLAS(ex : C:\OpenBLAS)
    • Type mingw32-make
    • You will find the .a and .dll under the folder C:\OpenBLAS
    8 : armadillo-5.600.2
    • extract source codes(ex : c:/armadillo-5.600.2)
    • go to the folder c:/aramadillo-5.600.2
    • open CMakeLists.txt by nodepad and add three lines set(CMAKE_WINDOWS_EXPORT_ALL_SYMBOLS ON)
    • open the CMakeLists.txt by cmake-gui
    • Set the value(under label Ungrouped Entries)  BLAS_LIBRARY(ex : "C:/Users/yyyy/Qt/3rdLibs/lapack/lapack-3.5.0/bin/vc2015_x86_amd64/release/libopenblas.dll.a")
    • Set the value(under label Ungrouped Entries)  LAPACK_LIBRARY(ex : "C:/Users/yyyy/Qt/3rdLibs/lapack/lapack-3.5.0/bin/vc2015_x86_amd64/release/liblapack.lib")
    • Click Configure until all white
    • Click generate
    • Open the vcproject files and build 
    9 : boost_1_59_0-msvc-14.0-64
    • Just download and unzip, the community already build it for us
    10 : mlpack-1.0.12
    • extract source codes(ex : c:/mlpack-1.0.12)
    • go to the folder c:/mlpack-1.0.12 
    • Specify the path of the libraries, dll and setup some definition, the details can found at here(start from line 66~87)
    • Click Configure until all white
    • Click generate
    • Open the vcproject files and build 
    • If there are link error, specify the path of openblas, lapack, libxml2 by cmake-gui(under label Ungrouped Entries, I do not know why the set command can not work yet), configure and generate again
         Ok, after so much trouble, the mlpack finally work.I will use it to solve one of the exercise of UFLDL later on.

    Sunday 13 September 2015

    Deep learning 03--Self-Taught Learning and Unsupervised Feature Learning with Shark

        Shark is a fast, modular, feature-rich open-source C++ machine learning library. Is it? At least it is not fast at all without the support of Atlas. Without Atlas, I took almost 2.5 hours to train about 29000 samples of MNIST on windows 8 64bits using sparse autoencoder with 200 iterations. So why not I compile and link to Atlas? The bad news is, Atlas is difficult to build under windows and not well optimized for windows 64bits.My conclusion is, if you really want to use shark to do some serious training, change your OS.I will use ubuntu or other OS if I want to continue to use Shark, without Atlas the performance is unable to accept.The other lesson I learn from Shark is, if you want your libraries/apps portable, never ever develop your libraries/apps on top of those hard to compile libraries.

        Even the speed of Shark is quite slow without Atlas, it is still a modular, feature-rich machine learning library, I am quite impress about how good it split and combine different piece of concepts together. With the good design architecture of Shark, you can try out different results of algorithms with a few lines of codes, they are easy to read and elegant.

        I use Shark to solve one of the exercise of UFLDL , a pleasant experience except of the speed(without Atlas). I use 6 different Autoencoder to train the features than feed into random forest to classify the hand written digits from 5~9. The beauty of Shark is, I only need to do some changes to try out these 5(at first it is 8, but three of them are too damn slow or buggy) algorithms.

        First, you need to define the type of the autoencoder.  
    using Autoencoder1 =
    shark::Autoencoder<shark::LogisticNeuron, shark::LogisticNeuron>;
    using Autoencoder2 =
    shark::TiedAutoencoder<shark::LogisticNeuron, shark::LogisticNeuron>;
    using Autoencoder3 =
    using Autoencoder4 =

    2 :  the sparse autoencoder and autoencoder use different cost functions, so I need to change the cost functions from ErrorFunction to SparseAutoEncoderError.

    3 :  sparse autoencoder cannot get good results with IRpropPlusFull, I need to use LBFGS to replace it.

    4 : The last thing is, the initial bias value of sparse autoencoder should be zero.

        Combine 2,3,4,  I decided to put them into different functions.

    template<typename Optimizer, typename Error, typename Model>
    std::string optimize_params(std::string const &encoder_name,
                                size_t iterate,
                                Optimizer *optimizer,
                                Error *error,
                                Model *model)
        using namespace shark;
        Timer timer;
        std::ostringstream str;
        for (size_t i = 0; i != iterate; ++i) {
            str<<i<<" Error: "<<optimizer->solution().value <<"\n";
        str<<"Elapsed time: " <<timer.stop()<<"\n";
        str<<"Function evaluations: "<<error->evaluationCounter()<<"\n";
        exportFiltersToPGMGrid(encoder_name, model->encoderMatrix(), 28, 28);
        std::ofstream out(encoder_name);
        boost::archive::polymorphic_text_oarchive oa(out);
        return str.str();
    template<typename Model>
    std::string train_autoencoder(std::vector<shark::RealVector> const &unlabel_data,
                                  std::string const &encoder_name,
                                  Model *model)
        using namespace shark;
        model->setStructure(unlabel_data[0].size(), 200);   
        initRandomUniform(*model, -0.1*std::sqrt(1.0/unlabel_data[0].size()),
        SquaredLoss<RealVector> loss;
        UnlabeledData<RealVector> const Samples = createDataFromRange(unlabel_data);
        RegressionDataset data(Samples, Samples);
        ErrorFunction error(data, model, &loss);
        // Add weight regularization
        const double lambda = 0.01; // Weight decay paramater
        TwoNormRegularizer regularizer(error.numberOfVariables());
        error.setRegularizer(lambda, &regularizer);
        //output some info of model, like number of params, input size etc
        IRpropPlusFull optimizer;
        return optimize_params(encoder_name, 200, &optimizer, &error, model);
    template<typename Model>
    std::string train_sparse_autoencoder(std::vector<shark::RealVector> const &unlabel_data,
                                         std::string const &encoder_name,
                                         Model *model)
        using namespace shark;
        model->setStructure(unlabel_data[0].size(), 200);    
        if(std::is_same<Model, Autoencoder2>::value ||
                std::is_same<Model, Autoencoder4>::value){            
            initRandomUniform(*model, -0.1*std::sqrt(1.0/unlabel_data[0].size()),
        SquaredLoss<RealVector> loss;
        UnlabeledData<RealVector> const Samples = createDataFromRange(unlabel_data);
        RegressionDataset data(Samples, Samples);
        const double Rho = 0.01; // Sparsity parameter
        const double Beta = 6.0; // Regularization parameter
        SparseAutoencoderError error(data, model, &loss, Rho, Beta);
        // Add weight regularization
        const double lambda = 0.01; // Weight decay paramater
        TwoNormRegularizer regularizer(error.numberOfVariables());
        error.setRegularizer(lambda, &regularizer);
        //output some info of model, like number of params, input size etc
        LBFGS optimizer;
        optimizer.lineSearch().lineSearchType() = LineSearch::WolfeCubic;
        return optimize_params(encoder_name, 400, &optimizer, &error, model);

        After I have the train function, the training process become quite easy to write.

    void autoencoder_prediction(std::vector<shark::RealVector> const &train_data,
                                std::vector<unsigned int> const &train_label)
            Autoencoder1 model;
            train_autoencoder(train_data, "ls_ls.txt", &model);
            prediction("ls_ls.txt", "ls_ls_rtree.txt", train_data,
                       train_label, &model);
            Autoencoder2 model;
            train_autoencoder(train_data, "tied_ls_ls.txt", &model);
            prediction("tied_ls_ls.txt", "tied_ls_ls_rtree.txt", train_data,
                       train_label, &model);
        //Autoencoder3 has bug, the prediction will stuck and cannot complete
        //Do not know it is cause by Shark3.0 beta or my fault
            Autoencoder3 model;
            train_autoencoder(train_data, "dropls_ls.txt", &model);
            prediction("dropls_ls.txt", "dropls_ls_rtree.txt", train_data,
                       train_label, &model);
            Autoencoder4 model;
            train_autoencoder(train_data, "tied_dropls_ls.txt", &model);
            prediction("tied_dropls_ls.txt", "tied_dropls_ls_rtree.txt", train_data,
                       train_label, &model);

        All of the algorithms use same function to train and predict, I only need to change the type of the autocoder and file names(the files will save the result of training).

        The part of prediction is almost same as the example of Shark, the different part is this example train and test on the data of MNIST. The codes are locate at github.

        Results of autoencoder

    Autoencoder1: iterate 200 times
    Random Forest on training set accuracy: 1
    Random Forest on test set accuracy: 0.978194

    Autoencoder2: iterate 200 times
    Random Forest on training set accuracy: 1
    Random Forest on test set accuracy: 0.9784
    Autoencoder4:iterate 200 times
    Random Forest on training set accuracy: 1
    Random Forest on test set accuracy: 0.918741

        Results of sparse autoencoder
    Autoencoder1: iterate 400 times
    Random Forest on training set accuracy: 1
    Random Forest on test set accuracy: 0.920798

    Autoencoder2: iterate 200 times
    Random Forest on training set accuracy: 1
    Random Forest on test set accuracy: 0.95721

        Visualization of the train results of autoencoder



        Visualization of the train results of sparse autoencoder

         Okay, Shark is slow on windows, it is almost impossible to expect Shark can manage real time image recognition task, is it possible to develop real time image recognition application with deep learning algo without rewrite?Maybe caffe can save my ass, according to stackoverflow, CNN perform better than deep belief network if you are dealing with computer vision tasks. What if I really need a fast and stable autoencoder on windows?Maybe mlpack could be an option(no guarantee it could be build on windows). If I only want to do some research, using R or python maybe is a better solution, since they are easier to install and provide good performance, but I need to use c++ to create real world products, that is why a decent machine learning library is crucial for me.

    Wednesday 9 September 2015

    Deep learning 02--deep learning and sparse autoencoder

       Extract meaningful features from interesting object(ex : cat, dog, smoke, car, human face, bird and so on) is crucial, because it can dominate the prediction results. But as you can see, find out good features from the images may not an easy task and always require expertise knowledge, you may need to study tons of papers before you can decided which kind of features you should feed into the machine learning algorithms.Deep learning give use another option, rather than extract the features by human, deep learning let the computer figure out which features to use.

        The idea of deep learning could be explain by graph_00

        To tell you the truth, the first time I know there are some algorithms could extract features for us, I am very exciting about it. Not to mention there are many research results already proved that deep learning is a very powerful beast in the field of image recognition, in order to leverage the power of deep learning, I begin to study the tutorial of UFLDL before I use shark(shark is quite easy to compile on unix, linux, mac and windows, but could be very slow without atlas) and caffe(I do not know this one is easier to compile or not when I write this post, but the performance of caffe is awesome) to help me finish the tasks, because this could help me gain more sense about what are those algorithms doing about.

        The first deep learning algorithm introduced by UFLDL is sparse autoencoder, the implementation details of sparse autoencoder is quite daunting, even it may be the most easiest algorithm to understand in the deep learning algorithms.

        At the first glance, sparse autoencoder looks quite similar to the traditional neural network, the different part are

    1. It is train layer by layer
    2. The dimension of the hidden layers usually less than the input layers
    3. Sparse autoencoder add sparsity constraint on the hidden unit(autoencoder do not have this constraint)
    4. The purpose of sparse autoencoder is  find out how to use less features to reconstruct the input
       Autoencoder(or sparse autoencoder) is combine with three layers(graph_01), there are input layer, hidden layer and output layer.

        The output of the hidden layers are the features we want to feed into the other machine learning algorithms(ex : svm, random forest, softmax regression and so on) for classification tasks. Sometimes we also say the output of the hidden layers are "encoder", the output of the output layer are "decoder", because the hidden layer will extract the features from input layer, the output layer will reconstruct the input by the output of the hidden layer.

        The meaning of train it layer by layer is that we can use the output of the hidden layer as the next input of the autoencoder layer(this autoencoder also contain three layers). By this way we can build efficient deep network.

        The implementation details of sparse autoencoder of mine can be found at here, the example and the images file. graph_02 show the results train by my implementation. Since the mini-batch do not work well in this case, I need to use the full batch size to do the gradient descent, without algorithm like L-BFGS, the speed could be quite slow. In the next post, I will use shark to recognize the hand written digts from MNIST.


        Atlas is difficult to build on windows and is not well optimized for Windows 64 bit, if you want to get maximum speed from shark, better train the data under unix/linux.

    Sunday 30 August 2015

    Deep learning 01--opencv, eigen and softmax regression, part 2--implementation details of softmax

        This post will record some implementation details of softmax regression, I will map the equation introduced by UFLDL(all of the images are come from here) and source codes on this post.

        First of all, to finish the classification tasks, we need a train function to train the data.

     * @brief Train the input data by softmax algorithm
     * @param train Training data, input contains one\n
     *  training example per column
     * @param labels The label of each training example
    template<typename T>
    void softmax<T>::train(const Eigen::Ref<const EigenMat> &train,
                           const std::vector<int> &labels)
        //#1 generate unique labels, because we need the
        //NumClass and generate the ground truth table
        auto const UniqueLabels = get_unique_labels(labels);
        auto const NumClass = UniqueLabels.size();
        //#2 initialize weight and gradient
        weight_ = EigenMat::Random(NumClass, train.rows());
        grad_ = EigenMat::Zero(NumClass, train.rows());
        //#3 initialize ground truth
        auto const TrainCols = static_cast<int>(train.cols());
        EigenMat const GroundTruth = get_ground_truth(NumClass, TrainCols,
        //#4 create the random generator for mini-batch algorithm
        std::random_device rd;
        std::default_random_engine re(rd());
        int const Batch = (get_batch_size(TrainCols));
        int const RandomSize = TrainCols != Batch ?
                    TrainCols - Batch - 1 : 0;
                uni_int(0, RandomSize);
        for(size_t i = 0; i != params_.max_iter_; ++i){
            auto const Cols = uni_int(re);
            auto const &TrainBlock =
                    train.block(0, Cols, train.rows(), Batch);
            auto const &GTBlock =
                    GroundTruth.block(0, Cols, NumClass, Batch);
            //#5 compute the cost of the cost function
            auto const Cost = compute_cost(TrainBlock, weight_, GTBlock);
            //#6 break the loop if meet the criteria
            if(std::abs(params_.cost_ - Cost) < params_.epsillon_ ||
                    Cost < 0){
            params_.cost_ = Cost;
            //#7 compute gradient
            compute_gradient(TrainBlock, GTBlock);
            //#8 update weight
            weight_.array() -= grad_.array() * params_.lrate_;

        The most complicated part is #5 and #7, other part is trivial. To make #5 work, I need to finish the cost function(graph_00) and gradient descent(graph_01).

        We can notice the hypothesis part(rounded by red suqare) may introduce very large value, this may cause the problem of overflow, to prevent this, we can do some preprocessing on it. The solution of UFLDL is subtract the largest large constant value from each of the \theta_j^T x^{(i)} terms before computing the exponential, this is trivial to be done in Eigen.

    template<typename T>
    void softmax<T>::compute_hypothesis(Eigen::Ref<const EigenMat> const &train,
                                        Eigen::Ref<const EigenMat> const &weight)
        hypothesis_.noalias() = weight * train;
        max_exp_power_ = hypothesis_.colwise().maxCoeff();
        for(size_t i = 0; i != hypothesis_.cols(); ++i){
            hypothesis_.col(i).array() -= max_exp_power_(0, i);
        hypothesis_ = hypothesis_.array().exp();
        weight_sum_ = hypothesis_.array().colwise().sum();
        for(size_t i = 0; i != hypothesis_.cols(); ++i){
            if(weight_sum_(0, i) != T(0)){
                hypothesis_.col(i) /= weight_sum_(0, i);
        //prevent feeding 0 to log function
        hypothesis_ = (hypothesis_.array() != 0 ).
                select(hypothesis_, T(0.1));

        After I have the hypothesis matrix, I can compute the cost and the gradient at ease.

    template<typename T>
    double softmax<T>::compute_cost(const Eigen::Ref<const EigenMat> &train,
                                    const Eigen::Ref<const EigenMat> &weight,
                                    const Eigen::Ref<const EigenMat> &ground_truth)
        compute_hypothesis(train, weight);
        double const NSamples = static_cast<double>(train.cols());
        return  -1.0 * (hypothesis_.array().log() *
                        ground_truth.array()).sum() / NSamples +
                weight.array().pow(2.0).sum() * params_.lambda_ / 2.0;
    template<typename T>
    void softmax<T>::compute_gradient(Eigen::Ref<const EigenMat> const &train,
                                      Eigen::Ref<const EigenMat> const &weight,
                                      Eigen::Ref<const EigenMat> const &ground_truth)
        grad_.noalias() =
                (ground_truth.array() - hypothesis_.array())
                .matrix() * train.transpose();
        auto const NSamples = static_cast<double>(train.cols());
        grad_.array() = grad_.array() / -NSamples +
                params_.lambda_ * weight.array();

       The test results could see on this post.

    Thursday 27 August 2015

    Deep learning 00--opencv, eigen and softmax regression, part 1--use softmax to clasisify data

        Recently I am studying deep learning algorithms and try to implement some of them, there are many deep learning framework of c++ out there(ex : caffe), why do I try to build them by myself?the reasons push me to implement they are

    1. This is a good way to study the algorithm
    2. Not all of the deep learning library developed by c++ community are easy to build, more precisely, it is a pain to build them on some major platform(ex : windows), c and c++ community do not have a standard build system really is a big problem.

        The first algorithm I am trying to build is softmax regression based on the tutorial of UFLDL and softmax regression with opencv(I implement it with Eigen and opencv rather than matlab build in function).In this post I want to record how do I implement the softmax regression and show some results. I pick Eigen to help me implement the algorithms because it is

    1. Portable and very easy to compile
    2. Api are clean, easy to use and expressive
    3. Performance is quite good
    4. Well maintain, nice document 
    5. Expression template rock 
        Softmax regression is a more general logistic regression, that is, logistic regression can classify two lables only, but sofmax regression can classify more than two labels. If this is the first time you try to implement an algorithm which involve a lot of matrix manipulation, you may overwhelm with the math equations, wondering how to write fast, clean vectorization codes. Following are some steps help me to develop the algorithms I want to share with you, take it as some reference materials but not golden rules.

    1. Find a good matrix library, like Eigen or Armadillo
    2. Study the algorithm carefully, make sure you understand every steps of it
    3. Write down the matrix operations on a white paper, check the dimensions, the operations results of each step is reasonable or not, if you can not persuade yourself this result is meaningful, go back to step 2, do not bet on luck
    4. Implement the algorithms
    5. Run the gradient checking algorithms to check the result, if error, go back to step 2 or step 3
    6.  Run on clean examples like MNIST, those examples already do preprocess for you
    7. If the result are poor, try to tune the parameters or go back to step 2
        Ok, enough of talk, just show you some codes.

    using namespace ocv::ml;
    using EMat = ocv::ml::softmax<>::EigenMat;
    EMat read_data(std::string const &file)
        std::ifstream in(file);
        std::cout<<"read file\n";
            std::cout<<"is open\n";
            EMat output(30, 284);
            for(size_t col = 0; col != output.cols(); ++col){
                for(size_t row = 0; row != output.rows(); ++row){
                    in>>output(row, col);
            return output;
            std::cout<<"cannot open file : "<<file<<"\n";
        return softmax<>::EigenMat();
    std::vector<int> read_label(std::string const &file)
        std::ifstream in(file);
        std::vector<double> output;
        //not the most efficient way, but easier to write
        return std::vector<int>(output.begin(),
    void softmax_test()
        ocv::ml::softmax<> sm;
        auto const TestData =
        auto const TestLabel =
        double correct = 0;
        for(size_t i = 0; i != TestLabel.size(); ++i){
            auto const Result =
                    sm.predict(TestData.block(0, i, TestData.rows(),
            if(Result == TestLabel[i]){
        std::cout<<"true positive pro : "<<

       This class use mini-batch to train the data, the results should within 89%~94%.The example use the ocv_libs of v1.1, the test data is located at here(I download from eric yuan).  The test example(softmax_test) is located at here(v1.0).

        Next post of deep learning will talk about the implementation details of this softmax class.