tag:blogger.com,1999:blog-47022303430975366102024-03-08T01:11:26.393-08:00Qt and openCVThamBloghttp://www.blogger.com/profile/12617350461752184349noreply@blogger.comBlogger71125tag:blogger.com,1999:blog-4702230343097536610.post-21477149083753884812021-03-13T03:08:00.003-08:002021-03-13T10:05:55.627-08:00Enter non-ascii charaters into the app created by Qt for webassembly<p> With the help of wasm, we are able to port the apps written by Qt to the browser, this save us a lot of times in many cases, but same as android/ios, wasm of Qt do have their limitations and the Qt company do not have much interest to fix those issues, I guess this is because major market of Qt are automobile, IOT etc, but not mobile phone or browser. </p><p> In this post I would like to write down how do I get around the issue <a href="https://bugreports.qt.io/browse/QTBUG-78826">78826</a>--Cannot enter non ascii characters to the TextField and QLineEdit from browser(wasm). <br /></p><p><br /></p><h1 style="text-align: center;"><u><b>Limitations</b></u></h1><p><br /></p><h2 style="text-align: left;"><u><b>1.Cannot access system font</b></u></h2><div style="text-align: left;"><u><b><br /></b></u></div><div style="text-align: left;"> Download the font you want to display/enter first, unlike desktop/mobile, Qt for wasm cannot access system font. Today I would use the font download from <a href="https://github.com/wordshub/free-font">oppo</a> as an example. After you download the ttf, compress it by qCompress in order to reduce the size.</div><div style="text-align: left;"><br /><b></b><u><b></b></u></div><div style="text-align: left;"><br /><b></b><u><b></b></u></div><div style="text-align: left;"><br /></div>
<!--HTML generated using hilite.me--><div style="background: rgb(255, 255, 255) none repeat scroll 0% 0%; border-color: gray; border-image: none 100% / 1 / 0 stretch; border-style: solid; border-width: 0.1em 0.1em 0.1em 0.8em; border: medium solid gray; overflow: auto; padding: 0.2em 0.6em; width: auto;"><pre style="line-height: 125%; margin: 0px;">QFile <span style="color: #0066bb; font-weight: bold;">file</span>(<span style="background-color: #fff0f0;">"OPPOSans-H.ttf"</span>);
<span style="color: #008800; font-weight: bold;">if</span>(file.open(QIODevice<span style="color: #333333;">::</span>ReadOnly)){
<span style="color: #008800; font-weight: bold;">auto</span> <span style="color: #008800; font-weight: bold;">const</span> contents <span style="color: #333333;">=</span> qCompress(file.readAll(), <span style="color: #0000dd; font-weight: bold;">9</span>);
QFile <span style="color: #0066bb; font-weight: bold;">fwrite</span>(<span style="background-color: #fff0f0;">"OPPOSans-H.zip"</span>);
<span style="color: #008800; font-weight: bold;">if</span>(fwrite.open(QIODevice<span style="color: #333333;">::</span>WriteOnly)){
fwrite.write(contents);
}<span style="color: #008800; font-weight: bold;">else</span>{
qDebug()<span style="color: #333333;"><<</span>__func__<span style="color: #333333;"><<</span><span style="background-color: #fff0f0;">": cannot write file"</span>;
}
}<span style="color: #008800; font-weight: bold;">else</span>{
qDebug()<span style="color: #333333;"><<</span>__func__<span style="color: #333333;"><<</span><span style="background-color: #fff0f0;">": cannot open file"</span>;
}
</pre></div><p>
<br /> After that, add the zip file into the Qt resource system, load it and set it as the font of the QPlainTextEdit.</p><p><br /></p>
<!--HTML generated using hilite.me--><div style="background: rgb(255, 255, 255) none repeat scroll 0% 0%; border-color: gray; border-image: none 100% / 1 / 0 stretch; border-style: solid; border-width: 0.1em 0.1em 0.1em 0.8em; border: medium solid gray; overflow: auto; padding: 0.2em 0.6em; width: auto;"><pre style="line-height: 125%; margin: 0px;"><span style="color: #888888;">//MainWindow.cpp, set font for QPlainTextEdit</span>
QFont <span style="color: #008800; font-weight: bold;">const</span> <span style="color: #0066bb; font-weight: bold;">monospace</span>(font_manager().get_font_family());
ui<span style="color: #333333;">-></span>plainTextEdit<span style="color: #333333;">-></span>setFont(monospace);
<span style="color: #888888;">//font_manager.cpp, a class use to register the font and access the font id</span>
font_manager<span style="color: #333333;">::</span>font_manager()
{
QFile ifile(<span style="background-color: #fff0f0;">":/assets/OPPOSans-H.zip"</span>);
<span style="color: #008800; font-weight: bold;">if</span>(ifile.open(QIODevice<span style="color: #333333;">::</span>ReadOnly)){
<span style="color: #008800; font-weight: bold;">auto</span> <span style="color: #008800; font-weight: bold;">const</span> font_id <span style="color: #333333;">=</span> QFontDatabase<span style="color: #333333;">::</span>addApplicationFontFromData(qUncompress(ifile.readAll()));
qDebug()<span style="color: #333333;"><<</span>__func__<span style="color: #333333;"><<</span><span style="background-color: #fff0f0;">"font id:"</span><span style="color: #333333;"><<</span>font_id;
font_family_ <span style="color: #333333;">=</span> QFontDatabase<span style="color: #333333;">::</span>applicationFontFamilies(font_id).at(<span style="color: #0000dd; font-weight: bold;">0</span>);
}
}
QFont font_manager<span style="color: #333333;">::</span>get_font() <span style="color: #008800; font-weight: bold;">const</span>
{
<span style="color: #008800; font-weight: bold;">return</span> QFont(get_font_family());
}
QString font_manager<span style="color: #333333;">::</span>get_font_family() <span style="color: #008800; font-weight: bold;">const</span> noexcept
{
<span style="color: #008800; font-weight: bold;">return</span> font_family_;
}
</pre></div><p>
<br /></p><h2 style="text-align: left;"><u><b>2. Cannot enter Chinese from QPlainTextEdit</b></u></h2><p> This issue is quite annoying, but we do have a "stupid" solution to overcome this problem. The way I found is use the prompt dialog of js library to input the text. The library I pick called <a href="http://bootboxjs.com/examples.html#bb-prompt">bootbox</a>. You can get around this limit by following steps</p><h3 style="text-align: left;"><u><b>a. Use js to enter the text</b></u></h3><p><br /></p>
<!--HTML generated using hilite.me--><div style="background: rgb(255, 255, 255) none repeat scroll 0% 0%; border-color: gray; border-image: none 100% / 1 / 0 stretch; border-style: solid; border-width: 0.1em 0.1em 0.1em 0.8em; border: medium solid gray; overflow: auto; padding: 0.2em 0.6em; width: auto;"><pre style="line-height: 125%; margin: 0px;"><span style="color: #008800; font-weight: bold;">var</span> _global_text_process_result <span style="color: #333333;">=</span> <span style="background-color: #fff0f0;">''</span>; </pre><pre style="line-height: 125%; margin: 0px;">//You need to allocate memory before return the array to the c++ side </pre><pre style="line-height: 125%; margin: 0px;"><span style="color: #008800; font-weight: bold;">function</span> getStringFromJS(targetStr, msg)
{
console.log(msg <span style="color: #333333;">+</span> <span style="background-color: #fff0f0;">":"</span> <span style="color: #333333;">+</span> targetStr);
<span style="color: #008800; font-weight: bold;">var</span> jsString <span style="color: #333333;">=</span> targetStr;
<span style="color: #008800; font-weight: bold;">var</span> lengthBytes <span style="color: #333333;">=</span> lengthBytesUTF8(jsString)<span style="color: #333333;">+</span><span style="color: #0000dd; font-weight: bold;">1</span>;
<span style="color: #008800; font-weight: bold;">var</span> stringOnWasmHeap <span style="color: #333333;">=</span> _malloc(lengthBytes);
stringToUTF8(jsString, stringOnWasmHeap, lengthBytes);
console.log(msg <span style="color: #333333;">+</span> <span style="background-color: #fff0f0;">" heap:"</span> <span style="color: #333333;">+</span> stringOnWasmHeap);
<span style="color: #008800; font-weight: bold;">return</span> stringOnWasmHeap;
}</pre><pre style="line-height: 125%; margin: 0px;"> </pre><pre style="line-height: 125%; margin: 0px;">//call the prompt dialog of bootbox</pre><pre style="line-height: 125%; margin: 0px;"><span style="color: #008800; font-weight: bold;">function</span> multiLinesPrompt(inputString, inputMode, inputTitle)
{
bootbox.prompt({
title<span style="color: #333333;">:</span> inputTitle,
inputType<span style="color: #333333;">:</span> <span style="background-color: #fff0f0;">"textarea"</span>,
value<span style="color: #333333;">:</span> inputString,
callback<span style="color: #333333;">:</span> <span style="color: #008800; font-weight: bold;">function</span> (result) { console.log(result); _global_text_process_result <span style="color: #333333;">=</span> result; }
});
} </pre><pre style="line-height: 125%; margin: 0px;"><span style="color: #008800; font-weight: bold;"> </span></pre><pre style="line-height: 125%; margin: 0px;"><span style="color: #008800; font-weight: bold;">//This function use to avoid the text update repeatly</span></pre><pre style="line-height: 125%; margin: 0px;"><span style="color: #008800; font-weight: bold;">function</span> globalTextProcessResultIsValid()
{
<span style="color: #008800; font-weight: bold;">return</span> _global_text_process_result <span style="color: #333333;">!==</span> <span style="background-color: #fff0f0;">""</span>
}
<span style="color: #008800; font-weight: bold;">function</span> getGlobalTextProcessResult()
{
results <span style="color: #333333;">=</span> getStringFromJS(_global_text_process_result, <span style="background-color: #fff0f0;">"js getGlobalTextProcessResult"</span>)
_global_text_process_result <span style="color: #333333;">=</span> <span style="background-color: #fff0f0;">""</span>
<span style="color: #008800; font-weight: bold;">return</span> results
}
</pre></div><p>
<br /></p><h3 style="text-align: left;"><u><b>b. Call the js function from c++</b></u></h3><p><br /></p>
<!--HTML generated using hilite.me--><div style="background: rgb(255, 255, 255) none repeat scroll 0% 0%; border-color: gray; border-image: none 100% / 1 / 0 stretch; border-style: solid; border-width: 0.1em 0.1em 0.1em 0.8em; border: medium solid gray; overflow: auto; padding: 0.2em 0.6em; width: auto;"><pre style="line-height: 125%; margin: 0px;"><span style="color: #557799;">#include "custom_qplain_text_edit.hpp"</span>
<span style="color: #557799;">#include <QDebug></span>
<span style="color: #557799;">#include <QTimer></span>
<span style="color: #557799;">#ifdef Q_OS_WASM</span>
<span style="color: #557799;">#include <emscripten.h></span>
<span style="color: #008800; font-weight: bold;">namespace</span>{
EM_JS(<span style="color: #333399; font-weight: bold;">char</span><span style="color: #333333;">*</span>, global_text_process_result_is_valid, (), {
<span style="color: #008800; font-weight: bold;">return</span> globalTextProcessResultIsValid();
})
EM_JS(<span style="color: #333399; font-weight: bold;">char</span><span style="color: #333333;">*</span>, get_global_text_process_result, (), {
<span style="color: #008800; font-weight: bold;">return</span> getGlobalTextProcessResult();
})
EM_JS(<span style="color: #333399; font-weight: bold;">void</span>, multi_lines_prompt, (<span style="color: #333399; font-weight: bold;">char</span> <span style="color: #008800; font-weight: bold;">const</span> <span style="color: #333333;">*</span>input_strings, <span style="color: #333399; font-weight: bold;">char</span> <span style="color: #008800; font-weight: bold;">const</span> <span style="color: #333333;">*</span>input_mode, <span style="color: #333399; font-weight: bold;">char</span> <span style="color: #008800; font-weight: bold;">const</span> <span style="color: #333333;">*</span>title), {
multiLinesPrompt(UTF8ToString(input_strings), UTF8ToString(input_mode), UTF8ToString(title));
})
}
<span style="color: #557799;">#endif</span>
custom_qplain_text_edit<span style="color: #333333;">::</span>custom_qplain_text_edit(QWidget <span style="color: #333333;">*</span>parent) <span style="color: #333333;">:</span>
QPlainTextEdit(parent)
{
<span style="color: #557799;">#ifdef Q_OS_WASM</span> </pre><pre style="line-height: 125%; margin: 0px;"> timer_ <span style="color: #333333;">=</span> <span style="color: #008800; font-weight: bold;">new</span> QTimer(<span style="color: #008800; font-weight: bold;">this</span>);
timer_<span style="color: #333333;">-></span>setInterval(<span style="color: #0000dd; font-weight: bold;">100</span>);
connect(timer_, <span style="color: #333333;">&</span>QTimer<span style="color: #333333;">::</span>timeout, [<span style="color: #008800; font-weight: bold;">this</span>]()
{
<span style="color: #008800; font-weight: bold;">if</span>(global_text_process_result_is_valid()){
<span style="color: #333399; font-weight: bold;">char</span> <span style="color: #333333;">*</span>msg <span style="color: #333333;">=</span> get_global_text_process_result();
setPlainText(QString(msg));
free(msg);
timer_<span style="color: #333333;">-></span>stop();
}
});
<span style="color: #557799;">#endif</span>
}
<span style="color: #333399; font-weight: bold;">void</span> custom_qplain_text_edit<span style="color: #333333;">::</span>mousePressEvent(QMouseEvent <span style="color: #333333;">*</span>e)
{
<span style="color: #557799;">#ifdef Q_OS_WASM</span>
<span style="color: #008800; font-weight: bold;">if</span>(e<span style="color: #333333;">-></span>button() <span style="color: #333333;">==</span> Qt<span style="color: #333333;">::</span>RightButton){
QPlainTextEdit<span style="color: #333333;">::</span>mousePressEvent(e);
}<span style="color: #008800; font-weight: bold;">else</span>{
multi_lines_prompt(toPlainText().toUtf8().data(), <span style="background-color: #fff0f0;">"textarea"</span>, <span style="background-color: #fff0f0;">"Input plain text"</span>);
timer_<span style="color: #333333;">-></span>start();
}
<span style="color: #557799;">#else</span>
QPlainTextEdit<span style="color: #333333;">::</span>mousePressEvent(e);
<span style="color: #557799;">#endif</span>
}
</pre></div><h3 style="text-align: left;">
c. <u><b>Include js script and css in generated html file</b></u></h3><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgGBQuh7EiC3dCTivL5Jtgad8E2H5K_Iae7ZGu2ahyphenhyphenybXi2Z9Mfe3t13nnuunn96s35x4CYFDJh0MTFskFdomTQF016T6m1OI3mrE-3VXnwDET5-p0Of8R0gDoOHFyxgHt6wBKVkjJ1XvA/s974/Capture.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="808" data-original-width="974" height="530" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgGBQuh7EiC3dCTivL5Jtgad8E2H5K_Iae7ZGu2ahyphenhyphenybXi2Z9Mfe3t13nnuunn96s35x4CYFDJh0MTFskFdomTQF016T6m1OI3mrE-3VXnwDET5-p0Of8R0gDoOHFyxgHt6wBKVkjJ1XvA/w640-h530/Capture.JPG" width="640" /></a></div><br /><p><br /></p><p>
<br />
</p><p><span class="VIiyi" lang="en"><span class="JLqJ4b ChMk0b" data-language-for-alternatives="en" data-language-to-translate-into="zh-CN" data-phrase-index="0"><span> To be honest, this is really troublesome, and I hope that one day this shortcoming can be resolved.</span></span></span> </p><h2 style="text-align: center;"><u><b>Source codes</b></u></h2><p><a href="https://github.com/stereomatchingkiss/blogCodes2/tree/master/qt_for_wasm_enter_chinese">Github</a></p><p><a href="https://drive.google.com/file/d/1O6qxW795ZIU7CLa-il6ZLThfTeqNyXne/view?usp=sharing">Full package, include the fonts and js libraries</a><br /></p>ThamBloghttp://www.blogger.com/profile/12617350461752184349noreply@blogger.com0tag:blogger.com,1999:blog-4702230343097536610.post-76806755202054183282020-06-25T13:31:00.000-07:002020-06-25T20:38:48.467-07:00Dense extreme inception edge detection with opencv and deep learning This tutorial introduce how to perform edge detection with opencv and deep learning. You will learn how to apply Dense Extreme Inception Network(DexiNed) and Holistically-Nested Edge Detection (HED) to images and videos. If you were interesting about HED, I recommend you study a brilliant post of <a href="https://www.pyimagesearch.com/2019/03/04/holistically-nested-edge-detection-with-opencv-and-deep-learning/">pyimagesearch</a>, I would explain the main points of how to perform edge detection by these networks with c++ and opencv.<br />
<br />
<br />
<h2 style="text-align: center;">
<u><b>Apply HED by opencv and c++</b></u></h2>
<br />
This <a href="https://berak.github.io/smallfry/dnn_edge.html">page</a> explain how to do it, you can find the link of the model and prototxt from there too. The author register a new layer, without it opencv cannot generate proper results.<br />
<br />
<h2 style="text-align: center;">
<u><b>Apply DexiNed by opencv and c++</b></u></h2>
<br />
<br />
You can find out the explanation of DexiNed from <a href="https://github.com/xavysp/DexiNed">this page</a>, in order to perform edge detection by DexiNed, we need to convert the model to onnx, I prefer pytorch for this purpose. Why I do not prefer tensorflow? Because convert the model of tensorflow to the format opencv can read is much more complicated from my ex experiences, tensorflow is feature rich but I always feel like they are trying very hard to make things unnecessary complicated, their notoriously bad api design explain this very well.<br />
<br />
<h3 style="text-align: center;">
<u>1.Convert pytorch model of DexiNed to onnx</u></h3>
<ol>
<li>Clone the project <a href="https://github.com/stereomatchingkiss/blogCodes2">blogCodes2</a></li>
<li>Navigate into edges_detection_with_deep_learning</li>
<li>Clone the project <a href="https://github.com/xavysp/DexiNed">DexiNed </a></li>
<li>Copy the file model.py in edges_detection_with_deep_learning/model.py into DexiNed/DexiNed-Pytorch</li>
<li>Run the script to_onnx.py</li>
</ol>
If you do not want to go through the trouble, just download the model from <a href="https://drive.google.com/file/d/1wjST2tD3nnwKPebwiKnLXJp-ymgioTay/view?usp=sharing">here</a>. <br />
<br />
<h3 style="text-align: center;">
<u><b>2.Load and forward image by DexiNed and HED</b></u></h3>
<br />
<br />
<!-- HTML generated using hilite.me --><br />
<div style="background: #ffffff; border-width: 0.1em 0.1em 0.1em 0.8em; border: solid gray; overflow: auto; padding: 0.2em 0.6em; width: auto;">
<pre style="line-height: 125%; margin: 0;"><span style="color: #333399; font-weight: bold;">void</span> <span style="color: #0066bb; font-weight: bold;">switch_to_cuda</span>(cv<span style="color: #333333;">::</span>dnn<span style="color: #333333;">::</span>Net <span style="color: #333333;">&</span>net)
{
try {
<span style="color: #008800; font-weight: bold;">for</span>(<span style="color: #008800; font-weight: bold;">auto</span> <span style="color: #008800; font-weight: bold;">const</span> <span style="color: #333333;">&</span>vpair <span style="color: #333333;">:</span> cv<span style="color: #333333;">::</span>dnn<span style="color: #333333;">::</span>getAvailableBackends()){
std<span style="color: #333333;">::</span>cout<span style="color: #333333;"><<</span>vpair.first<span style="color: #333333;"><<</span><span style="background-color: #fff0f0;">", "</span><span style="color: #333333;"><<</span>vpair.second<span style="color: #333333;"><<</span>std<span style="color: #333333;">::</span>endl;
<span style="color: #008800; font-weight: bold;">if</span>(vpair.first <span style="color: #333333;">==</span> cv<span style="color: #333333;">::</span>dnn<span style="color: #333333;">::</span>DNN_BACKEND_CUDA <span style="color: #333333;">&&</span> vpair.second <span style="color: #333333;">==</span> cv<span style="color: #333333;">::</span>dnn<span style="color: #333333;">::</span>DNN_TARGET_CUDA){
std<span style="color: #333333;">::</span>cout<span style="color: #333333;"><<</span><span style="background-color: #fff0f0;">"can switch to cuda"</span><span style="color: #333333;"><<</span>std<span style="color: #333333;">::</span>endl;
net.setPreferableBackend(cv<span style="color: #333333;">::</span>dnn<span style="color: #333333;">::</span>DNN_BACKEND_CUDA);
net.setPreferableTarget(cv<span style="color: #333333;">::</span>dnn<span style="color: #333333;">::</span>DNN_TARGET_CUDA);
<span style="color: #008800; font-weight: bold;">break</span>;
}
}
}<span style="color: #008800; font-weight: bold;">catch</span>(std<span style="color: #333333;">::</span>exception <span style="color: #008800; font-weight: bold;">const</span> <span style="color: #333333;">&</span>ex){
net.setPreferableBackend(cv<span style="color: #333333;">::</span>dnn<span style="color: #333333;">::</span>DNN_BACKEND_DEFAULT);
net.setPreferableTarget(cv<span style="color: #333333;">::</span>dnn<span style="color: #333333;">::</span>DNN_TARGET_CPU);
<span style="color: #008800; font-weight: bold;">throw</span> std<span style="color: #333333;">::</span>runtime_error(ex.what());
}
}
std<span style="color: #333333;">::</span>tuple<span style="color: #333333;"><</span>cv<span style="color: #333333;">::</span>Mat, <span style="color: #333399; font-weight: bold;">long</span> <span style="color: #333399; font-weight: bold;">long</span><span style="color: #333333;">></span> forward_utils(cv<span style="color: #333333;">::</span>dnn<span style="color: #333333;">::</span>Net <span style="color: #333333;">&</span>net, cv<span style="color: #333333;">::</span>Mat <span style="color: #008800; font-weight: bold;">const</span> <span style="color: #333333;">&</span>input, cv<span style="color: #333333;">::</span>Size <span style="color: #008800; font-weight: bold;">const</span> <span style="color: #333333;">&</span>blob_size)
{
<span style="color: #008800; font-weight: bold;">using</span> <span style="color: #008800; font-weight: bold;">namespace</span> std<span style="color: #333333;">::</span>chrono;
<span style="color: #888888;">//measure duration</span>
<span style="color: #008800; font-weight: bold;">auto</span> <span style="color: #008800; font-weight: bold;">const</span> start <span style="color: #333333;">=</span> high_resolution_clock<span style="color: #333333;">::</span>now();
cv<span style="color: #333333;">::</span>Mat blob <span style="color: #333333;">=</span> cv<span style="color: #333333;">::</span>dnn<span style="color: #333333;">::</span>blobFromImage(input, <span style="color: #6600ee; font-weight: bold;">1.0</span>, blob_size,
cv<span style="color: #333333;">::</span>Scalar(<span style="color: #6600ee; font-weight: bold;">104.00698793</span>, <span style="color: #6600ee; font-weight: bold;">116.66876762</span>, <span style="color: #6600ee; font-weight: bold;">122.67891434</span>), <span style="color: #007020;">false</span>, <span style="color: #007020;">false</span>);
net.setInput(blob);
cv<span style="color: #333333;">::</span>Mat out <span style="color: #333333;">=</span> net.forward();
cv<span style="color: #333333;">::</span>resize(out.reshape(<span style="color: #0000dd; font-weight: bold;">1</span>, blob_size.height), out, input.size());
<span style="color: #888888;">//the data type of out is CV_32F(single channel, floating point) so we need to upscale the value and convert</span>
<span style="color: #888888;">//it to CV_8U(single channel, uchar)</span>
out <span style="color: #333333;">*=</span> <span style="color: #0000dd; font-weight: bold;">255</span>;
out.convertTo(out, CV_8U);
<span style="color: #888888;">//convert gray to bgr because we need to create montage(1 row, 3 column of images in our case)</span>
<span style="color: #008800; font-weight: bold;">auto</span> <span style="color: #008800; font-weight: bold;">const</span> finish <span style="color: #333333;">=</span> high_resolution_clock<span style="color: #333333;">::</span>now();
<span style="color: #008800; font-weight: bold;">auto</span> <span style="color: #008800; font-weight: bold;">const</span> elapsed <span style="color: #333333;">=</span> duration_cast<span style="color: #333333;"><</span>milliseconds<span style="color: #333333;">></span>(finish <span style="color: #333333;">-</span> start).count();
cv<span style="color: #333333;">::</span>cvtColor(out, out, cv<span style="color: #333333;">::</span>COLOR_GRAY2BGR);
<span style="color: #008800; font-weight: bold;">return</span> {out, elapsed};
}
<span style="color: #008800; font-weight: bold;">class</span> <span style="color: #bb0066; font-weight: bold;">hed_edges_detector</span>
{
<span style="color: #997700; font-weight: bold;">public:</span>
hed_edges_detector(std<span style="color: #333333;">::</span>string <span style="color: #008800; font-weight: bold;">const</span> <span style="color: #333333;">&</span>weights, std<span style="color: #333333;">::</span>string <span style="color: #008800; font-weight: bold;">const</span> <span style="color: #333333;">&</span>config) <span style="color: #333333;">:</span>
net_(cv<span style="color: #333333;">::</span>dnn<span style="color: #333333;">::</span>readNet(config, weights))
{
switch_to_cuda(net_);
}
<span style="color: #333399; font-weight: bold;">long</span> <span style="color: #333399; font-weight: bold;">long</span> elapsed() <span style="color: #008800; font-weight: bold;">const</span>
{
<span style="color: #008800; font-weight: bold;">return</span> elapsed_;
}
cv<span style="color: #333333;">::</span>Mat forward(cv<span style="color: #333333;">::</span>Mat <span style="color: #008800; font-weight: bold;">const</span> <span style="color: #333333;">&</span>input)
{
<span style="color: #008800; font-weight: bold;">auto</span> result <span style="color: #333333;">=</span> forward_utils(net_, input, {<span style="color: #0000dd; font-weight: bold;">500</span>, <span style="color: #0000dd; font-weight: bold;">500</span>});
elapsed_ <span style="color: #333333;">+=</span> std<span style="color: #333333;">::</span>get<span style="color: #333333;"><</span><span style="color: #0000dd; font-weight: bold;">1</span><span style="color: #333333;">></span>(result);
<span style="color: #008800; font-weight: bold;">return</span> std<span style="color: #333333;">::</span>get<span style="color: #333333;"><</span><span style="color: #0000dd; font-weight: bold;">0</span><span style="color: #333333;">></span>(result);
}
<span style="color: #997700; font-weight: bold;">private:</span>
<span style="color: #333399; font-weight: bold;">long</span> <span style="color: #333399; font-weight: bold;">long</span> elapsed_ <span style="color: #333333;">=</span> <span style="color: #0000dd; font-weight: bold;">0</span>;
cv<span style="color: #333333;">::</span>dnn<span style="color: #333333;">::</span>Net net_;
};
<span style="color: #008800; font-weight: bold;">class</span> <span style="color: #bb0066; font-weight: bold;">dexi_edges_detector</span>
{
<span style="color: #997700; font-weight: bold;">public:</span>
<span style="color: #008800; font-weight: bold;">explicit</span> dexi_edges_detector(std<span style="color: #333333;">::</span>string <span style="color: #008800; font-weight: bold;">const</span> <span style="color: #333333;">&</span>model) <span style="color: #333333;">:</span>
net_(cv<span style="color: #333333;">::</span>dnn<span style="color: #333333;">::</span>readNet(model))
{
switch_to_cuda(net_);
}
<span style="color: #333399; font-weight: bold;">long</span> <span style="color: #333399; font-weight: bold;">long</span> elapsed() <span style="color: #008800; font-weight: bold;">const</span>
{
<span style="color: #008800; font-weight: bold;">return</span> elapsed_;
}
cv<span style="color: #333333;">::</span>Mat forward(cv<span style="color: #333333;">::</span>Mat <span style="color: #008800; font-weight: bold;">const</span> <span style="color: #333333;">&</span>input)
{
<span style="color: #008800; font-weight: bold;">auto</span> result <span style="color: #333333;">=</span> forward_utils(net_, input, {<span style="color: #0000dd; font-weight: bold;">400</span>, <span style="color: #0000dd; font-weight: bold;">400</span>});
elapsed_ <span style="color: #333333;">+=</span> std<span style="color: #333333;">::</span>get<span style="color: #333333;"><</span><span style="color: #0000dd; font-weight: bold;">1</span><span style="color: #333333;">></span>(result);
<span style="color: #008800; font-weight: bold;">return</span> std<span style="color: #333333;">::</span>get<span style="color: #333333;"><</span><span style="color: #0000dd; font-weight: bold;">0</span><span style="color: #333333;">></span>(result);
}
<span style="color: #997700; font-weight: bold;">private:</span>
<span style="color: #333399; font-weight: bold;">long</span> <span style="color: #333399; font-weight: bold;">long</span> elapsed_ <span style="color: #333333;">=</span> <span style="color: #0000dd; font-weight: bold;">0</span>;
cv<span style="color: #333333;">::</span>dnn<span style="color: #333333;">::</span>Net net_;
};
</pre>
</div>
<br />
<br />
<h3 style="text-align: center;">
<u>3. Detect edges of image</u></h3>
<br />
<br />
<!-- HTML generated using hilite.me --><br />
<div style="background: #ffffff; border-width: 0.1em 0.1em 0.1em 0.8em; border: solid gray; overflow: auto; padding: 0.2em 0.6em; width: auto;">
<pre style="line-height: 125%; margin: 0;"><span style="color: #333399; font-weight: bold;">void</span> <span style="color: #0066bb; font-weight: bold;">test_image</span>(std<span style="color: #333333;">::</span>string <span style="color: #008800; font-weight: bold;">const</span> <span style="color: #333333;">&</span>mpath)
{
cv<span style="color: #333333;">::</span>Mat img <span style="color: #333333;">=</span> cv<span style="color: #333333;">::</span>imread(<span style="background-color: #fff0f0;">"2007_000129.jpg"</span>);
hed_edges_detector hed(mpath <span style="color: #333333;">+</span> <span style="background-color: #fff0f0;">"hed_pretrained_bsds.caffemodel"</span>, mpath <span style="color: #333333;">+</span> <span style="background-color: #fff0f0;">"deploy.prototxt"</span>);
<span style="color: #008800; font-weight: bold;">auto</span> hed_out <span style="color: #333333;">=</span> hed.forward(img);
dexi_edges_detector dexi(mpath <span style="color: #333333;">+</span> <span style="background-color: #fff0f0;">"24_model.onnx"</span>);
<span style="color: #008800; font-weight: bold;">auto</span> dexi_out <span style="color: #333333;">=</span> dexi.forward(img);
cv<span style="color: #333333;">::</span>Size <span style="color: #008800; font-weight: bold;">const</span> frame_size(img.cols, img.rows);
<span style="color: #333399; font-weight: bold;">int</span> constexpr grid_x <span style="color: #333333;">=</span> <span style="color: #0000dd; font-weight: bold;">3</span>;
<span style="color: #333399; font-weight: bold;">int</span> constexpr grid_y <span style="color: #333333;">=</span> <span style="color: #0000dd; font-weight: bold;">1</span>;
ocv<span style="color: #333333;">::</span>montage mt(frame_size, grid_x, grid_y);
mt.add_image(img);
mt.add_image(hed_out);
mt.add_image(dexi_out);
cv<span style="color: #333333;">::</span>imshow(<span style="background-color: #fff0f0;">"results"</span>, mt.get_montage());
cv<span style="color: #333333;">::</span>imwrite(<span style="background-color: #fff0f0;">"results2.jpg"</span>, mt.get_montage());
cv<span style="color: #333333;">::</span>waitKey();
}
</pre>
</div>
<br />
<h3 style="text-align: center;">
<u><b>4. Detect edges of video</b></u></h3>
<br />
<br />
<!-- HTML generated using hilite.me --><br />
<div style="background: #ffffff; border-width: 0.1em 0.1em 0.1em 0.8em; border: solid gray; overflow: auto; padding: 0.2em 0.6em; width: auto;">
<pre style="line-height: 125%; margin: 0;"><span style="color: #333399; font-weight: bold;">void</span> <span style="color: #0066bb; font-weight: bold;">test_video</span>(std<span style="color: #333333;">::</span>string <span style="color: #008800; font-weight: bold;">const</span> <span style="color: #333333;">&</span>mpath)
{
cv<span style="color: #333333;">::</span>VideoCapture cap(<span style="background-color: #fff0f0;">"pedestrian.mp4"</span>);
<span style="color: #008800; font-weight: bold;">if</span>(cap.isOpened()){
hed_edges_detector hed(mpath <span style="color: #333333;">+</span> <span style="background-color: #fff0f0;">"hed_pretrained_bsds.caffemodel"</span>, mpath <span style="color: #333333;">+</span> <span style="background-color: #fff0f0;">"deploy.prototxt"</span>);
dexi_edges_detector dexi(mpath <span style="color: #333333;">+</span> <span style="background-color: #fff0f0;">"24_model.onnx"</span>);
<span style="color: #888888;">//unique_ptr is a resource manager class(smart pointer) of c++,</span>
<span style="color: #888888;">//we allocate memory by the reset(or make_unique) api,</span>
<span style="color: #888888;">//after leaving the scope(scope is surrounded by {}), the memory will be released. In c++, the</span>
<span style="color: #888888;">//best way of manage the resource is avoid explicit memory allocation, if you really need to do it,</span>
<span style="color: #888888;">//guard your memory by smart pointer. I use unique_ptr at here because I cannot</span>
<span style="color: #888888;">//initialize the objects before I know the frame size of the video.</span>
std<span style="color: #333333;">::</span>unique_ptr<span style="color: #333333;"><</span>ocv<span style="color: #333333;">::</span>montage<span style="color: #333333;">></span> mt;
std<span style="color: #333333;">::</span>unique_ptr<span style="color: #333333;"><</span>cv<span style="color: #333333;">::</span>VideoWriter<span style="color: #333333;">></span> vwriter;
cv<span style="color: #333333;">::</span>Mat frame;
<span style="color: #333399; font-weight: bold;">float</span> frame_count <span style="color: #333333;">=</span> <span style="color: #0000dd; font-weight: bold;">0</span>;
<span style="color: #008800; font-weight: bold;">while</span>(<span style="color: #0000dd; font-weight: bold;">1</span>){
cap<span style="color: #333333;">>></span>frame;
<span style="color: #008800; font-weight: bold;">if</span>(frame.empty()){
<span style="color: #008800; font-weight: bold;">break</span>;
}
<span style="color: #333333;">++</span>frame_count;
cv<span style="color: #333333;">::</span>resize(frame, frame, {}, <span style="color: #6600ee; font-weight: bold;">0.5</span>, <span style="color: #6600ee; font-weight: bold;">0.5</span>);
<span style="color: #008800; font-weight: bold;">auto</span> <span style="color: #008800; font-weight: bold;">const</span> hed_out <span style="color: #333333;">=</span> hed.forward(frame);
<span style="color: #008800; font-weight: bold;">auto</span> <span style="color: #008800; font-weight: bold;">const</span> dexi_out <span style="color: #333333;">=</span> dexi.forward(frame);
<span style="color: #008800; font-weight: bold;">if</span>(<span style="color: #333333;">!</span>mt){
<span style="color: #888888;">//initialize the class to create montage</span>
<span style="color: #888888;">//First arguments tell the class the size of each frame</span>
cv<span style="color: #333333;">::</span>Size <span style="color: #008800; font-weight: bold;">const</span> frame_size(frame.cols, frame.rows);
<span style="color: #333399; font-weight: bold;">int</span> constexpr grid_x <span style="color: #333333;">=</span> <span style="color: #0000dd; font-weight: bold;">3</span>;
<span style="color: #333399; font-weight: bold;">int</span> constexpr grid_y <span style="color: #333333;">=</span> <span style="color: #0000dd; font-weight: bold;">1</span>;
mt.reset(<span style="color: #008800; font-weight: bold;">new</span> ocv<span style="color: #333333;">::</span>montage(frame_size, grid_x, grid_y));
}
<span style="color: #008800; font-weight: bold;">if</span>(<span style="color: #333333;">!</span>vwriter){
<span style="color: #008800; font-weight: bold;">auto</span> <span style="color: #008800; font-weight: bold;">const</span> fourcc <span style="color: #333333;">=</span> cv<span style="color: #333333;">::</span>VideoWriter<span style="color: #333333;">::</span>fourcc(<span style="color: #0044dd;">'F'</span>, <span style="color: #0044dd;">'M'</span>, <span style="color: #0044dd;">'P'</span>, <span style="color: #0044dd;">'4'</span>);
<span style="color: #333399; font-weight: bold;">int</span> constexpr fps <span style="color: #333333;">=</span> <span style="color: #0000dd; font-weight: bold;">30</span>;
<span style="color: #888888;">//because the montage is 3 columns and 1 row, so the cols need to multiply by 3</span>
vwriter.reset(<span style="color: #008800; font-weight: bold;">new</span> cv<span style="color: #333333;">::</span>VideoWriter(<span style="background-color: #fff0f0;">"out.avi"</span>, fourcc, fps, {frame.cols <span style="color: #333333;">*</span> <span style="color: #0000dd; font-weight: bold;">3</span>, frame.rows}));
}
mt<span style="color: #333333;">-></span>add_image(frame);
mt<span style="color: #333333;">-></span>add_image(hed_out);
mt<span style="color: #333333;">-></span>add_image(dexi_out);
<span style="color: #008800; font-weight: bold;">auto</span> <span style="color: #008800; font-weight: bold;">const</span> montage <span style="color: #333333;">=</span> mt<span style="color: #333333;">-></span>get_montage();
cv<span style="color: #333333;">::</span>imshow(<span style="background-color: #fff0f0;">"out"</span>, mt<span style="color: #333333;">-></span>get_montage());
vwriter<span style="color: #333333;">-></span>write(montage);
cv<span style="color: #333333;">::</span>waitKey(<span style="color: #0000dd; font-weight: bold;">10</span>);
mt<span style="color: #333333;">-></span>clear();
}
std<span style="color: #333333;">::</span>cout<span style="color: #333333;"><<</span><span style="background-color: #fff0f0;">"hed elapsed time = "</span><span style="color: #333333;"><<</span>hed.elapsed()<span style="color: #333333;"><<</span><span style="background-color: #fff0f0;">", frame count = "</span><span style="color: #333333;"><<</span>frame_count
<span style="color: #333333;"><<</span><span style="background-color: #fff0f0;">", fps = "</span><span style="color: #333333;"><<</span><span style="color: #6600ee; font-weight: bold;">1000.0f</span><span style="color: #333333;">/</span>(hed.elapsed()<span style="color: #333333;">/</span>frame_count)<span style="color: #333333;"><<</span>std<span style="color: #333333;">::</span>endl;
std<span style="color: #333333;">::</span>cout<span style="color: #333333;"><<</span><span style="background-color: #fff0f0;">"dexi elapsed time = "</span><span style="color: #333333;"><<</span>dexi.elapsed()<span style="color: #333333;"><<</span><span style="background-color: #fff0f0;">", frame count = "</span><span style="color: #333333;"><<</span>frame_count
<span style="color: #333333;"><<</span><span style="background-color: #fff0f0;">", fps = "</span><span style="color: #333333;"><<</span><span style="color: #6600ee; font-weight: bold;">1000.0f</span><span style="color: #333333;">/</span>(dexi.elapsed()<span style="color: #333333;">/</span>frame_count)<span style="color: #333333;"><<</span>std<span style="color: #333333;">::</span>endl;
}<span style="color: #008800; font-weight: bold;">else</span>{
std<span style="color: #333333;">::</span>cerr<span style="color: #333333;"><<</span><span style="background-color: #fff0f0;">"cannot open video pedestrian.mp4"</span><span style="color: #333333;"><<</span>std<span style="color: #333333;">::</span>endl;
}
}
</pre>
</div>
<br />
<h2 style="text-align: center;">
<u><b>Results of image detection</b></u></h2>
<span id="goog_726847565"></span><span id="goog_726847566"></span><br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhAfIH6vrBthaJHo9NXPda0Y1ozBmDjVFfslNy3laWalxowufsQGCCKYyR4rcPnz_0g9iWh2NpWc7jNaD8ccEy8ti98_aQn7GVwC-261YsP0F0XtaYs-BpeF8YZT9iXRXTPoWmyKOIFmj8/s1600/results2.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="500" data-original-width="1002" height="159" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhAfIH6vrBthaJHo9NXPda0Y1ozBmDjVFfslNy3laWalxowufsQGCCKYyR4rcPnz_0g9iWh2NpWc7jNaD8ccEy8ti98_aQn7GVwC-261YsP0F0XtaYs-BpeF8YZT9iXRXTPoWmyKOIFmj8/s320/results2.jpg" width="320" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">img_00</td></tr>
</tbody></table>
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhPHjFNT-EeZV70QUPimEtDHsqXjaZorOmrrIHWXJrB-FoDzUStecRZ8Guw3AS2yuYBUvBe-2p7BDHsJ1s6sHrnPiaQoeiADI0cHnju9kXPxZqAlmhuAZJ-Qg9ia5qrQ5QGlT0hJ4dswHY/s1600/results.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="538" data-original-width="1119" height="153" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhPHjFNT-EeZV70QUPimEtDHsqXjaZorOmrrIHWXJrB-FoDzUStecRZ8Guw3AS2yuYBUvBe-2p7BDHsJ1s6sHrnPiaQoeiADI0cHnju9kXPxZqAlmhuAZJ-Qg9ia5qrQ5QGlT0hJ4dswHY/s320/results.jpg" width="320" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">img_01</td></tr>
</tbody></table>
<br />
<h2 style="text-align: center;">
<u><b>Results of video detection</b></u></h2>
<div class="separator" style="clear: both; text-align: center;">
<iframe width="320" height="266" class="YOUTUBE-iframe-video" data-thumbnail-src="https://i.ytimg.com/vi/5AKMoczWcX4/0.jpg" src="https://www.youtube.com/embed/5AKMoczWcX4?feature=player_embedded" frameborder="0" allowfullscreen></iframe></div>
<br />
<br />
<h2 style="text-align: center;">
<u>Runtime performance on gpu(gtx 1060)</u></h2>
Following results are based on the video I posted on <a href="https://www.youtube.com/watch?v=5AKMoczWcX4">youtube</a>.The video has 733 images. From left to right is original frame, frame processed by HED, frame processed by DexiNed.<br />
<br />
HED elapsed time is 43870ms, fps is 16.7085.<br />
DexiNed elapsed time is 45149ms, fps is 16.2351.<br />
<br />
The crop layer of HED do not support cuda, it should become faster after the cuda layer is done.<br />
<h2 style="text-align: center;">
<u><b>Source codes</b></u></h2>
<br />
Located at <a href="http://hed elapsed time = 42973, frame count = 733, fps = 58.6262 dexi elapsed time = 44285, frame count = 733, fps = 60.4161">github</a>.ThamBloghttp://www.blogger.com/profile/12617350461752184349noreply@blogger.com2tag:blogger.com,1999:blog-4702230343097536610.post-47101571209926774532020-06-09T12:05:00.003-07:002020-06-09T12:05:50.385-07:00Asynchronous video capture written by opencv and Qt<h2 style="text-align: center;">
<u><b>Before we start </b></u></h2>
<br />
<ol>
<li>If you do not familiar with QThread, the document of Qt5 show us how to use
QThread properly already, please check it out by google(keyword is "QThread doc", or
you could open the page by this <a href="https://doc.qt.io/qt-5/qthread.html">link</a>), and read this <a href="https://qtandopencv.blogspot.com/2017/09/wrong-way-to-use-qthread.html">post</a>, it may save you a lot of troubles.</li>
<li>If you do not familiar with thread, I suggest you read the book <a href="https://www.amazon.com/C-Concurrency-Action-Practical-Multithreading/dp/1933988770">c++ concurrency in action</a>, chapter 1~4 and basic atomic knowledge from this <a href="https://riptutorial.com/cplusplus/example/24687/atomic-types">site</a> should be more than enough for most of the tasks.</li>
</ol>
<br />
<h2 style="text-align: center;">
<u><b>Why do we need asynchronous video capture</b></u></h2>
<br />
<ol>
<li><span style="color: blue;">Performance</span>, VideoCapture could be slow when capturing the frame from rtsp protocol, especially when the frame size is big, the ideal solution is capture the frame and process the frame in different thread.</li>
<li>Do not <span style="color: blue;">freeze</span> the ui. cv::waitKey is a blocking operation, it is a bad idea to use it directly in gui programming.</li>
</ol>
<br />
<h2 style="text-align: center;">
<u><b>Dependencies </b></u></h2>
<br />
<a href="https://opencv.org/releases/">opencv4</a>(using 4.3.0 in this tutorial)<br />
<a href="https://www.qt.io/download">Qt5</a>(using 5.13.2 in this tutorial) <br />
<br />
If you do not want to register a new account in order to download Qt5(they did a great job to piss off open source communities), you can try <a href="https://github.com/miurahr/aqtinstall">this link</a>.<br />
<br />
<br />
<h2 style="text-align: center;">
<u><b>Define interfaces for worker</b></u></h2>
In order to make us easier to switch the frame capture in the future, I create a worker_base class for this purpose.<br />
<br />
<h4 style="text-align: center;">
<u>frame_capture_config.hpp</u></h4>
<br />
<!-- HTML generated using hilite.me --><br />
<div style="background: #ffffff; border-width: 0.1em 0.1em 0.1em 0.8em; border: solid gray; overflow: auto; padding: 0.2em 0.6em; width: auto;">
<pre style="line-height: 125%; margin: 0;"><span style="color: #557799;">#ifndef FRAME_CAPTURE_CONFIG_HPP</span>
<span style="color: #557799;">#define FRAME_CAPTURE_CONFIG_HPP</span>
<span style="color: #557799;">#include <QString></span>
<span style="color: #008800; font-weight: bold;">namespace</span> frame_capture{
<span style="color: #008800; font-weight: bold;">struct</span> frame_capture_config
{
<span style="color: #888888;">//True will copy the frame captured, useful if the functors</span>
<span style="color: #888888;">//you add work in different thread</span>
<span style="color: #333399; font-weight: bold;">bool</span> deep_copy_ <span style="color: #333333;">=</span> <span style="color: #007020;">false</span>;
<span style="color: #333399; font-weight: bold;">int</span> fps_ <span style="color: #333333;">=</span> <span style="color: #0000dd; font-weight: bold;">30</span>;
QString url_;
};
}
<span style="color: #557799;">#endif </span><span style="color: #888888;">// FRAME_CAPTURE_CONFIG_HPP</span>
</pre>
</div>
<br />
<br />
<h4 style="text-align: center;">
<u>frame_worker_base.hpp</u></h4>
<br />
<!-- HTML generated using hilite.me --><br />
<div style="background: #ffffff; border-width: 0.1em 0.1em 0.1em 0.8em; border: solid gray; overflow: auto; padding: 0.2em 0.6em; width: auto;">
<pre style="line-height: 125%; margin: 0;"><span style="color: #557799;">#ifndef FRAME_WORKER_BASE_HPP</span>
<span style="color: #557799;">#define FRAME_WORKER_BASE_HPP</span>
<span style="color: #557799;">#include <opencv2/core.hpp></span>
<span style="color: #557799;">#include <QObject></span>
<span style="color: #557799;">#include <functional></span>
<span style="color: #008800; font-weight: bold;">namespace</span> frame_capture{
<span style="color: #008800; font-weight: bold;">struct</span> frame_capture_config;
<span style="color: #008800; font-weight: bold;">class</span> <span style="color: #bb0066; font-weight: bold;">worker_base</span> <span style="color: #333333;">:</span> <span style="color: #008800; font-weight: bold;">public</span> QObject
{
Q_OBJECT
<span style="color: #997700; font-weight: bold;">public:</span>
<span style="color: #008800; font-weight: bold;">explicit</span> worker_base(QObject <span style="color: #333333;">*</span>parent <span style="color: #333333;">=</span> nullptr);
<span style="color: #888888;">/**</span>
<span style="color: #888888;"> * Add listener to process the frame</span>
<span style="color: #888888;"> * @param functor Process the frame</span>
<span style="color: #888888;"> * @param key Key of the functor, we need it to remove the functor</span>
<span style="color: #888888;"> * @return True if able to add the listener and vice versa</span>
<span style="color: #888888;"> */</span>
<span style="color: #008800; font-weight: bold;">virtual</span> <span style="color: #333399; font-weight: bold;">bool</span> add_image_listener(std<span style="color: #333333;">::</span>function<span style="color: #333333;"><</span><span style="color: #333399; font-weight: bold;">void</span>(cv<span style="color: #333333;">::</span>Mat)<span style="color: #333333;">></span> functor, <span style="color: #333399; font-weight: bold;">void</span> <span style="color: #333333;">*</span>key) <span style="color: #333333;">=</span> <span style="color: #0000dd; font-weight: bold;">0</span>;
<span style="color: #008800; font-weight: bold;">virtual</span> frame_capture_config get_params() <span style="color: #008800; font-weight: bold;">const</span> <span style="color: #333333;">=</span> <span style="color: #0000dd; font-weight: bold;">0</span>;
<span style="color: #008800; font-weight: bold;">virtual</span> QString get_url() <span style="color: #008800; font-weight: bold;">const</span> <span style="color: #333333;">=</span> <span style="color: #0000dd; font-weight: bold;">0</span>;
<span style="color: #008800; font-weight: bold;">virtual</span> <span style="color: #333399; font-weight: bold;">bool</span> is_stop() <span style="color: #008800; font-weight: bold;">const</span> <span style="color: #333333;">=</span> <span style="color: #0000dd; font-weight: bold;">0</span>;
<span style="color: #888888;">/**</span>
<span style="color: #888888;"> * This function will stop the frame capturer, release all functor etc</span>
<span style="color: #888888;"> */</span>
<span style="color: #008800; font-weight: bold;">virtual</span> <span style="color: #333399; font-weight: bold;">void</span> release() <span style="color: #333333;">=</span> <span style="color: #0000dd; font-weight: bold;">0</span>;
<span style="color: #888888;">/**</span>
<span style="color: #888888;"> * Remove the listener</span>
<span style="color: #888888;"> * @param key The key same as add_image_listener when you add the functor</span>
<span style="color: #888888;"> * @return True if able to remove the listener and vice versa</span>
<span style="color: #888888;"> * @warning Remember to remove_image_listener before the resources of the register functor</span>
<span style="color: #888888;"> * released, else the app may crash.</span>
<span style="color: #888888;"> */</span>
<span style="color: #008800; font-weight: bold;">virtual</span> <span style="color: #333399; font-weight: bold;">bool</span> remove_image_listener(<span style="color: #333399; font-weight: bold;">void</span> <span style="color: #333333;">*</span>key) <span style="color: #333333;">=</span> <span style="color: #0000dd; font-weight: bold;">0</span>;
<span style="color: #008800; font-weight: bold;">virtual</span> <span style="color: #333399; font-weight: bold;">void</span> set_max_fps(<span style="color: #333399; font-weight: bold;">int</span> input) <span style="color: #333333;">=</span> <span style="color: #0000dd; font-weight: bold;">0</span>;
<span style="color: #008800; font-weight: bold;">virtual</span> <span style="color: #333399; font-weight: bold;">void</span> set_params(frame_capture_config <span style="color: #008800; font-weight: bold;">const</span> <span style="color: #333333;">&</span>config) <span style="color: #333333;">=</span> <span style="color: #0000dd; font-weight: bold;">0</span>;
<span style="color: #888888;">/**</span>
<span style="color: #888888;"> * Will start the frame captured with the url set by the start_url api</span>
<span style="color: #888888;"> */</span>
<span style="color: #008800; font-weight: bold;">virtual</span> <span style="color: #333399; font-weight: bold;">void</span> start() <span style="color: #333333;">=</span> <span style="color: #0000dd; font-weight: bold;">0</span>;
<span style="color: #008800; font-weight: bold;">virtual</span> <span style="color: #333399; font-weight: bold;">void</span> start_url(QString <span style="color: #008800; font-weight: bold;">const</span> <span style="color: #333333;">&</span>url) <span style="color: #333333;">=</span> <span style="color: #0000dd; font-weight: bold;">0</span>;
<span style="color: #008800; font-weight: bold;">virtual</span> <span style="color: #333399; font-weight: bold;">void</span> stop() <span style="color: #333333;">=</span> <span style="color: #0000dd; font-weight: bold;">0</span>;
<span style="color: #997700; font-weight: bold;">signals:</span>
<span style="color: #333399; font-weight: bold;">void</span> <span style="color: #0066bb; font-weight: bold;">cannot_open</span>(QString <span style="color: #008800; font-weight: bold;">const</span> <span style="color: #333333;">&</span>media_url);
};
}
<span style="color: #557799;">#endif </span><span style="color: #888888;">// FRAME_WORKER_BASE_HPP</span>
</pre>
</div>
<br />
<br />
<h2 style="text-align: center;">
<u>Implement frame capture by opencv</u></h2>
<br />
Compare with another libraries like gstreamer, ffmpeg, libvlc, Qt etc. cv::VideoCapture has the simplest api to capture the frame, although c++ api of this class do not work on android yet(you have to use jni, this make porting the app to android become more troubles).<br />
<br />
<br />
<h4 style="text-align: center;">
<u>frame_capture_opencv_worker.hpp</u></h4>
<br />
<!-- HTML generated using hilite.me --><br />
<div style="background: #ffffff; border-width: 0.1em 0.1em 0.1em 0.8em; border: solid gray; overflow: auto; padding: 0.2em 0.6em; width: auto;">
<pre style="line-height: 125%; margin: 0;"><span style="color: #557799;">#ifndef FRAME_CAPTURED_OPENCV_WORKER_HPP</span>
<span style="color: #557799;">#define FRAME_CAPTURED_OPENCV_WORKER_HPP</span>
<span style="color: #557799;">#include "frame_worker_base.hpp"</span>
<span style="color: #557799;">#include <QObject></span>
<span style="color: #557799;">#include <opencv2/core.hpp></span>
<span style="color: #557799;">#include <opencv2/highgui.hpp></span>
<span style="color: #557799;">#include <opencv2/video.hpp></span>
<span style="color: #557799;">#include <atomic></span>
<span style="color: #557799;">#include <functional></span>
<span style="color: #557799;">#include <map></span>
<span style="color: #557799;">#include <mutex></span>
<span style="color: #008800; font-weight: bold;">class</span> <span style="color: #bb0066; font-weight: bold;">QTimer</span>;
<span style="color: #008800; font-weight: bold;">namespace</span> frame_capture{
<span style="color: #008800; font-weight: bold;">struct</span> frame_capture_config;
<span style="color: #008800; font-weight: bold;">class</span> <span style="color: #bb0066; font-weight: bold;">capture_opencv_worker</span> <span style="color: #333333;">:</span> <span style="color: #008800; font-weight: bold;">public</span> worker_base
{
Q_OBJECT
<span style="color: #997700; font-weight: bold;">public:</span>
<span style="color: #008800; font-weight: bold;">explicit</span> capture_opencv_worker(frame_capture_config <span style="color: #008800; font-weight: bold;">const</span> <span style="color: #333333;">&</span>config);
<span style="color: #333333;">~</span>capture_opencv_worker() override;
<span style="color: #333399; font-weight: bold;">bool</span> add_image_listener(std<span style="color: #333333;">::</span>function<span style="color: #333333;"><</span><span style="color: #333399; font-weight: bold;">void</span>(cv<span style="color: #333333;">::</span>Mat)<span style="color: #333333;">></span> functor, <span style="color: #333399; font-weight: bold;">void</span> <span style="color: #333333;">*</span>key) override;
frame_capture_config get_params() <span style="color: #008800; font-weight: bold;">const</span> override;
QString get_url() <span style="color: #008800; font-weight: bold;">const</span> override;
<span style="color: #333399; font-weight: bold;">void</span> set_max_fps(<span style="color: #333399; font-weight: bold;">int</span> input) override;
<span style="color: #333399; font-weight: bold;">bool</span> is_stop() <span style="color: #008800; font-weight: bold;">const</span> override;
<span style="color: #333399; font-weight: bold;">void</span> release() override;
<span style="color: #333399; font-weight: bold;">bool</span> remove_image_listener(<span style="color: #333399; font-weight: bold;">void</span> <span style="color: #333333;">*</span>key) override;
<span style="color: #333399; font-weight: bold;">void</span> set_params(frame_capture_config <span style="color: #008800; font-weight: bold;">const</span> <span style="color: #333333;">&</span>config) override;
<span style="color: #333399; font-weight: bold;">void</span> start() override;
<span style="color: #333399; font-weight: bold;">void</span> start_url(QString <span style="color: #008800; font-weight: bold;">const</span> <span style="color: #333333;">&</span>url) override;
<span style="color: #333399; font-weight: bold;">void</span> stop() override;
<span style="color: #997700; font-weight: bold;">private:</span>
<span style="color: #333399; font-weight: bold;">void</span> <span style="color: #0066bb; font-weight: bold;">open_media</span>(QString <span style="color: #008800; font-weight: bold;">const</span> <span style="color: #333333;">&</span>media_url);
<span style="color: #333399; font-weight: bold;">void</span> <span style="color: #0066bb; font-weight: bold;">captured_frame</span>();
<span style="color: #333399; font-weight: bold;">void</span> <span style="color: #0066bb; font-weight: bold;">set_max_fps_non_ts</span>(<span style="color: #333399; font-weight: bold;">int</span> input);
<span style="color: #333399; font-weight: bold;">void</span> <span style="color: #0066bb; font-weight: bold;">time_out</span>();
cv<span style="color: #333333;">::</span>VideoCapture capture_;
<span style="color: #333399; font-weight: bold;">bool</span> deep_copy_ <span style="color: #333333;">=</span> <span style="color: #007020;">false</span>;
<span style="color: #333399; font-weight: bold;">int</span> frame_duration_ <span style="color: #333333;">=</span> <span style="color: #0000dd; font-weight: bold;">30</span>;
std<span style="color: #333333;">::</span>map<span style="color: #333333;"><</span><span style="color: #333399; font-weight: bold;">void</span><span style="color: #333333;">*</span>, std<span style="color: #333333;">::</span>function<span style="color: #333333;"><</span><span style="color: #333399; font-weight: bold;">void</span>(cv<span style="color: #333333;">::</span>Mat)<span style="color: #333333;">>></span> functors_;
<span style="color: #333399; font-weight: bold;">int</span> max_fps_;
QString media_url_;
<span style="color: #008800; font-weight: bold;">mutable</span> std<span style="color: #333333;">::</span>mutex mutex_;
<span style="color: #333399; font-weight: bold;">bool</span> stop_;
QTimer <span style="color: #333333;">*</span>timer_ <span style="color: #333333;">=</span> nullptr;
<span style="color: #333399; font-weight: bold;">int</span> webcam_index_ <span style="color: #333333;">=</span> <span style="color: #0000dd; font-weight: bold;">0</span>;
};
}
<span style="color: #557799;">#endif </span><span style="color: #888888;">// FRAME_CAPTURED_OPENCV_WORKER_HPP</span>
</pre>
</div>
<br />
<br />
<h4 style="text-align: center;">
<u><b>frame_capture_opencv_worker.cpp</b></u></h4>
<br />
Please read the comments carefully, they may help you avoid subtle bugs.<br />
<br />
<!-- HTML generated using hilite.me --><br />
<div style="background: #ffffff; border-width: 0.1em 0.1em 0.1em 0.8em; border: solid gray; overflow: auto; padding: 0.2em 0.6em; width: auto;">
<pre style="line-height: 125%; margin: 0;"><span style="color: #557799;">#include "frame_capture_opencv_worker.hpp"</span>
<span style="color: #557799;">#include "frame_capture_config.hpp"</span>
<span style="color: #557799;">#include <QDebug></span>
<span style="color: #557799;">#include <QElapsedTimer></span>
<span style="color: #557799;">#include <QThread></span>
<span style="color: #557799;">#include <QTimer></span>
<span style="color: #557799;">#include <chrono></span>
<span style="color: #557799;">#include <thread></span>
<span style="color: #008800; font-weight: bold;">namespace</span> frame_capture{
capture_opencv_worker<span style="color: #333333;">::</span>capture_opencv_worker(frame_capture_config <span style="color: #008800; font-weight: bold;">const</span> <span style="color: #333333;">&</span>config) <span style="color: #333333;">:</span>
worker_base(),
deep_copy_(config.deep_copy_),
media_url_(config.url_),
stop_(<span style="color: #007020;">true</span>)
{
set_max_fps(config.fps_);
}
capture_opencv_worker<span style="color: #333333;">::~</span>capture_opencv_worker()
{
release();
}
<span style="color: #333399; font-weight: bold;">bool</span> capture_opencv_worker<span style="color: #333333;">::</span>add_image_listener(std<span style="color: #333333;">::</span>function<span style="color: #333333;"><</span><span style="color: #333399; font-weight: bold;">void</span> (cv<span style="color: #333333;">::</span>Mat)<span style="color: #333333;">></span> functor, <span style="color: #333399; font-weight: bold;">void</span> <span style="color: #333333;">*</span>key)
{
std<span style="color: #333333;">::</span>lock_guard<span style="color: #333333;"><</span>std<span style="color: #333333;">::</span>mutex<span style="color: #333333;">></span> lock(mutex_);
<span style="color: #008800; font-weight: bold;">return</span> functors_.insert(std<span style="color: #333333;">::</span>make_pair(key, std<span style="color: #333333;">::</span>move(functor))).second;
}
frame_capture_config capture_opencv_worker<span style="color: #333333;">::</span>get_params() <span style="color: #008800; font-weight: bold;">const</span>
{
std<span style="color: #333333;">::</span>lock_guard<span style="color: #333333;"><</span>std<span style="color: #333333;">::</span>mutex<span style="color: #333333;">></span> lock(mutex_);
frame_capture_config config;
config.deep_copy_ <span style="color: #333333;">=</span> deep_copy_;
config.fps_ <span style="color: #333333;">=</span> max_fps_;
config.url_ <span style="color: #333333;">=</span> media_url_;
<span style="color: #008800; font-weight: bold;">return</span> config;
}
QString capture_opencv_worker<span style="color: #333333;">::</span>get_url() <span style="color: #008800; font-weight: bold;">const</span>
{
std<span style="color: #333333;">::</span>lock_guard<span style="color: #333333;"><</span>std<span style="color: #333333;">::</span>mutex<span style="color: #333333;">></span> lock(mutex_);
<span style="color: #008800; font-weight: bold;">return</span> media_url_;
}
<span style="color: #333399; font-weight: bold;">void</span> capture_opencv_worker<span style="color: #333333;">::</span>set_max_fps(<span style="color: #333399; font-weight: bold;">int</span> input)
{
std<span style="color: #333333;">::</span>lock_guard<span style="color: #333333;"><</span>std<span style="color: #333333;">::</span>mutex<span style="color: #333333;">></span> lock(mutex_);
set_max_fps_non_ts(input);
}
<span style="color: #333399; font-weight: bold;">void</span> capture_opencv_worker<span style="color: #333333;">::</span>open_media(<span style="color: #008800; font-weight: bold;">const</span> QString <span style="color: #333333;">&</span>media_url)
{
qDebug()<span style="color: #333333;"><<</span>__func__<span style="color: #333333;"><<</span><span style="background-color: #fff0f0;">": "</span><span style="color: #333333;"><<</span>media_url;
<span style="color: #333399; font-weight: bold;">bool</span> can_convert_to_int <span style="color: #333333;">=</span> <span style="color: #0000dd; font-weight: bold;">0</span>;
<span style="color: #008800; font-weight: bold;">if</span>(timer_){
timer_<span style="color: #333333;">-></span>stop();
}
media_url.toInt(<span style="color: #333333;">&</span>can_convert_to_int);
stop_ <span style="color: #333333;">=</span> <span style="color: #007020;">true</span>;
try{
capture_.release();
<span style="color: #888888;">//If you pass in int, opencv will open webcam if it could</span>
<span style="color: #008800; font-weight: bold;">if</span>(can_convert_to_int){
capture_.open(media_url.toInt());
}<span style="color: #008800; font-weight: bold;">else</span>{
capture_.open(media_url.toStdString());
}
}<span style="color: #008800; font-weight: bold;">catch</span>(std<span style="color: #333333;">::</span>exception <span style="color: #008800; font-weight: bold;">const</span> <span style="color: #333333;">&</span>ex){
qDebug()<span style="color: #333333;"><<</span>__func__<span style="color: #333333;"><<</span>ex.what();
}
<span style="color: #008800; font-weight: bold;">if</span>(capture_.isOpened()){
stop_ <span style="color: #333333;">=</span> <span style="color: #007020;">false</span>;
}<span style="color: #008800; font-weight: bold;">else</span>{
stop_ <span style="color: #333333;">=</span> <span style="color: #007020;">true</span>;
emit <span style="color: #0066bb; font-weight: bold;">cannot_open</span>(media_url);
}
}
<span style="color: #333399; font-weight: bold;">bool</span> capture_opencv_worker<span style="color: #333333;">::</span>is_stop() <span style="color: #008800; font-weight: bold;">const</span>
{
std<span style="color: #333333;">::</span>lock_guard<span style="color: #333333;"><</span>std<span style="color: #333333;">::</span>mutex<span style="color: #333333;">></span> lock(mutex_);
<span style="color: #008800; font-weight: bold;">return</span> stop_;
}
<span style="color: #333399; font-weight: bold;">void</span> capture_opencv_worker<span style="color: #333333;">::</span>release()
{
qDebug()<span style="color: #333333;"><<</span>__func__<span style="color: #333333;"><<</span><span style="background-color: #fff0f0;">": delete cam with url = "</span><span style="color: #333333;"><<</span>media_url_;
std<span style="color: #333333;">::</span>lock_guard<span style="color: #333333;"><</span>std<span style="color: #333333;">::</span>mutex<span style="color: #333333;">></span> lock(mutex_);
qDebug()<span style="color: #333333;"><<</span>__func__<span style="color: #333333;"><<</span><span style="background-color: #fff0f0;">": enter lock region"</span>;
stop_ <span style="color: #333333;">=</span> <span style="color: #007020;">true</span>;
qDebug()<span style="color: #333333;"><<</span>__func__<span style="color: #333333;"><<</span><span style="background-color: #fff0f0;">": clear functor"</span>;
functors_.clear();
qDebug()<span style="color: #333333;"><<</span>__func__<span style="color: #333333;"><<</span><span style="background-color: #fff0f0;">": release capture"</span>;
capture_.release();
qDebug()<span style="color: #333333;"><<</span>__func__<span style="color: #333333;"><<</span><span style="background-color: #fff0f0;">": delete timer later"</span>;
}
<span style="color: #333399; font-weight: bold;">bool</span> capture_opencv_worker<span style="color: #333333;">::</span>remove_image_listener(<span style="color: #333399; font-weight: bold;">void</span> <span style="color: #333333;">*</span>key)
{
std<span style="color: #333333;">::</span>lock_guard<span style="color: #333333;"><</span>std<span style="color: #333333;">::</span>mutex<span style="color: #333333;">></span> lock(mutex_);
<span style="color: #008800; font-weight: bold;">return</span> functors_.erase(key) <span style="color: #333333;">></span> <span style="color: #0000dd; font-weight: bold;">0</span>;
}
<span style="color: #333399; font-weight: bold;">void</span> capture_opencv_worker<span style="color: #333333;">::</span>set_params(<span style="color: #008800; font-weight: bold;">const</span> frame_capture_config <span style="color: #333333;">&</span>config)
{
std<span style="color: #333333;">::</span>lock_guard<span style="color: #333333;"><</span>std<span style="color: #333333;">::</span>mutex<span style="color: #333333;">></span> lock(mutex_);
deep_copy_ <span style="color: #333333;">=</span> config.deep_copy_;
media_url_ <span style="color: #333333;">=</span> config.url_;
set_max_fps_non_ts(config.fps_);
}
<span style="color: #333399; font-weight: bold;">void</span> capture_opencv_worker<span style="color: #333333;">::</span>start()
{
start_url(media_url_);
}
<span style="color: #333399; font-weight: bold;">void</span> capture_opencv_worker<span style="color: #333333;">::</span>start_url(QString <span style="color: #008800; font-weight: bold;">const</span> <span style="color: #333333;">&</span>url)
{
stop();
qDebug()<span style="color: #333333;"><<</span>__func__<span style="color: #333333;"><<</span><span style="background-color: #fff0f0;">": stop = "</span><span style="color: #333333;"><<</span>stop_<span style="color: #333333;"><<</span><span style="background-color: #fff0f0;">", url = "</span><span style="color: #333333;"><<</span>url;
std<span style="color: #333333;">::</span>lock_guard<span style="color: #333333;"><</span>std<span style="color: #333333;">::</span>mutex<span style="color: #333333;">></span> lock(mutex_);
open_media(url);
<span style="color: #008800; font-weight: bold;">if</span>(capture_.isOpened()){
media_url_ <span style="color: #333333;">=</span> url;
captured_frame();
}
}
<span style="color: #333399; font-weight: bold;">void</span> capture_opencv_worker<span style="color: #333333;">::</span>stop()
{
std<span style="color: #333333;">::</span>lock_guard<span style="color: #333333;"><</span>std<span style="color: #333333;">::</span>mutex<span style="color: #333333;">></span> lock(mutex_);
stop_ <span style="color: #333333;">=</span> <span style="color: #007020;">true</span>;
qDebug()<span style="color: #333333;"><<</span>__func__<span style="color: #333333;"><<</span><span style="background-color: #fff0f0;">": stop the worker = "</span><span style="color: #333333;"><<</span>stop_;
}
<span style="color: #333399; font-weight: bold;">void</span> capture_opencv_worker<span style="color: #333333;">::</span>captured_frame()
{
qDebug()<span style="color: #333333;"><<</span>__func__<span style="color: #333333;"><<</span><span style="background-color: #fff0f0;">": capture_.isOpened()"</span>;
<span style="color: #888888;">//You must initialize and delete timer in the same thread</span>
<span style="color: #888888;">//of the VideoCapture running, else you may trigger undefined</span>
<span style="color: #888888;">//behavior</span>
<span style="color: #008800; font-weight: bold;">if</span>(<span style="color: #333333;">!</span>timer_){
qDebug()<span style="color: #333333;"><<</span>__func__<span style="color: #333333;"><<</span><span style="background-color: #fff0f0;">": init timer"</span>;
timer_ <span style="color: #333333;">=</span> <span style="color: #008800; font-weight: bold;">new</span> QTimer;
timer_<span style="color: #333333;">-></span>setSingleShot(<span style="color: #007020;">true</span>);
connect(timer_, <span style="color: #333333;">&</span>QTimer<span style="color: #333333;">::</span>timeout, <span style="color: #008800; font-weight: bold;">this</span>, <span style="color: #333333;">&</span>capture_opencv_worker<span style="color: #333333;">::</span>time_out);
}
qDebug()<span style="color: #333333;"><<</span>__func__<span style="color: #333333;"><<</span><span style="background-color: #fff0f0;">": start timer"</span>;
timer_<span style="color: #333333;">-></span>start();
qDebug()<span style="color: #333333;"><<</span>__func__<span style="color: #333333;"><<</span><span style="background-color: #fff0f0;">": called start timer"</span>;
}
<span style="color: #333399; font-weight: bold;">void</span> capture_opencv_worker<span style="color: #333333;">::</span>set_max_fps_non_ts(<span style="color: #333399; font-weight: bold;">int</span> input)
{
max_fps_ <span style="color: #333333;">=</span> std<span style="color: #333333;">::</span>max(input, <span style="color: #0000dd; font-weight: bold;">1</span>);
frame_duration_ <span style="color: #333333;">=</span> std<span style="color: #333333;">::</span>max(<span style="color: #0000dd; font-weight: bold;">1000</span> <span style="color: #333333;">/</span> max_fps_, <span style="color: #0000dd; font-weight: bold;">1</span>);
}
<span style="color: #333399; font-weight: bold;">void</span> capture_opencv_worker<span style="color: #333333;">::</span>time_out()
{
QElapsedTimer elapsed;
elapsed.start();
std<span style="color: #333333;">::</span>lock_guard<span style="color: #333333;"><</span>std<span style="color: #333333;">::</span>mutex<span style="color: #333333;">></span> lock(mutex_);
<span style="color: #008800; font-weight: bold;">if</span>(<span style="color: #333333;">!</span>stop_ <span style="color: #333333;">&&</span> timer_){
capture_.grab();
cv<span style="color: #333333;">::</span>Mat frame;
capture_.retrieve(frame);
<span style="color: #008800; font-weight: bold;">if</span>(<span style="color: #333333;">!</span>frame.empty()){
<span style="color: #008800; font-weight: bold;">for</span>(<span style="color: #008800; font-weight: bold;">auto</span> <span style="color: #333333;">&</span>iter <span style="color: #333333;">:</span> functors_){
iter.second(deep_copy_ <span style="color: #333333;">?</span> frame.clone() <span style="color: #333333;">:</span> frame);
}
<span style="color: #008800; font-weight: bold;">auto</span> <span style="color: #008800; font-weight: bold;">const</span> interval <span style="color: #333333;">=</span> frame_duration_ <span style="color: #333333;">-</span> elapsed.elapsed();
timer_<span style="color: #333333;">-></span>start(std<span style="color: #333333;">::</span>max(<span style="color: #008800; font-weight: bold;">static_cast</span><span style="color: #333333;"><</span><span style="color: #333399; font-weight: bold;">int</span><span style="color: #333333;">></span>(interval), <span style="color: #0000dd; font-weight: bold;">10</span>));
}<span style="color: #008800; font-weight: bold;">else</span>{
open_media(media_url_);
timer_<span style="color: #333333;">-></span>start();
}
}<span style="color: #008800; font-weight: bold;">else</span>{
capture_.release();
<span style="color: #008800; font-weight: bold;">if</span>(timer_){
<span style="color: #888888;">//you must delete timer in the thread where you initiaize it</span>
<span style="color: #008800; font-weight: bold;">delete</span> timer_;
timer_ <span style="color: #333333;">=</span> nullptr;
}
}
}
}
</pre>
</div>
<br />
<br />
<h2 style="text-align: center;">
<u>Create controller associate with the worker</u></h2>
<br />
<h4 style="text-align: center;">
<u>frame_capture_controller.hpp</u></h4>
<br />
<br />
<!-- HTML generated using hilite.me --><br />
<div style="background: #ffffff; border-width: 0.1em 0.1em 0.1em 0.8em; border: solid gray; overflow: auto; padding: 0.2em 0.6em; width: auto;">
<pre style="line-height: 125%; margin: 0;"><span style="color: #557799;">#ifndef FRAME_CAPTURE_OPENCV_CONTROLLER_HPP</span>
<span style="color: #557799;">#define FRAME_CAPTURE_OPENCV_CONTROLLER_HPP</span>
<span style="color: #557799;">#include <QObject></span>
<span style="color: #557799;">#include <QThread></span>
<span style="color: #557799;">#include <QVariant></span>
<span style="color: #557799;">#include <opencv2/core.hpp></span>
<span style="color: #557799;">#include <functional></span>
<span style="color: #008800; font-weight: bold;">namespace</span> frame_capture{
<span style="color: #008800; font-weight: bold;">struct</span> frame_capture_config;
<span style="color: #008800; font-weight: bold;">class</span> <span style="color: #bb0066; font-weight: bold;">worker_base</span>;
<span style="color: #008800; font-weight: bold;">class</span> <span style="color: #bb0066; font-weight: bold;">capture_controller</span> <span style="color: #333333;">:</span> <span style="color: #008800; font-weight: bold;">public</span> QObject
{
Q_OBJECT
<span style="color: #997700; font-weight: bold;">public:</span>
<span style="color: #008800; font-weight: bold;">explicit</span> capture_controller(frame_capture_config <span style="color: #008800; font-weight: bold;">const</span> <span style="color: #333333;">&</span>config);
<span style="color: #333333;">~</span>capture_controller() override;
<span style="color: #333399; font-weight: bold;">bool</span> <span style="color: #0066bb; font-weight: bold;">add_image_listener</span>(std<span style="color: #333333;">::</span>function<span style="color: #333333;"><</span><span style="color: #333399; font-weight: bold;">void</span>(cv<span style="color: #333333;">::</span>Mat)<span style="color: #333333;">></span> functor, <span style="color: #333399; font-weight: bold;">void</span> <span style="color: #333333;">*</span>key);
frame_capture_config get_params() <span style="color: #008800; font-weight: bold;">const</span>;
QString get_url() <span style="color: #008800; font-weight: bold;">const</span>;
<span style="color: #333399; font-weight: bold;">bool</span> is_stop() <span style="color: #008800; font-weight: bold;">const</span>;
<span style="color: #333399; font-weight: bold;">bool</span> <span style="color: #0066bb; font-weight: bold;">remove_image_listener</span>(<span style="color: #333399; font-weight: bold;">void</span> <span style="color: #333333;">*</span>key);
<span style="color: #333399; font-weight: bold;">void</span> <span style="color: #0066bb; font-weight: bold;">set_max_fps</span>(<span style="color: #333399; font-weight: bold;">int</span> input);
<span style="color: #333399; font-weight: bold;">void</span> <span style="color: #0066bb; font-weight: bold;">set_params</span>(frame_capture_config <span style="color: #008800; font-weight: bold;">const</span> <span style="color: #333333;">&</span>config);
<span style="color: #333399; font-weight: bold;">void</span> <span style="color: #0066bb; font-weight: bold;">stop</span>();
<span style="color: #997700; font-weight: bold;">signals:</span>
<span style="color: #333399; font-weight: bold;">void</span> <span style="color: #0066bb; font-weight: bold;">cannot_open</span>(QString <span style="color: #008800; font-weight: bold;">const</span> <span style="color: #333333;">&</span>media_url);
<span style="color: #333399; font-weight: bold;">void</span> <span style="color: #0066bb; font-weight: bold;">reach_the_end</span>();
<span style="color: #333399; font-weight: bold;">void</span> <span style="color: #0066bb; font-weight: bold;">start</span>();
<span style="color: #333399; font-weight: bold;">void</span> <span style="color: #0066bb; font-weight: bold;">start_url</span>(QString <span style="color: #008800; font-weight: bold;">const</span> <span style="color: #333333;">&</span>url);
<span style="color: #997700; font-weight: bold;">private:</span>
<span style="color: #333399; font-weight: bold;">void</span> <span style="color: #0066bb; font-weight: bold;">init_frame_capture</span>();
worker_base <span style="color: #333333;">*</span>frame_capture_;
QThread thread_;
};
}
<span style="color: #557799;">#endif </span><span style="color: #888888;">// FRAME_CAPTURE_OPENCV_CONTROLLER_HPP</span>
</pre>
</div>
<br />
<br />
<h4 style="text-align: center;">
<u>frame_capture_controller.cpp</u></h4>
<br />
<br />
<!-- HTML generated using hilite.me --><br />
<div style="background: #ffffff; border-width: 0.1em 0.1em 0.1em 0.8em; border: solid gray; overflow: auto; padding: 0.2em 0.6em; width: auto;">
<pre style="line-height: 125%; margin: 0;"><span style="color: #557799;">#include "frame_capture_controller.hpp"</span>
<span style="color: #557799;">#include "frame_capture_config.hpp"</span>
<span style="color: #557799;">#include "frame_capture_opencv_worker.hpp"</span>
<span style="color: #557799;">#include <QDebug></span>
<span style="color: #008800; font-weight: bold;">namespace</span> frame_capture{
<span style="color: #333399; font-weight: bold;">void</span> capture_controller<span style="color: #333333;">::</span>init_frame_capture()
{
frame_capture_<span style="color: #333333;">-></span>moveToThread(<span style="color: #333333;">&</span>thread_);
connect(<span style="color: #333333;">&</span>thread_, <span style="color: #333333;">&</span>QThread<span style="color: #333333;">::</span>finished, frame_capture_, <span style="color: #333333;">&</span>QObject<span style="color: #333333;">::</span>deleteLater);
connect(frame_capture_, <span style="color: #333333;">&</span>worker_base<span style="color: #333333;">::</span>cannot_open, <span style="color: #008800; font-weight: bold;">this</span>, <span style="color: #333333;">&</span>capture_controller<span style="color: #333333;">::</span>cannot_open);
connect(<span style="color: #008800; font-weight: bold;">this</span>, <span style="color: #333333;">&</span>capture_controller<span style="color: #333333;">::</span>start, frame_capture_, <span style="color: #333333;">&</span>worker_base<span style="color: #333333;">::</span>start);
connect(<span style="color: #008800; font-weight: bold;">this</span>, <span style="color: #333333;">&</span>capture_controller<span style="color: #333333;">::</span>start_url, frame_capture_, <span style="color: #333333;">&</span>worker_base<span style="color: #333333;">::</span>start_url);
thread_.start();
}
capture_controller<span style="color: #333333;">::</span>capture_controller(frame_capture_config <span style="color: #008800; font-weight: bold;">const</span> <span style="color: #333333;">&</span>config) <span style="color: #333333;">:</span>
QObject(),
frame_capture_(<span style="color: #008800; font-weight: bold;">new</span> capture_opencv_worker(config))
{
init_frame_capture();
}
capture_controller<span style="color: #333333;">::~</span>capture_controller()
{
qDebug()<span style="color: #333333;"><<</span>__func__<span style="color: #333333;"><<</span><span style="background-color: #fff0f0;">": quit"</span>;
<span style="color: #888888;">//must called release or before quit and wait, else the</span>
<span style="color: #888888;">//frame capture will fall into infinite loop</span>
frame_capture_<span style="color: #333333;">-></span>release();
thread_.quit();
qDebug()<span style="color: #333333;"><<</span>__func__<span style="color: #333333;"><<</span><span style="background-color: #fff0f0;">": wait"</span>;
thread_.wait();
qDebug()<span style="color: #333333;"><<</span>__func__<span style="color: #333333;"><<</span><span style="background-color: #fff0f0;">": wait exit"</span>;
}
<span style="color: #333399; font-weight: bold;">bool</span> capture_controller<span style="color: #333333;">::</span>add_image_listener(std<span style="color: #333333;">::</span>function<span style="color: #333333;"><</span><span style="color: #333399; font-weight: bold;">void</span> (cv<span style="color: #333333;">::</span>Mat)<span style="color: #333333;">></span> functor, <span style="color: #333399; font-weight: bold;">void</span> <span style="color: #333333;">*</span>key)
{
<span style="color: #008800; font-weight: bold;">return</span> frame_capture_<span style="color: #333333;">-></span>add_image_listener(std<span style="color: #333333;">::</span>move(functor), key);
}
frame_capture_config capture_controller<span style="color: #333333;">::</span>get_params() <span style="color: #008800; font-weight: bold;">const</span>
{
<span style="color: #008800; font-weight: bold;">return</span> frame_capture_<span style="color: #333333;">-></span>get_params();
}
QString capture_controller<span style="color: #333333;">::</span>get_url() <span style="color: #008800; font-weight: bold;">const</span>
{
<span style="color: #008800; font-weight: bold;">return</span> frame_capture_<span style="color: #333333;">-></span>get_url();
}
<span style="color: #333399; font-weight: bold;">bool</span> capture_controller<span style="color: #333333;">::</span>is_stop() <span style="color: #008800; font-weight: bold;">const</span>
{
<span style="color: #008800; font-weight: bold;">return</span> frame_capture_<span style="color: #333333;">-></span>is_stop();
}
<span style="color: #333399; font-weight: bold;">bool</span> capture_controller<span style="color: #333333;">::</span>remove_image_listener(<span style="color: #333399; font-weight: bold;">void</span> <span style="color: #333333;">*</span>key)
{
<span style="color: #008800; font-weight: bold;">return</span> frame_capture_<span style="color: #333333;">-></span>remove_image_listener(key);
}
<span style="color: #333399; font-weight: bold;">void</span> capture_controller<span style="color: #333333;">::</span>set_max_fps(<span style="color: #333399; font-weight: bold;">int</span> input)
{
frame_capture_<span style="color: #333333;">-></span>set_max_fps(input);
}
<span style="color: #333399; font-weight: bold;">void</span> capture_controller<span style="color: #333333;">::</span>set_params(<span style="color: #008800; font-weight: bold;">const</span> frame_capture_config <span style="color: #333333;">&</span>config)
{
frame_capture_<span style="color: #333333;">-></span>set_params(config);
}
<span style="color: #333399; font-weight: bold;">void</span> capture_controller<span style="color: #333333;">::</span>stop()
{
frame_capture_<span style="color: #333333;">-></span>stop();
}
}
</pre>
</div>
<br />
<br />
<h2 style="text-align: center;">
<u>Do we need mutex?</u></h2>
With current solution, yes, we need it. But we could avoid mutex if we declare the other api as signal and connect them to the worker, just like the start and start_url signals of the frame_capture_controller did. <br />
<br />
<h2 style="text-align: center;">
<u>How to use it?</u></h2>
You can find the answer from tthis file--<a href="http://mainwindow.cpp/">mainwindow.cpp</a>. For simplicity, I did not put the functor into another thread.<br />
<br />
<h2 style="text-align: center;">
<u>Example</u></h2>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhjB-AB1VnJ2mNHsXfdONVl6UU0r2QUAsRTWX47xH80__VtIO-blwU5LdQPBQCdUhxtr0zpvxqvILRX_JkhyiF67pcyZvMnA_JLgtDTbL6dDpvr-M0n2zp-KCk0UOVe4y9-UzZZuYjkpvI/s1600/Capture.PNG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="498" data-original-width="754" height="211" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhjB-AB1VnJ2mNHsXfdONVl6UU0r2QUAsRTWX47xH80__VtIO-blwU5LdQPBQCdUhxtr0zpvxqvILRX_JkhyiF67pcyZvMnA_JLgtDTbL6dDpvr-M0n2zp-KCk0UOVe4y9-UzZZuYjkpvI/s320/Capture.PNG" width="320" /></a></div>
<br />
<br />
<br />
<br />
<br />
<h2 style="text-align: center;">
<u>Warning</u></h2>
Remeber to release the listener of the frame capture if the resources associate will be deleted. <br />
<br />
<br />
<br />
<br />
<br />ThamBloghttp://www.blogger.com/profile/12617350461752184349noreply@blogger.com0tag:blogger.com,1999:blog-4702230343097536610.post-61268155743263078812019-03-27T02:13:00.001-07:002021-02-26T08:35:56.460-08:00Asynchronous computer vision algorithm In last post I introduce how to <a href="https://qtandopencv.blogspot.com/2019/03/asynchronous-videocapture-of-opencv.html">create an asynchronous class to capture the frame by cv::VideoCapture</a>, today I would show you how to create an asynchronous algorithm which could be called a lot of times without re-spawn any new thread. <br />
<br />
<h2 style="text-align: center;">
<span style="color: blue;"><u>Main flow of async_to_gray_algo</u></span> </h2>
async_to_gray_algo is a small class which convert the image from bgr
channels to gray image in another thread. If you have use any thread
pool library before, all of them are using similar logic under the hood,
but with more generic, flexible api.<br />
<!--HTML generated using hilite.me--><br />
<div style="background: rgb(255, 255, 255) none repeat scroll 0% 0%; border-width: 0.1em 0.1em 0.1em 0.8em; border: medium solid gray; overflow: auto; padding: 0.2em 0.6em; width: auto;">
<pre style="line-height: 125%; margin: 0px;">async_to_gray_algo<span style="color: #333333;">::</span>async_to_gray_algo(cv<span style="color: #333333;">::</span>Mat <span style="color: #333333;">&</span>result, std<span style="color: #333333;">::</span>mutex <span style="color: #333333;">&</span>result_mutex) <span style="color: #333333;">:</span>
result_(result),
result_mutex_(result_mutex),
stop_(<span style="color: #007020;">false</span>)
{
<span style="color: #008800; font-weight: bold;">auto</span> func <span style="color: #333333;">=</span> [<span style="color: #333333;">&</span>]()
{
<span style="color: #888888;">//1. In order to reuse the thread, we need to keep it alive</span>
<span style="color: #888888;">//that is why we should put it in an infinite for loop</span>
<span style="color: #008800; font-weight: bold;">for</span>(;;){
unique_lock<span style="color: #333333;"><</span>mutex<span style="color: #333333;">></span> lock(mutex_);
<span style="color: #888888;">//2. use condition_variable to replace sleep(x milliseconds) is more efficient</span>
wait_.wait(lock, [<span style="color: #333333;">&</span>]() <span style="color: #888888;">//wait_ will acquire the lock if condition satisfied</span>
{
<span style="color: #008800; font-weight: bold;">return</span> stop_ <span style="color: #333333;">||</span> <span style="color: #333333;">!</span>input_.empty();
});
<span style="color: #888888;">//3. stop the thread in destructor</span>
<span style="color: #008800; font-weight: bold;">if</span>(stop_){
<span style="color: #008800; font-weight: bold;">return</span>;
}
<span style="color: #888888;">//4. convert and write the results into result_</span>
<span style="color: #888888;">//we need gmutex to synchronize the result_, else it may incur</span>
<span style="color: #888888;">//race condition in the main thread.</span>
{
lock_guard<span style="color: #333333;"><</span>mutex<span style="color: #333333;">></span> glock(result_mutex);
cv<span style="color: #333333;">::</span>cvtColor(input_, result_, COLOR_BGR2GRAY);
}
<span style="color: #888888;">//5: clear the input_, else the wait_ variable may wake up and continue the task</span>
<span style="color: #888888;">//due to spurious wake up</span>
input_.resize(<span style="color: #0000dd; font-weight: bold;">0</span>);
}
};
thread_ <span style="color: #333333;">=</span> std<span style="color: #333333;">::</span><span style="color: #008800; font-weight: bold;">thread</span>(func);
}
</pre>
</div>
<br />
After we initialize the thread, all we need to do is call it by the process api whenever we need to convert image from bgr channels to gray image.<br />
<br />
<!--HTML generated using hilite.me--><br />
<div style="background: rgb(255, 255, 255) none repeat scroll 0% 0%; border-width: 0.1em 0.1em 0.1em 0.8em; border: medium solid gray; overflow: auto; padding: 0.2em 0.6em; width: auto;">
<pre style="line-height: 125%; margin: 0px;"><span style="color: #333399; font-weight: bold;">void</span> async_to_gray_algo<span style="color: #333333;">::</span>process(Mat input)
{
{
lock_guard<span style="color: #333333;"><</span>mutex<span style="color: #333333;">></span> lock(mutex_);
input_ <span style="color: #333333;">=</span> input;
}</pre>
<pre style="line-height: 125%; margin: 0px;"> //wait condition will acquire the mutex after it receive notification </pre>
<pre style="line-height: 125%; margin: 0px;"> wait_.notify_one();
}
</pre>
</div>
<br />
If we do not need this class anymore, we can and should stop it in the destructor, always followed the rule of RAII when you can is a best practices to keep your codes clean, robust and (much)easier to maintain(let the machine do the jobs of book keeping for humans).<br />
<br />
<br />
<!--HTML generated using hilite.me--><br />
<div style="background: rgb(255, 255, 255) none repeat scroll 0% 0%; border-width: 0.1em 0.1em 0.1em 0.8em; border: medium solid gray; overflow: auto; padding: 0.2em 0.6em; width: auto;">
<pre style="line-height: 125%; margin: 0px;">async_to_gray_algo<span style="color: #333333;">::~</span>async_to_gray_algo()
{
{
lock_guard<span style="color: #333333;"><</span>mutex<span style="color: #333333;">></span> lock(mutex_);
stop_ <span style="color: #333333;">=</span> <span style="color: #007020;">true</span>;
}
wait_.notify_one();
thread_.join();
}
</pre>
</div>
<br />
<h2 style="text-align: center;">
<span style="color: blue;"><b><u>What is spurious wake up?</u></b></span></h2>
That means the condition_variable may wake up even no notification(notify_one or notify_all) happened. This is one of the reason why we should not wait without a condition(Another reason lost wake up).<br />
<h2 style="text-align: center;">
<span style="color: blue;"><u><b>Do we have a better way to reuse the thread?</b></u></span></h2>
Yes, we have. The easiest solution is create a generic thread pool, you can check the codes of a simple thread pool at <a href="https://github.com/stereomatchingkiss/ocv_libs/tree/master/thread">here</a>. I would show you how to use it in the future. <br />
<h2 style="text-align: center;">
<span style="color: blue;"><u><b>Better way to pass the variable between different thread?</b></u></span></h2>
As you see, the way I communicate between main thread and the other thread are awkward, it will be a hell to maintain the source codes like that when your program become bigger and bigger. Fortunately, we have better way to pass the variable between different thread with the help of <a href="https://www.qt.io/">Qt5</a>, by their signal and slot mechanism.Not to mention, Qt5 can help us make the codes much more easy to maintain.<br />
<h2 style="text-align: center;">
<span style="color: blue;"><u><b>Summary </b></u></span></h2>
<span style="color: blue;"><span style="color: #666666;"><span style="color: black;"> The source codes of async_opencv_video_capture could find on <a href="https://github.com/stereomatchingkiss/blogCodes2/tree/master/async_vision_algorithm">github</a>.</span></span></span><br /><br />ThamBloghttp://www.blogger.com/profile/12617350461752184349noreply@blogger.com0tag:blogger.com,1999:blog-4702230343097536610.post-37266269867354007552019-03-23T05:51:00.001-07:002021-02-26T08:36:07.141-08:00Asynchronous videoCapture of opencv Today I would like to introduce how to create an asynchronous videoCapture by opencv and standard library of c++. Captured video from HD video, especially the HD video from internet could be a time consuming task, it is not a good idea to waste the cpu cycle to wait the frame arrive, in order to speed up our app, or keep the gui alive, we better put the video capture part into another thread.<br />
<br />
With the helps of thread facilities added since c++11, make the videoCapture of opencv support cross platform asynchronous read operation become a simple task, <span style="color: #00677c;">let us have a simple example.</span><br />
<br />
<!--HTML generated using hilite.me--><br />
<div style="background: rgb(255, 255, 255) none repeat scroll 0% 0%; border-width: 0.1em 0.1em 0.1em 0.8em; border: medium solid gray; overflow: auto; padding: 0.2em 0.6em; width: auto;">
<pre style="line-height: 125%; margin: 0px;"><span style="color: #557799;">#include <ocv_libs/camera/async_opencv_video_capture.hpp></span>
<span style="color: #557799;">#include <opencv2/core.hpp></span>
<span style="color: #557799;">#include <opencv2/highgui.hpp></span>
<span style="color: #557799;">#include <iostream></span>
<span style="color: #557799;">#include <mutex></span>
<span style="color: #333399; font-weight: bold;">int</span> <span style="color: #0066bb; font-weight: bold;">main</span>(<span style="color: #333399; font-weight: bold;">int</span> argc, <span style="color: #333399; font-weight: bold;">char</span> <span style="color: #333333;">*</span>argv[])
{
<span style="color: #008800; font-weight: bold;">if</span>(argc <span style="color: #333333;">!=</span> <span style="color: #0000dd; font-weight: bold;">2</span>){
std<span style="color: #333333;">::</span>cerr<span style="color: #333333;"><<</span><span style="background-color: #fff0f0;">"must enter url of media</span><span style="background-color: #fff0f0; color: #666666; font-weight: bold;">\n</span><span style="background-color: #fff0f0;">"</span>;
<span style="color: #008800; font-weight: bold;">return</span> <span style="color: #333333;">-</span><span style="color: #0000dd; font-weight: bold;">1</span>;
}
std<span style="color: #333333;">::</span>mutex emutex;
<span style="color: #888888;">//create the functor to handle the exception when cv::VideoCapture fail</span>
<span style="color: #888888;">//to capture the frame and wait 30 msec between each frame</span>
<span style="color: #333399; font-weight: bold;">long</span> <span style="color: #333399; font-weight: bold;">long</span> constexpr wait_msec <span style="color: #333333;">=</span> <span style="color: #0000dd; font-weight: bold;">30</span>;
ocv<span style="color: #333333;">::</span>camera<span style="color: #333333;">::</span>async_opencv_video_capture<span style="color: #333333;"><></span> cl([<span style="color: #333333;">&</span>](std<span style="color: #333333;">::</span>exception <span style="color: #008800; font-weight: bold;">const</span> <span style="color: #333333;">&</span>ex)
{
<span style="color: #888888;">//cerr of c++ is not a thread safe class, so we need to lock the mutex</span>
std<span style="color: #333333;">::lock_guard</span><span style="color: #333333;"><</span>std<span style="color: #333333;">::</span>mutex<span style="color: #333333;">></span> lock(emutex);
std<span style="color: #333333;">::</span>cerr<span style="color: #333333;"><<</span><span style="background-color: #fff0f0;">"camera exception:"</span><span style="color: #333333;"><<</span>ex.what()<span style="color: #333333;"><<</span>std<span style="color: #333333;">::</span>endl;
<span style="color: #008800; font-weight: bold;">return</span> <span style="color: #007020;">true</span>;
}, wait_msec);
cl.open_url(argv[<span style="color: #0000dd; font-weight: bold;">1</span>]);
<span style="color: #888888;">//add listener to process captured frame</span>
<span style="color: #888888;">//the listener could process the task in another thread too,</span>
<span style="color: #888888;">//to make things easier to explain, I prefer to process it in</span>
<span style="color: #888888;">//the same thread of videoCapture</span>
cv<span style="color: #333333;">::</span>Mat img;
cl.add_listener([<span style="color: #333333;">&</span>](cv<span style="color: #333333;">::</span>Mat input)
{
std<span style="color: #333333;">::</span><span style="color: #333333;">lock_guard</span><span style="color: #333333;"><</span>std<span style="color: #333333;">::</span>mutex<span style="color: #333333;">></span> lock(emutex);
img <span style="color: #333333;">=</span> input;
}, <span style="color: #333333;">&</span>emutex);
<span style="color: #888888;">//execute the task(s)</span>
cl.run();
<span style="color: #888888;">//We must display the captured image at main thread but not</span>
<span style="color: #888888;">//in the listener, because every manipulation related to gui</span>
<span style="color: #888888;">//must perform in the main thread(it also called gui thread)</span>
<span style="color: #008800; font-weight: bold;">for</span>(<span style="color: #333399; font-weight: bold;">int</span> finished <span style="color: #333333;">=</span> <span style="color: #007020;">false</span>; finished <span style="color: #333333;">!=</span> <span style="color: #0044dd;">'q'</span>;){
finished <span style="color: #333333;">=</span> std<span style="color: #333333;">::</span>tolower(cv<span style="color: #333333;">::</span>waitKey(<span style="color: #0000dd; font-weight: bold;">30</span>));
std<span style="color: #333333;">::</span><span style="color: #333333;">lock_guard</span><span style="color: #333333;"><</span>std<span style="color: #333333;">::</span>mutex<span style="color: #333333;">></span> lock(emutex);
<span style="color: #008800; font-weight: bold;">if</span>(<span style="color: #333333;">!</span>img.empty()){
cv<span style="color: #333333;">::</span>imshow(<span style="background-color: #fff0f0;">"frame"</span>, img);
}
}
}
</pre>
</div>
<br />
<br />
<h2 style="text-align: center;">
<span style="color: blue;"><u><b>Important details of async_opencv_video_capture</b></u></span></h2>
<div style="text-align: center;">
<span style="color: #0b5394;"><b><span style="color: blue;"><u>1. Create an infinite for loop to read the frame in another thread</u></span></b></span></div>
<br />
<br />
<!--HTML generated using hilite.me--><br />
<div style="background: rgb(255, 255, 255) none repeat scroll 0% 0%; border-width: 0.1em 0.1em 0.1em 0.8em; border: medium solid gray; overflow: auto; padding: 0.2em 0.6em; width: auto;">
<pre style="line-height: 125%; margin: 0px;"><span style="color: #333399; font-weight: bold;">void</span> <span style="color: #0066bb; font-weight: bold;">run</span>()
{
<span style="color: #008800; font-weight: bold;">if</span>(thread_){
<span style="color: #888888;">//before we start the thread,</span>
<span style="color: #888888;">//we need to stop it</span>
set_stop(<span style="color: #007020;">true</span>);
<span style="color: #888888;">//call join before task(s)</span>
<span style="color: #888888;">//of the thread done</span>
thread_<span style="color: #333333;">-></span>join();
set_stop(<span style="color: #007020;">false</span>);
}
<span style="color: #888888;">//create a new thread</span>
create_thread();
}
</pre>
</div>
<br />
<br />
<!--HTML generated using hilite.me--><br />
<div style="background: rgb(255, 255, 255) none repeat scroll 0% 0%; border-width: 0.1em 0.1em 0.1em 0.8em; border: medium solid gray; overflow: auto; padding: 0.2em 0.6em; width: auto;">
<pre style="line-height: 125%; margin: 0px;"> <span style="color: #333399; font-weight: bold;">void</span> <span style="color: #0066bb; font-weight: bold;">create_thread</span>()
{
thread_ <span style="color: #333333;">=</span> std<span style="color: #333333;">::</span>make_unique<span style="color: #333333;"><</span>std<span style="color: #333333;">::</span><span style="color: #008800; font-weight: bold;">thread</span><span style="color: #333333;">></span>([<span style="color: #008800; font-weight: bold;">this</span>]()
{
<span style="color: #888888;">//read the frames in infinite for loop</span>
<span style="color: #008800; font-weight: bold;">for</span>(cv<span style="color: #333333;">::</span>Mat frame;;){
std<span style="color: #333333;">::</span>lock_guard<span style="color: #333333;"><</span>Mutex<span style="color: #333333;">></span> lock(mutex_);
<span style="color: #008800; font-weight: bold;">if</span>(<span style="color: #333333;">!</span>stop_ <span style="color: #333333;">&&</span> <span style="color: #333333;">!</span>listeners_.empty()){
try{
cap_<span style="color: #333333;">>></span>frame;
}<span style="color: #008800; font-weight: bold;">catch</span>(std<span style="color: #333333;">::</span>exception <span style="color: #008800; font-weight: bold;">const</span> <span style="color: #333333;">&</span>ex){
<span style="color: #888888;">//reopen the camera if exception thrown ,this may happen frequently when you</span>
<span style="color: #888888;">//receive frames from network</span>
cap_.open(url_);
cam_exception_listener_(ex);
}
<span style="color: #008800; font-weight: bold;">if</span>(<span style="color: #333333;">!</span>frame.empty()){
<span style="color: #008800; font-weight: bold;">for</span>(<span style="color: #008800; font-weight: bold;">auto</span> <span style="color: #333333;">&</span>val <span style="color: #333333;">:</span> listeners_){
val.second(frame);
}
}<span style="color: #008800; font-weight: bold;">else</span>{
<span style="color: #008800; font-weight: bold;">if</span>(replay_){
cap_.open(url_);
}<span style="color: #008800; font-weight: bold;">else</span>{
<span style="color: #008800; font-weight: bold;">break</span>;
}
}
std<span style="color: #333333;">::</span>this_thread<span style="color: #333333;">::</span>sleep_for(wait_for_);
}<span style="color: #008800; font-weight: bold;">else</span>{
<span style="color: #008800; font-weight: bold;">break</span>;
}
}
});
}
</pre>
</div>
<br />
The listeners_ is a vector which stores the std::function<cv::Mat> to be called in the infinite loop if the frame readed by the videoCapture was not empty. The users must handle the exceptions thrown by those functors by themselves else the app will crash.<br />
<br />
<div style="text-align: center;">
<span style="color: #0b5394;"><u><span style="color: blue;">2. Stop the thread in the destructor</span></u></span></div>
<br />
<br />
<!--HTML generated using hilite.me--><br />
<div style="background: rgb(255, 255, 255) none repeat scroll 0% 0%; border-width: 0.1em 0.1em 0.1em 0.8em; border: medium solid gray; overflow: auto; padding: 0.2em 0.6em; width: auto;">
<pre style="line-height: 125%; margin: 0px;"><span style="color: #333399; font-weight: bold;">void</span> <span style="color: #0066bb; font-weight: bold;">set_stop</span>(<span style="color: #333399; font-weight: bold;">bool</span> val)
{
std<span style="color: #333333;">::</span>lock_guard<span style="color: #333333;"><</span>Mutex<span style="color: #333333;">></span> lock(mutex_);
stop_ <span style="color: #333333;">=</span> val;
}
<span style="color: #333399; font-weight: bold;">void</span> <span style="color: #0066bb; font-weight: bold;">stop</span>()
{
set_stop(<span style="color: #007020;">true</span>);
}
<span style="color: #008800; font-weight: bold;">template</span><span style="color: #333333;"><</span><span style="color: #008800; font-weight: bold;">typename</span> Mutex<span style="color: #333333;">></span>
async_opencv_video_capture<span style="color: #333333;"><</span>Mutex<span style="color: #333333;">>::~</span>async_opencv_video_capture()
{
stop();
thread_<span style="color: #333333;">-></span>join();
}
</pre>
</div>
<br />
<br />
We must stop and join the thread in the destructor, else the thread may never end and cause the app freeze.<br />
<br />
<div style="text-align: center;">
<span style="color: #0b5394;"><u><b><span style="color: blue;">3. Select mutex type by template</span></b></u></span></div>
<br />
By default, async_opencv_video_capture use std::mutex, it is more efficient but may cause dead lock if you called the api of async_opencv_video_capture in the listeners. If you want to avoid dead lock this issue, use std::recursive_mutex to replace std::mutex.<br />
<h2 style="text-align: center;">
<span style="color: blue;"><u><b>Summary </b></u></span></h2>
<span style="color: blue;"><span style="color: #666666;"><span style="color: black;"> The source codes of async_opencv_video_capture could find on <a href="https://github.com/stereomatchingkiss/ocv_libs/tree/master/camera">github</a>.</span></span></span><br /><br />ThamBloghttp://www.blogger.com/profile/12617350461752184349noreply@blogger.com0tag:blogger.com,1999:blog-4702230343097536610.post-78850447376645078142019-03-09T17:07:00.002-08:002019-03-09T17:07:18.917-08:00Build mxnet 1.3.1 on windows If you were like me, tried to build mxnet 1.3.1 on windows, you may suffer a lot of pains since mxnet do not have decent support on windows, apparently the developers of mxnet do not perform enough tests(maybe none) on windows before they release the stable version. Despite of all of the troubles mxnet brought, it is still a nice tool of deep learning, that is why I am still prefer to work with it.<br />
<br />
I believe one of the best way to make the open source project become better is contribute something back to it, that is why I would like to write down how to build mxnet 1.3.1 on windows step by step.<br />
<br />
<h2 style="text-align: center;">
<span style="color: blue;"><u>1. Do not build mxnet on windows with intel mkl</u></span></h2>
Do not do this unless you are asking for trouble, please check the details on <a href="https://stackoverflow.com/questions/55017740/mxnet-build-with-intel-mkl-always-throw-error-intel-mkl-fatal-error-cannot-loa">stackoverflow</a> and <a href="https://github.com/apache/incubator-mxnet/issues/14343#issuecomment-470043731">issue 14343</a>.<br />
<br />
<h2 style="text-align: center;">
<span style="color: blue;"><u><b>2. Build openBLAS with native msvc ABI</b></u></span></h2>
<br />
The openBLAS post at <a href="https://sourceforge.net/projects/openblas/files/v0.2.14/">here </a>do not work with vc2015 anymore(if you updated your vc2015), the abi are not compatible with msvc. The easiest solution to solve this issue is build the openBLAS by yourself.The steps are<br />
<br />
a. Clone openBLAS of xianyi from <a href="https://github.com/xianyi/OpenBLAS">github</a><br />
b. Compile openBLAS as the instruction shown <a href="https://github.com/xianyi/OpenBLAS/wiki/How-to-use-OpenBLAS-in-Microsoft-Visual-Studio">here</a>. Do not install Anaconda and miniconda together, just pick one of them. If you do not know where is your vcvars64.bat on your pc, I suggest you use <a href="https://filehippo.com/download_everything/">Everything </a>to find the path.<br />
c. Copy the files(cblas.h, f77blas.h) from the generated folder into the build folder.<br />
<h2 style="text-align: center;">
<u><span style="color: blue;">3. Clone mxnet fork by me</span></u></h2>
<br />
<pre><code>git clone --recursive https://github.com/stereomatchingkiss/incubator-mxnet</code></pre>
<pre><code>cd mxnet </code></pre>
<pre><code>git checkout 1.3.1_win_compile_fix</code></pre>
<pre><code> </code></pre>
<pre><code><span style="font-family: Times, "Times New Roman", serif;"> This branch fix some type mismatch errors.</span></code></pre>
<pre><code><span style="font-family: Times, "Times New Roman", serif;"> </span></code></pre>
<div style="text-align: center;">
<h2>
<u><span style="color: blue;"><b><code><span style="font-family: Times, "Times New Roman", serif;">4. Comment out codes in shuffel_op.cu</span></code></b></span></u></h2>
</div>
<pre><code><span style="font-family: Times, "Times New Roman", serif;"> This file is under the folder "mxnet\src\operator\random", there is a function ShuffleForwardGPU, </span></code></pre>
<pre><code><span style="font-family: Times, "Times New Roman", serif;">comment out the implementation else there will have a lot of compile times errors(no suitable </span></code></pre>
<pre><code><span style="font-family: Times, "Times New Roman", serif;">user-defined conversion from "mshadow::Tensor<mxnet::gpu, 1, mxnet::index_t>" to </span></code></pre>
<pre><code><span style="font-family: Times, "Times New Roman", serif;">"const mshadow::Tensor<mshadow::gpu, 1, unsigned int>" exists).</span></code></pre>
<pre><code><span style="font-family: Times, "Times New Roman", serif;"> </span></code></pre>
<pre><code><span style="font-family: Times, "Times New Roman", serif;"> I guess this function would not be called when doing inference task, after all who would like to </span></code></pre>
<pre><code><span style="font-family: Times, "Times New Roman", serif;">make their inference results become unpredictable? If you were like me, only want to use </span></code></pre>
<pre><code><span style="font-family: Times, "Times New Roman", serif;">cpp_package to do the inference task, you should be safe to comment out the codes. </span></code></pre>
<div style="text-align: center;">
<h2>
<span style="color: blue;"><u><code><span style="font-family: Times, "Times New Roman", serif;">5.</span> <b><span style="font-family: Times, "Times New Roman", serif;">Open cmake</span></b></code></u></span></h2>
</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjOkKu2QONNOH7vpXzkGWzW1JnL8GM2V6RXZmcy_8sSVGddpJPACgZtCHlB0jy_DCNtd3iFaEJHZgR3bctxE6b-Gh1riMrQVXYDHqhyphenhyphenHltFOPxXvqnkE2ws4elMQWwYf3DmkkjKYz-EWiQ/s1600/Capture.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="665" data-original-width="1041" height="255" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjOkKu2QONNOH7vpXzkGWzW1JnL8GM2V6RXZmcy_8sSVGddpJPACgZtCHlB0jy_DCNtd3iFaEJHZgR3bctxE6b-Gh1riMrQVXYDHqhyphenhyphenHltFOPxXvqnkE2ws4elMQWwYf3DmkkjKYz-EWiQ/s400/Capture.JPG" width="400" /></a></div>
<pre><code><span style="font-family: Times, "Times New Roman", serif;">
</span></code></pre>
<pre><code><span style="font-family: Times, "Times New Roman", serif;"> Open your cmake and select msvc with 64bits.</span></code></pre>
<pre><code><span style="font-family: Times, "Times New Roman", serif;">
</span></code></pre>
<div style="text-align: center;">
<h2>
<code><span style="font-family: Times, "Times New Roman", serif;"><span style="color: blue;"><u><b>6. Configuration</b></u></span></span></code></h2>
</div>
<pre><code><span style="font-family: Times, "Times New Roman", serif;">
</span></code></pre>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi944Affvwzdm6Nsj7qcTQyGIUETScvfjw-EZddYQwNtAI9bPqF1SDLvyJZEgwz_2b6i6sXrqDQkw10OkbB92bfhezsgzSY769eu7wSyCQHNZmvGalvoClCjkQPF4d6L7QutYL5_8VW0LE/s1600/Capture.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="532" data-original-width="1600" height="132" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi944Affvwzdm6Nsj7qcTQyGIUETScvfjw-EZddYQwNtAI9bPqF1SDLvyJZEgwz_2b6i6sXrqDQkw10OkbB92bfhezsgzSY769eu7wSyCQHNZmvGalvoClCjkQPF4d6L7QutYL5_8VW0LE/s400/Capture.JPG" width="400" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiABMmS3q2tIlLJ_UJMQMp9pxitpctYJc6p-s-xo7XfyG2O3XQhH5wVCc1074YvDurANJ9_jcDM0MllpdyljptkxTbdGR_pdEf3vMmrR5SGGtsPprCqccaacPdRxMEPCd2PA78tolWnigQ/s1600/Capture.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="543" data-original-width="1600" height="135" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiABMmS3q2tIlLJ_UJMQMp9pxitpctYJc6p-s-xo7XfyG2O3XQhH5wVCc1074YvDurANJ9_jcDM0MllpdyljptkxTbdGR_pdEf3vMmrR5SGGtsPprCqccaacPdRxMEPCd2PA78tolWnigQ/s400/Capture.JPG" width="400" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<h3>
<code><span style="font-family: Times, "Times New Roman", serif;"> The most important note are</span></code></h3>
<h3>
<code><span style="font-family: Times, "Times New Roman", serif;">1. <span style="color: red;"><b>Do not use anything related to intel MKL</b></span>.</span></code><span style="color: red;"><code></code></span></h3>
<h3>
<span style="color: red;"><code><span style="font-family: Times, "Times New Roman", serif;"><span style="color: black;">2.</span> Do not build cpp_package at the first time</span></code></span></h3>
<pre><code><span style="font-family: Times, "Times New Roman", serif;"> Without mkl mxnet cannot exploit full power of the cpu, but with it your app cannot run at all, </span></code></pre>
<pre><code><span style="font-family: Times, "Times New Roman", serif;">depending on how you build it, your app may throw the error </span></code></pre>
<pre><code><span style="font-family: Times, "Times New Roman", serif;">"</span></code><code><span style="font-family: Times, "Times New Roman", serif;"><strong>Intel MKL FATAL ERROR: Cannot load mkl_intel_thread.dll.</strong>" or </span></code></pre>
<pre><code><span style="font-family: Times, "Times New Roman", serif;">"Check failed: MXNDArrayWaitToRead(blob_ptr_->handle_) == 0 (-1 vs. 0)"</span></code></pre>
<pre><code><span style="font-family: Times, "Times New Roman", serif;">
</span></code></pre>
If you do not need cuda, uncheck following options, USE_CUDA and USE_CUDNN. After that, click Configure->uncheck BUILD_TESTING->click Configure->click generate.<br />
<br />
<h2 style="text-align: center;">
<u><span style="color: blue;">7. Build mxnet without cpp_package</span></u></h2>
<br />
a. Open your ALL_BUILD.vcxproj<br />
b. Navigate to the project "mxnet"<br />
c. Right click your mouse, select "Properties"<br />
d. Select Linker->Input<br />
e. Link to flangmain.lib, flangrti.lib, flang.lib, ompstub.lib. For example, my paths are<br />
<br />
C:\Users\yyyy\Anaconda3\pkgs\flang-5.0.0-he025d50_20180525\Library\lib\flangmain.lib <br /> C:\Users\yyyy\Anaconda3\pkgs\flang-5.0.0-he025d50_20180525\Library\lib\flangrti.lib<br /> C:\Users\yyyy\Anaconda3\pkgs\flang-5.0.0-he025d50_20180525\Library\lib\flang.lib<br /> C:\Users\yyyy\Anaconda3\pkgs\flang-5.0.0-he025d50_20180525\Library\lib\ompstub.lib<br /><br />
If you do not know where are they, use <a href="https://filehippo.com/download_everything/">Everything </a>to find the path.<br />
<br />
f. Navigate to the project "ALL_Build"<br />
g.Right click your mouse, click build.<br />
<h2 style="text-align: center;">
<span style="color: blue;"><u>8. Configure cmake to build with cpp_package</u></span></h2>
<br />
Now we can build mxnet with cpp_package, let us go to cmake again and change some settings. <br />
<br />
a. (Optional)Change your install path, else you may not be able to install(ex : change to C:/Users/yyyy/programs/Qt/3rdLibs/mxnet/build_gpu_1_3_1_temp/install).<br />
<br />
b. Make sure you have set the PATH of python, if you are building 32/64bits version of mxnet, you need python of 32/64bits, else you wouldn't be able to generate op.h. I suggest you use Rapid environment to manage your path on windows.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj-SAMlW9QwhID05QuJuzHXwJlYRneAGXXu-0X73WBU0tDTKDUHTt-AFTbKWJ8excUCND4mBMIaNIMSrd0wCeMfgl1H1z8ZEPDuyiF9wJryVvkn8Gbn4beqo2h3E1FRabuwdrqv7E1MtCQ/s1600/Capture.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="371" data-original-width="516" height="230" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj-SAMlW9QwhID05QuJuzHXwJlYRneAGXXu-0X73WBU0tDTKDUHTt-AFTbKWJ8excUCND4mBMIaNIMSrd0wCeMfgl1H1z8ZEPDuyiF9wJryVvkn8Gbn4beqo2h3E1FRabuwdrqv7E1MtCQ/s320/Capture.JPG" width="320" /></a></div>
<br />
If your vc complain it cannot find the python exe, reopen your vc.<br />
<br />
c. Check USE_CPP_PACKAGE->uncheck BUILD_TESTING->configure->generate<br />
<br />
d. Remove the example projects since they will hinder the build process, those projects are alexnet, charRNN, googleNet, inception_bn, lenet, lenet_with_mxdataiter, mlp, mlp_cpu, mlp_gpu, resnset.<br />
<br />
e. Go to your build/Release folder, copy the libmxnet.dll into any folder which could be found by<br />
the windows(the path in the Path), let us assume that path call global_path.<br />
<br />
f. Open your Developer command prompt(mine is developer command prompt for vs2015), let us call it DCP<br />
<br />
g. Navigate your DCP to global_path<br />
<br />
h. Enter "dumpbin /dependents libmxnet.dll", this command will show you the dependencies of this<br />
dll. In my case, it show <br />
<br />
flangrti.dll<br />
flang.dll<br />
ompstub.dll<br />
cudnn64_7.dll<br />
cublas64_92.dll<br />
cufft64_92.dll<br />
cusolver64_92.dll<br />
curand64_92.dll<br />
nvrtc64_92.dll<br />
nvcuda.dll<br />
KERNEL32.dll<br />
VCOMP140.DLL<br />
<br />
We only need to copy flangrti.dll, flang.dll, ompstub.dll into global_dll in order to generate op.h, because another dll already exist in the PATH. Again, please use <a href="https://filehippo.com/download_everything/">Everything </a>to find the path.<br />
<br />
i. Your mxnet need to link to Link to flangmain.lib, flangrti.lib, flang.lib, ompstub.lib again since generate clear them.<br />
<br />
j. Navigate to the project "ALL_Build" <br />
<br />
k. Right click your mouse, click build.<br />
<br />
l. Navigate to the project "INSTALL" <br />
<br />
m. Right click your mouse, click build.<br />
<br />
n. Copy cpp-package\include\mxnet-cpp into build/install/include<br />
<br />
o. Copy mxnet\3rdparty\tvm\nnvm\include\nnvm into build/install/include<br />
<br />
<h2 style="text-align: center;">
<span style="color: blue;"><u><b>9. Add mx_float for scale, int for num_filter</b></u></span></h2>
<span style="color: #092e64;"> The op.h generated by this solution, there are two parameters lack type declaration, you need to add them by yourself.</span><br />
<span style="color: #092e64;"><br /></span>
<h2 style="text-align: center;">
<span style="color: blue;"><u><b>Conclusion</b></u></span></h2>
<span style="color: #092e64;"> Congratulation, now you have build the mxnet successfully, to tell you the truth, this is not a pleasant journey, there are too many bugs/issues when I try to build mxnet1.3.1on windows(</span><span style="color: #092e64;"><span style="color: #092e64;">1.4.0 got more bugs on windows when you try to build it</span>) , there are many bugs should be found before they release the major version if them have tried to build mxnet on windows. I believe windows and cpp_package are not their main concern yet, let us hope that </span><br />
<br />
<span style="color: #092e64;">a. Someday they can put more love into windows and cpp_package. Windows still dominate market of desktop/laptop and cpp_package is a much better choice than python if you want to do edge deployment.</span><br />
<br />
<span style="color: #092e64;">b. Adopt a commit system like opencv(whenever you commit your codes, opencv build it on every single platforms they support), this could prevent a log of bugs, the later you adopt, the more cost you need to pay for cross-platform.</span><br />
<span style="color: #092e64;"><br /></span>
<span style="color: #092e64;">c. Let us cross our finger, hope them can fix all of these bugs before next version release</span><br />
<span style="color: #092e64;"><br /></span>
<span style="color: #092e64;"><br /></span>ThamBloghttp://www.blogger.com/profile/12617350461752184349noreply@blogger.com3tag:blogger.com,1999:blog-4702230343097536610.post-35888634169113684472019-02-24T01:45:00.002-08:002021-02-26T08:36:47.470-08:00Age and gender classification by opencv, dlib and mxnet In this post, I will show you how to build an age gender classification application with the infrastructures I created in the <a href="https://qtandopencv.blogspot.com/2019/02/face-recognition-with-mxnet-dlib-and.html"><b>last post</b></a>. Almost everything are same as before, except the part of parsing the NDArray in the forward function.<br />
<br />
Before we dive into the source codes, let us have some examples. The images are predicted by two networks and concatenate, left side predicted by <a href="https://github.com/deepinsight/insightface/tree/master/gender-age">light model</a>, right side predicted by heavy model based on <a href="https://github.com/deepinsight/insightface/tree/master/gender-age">resnet50</a>.<br />
<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgy9ZL2CL756QeoXYq-5ISSahKNycX1C4iBgN7yQukRved5jJ-Empd8TSRKoPWXtBc2cDPLIdGB-yZnSzF1Lu-14V3Ud6-Q6P0fzWA-heJXmcFJrVszeBJkblQUM-awAA-yyBdOPlT4PIU/s1600/40dcef12da3505d4dba0cd70d1a7fd6b.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="532" data-original-width="1600" height="132" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgy9ZL2CL756QeoXYq-5ISSahKNycX1C4iBgN7yQukRved5jJ-Empd8TSRKoPWXtBc2cDPLIdGB-yZnSzF1Lu-14V3Ud6-Q6P0fzWA-heJXmcFJrVszeBJkblQUM-awAA-yyBdOPlT4PIU/s400/40dcef12da3505d4dba0cd70d1a7fd6b.png" width="400" /></a></div>
<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiwxhNEjVZgSqFX-lzm621cG2N95l_cOoa_UsuX0me04hW4FKWj073HCov7MjO7FwWvkrsfbHyNjkCIBH8Xv6MKdTpcLxaJ6xTtpF9d0EA5aaSFvSC-_wu_R2tC32fXIqyTpnO5AYxwqRs/s1600/2018-11-06t054310z-1334124005-rc1be15a8050-rtrmadp-3-people-sexiest-man.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="350" data-original-width="1240" height="112" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiwxhNEjVZgSqFX-lzm621cG2N95l_cOoa_UsuX0me04hW4FKWj073HCov7MjO7FwWvkrsfbHyNjkCIBH8Xv6MKdTpcLxaJ6xTtpF9d0EA5aaSFvSC-_wu_R2tC32fXIqyTpnO5AYxwqRs/s400/2018-11-06t054310z-1334124005-rc1be15a8050-rtrmadp-3-people-sexiest-man.png" width="400" /></a></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhYoB2azpiIetd-IoQDsNjJZsaSL9KivwJzuBEFP9msExyzaIBEJg1tkrs7RYAlao6zn_0oV1cDCntfBgP89-E9Nw778yc2LMzlS0UUMU91Xzb4k2sn5UM5tpWewX2nkH9uLi8s5KCWrvs/s1600/getty_486119835_367858.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="372" data-original-width="1600" height="92" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhYoB2azpiIetd-IoQDsNjJZsaSL9KivwJzuBEFP9msExyzaIBEJg1tkrs7RYAlao6zn_0oV1cDCntfBgP89-E9Nw778yc2LMzlS0UUMU91Xzb4k2sn5UM5tpWewX2nkH9uLi8s5KCWrvs/s400/getty_486119835_367858.png" width="400" /></a></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi8-ZoGATHKdbQtQ7AbYqx_gqyYN3PKuFgHpA3KZb6E_8fM7fbUYwFGj9r_mZ1_U6A9fzG4sJXKOutFh5zom6m8lMEWtJxh-Gl8kX8SOe6CA60Y_UkyIp_rpkKeU2lGsqdawB4UEvucjlU/s1600/maxresdefault.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="450" data-original-width="1600" height="112" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi8-ZoGATHKdbQtQ7AbYqx_gqyYN3PKuFgHpA3KZb6E_8fM7fbUYwFGj9r_mZ1_U6A9fzG4sJXKOutFh5zom6m8lMEWtJxh-Gl8kX8SOe6CA60Y_UkyIp_rpkKeU2lGsqdawB4UEvucjlU/s400/maxresdefault.png" width="400" /></a></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjWdlvu3SHk_xEztFD8aTQYV1pGk9pJg8DUSWKLNNIJNbnmzGkjQyCehqoyK7OlnyikpMsvG_s3Oa_SO9raZxtWNjbHJogwdeVFukdZbz5G6lqAgecoQQjd0kYLlCrDae6cbddQrPJ2Bas/s1600/t.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="375" data-original-width="1000" height="150" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjWdlvu3SHk_xEztFD8aTQYV1pGk9pJg8DUSWKLNNIJNbnmzGkjQyCehqoyK7OlnyikpMsvG_s3Oa_SO9raZxtWNjbHJogwdeVFukdZbz5G6lqAgecoQQjd0kYLlCrDae6cbddQrPJ2Bas/s400/t.png" width="400" /></a></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj3-Qu5AeCcmPkx4S-zkSGD7ieTtmxg5LPQ0U0KEE4BgAeSr0jSOzXNbq7GXzdp66prhpMFErt_D-R1PCazeKv__XKfrwsEAH61gNIGF8XmSsh_8AW-tsiqGzIEJgMbiFHYUv0KA2mgxAM/s1600/b2d1fd70ee.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="335" data-original-width="960" height="138" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj3-Qu5AeCcmPkx4S-zkSGD7ieTtmxg5LPQ0U0KEE4BgAeSr0jSOzXNbq7GXzdp66prhpMFErt_D-R1PCazeKv__XKfrwsEAH61gNIGF8XmSsh_8AW-tsiqGzIEJgMbiFHYUv0KA2mgxAM/s400/b2d1fd70ee.png" width="400" /></a></div>
<br />
The results do not looks bad for both of the models if we don't know their ages :), let us use the model to predict the ages of famous person, like trumps(with resnet50).<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<div style="text-align: center;">
<iframe allowfullscreen="" class="YOUTUBE-iframe-video" data-thumbnail-src="https://i.ytimg.com/vi/Q0CnnBqu1P8/0.jpg" frameborder="0" height="266" src="https://www.youtube.com/embed/Q0CnnBqu1P8?feature=player_embedded" width="320"></iframe></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<iframe allowfullscreen="" class="YOUTUBE-iframe-video" data-thumbnail-src="https://i.ytimg.com/vi/gQrm5m5-uXs/0.jpg" frameborder="0" height="266" src="https://www.youtube.com/embed/gQrm5m5-uXs?feature=player_embedded" width="320"></iframe></div>
<br />
Unfortunately, results of age classification are not that good under different angles and expressions, this is because age classification from an image is very difficult, even human <b><span style="color: red;">cannot </span></b>accurately predict ages of persons by looking at a single image. <span style="font-family: "times" , "times new roman" , serif;"> </span><span style="color: blue;"><span style="color: #666666;"><span style="color: black;"><span style="font-family: "times" , "times new roman" , serif;"> </span></span></span></span><br />
<h2 style="text-align: center;">
<span style="color: blue;"><u>1. Difference of face recognition and age gender classification</u></span></h2>
The codes of parsing face recognition features are<br />
<br />
<div style="background: rgb(255, 255, 255) none repeat scroll 0% 0%; border-width: 0.1em 0.1em 0.1em 0.8em; border: medium solid gray; overflow: auto; padding: 0.2em 0.6em; width: auto;">
<pre style="line-height: 125%; margin: 0px;">std<span style="color: #333333;">::</span>vector<span style="color: #333333;"><</span>insight_face_key<span style="color: #333333;">></span> result;
<span style="color: #333399; font-weight: bold;">size_t</span> constexpr feature_size <span style="color: #333333;">=</span> <span style="color: #0000dd; font-weight: bold;">512</span>;
Shape <span style="color: #008800; font-weight: bold;">const</span> <span style="color: #0066bb; font-weight: bold;">shape</span>(<span style="color: #0000dd; font-weight: bold;">1</span>, feature_size);
<span style="color: #008800; font-weight: bold;">for</span>(<span style="color: #333399; font-weight: bold;">size_t</span> i <span style="color: #333333;">=</span> <span style="color: #0000dd; font-weight: bold;">0</span>; i <span style="color: #333333;">!=</span> batch_size; <span style="color: #333333;">++</span>i){
NDArray feature(features.GetData() <span style="color: #333333;">+</span> i <span style="color: #333333;">*</span> feature_size, shape, Context(kCPU, <span style="color: #0000dd; font-weight: bold;">0</span>));
result.emplace_back(std<span style="color: #333333;">::</span>move(feature));
}
</pre>
</div>
<br />
The codes of parsing age and gender classification are<br />
<!--HTML generated using hilite.me--><br />
<div style="background: rgb(255, 255, 255) none repeat scroll 0% 0%; border-width: 0.1em 0.1em 0.1em 0.8em; border: medium solid gray; overflow: auto; padding: 0.2em 0.6em; width: auto;">
<pre style="line-height: 125%; margin: 0px;">std<span style="color: #333333;">::</span>vector<span style="color: #333333;"><</span>insight_age_gender_info<span style="color: #333333;">></span> result;
<span style="color: #333399; font-weight: bold;">int</span> constexpr features_size <span style="color: #333333;">=</span> <span style="color: #0000dd; font-weight: bold;">202</span>;
<span style="color: #008800; font-weight: bold;">for</span>(<span style="color: #333399; font-weight: bold;">size_t</span> i <span style="color: #333333;">=</span> <span style="color: #0000dd; font-weight: bold;">0</span>; i <span style="color: #333333;">!=</span> batch_size; <span style="color: #333333;">++</span>i){
<span style="color: #008800; font-weight: bold;">auto</span> <span style="color: #008800; font-weight: bold;">const</span> <span style="color: #333333;">*</span>ptr <span style="color: #333333;">=</span> features.GetData() <span style="color: #333333;">+</span> i <span style="color: #333333;">*</span> features_size;
insight_age_gender_info info;
info.gender_ <span style="color: #333333;">=</span> ptr[<span style="color: #0000dd; font-weight: bold;">0</span>] <span style="color: #333333;">></span> ptr[<span style="color: #0000dd; font-weight: bold;">1</span>] <span style="color: #333333;">?</span> gender_info<span style="color: #333333;">::</span>female_ <span style="color: #333333;">:</span> gender_info<span style="color: #333333;">::</span>male_;
<span style="color: #008800; font-weight: bold;">for</span>(<span style="color: #333399; font-weight: bold;">int</span> i <span style="color: #333333;">=</span> <span style="color: #0000dd; font-weight: bold;">2</span>; i <span style="color: #333333;"><</span> features_size; i <span style="color: #333333;">+=</span> <span style="color: #0000dd; font-weight: bold;">2</span>){
<span style="color: #008800; font-weight: bold;">if</span>(ptr[i <span style="color: #333333;">+</span> <span style="color: #0000dd; font-weight: bold;">1</span>] <span style="color: #333333;">></span> ptr[i]){
info.age_ <span style="color: #333333;">+=</span> <span style="color: #0000dd; font-weight: bold;">1</span>;
}
}
result.emplace_back(info);
}
</pre>
</div>
<br />
<br />
Except of these part, <span style="color: blue;"><b>everything are the same</b></span> as before.<br />
<br />
<h2 style="text-align: center;">
<span style="color: blue;"><u>2. Make codes easier to reuse</u></span></h2>
It is a <span style="color: blue;"><b>pain </b></span>to maintain similar codes with <span style="color: blue;"><b>minor </b><b>difference</b></span>, in order to alleviate the prices of maintenance, I create a generic predictor as a template class with three policies, implement the face recognition and age/gender classification with this generic predictor.<br />
<br />
<div style="background: rgb(255, 255, 255) none repeat scroll 0% 0%; border-width: 0.1em 0.1em 0.1em 0.8em; border: medium solid gray; overflow: auto; padding: 0.2em 0.6em; width: auto;">
<pre style="line-height: 125%; margin: 0px;"><span style="color: #008800; font-weight: bold;">template</span><span style="color: #333333;"><</span><span style="color: #008800; font-weight: bold;">typename</span> Return, <span style="color: #008800; font-weight: bold;">typename</span> ProcessFeature, <span style="color: #008800; font-weight: bold;">typename</span> ImageConvert <span style="color: #333333;">=</span> dlib_mat_to_separate_rgb<span style="color: #333333;">></span>
<span style="color: #008800; font-weight: bold;">class</span> <span style="color: #bb0066; font-weight: bold;">generic_predictor</span>
{
<span style="color: #888888;">/*please check the details on github*/</span>
}
</pre>
</div>
<br />
We could use it like to create age gender predictor as following<br />
<br />
<!--HTML generated using hilite.me--><br />
<div style="background: rgb(255, 255, 255) none repeat scroll 0% 0%; border-width: 0.1em 0.1em 0.1em 0.8em; border: medium solid gray; overflow: auto; padding: 0.2em 0.6em; width: auto;">
<pre style="line-height: 125%; margin: 0px;"><span style="color: #008800; font-weight: bold;">struct</span> predict_age_gender_functor
{
std<span style="color: #333333;">::</span>vector<span style="color: #333333;"><</span>insight_age_gender_info<span style="color: #333333;">></span>
<span style="color: #008800; font-weight: bold;">operator</span>()(<span style="color: #008800; font-weight: bold;">const</span> mxnet<span style="color: #333333;">::</span>cpp<span style="color: #333333;">::</span>NDArray <span style="color: #333333;">&</span>features, <span style="color: #333399; font-weight: bold;">size_t</span> batch_size) <span style="color: #008800; font-weight: bold;">const</span>
{
std<span style="color: #333333;">::</span>vector<span style="color: #333333;"><</span>insight_age_gender_info<span style="color: #333333;">></span> result;
<span style="color: #333399; font-weight: bold;">int</span> constexpr features_size <span style="color: #333333;">=</span> <span style="color: #0000dd; font-weight: bold;">202</span>;
<span style="color: #008800; font-weight: bold;">for</span>(<span style="color: #333399; font-weight: bold;">size_t</span> i <span style="color: #333333;">=</span> <span style="color: #0000dd; font-weight: bold;">0</span>; i <span style="color: #333333;">!=</span> batch_size; <span style="color: #333333;">++</span>i){
<span style="color: #008800; font-weight: bold;">auto</span> <span style="color: #008800; font-weight: bold;">const</span> <span style="color: #333333;">*</span>ptr <span style="color: #333333;">=</span> features.GetData() <span style="color: #333333;">+</span> i <span style="color: #333333;">*</span> features_size;
insight_age_gender_info info;
info.gender_ <span style="color: #333333;">=</span> ptr[<span style="color: #0000dd; font-weight: bold;">0</span>] <span style="color: #333333;">></span> ptr[<span style="color: #0000dd; font-weight: bold;">1</span>] <span style="color: #333333;">?</span> gender_info<span style="color: #333333;">::</span>female_ <span style="color: #333333;">:</span> gender_info<span style="color: #333333;">::</span>male_;
<span style="color: #008800; font-weight: bold;">for</span>(<span style="color: #333399; font-weight: bold;">int</span> i <span style="color: #333333;">=</span> <span style="color: #0000dd; font-weight: bold;">2</span>; i <span style="color: #333333;"><</span> features_size; i <span style="color: #333333;">+=</span> <span style="color: #0000dd; font-weight: bold;">2</span>){
<span style="color: #008800; font-weight: bold;">if</span>(ptr[i <span style="color: #333333;">+</span> <span style="color: #0000dd; font-weight: bold;">1</span>] <span style="color: #333333;">></span> ptr[i]){
info.age_ <span style="color: #333333;">+=</span> <span style="color: #0000dd; font-weight: bold;">1</span>;
}
}
result.emplace_back(info);
}
<span style="color: #008800; font-weight: bold;">return</span> result;
}
};
<span style="color: #008800; font-weight: bold;">using</span> insight_age_gender_predict <span style="color: #333333;">=</span> mxnet_aux<span style="color: #333333;">::</span>generic_predictor<span style="color: #333333;"><</span>insight_age_gender_info, predict_age_gender_functor<span style="color: #333333;">></span>;
</pre>
</div>
<br />
Please check <a href="https://github.com/stereomatchingkiss/blogCodes2/blob/master/libs/mxnet/generic_predictor.hpp">github </a>if you want to know the implementation details.<br />
<br />
<h2 style="text-align: center;">
<span style="color: blue;"><u><b>3. Summary </b></u></span></h2>
Gender prediction works very well, unfortunately age predictions is far from ideal.<span style="color: blue;"><span style="color: #666666;"><span style="color: black;"><span style="font-family: "times" , "times new roman" , serif;"> If we could obtain huge data set, which contain the face of the same
person with different range of ages, expression, angles and the number
of races are not super imbalance</span></span>, <span style="color: black;">accuracy of age accuracy may improve very much, but a huge data set like this is very hard to collect. </span></span></span><br />
<br />
<span style="color: blue;"><span style="color: #666666;"><span style="color: black;"> The source codes could find on <a href="https://github.com/stereomatchingkiss/blogCodes2/tree/master/mxnet_age_gender">github</a>.</span></span></span><br /><br />ThamBloghttp://www.blogger.com/profile/12617350461752184349noreply@blogger.com0tag:blogger.com,1999:blog-4702230343097536610.post-48301397047316075302019-02-18T04:21:00.001-08:002021-02-26T08:36:56.789-08:00Face recognition with mxnet, dlib and opencv In this post I will show you how to implement an <b><span style="color: blue;">industrial level, </span></b><b><span style="color: blue;">portable</span></b><span style="color: blue;"><b> face recognition application</b></span> with a <b><span style="color: blue;">small, reuseable example</span></b>, without relying on any commercial library(except of Qt5, unless the module I use in this example support LGPL license).<br />
<br />
Before deep learning become main stream technology in computer vision fields, 2D face recognition <b><span style="color: red;">only </span></b>works well under strict environments, this make it an <span style="color: red;"><b>impractical</b></span> technology.<br />
<br />
Thanks to the contributions of open source communities like <b><a href="http://dlib.net/">dlib</a></b>, <a href="https://opencv.org/"><b>opencv</b> </a>and <b><a href="https://mxnet.apache.org/">mxnet</a></b>, today, high accuracy 2D face recognition is not a difficult problem anymore.<br />
<br />
Before we start, let us see an interesting example(video_00).<br />
<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<iframe allowfullscreen="" class="YOUTUBE-iframe-video" data-thumbnail-src="https://i.ytimg.com/vi/eHAJh74z-fA/0.jpg" frameborder="0" height="266" src="https://www.youtube.com/embed/eHAJh74z-fA?feature=player_embedded" width="320"></iframe></div>
<div style="text-align: center;">
video_00</div>
<br />
Although different angles and expressions affect the confidence value a lot, but in most of the time the algorithm still able to find out the most similar faces from 25 faces. <br />
<br />
The flow of face recognition on <b><a href="https://github.com/stereomatchingkiss/blogCodes2/tree/master/mxnet_face_recognition">github</a></b> are composed by 4 critical steps.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEju-Y175Fs7zfu4xI97dmxYlqCTLC6ZhhzpZR5Ca0umVbDToQfaFeAdslwICASDPw2cucZarTLnxyF8jZQbTLLuCiX8bwOoAh46AwfccD-jPWXVpP5lFo7qXBUnSPKlVsBiz2POonqxahY/s1600/main_flow.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="377" data-original-width="399" height="302" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEju-Y175Fs7zfu4xI97dmxYlqCTLC6ZhhzpZR5Ca0umVbDToQfaFeAdslwICASDPw2cucZarTLnxyF8jZQbTLLuCiX8bwOoAh46AwfccD-jPWXVpP5lFo7qXBUnSPKlVsBiz2POonqxahY/s320/main_flow.png" width="320" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<div style="text-align: center;">
pic_00</div>
<br />
<div style="text-align: center;">
<h2>
<u><b><span style="color: blue;">Detect face by dlib</span></b></u> </h2>
</div>
<pre style="background: rgb(255, 255, 255) none repeat scroll 0% 0%; color: black;"><span style="color: #666616;">std</span><span style="color: purple;">::</span><span style="color: #603000;">vector</span><span style="color: purple;"><</span>mmod_rect<span style="color: purple;">></span> face_detector<span style="color: purple;">::</span>forward_lazy<span style="color: #808030;">(</span><span style="color: maroon; font-weight: bold;">const</span> cv<span style="color: purple;">::</span>Mat <span style="color: #808030;">&</span>input<span style="color: #808030;">)</span>
<span style="color: purple;">{</span>
<span style="color: dimgrey;">//make sure input image got 3 channels</span>
CV_Assert<span style="color: #808030;">(</span>input<span style="color: #808030;">.</span>channels<span style="color: #808030;">(</span><span style="color: #808030;">)</span> <span style="color: #808030;">=</span><span style="color: #808030;">=</span> <span style="color: #008c00;">3</span><span style="color: #808030;">)</span><span style="color: purple;">;</span>
<span style="color: dimgrey;">//Resize the input image to certain width, </span>
<span style="color: dimgrey;">//The bigger the face_detect_width_, more </span>
<span style="color: dimgrey;">//faces could be detected, but will consume</span>
<span style="color: dimgrey;">//more memory, and slower</span>
<span style="color: maroon; font-weight: bold;">if</span><span style="color: #808030;">(</span>input<span style="color: #808030;">.</span>cols <span style="color: #808030;">!</span><span style="color: #808030;">=</span> face_detect_width_<span style="color: #808030;">)</span><span style="color: purple;">{</span>
<span style="color: dimgrey;">//resize_cache_ is a simple trick to reduce the</span>
<span style="color: dimgrey;">//number of memory allocation</span>
<span style="color: maroon; font-weight: bold;">double</span> <span style="color: maroon; font-weight: bold;">const</span> ratio <span style="color: #808030;">=</span> face_detect_width_ <span style="color: #808030;">/</span>
<span style="color: maroon; font-weight: bold;">static_cast</span><span style="color: purple;"><</span><span style="color: maroon; font-weight: bold;">double</span><span style="color: purple;">></span><span style="color: #808030;">(</span>input<span style="color: #808030;">.</span>cols<span style="color: #808030;">)</span><span style="color: purple;">;</span>
cv<span style="color: purple;">::</span>resize<span style="color: #808030;">(</span>input<span style="color: #808030;">,</span> resize_cache_<span style="color: #808030;">,</span> <span style="background: rgb(221, 0, 0) none repeat scroll 0% 0%; color: white; font-style: italic; font-weight: bold;">{</span><span style="background: rgb(221, 0, 0) none repeat scroll 0% 0%; color: white; font-style: italic; font-weight: bold;">}</span><span style="color: #808030;">,</span> ratio<span style="color: #808030;">,</span> ratio<span style="color: #808030;">)</span><span style="color: purple;">;</span>
<span style="color: purple;">}</span><span style="color: maroon; font-weight: bold;">else</span><span style="color: purple;">{</span>
resize_cache_ <span style="color: #808030;">=</span> input<span style="color: purple;">;</span>
<span style="color: purple;">}</span>
<span style="color: dimgrey;">//1. convert cv::Mat to dlib::matrix</span>
<span style="color: dimgrey;">//2. Swap bgr channel to rgb</span>
img_<span style="color: #808030;">.</span>set_size<span style="color: #808030;">(</span>resize_cache_<span style="color: #808030;">.</span>rows<span style="color: #808030;">,</span> resize_cache_<span style="color: #808030;">.</span>cols<span style="color: #808030;">)</span><span style="color: purple;">;</span>
dlib<span style="color: purple;">::</span>assign_image<span style="color: #808030;">(</span>img_<span style="color: #808030;">,</span> dlib<span style="color: purple;">::</span>cv_image<span style="color: purple;"><</span>bgr_pixel<span style="color: purple;">></span><span style="color: #808030;">(</span>resize_cache_<span style="color: #808030;">)</span><span style="color: #808030;">)</span><span style="color: purple;">;</span>
<span style="color: maroon; font-weight: bold;">return</span> net_<span style="color: #808030;">(</span>img_<span style="color: #808030;">)</span><span style="color: purple;">;</span>
<span style="color: purple;">}</span>
</pre>
<!--Created using ToHtml.com on 2019-02-18 07:41:45 UTC-->
<br />
Face detector of dlib perform very well, you can check the results on their <b><a href="http://blog.dlib.net/2016/10/easily-create-high-quality-object.html">post</a></b>.<br />
<br />
If you want to know the details, please study the <b><a href="http://dlib.net/dnn_mmod_ex.cpp.html">example provided by dlib</a></b>, if you want to know more options, please study the excellent post of <b><a href="https://www.learnopencv.com/face-detection-opencv-dlib-and-deep-learning-c-python/">Learn Opencv</a></b>.<br />
<br />
<h2 style="text-align: center;">
<u><b><span style="color: blue;">Perform face alignment by dlib</span></b></u> </h2>
We can treat face alignment as a data normalization skills develop for face recognition, usually you would align the faces before training your model, and align the faces when predict, this could help you obtain higher accuracy.<br />
<br />
With dlib, face alignment become very simple. Just a few lines of codes.<br />
<br />
<pre style="background: rgb(255, 255, 255) none repeat scroll 0% 0%; color: black;"><span style="color: dimgrey;">//rect contain the roi of the face</span>
dlib<span style="color: purple;">::</span>matrix<span style="color: purple;"><</span>rgb_pixel<span style="color: purple;">></span> face_detector<span style="color: purple;">::</span>
get_aligned_face<span style="color: #808030;">(</span><span style="color: maroon; font-weight: bold;">const</span> mmod_rect <span style="color: #808030;">&</span>rect<span style="color: #808030;">)</span>
<span style="color: purple;">{</span>
<span style="color: dimgrey;">//Type of pose_model_ is dlib::shape_predictor</span>
<span style="color: dimgrey;">//It return the landmarks of the face</span>
<span style="color: maroon; font-weight: bold;">auto</span> shape <span style="color: #808030;">=</span> pose_model_<span style="color: #808030;">(</span>img_<span style="color: #808030;">,</span> rect<span style="color: #808030;">)</span><span style="color: purple;">;</span>
matrix<span style="color: purple;"><</span>rgb_pixel<span style="color: purple;">></span> face_chip<span style="color: purple;">;</span>
<span style="color: maroon; font-weight: bold;">auto</span> <span style="color: maroon; font-weight: bold;">const</span> details <span style="color: #808030;">=</span>
get_face_chip_details<span style="color: #808030;">(</span>shape<span style="color: #808030;">,</span> face_aligned_size_<span style="color: #808030;">,</span> <span style="color: green;">0.25</span><span style="color: #808030;">)</span><span style="color: purple;">;</span>
<span style="color: dimgrey;">//extract face after aligned from the image</span>
extract_image_chip<span style="color: #808030;">(</span>img_<span style="color: #808030;">,</span> details<span style="color: #808030;">,</span> face_chip<span style="color: #808030;">)</span><span style="color: purple;">;</span>
<span style="color: maroon; font-weight: bold;">return</span> face_chip<span style="color: purple;">;</span>
<span style="color: purple;">}</span>
</pre>
<!--Created using ToHtml.com on 2019-02-18 09:06:37 UTC-->
<br />
<h2 style="text-align: center;">
<u><b><span style="color: blue;">Extract features of face by mxnet</span></b></u></h2>
This section will need to load the model from mxnet, unlike dlib or opencv, the c++ api of mxnet is more complicated, if you do not know how to load the model of mxnet yet, I recommend you study <b><a href="https://qtandopencv.blogspot.com/2018/10/person-detectionyolo-v3-with-helps-of.html">this post</a></b>.<br />
<br />
This section is the most complicated part, because it contains<b> <span style="color: blue;">three </span></b><span style="color: blue;"><b>main points</b></span>. <br />
<br />
1. Extract the features of faces.<br />
2. Perform batch processing.<br />
3. Convert aligned face of dlib(store as <span style="color: purple;">matrix</span><<span style="color: purple;">rgb_pixel></span>) to a <span style="color: blue;">memory continuous</span> float array with <br />
the format expected by the mxnet model.<br />
<br />
<br />
<div style="text-align: center;">
<h3>
<span style="color: blue;"><b><u>A.Load the model with variable batch size</u></b></span></h3>
</div>
<br />
In order to load the model which support variable batch size, all we need to do is add one more argument to the argument list.<br />
<br />
<br />
<pre style="background: rgb(255, 255, 255) none repeat scroll 0% 0%; color: black;"><span style="color: #666616;">std</span><span style="color: purple;">::</span><span style="color: #603000;">unique_ptr</span><span style="color: purple;"><</span>Executor<span style="color: purple;">></span> create_executor<span style="color: #808030;">(</span><span style="color: maroon; font-weight: bold;">const</span> <span style="color: #666616;">std</span><span style="color: purple;">::</span><span style="color: #603000;">string</span> <span style="color: #808030;">&</span>model_params<span style="color: #808030;">,</span>
<span style="color: maroon; font-weight: bold;">const</span> <span style="color: #666616;">std</span><span style="color: purple;">::</span><span style="color: #603000;">string</span> <span style="color: #808030;">&</span>model_symbols<span style="color: #808030;">,</span>
<span style="color: maroon; font-weight: bold;">const</span> Context <span style="color: #808030;">&</span>context<span style="color: #808030;">,</span>
<span style="color: maroon; font-weight: bold;">const</span> Shape <span style="color: #808030;">&</span>input_shape<span style="color: #808030;">)</span>
<span style="color: purple;">{</span>
Symbol net<span style="color: purple;">;</span>
<span style="color: #666616;">std</span><span style="color: purple;">::</span><span style="color: #603000;">map</span><span style="color: purple;"><</span><span style="color: #666616;">std</span><span style="color: purple;">::</span><span style="color: #603000;">string</span><span style="color: #808030;">,</span> NDArray<span style="color: purple;">></span> args<span style="color: #808030;">,</span> auxs<span style="color: purple;">;</span>
load_check_point<span style="color: #808030;">(</span>model_params<span style="color: #808030;">,</span> model_symbols<span style="color: #808030;">,</span> <span style="color: #808030;">&</span>net<span style="color: #808030;">,</span>
<span style="color: #808030;">&</span>args<span style="color: #808030;">,</span> <span style="color: #808030;">&</span>auxs<span style="color: #808030;">,</span> context<span style="color: #808030;">)</span><span style="color: purple;">;</span>
<span style="color: dimgrey;">//if "data" throw exception, try another key, like "data0"</span>
args<span style="color: #808030;">[</span><span style="color: maroon;">"</span><span style="color: #0000e6;">data</span><span style="color: maroon;">"</span><span style="color: #808030;">]</span> <span style="color: #808030;">=</span> NDArray<span style="color: #808030;">(</span>input_shape<span style="color: #808030;">,</span> context<span style="color: #808030;">,</span> <span style="color: maroon; font-weight: bold;">false</span><span style="color: #808030;">)</span><span style="color: purple;">;</span>
<span style="color: dimgrey;">//we only need to add the new key if batch size larger than 1</span>
<span style="color: maroon; font-weight: bold;">if</span><span style="color: #808030;">(</span>input_shape<span style="color: #808030;">[</span><span style="color: #008c00;">0</span><span style="color: #808030;">]</span> <span style="color: #808030;">></span> <span style="color: #008c00;">1</span><span style="color: #808030;">)</span><span style="color: purple;">{</span>
<span style="color: dimgrey;">//all we need is the new key "data1"</span>
args<span style="color: #808030;">[</span><span style="color: maroon;">"</span><span style="color: #0000e6;">data1</span><span style="color: maroon;">"</span><span style="color: #808030;">]</span> <span style="color: #808030;">=</span> NDArray<span style="color: #808030;">(</span>Shape<span style="color: #808030;">(</span><span style="color: #008c00;">1</span><span style="color: #808030;">)</span><span style="color: #808030;">,</span> context<span style="color: #808030;">,</span> <span style="color: maroon; font-weight: bold;">false</span><span style="color: #808030;">)</span><span style="color: purple;">;</span>
<span style="color: purple;">}</span>
<span style="color: #666616;">std</span><span style="color: purple;">::</span><span style="color: #603000;">unique_ptr</span><span style="color: purple;"><</span>Executor<span style="color: purple;">></span> executor<span style="color: purple;">;</span>
executor<span style="color: #808030;">.</span>reset<span style="color: #808030;">(</span>net<span style="color: #808030;">.</span>SimpleBind<span style="color: #808030;">(</span>context<span style="color: #808030;">,</span>
args<span style="color: #808030;">,</span>
<span style="color: #666616;">std</span><span style="color: purple;">::</span><span style="color: #603000;">map</span><span style="color: purple;"><</span><span style="color: #666616;">std</span><span style="color: purple;">::</span><span style="color: #603000;">string</span><span style="color: #808030;">,</span> NDArray<span style="color: purple;">></span><span style="color: #808030;">(</span><span style="color: #808030;">)</span><span style="color: #808030;">,</span>
<span style="color: #666616;">std</span><span style="color: purple;">::</span><span style="color: #603000;">map</span><span style="color: purple;"><</span><span style="color: #666616;">std</span><span style="color: purple;">::</span><span style="color: #603000;">string</span><span style="color: #808030;">,</span> OpReqType<span style="color: purple;">></span><span style="color: #808030;">(</span><span style="color: #808030;">)</span><span style="color: #808030;">,</span>
auxs<span style="color: #808030;">)</span><span style="color: #808030;">)</span><span style="color: purple;">;</span>
<span style="color: maroon; font-weight: bold;">return</span> executor<span style="color: purple;">;</span>
<span style="color: purple;">}</span>
</pre>
<!--Created using ToHtml.com on 2019-02-18 12:11:43 UTC-->
<br />
<h3 style="text-align: center;">
<span style="color: blue;"><u><b>B.Convert aligned face to array</b></u></span></h3>
Unlike the <b><a href="https://qtandopencv.blogspot.com/2018/10/person-detectionyolo-v3-with-helps-of.html">example of yolo v3</a></b>, the input data of <b><span style="color: blue;"><a href="https://github.com/deepinsight/insightface">deepsight</a></span></b> need more preprocess steps before you can feed the aligned face into the model. Instead of arranged the pixels as rgb order, you need to split each channels of the face into separate "page". Simply put, instead of arrange the pixels as<br />
<br />
R1G1B1R2G2B2......RnGnBn<br />
<br />
We should arrange the pixels as<br />
<br />
R1R2....RNG1G2......GNB1B2.....BN<br />
<br />
<br />
<pre style="background: rgb(255, 255, 255) none repeat scroll 0% 0%; color: black;"><span style="color: dimgrey;">//using dlib_const_images_ptr = std::vector<matrix<rgb_pixel> const*>;</span>
<span style="color: maroon; font-weight: bold;">void</span> face_key_extractor<span style="color: purple;">::</span>
dlib_matrix_to_float_array<span style="color: #808030;">(</span>dlib_const_images_ptr <span style="color: maroon; font-weight: bold;">const</span> <span style="color: #808030;">&</span>rgb_image<span style="color: #808030;">)</span>
<span style="color: purple;">{</span>
<span style="color: #603000;">size_t</span> index <span style="color: #808030;">=</span> <span style="color: #008c00;">0</span><span style="color: purple;">;</span>
<span style="color: maroon; font-weight: bold;">for</span><span style="color: #808030;">(</span><span style="color: #603000;">size_t</span> i <span style="color: #808030;">=</span> <span style="color: #008c00;">0</span><span style="color: purple;">;</span> i <span style="color: #808030;">!</span><span style="color: #808030;">=</span> rgb_image<span style="color: #808030;">.</span>size<span style="color: #808030;">(</span><span style="color: #808030;">)</span><span style="color: purple;">;</span> <span style="color: #808030;">+</span><span style="color: #808030;">+</span>i<span style="color: #808030;">)</span><span style="color: purple;">{</span>
<span style="color: maroon; font-weight: bold;">for</span><span style="color: #808030;">(</span><span style="color: #603000;">size_t</span> ch <span style="color: #808030;">=</span> <span style="color: #008c00;">0</span><span style="color: purple;">;</span> ch <span style="color: #808030;">!</span><span style="color: #808030;">=</span> <span style="color: #008c00;">3</span><span style="color: purple;">;</span> <span style="color: #808030;">+</span><span style="color: #808030;">+</span>ch<span style="color: #808030;">)</span><span style="color: purple;">{</span>
<span style="color: maroon; font-weight: bold;">for</span><span style="color: #808030;">(</span><span style="color: maroon; font-weight: bold;">long</span> row <span style="color: #808030;">=</span> <span style="color: #008c00;">0</span><span style="color: purple;">;</span> row <span style="color: #808030;">!</span><span style="color: #808030;">=</span> rgb_image<span style="color: #808030;">[</span>i<span style="color: #808030;">]</span><span style="color: #808030;">-</span><span style="color: #808030;">></span>nr<span style="color: #808030;">(</span><span style="color: #808030;">)</span><span style="color: purple;">;</span> <span style="color: #808030;">+</span><span style="color: #808030;">+</span>row<span style="color: #808030;">)</span><span style="color: purple;">{</span>
<span style="color: maroon; font-weight: bold;">for</span><span style="color: #808030;">(</span><span style="color: maroon; font-weight: bold;">long</span> col <span style="color: #808030;">=</span> <span style="color: #008c00;">0</span><span style="color: purple;">;</span> col <span style="color: #808030;">!</span><span style="color: #808030;">=</span> rgb_image<span style="color: #808030;">[</span>i<span style="color: #808030;">]</span><span style="color: #808030;">-</span><span style="color: #808030;">></span>nc<span style="color: #808030;">(</span><span style="color: #808030;">)</span><span style="color: purple;">;</span> <span style="color: #808030;">+</span><span style="color: #808030;">+</span>col<span style="color: #808030;">)</span><span style="color: purple;">{</span>
<span style="color: maroon; font-weight: bold;">auto</span> <span style="color: maroon; font-weight: bold;">const</span> <span style="color: #808030;">&</span>pix <span style="color: #808030;">=</span> <span style="color: #808030;">(</span><span style="color: #808030;">*</span>rgb_image<span style="color: #808030;">[</span>i<span style="color: #808030;">]</span><span style="color: #808030;">)</span><span style="color: #808030;">(</span>row<span style="color: #808030;">,</span> col<span style="color: #808030;">)</span><span style="color: purple;">;</span>
<span style="color: maroon; font-weight: bold;">switch</span><span style="color: #808030;">(</span>ch<span style="color: #808030;">)</span><span style="color: purple;">{</span>
<span style="color: maroon; font-weight: bold;">case </span><span style="color: #008c00;">0</span><span style="color: #e34adc;">:</span>
<span style="color: dimgrey;">//image_vector_ is a std::vector<float>, resized in </span>
<span style="color: dimgrey;">//constructor.</span>
<span style="color: dimgrey;">//image_vector_.resize(params_->shape_.Size())</span>
<span style="color: dimgrey;">//params_->shape_.Size() return total number </span>
<span style="color: dimgrey;">//of elements in the tenso</span>
image_vector_<span style="color: #808030;">[</span>index<span style="color: #808030;">+</span><span style="color: #808030;">+</span><span style="color: #808030;">]</span> <span style="color: #808030;">=</span> pix<span style="color: #808030;">.</span>red<span style="color: purple;">;</span>
<span style="color: maroon; font-weight: bold;">break</span><span style="color: purple;">;</span>
<span style="color: maroon; font-weight: bold;">case </span><span style="color: #008c00;">1</span><span style="color: #e34adc;">:</span>
image_vector_<span style="color: #808030;">[</span>index<span style="color: #808030;">+</span><span style="color: #808030;">+</span><span style="color: #808030;">]</span> <span style="color: #808030;">=</span> pix<span style="color: #808030;">.</span>green<span style="color: purple;">;</span>
<span style="color: maroon; font-weight: bold;">break</span><span style="color: purple;">;</span>
<span style="color: maroon; font-weight: bold;">case </span><span style="color: #008c00;">2</span><span style="color: #e34adc;">:</span>
image_vector_<span style="color: #808030;">[</span>index<span style="color: #808030;">+</span><span style="color: #808030;">+</span><span style="color: #808030;">]</span> <span style="color: #808030;">=</span> pix<span style="color: #808030;">.</span>blue<span style="color: purple;">;</span>
<span style="color: maroon; font-weight: bold;">break</span><span style="color: purple;">;</span>
<span style="color: #e34adc;"> </span><span style="color: maroon; font-weight: bold;">default</span><span style="color: #e34adc;">:</span>
<span style="color: maroon; font-weight: bold;">break</span><span style="color: purple;">;</span>
<span style="color: purple;">}</span>
<span style="color: purple;">}</span>
<span style="color: purple;">}</span>
<span style="color: purple;">}</span>
<span style="color: purple;">}</span>
<span style="color: purple;">}</span>
</pre>
<!--Created using ToHtml.com on 2019-02-18 12:13:47 UTC-->
<br />
<br />
<h3 style="text-align: center;">
<span style="color: blue;"><b><u>C.Forward aligned faces with variable batch size</u></b></span></h3>
There are two things you must know before we dive into the source codes.<br />
<br />
1. To avoid memory reallocation, we <span style="color: blue;"><b>must</b> </span>allocate memory for the <span style="color: blue;"><b>largest</b></span> possible batch size and reuse that same
memory when batch size is smaller.<br />
2. The batch size of the float array input to the model <span style="color: blue;"><b>must be the same</b></span> as the largest possible batch size<br />
<br />
<br />
<pre style="background: rgb(255, 255, 255) none repeat scroll 0% 0%; color: black;"><span style="color: dimgrey;">//input contains all of the aligned faces detected from the image</span>
<span style="color: #666616;">std</span><span style="color: purple;">::</span><span style="color: #603000;">vector</span><span style="color: purple;"><</span>face_key<span style="color: purple;">></span> face_key_extractor<span style="color: purple;">::</span>
forward<span style="color: #808030;">(</span><span style="color: maroon; font-weight: bold;">const</span> <span style="color: #666616;">std</span><span style="color: purple;">::</span><span style="color: #603000;">vector</span><span style="color: purple;"><</span>dlib<span style="color: purple;">::</span>matrix<span style="color: purple;"><</span>dlib<span style="color: purple;">::</span>rgb_pixel<span style="color: purple;">></span> <span style="color: purple;">></span> <span style="color: #808030;">&</span>input<span style="color: #808030;">)</span>
<span style="color: purple;">{</span>
<span style="color: maroon; font-weight: bold;">if</span><span style="color: #808030;">(</span>input<span style="color: #808030;">.</span>empty<span style="color: #808030;">(</span><span style="color: #808030;">)</span><span style="color: #808030;">)</span><span style="color: purple;">{</span>
<span style="color: maroon; font-weight: bold;">return</span> <span style="color: purple;">{</span><span style="color: purple;">}</span><span style="color: purple;">;</span>
<span style="color: purple;">}</span>
<span style="color: dimgrey;">//Size of the input may not divisible by batch size</span>
<span style="color: dimgrey;">//That is why we need some preprocess job to make sure</span>
<span style="color: dimgrey;">//features of every faces are extracted</span>
<span style="color: maroon; font-weight: bold;">auto</span> <span style="color: maroon; font-weight: bold;">const</span> forward_count <span style="color: #808030;">=</span> <span style="color: maroon; font-weight: bold;">static_cast</span><span style="color: purple;"><</span><span style="color: #603000;">size_t</span><span style="color: purple;">></span><span style="color: #808030;">(</span>
<span style="color: #666616;">std</span><span style="color: purple;">::</span><span style="color: #603000;">ceil</span><span style="color: #808030;">(</span>input<span style="color: #808030;">.</span>size<span style="color: #808030;">(</span><span style="color: #808030;">)</span> <span style="color: #808030;">/</span> <span style="color: maroon; font-weight: bold;">static_cast</span><span style="color: purple;"><</span><span style="color: maroon; font-weight: bold;">float</span><span style="color: purple;">></span><span style="color: #808030;">(</span>params_<span style="color: #808030;">-</span><span style="color: #808030;">></span>shape_<span style="color: #808030;">[</span><span style="color: #008c00;">0</span><span style="color: #808030;">]</span><span style="color: #808030;">)</span><span style="color: #808030;">)</span><span style="color: #808030;">)</span><span style="color: purple;">;</span>
<span style="color: #666616;">std</span><span style="color: purple;">::</span><span style="color: #603000;">vector</span><span style="color: purple;"><</span>face_key<span style="color: purple;">></span> result<span style="color: purple;">;</span>
<span style="color: maroon; font-weight: bold;">for</span><span style="color: #808030;">(</span><span style="color: #603000;">size_t</span> i <span style="color: #808030;">=</span> <span style="color: #008c00;">0</span><span style="color: #808030;">,</span> index <span style="color: #808030;">=</span> <span style="color: #008c00;">0</span><span style="color: purple;">;</span> i <span style="color: #808030;">!</span><span style="color: #808030;">=</span> forward_count<span style="color: purple;">;</span> <span style="color: #808030;">+</span><span style="color: #808030;">+</span>i<span style="color: #808030;">)</span><span style="color: purple;">{</span>
dlib_const_images_ptr faces<span style="color: purple;">;</span>
<span style="color: maroon; font-weight: bold;">for</span><span style="color: #808030;">(</span><span style="color: #603000;">size_t</span> j <span style="color: #808030;">=</span> <span style="color: #008c00;">0</span><span style="color: purple;">;</span>
j <span style="color: #808030;">!</span><span style="color: #808030;">=</span> params_<span style="color: #808030;">-</span><span style="color: #808030;">></span>shape_<span style="color: #808030;">[</span><span style="color: #008c00;">0</span><span style="color: #808030;">]</span> <span style="color: #808030;">&</span><span style="color: #808030;">&</span> index <span style="color: #808030;"><</span> input<span style="color: #808030;">.</span>size<span style="color: #808030;">(</span><span style="color: #808030;">)</span><span style="color: purple;">;</span> <span style="color: #808030;">+</span><span style="color: #808030;">+</span>j<span style="color: #808030;">)</span><span style="color: purple;">{</span>
faces<span style="color: #808030;">.</span>emplace_back<span style="color: #808030;">(</span><span style="color: #808030;">&</span>input<span style="color: #808030;">[</span>index<span style="color: #808030;">+</span><span style="color: #808030;">+</span><span style="color: #808030;">]</span><span style="color: #808030;">)</span><span style="color: purple;">;</span>
<span style="color: purple;">}</span>
dlib_matrix_to_float_array<span style="color: #808030;">(</span>faces<span style="color: #808030;">)</span><span style="color: purple;">;</span>
<span style="color: maroon; font-weight: bold;">auto</span> features <span style="color: #808030;">=</span>
forward<span style="color: #808030;">(</span>image_vector_<span style="color: #808030;">,</span> <span style="color: maroon; font-weight: bold;">static_cast</span><span style="color: purple;"><</span><span style="color: #603000;">size_t</span><span style="color: purple;">></span><span style="color: #808030;">(</span>faces<span style="color: #808030;">.</span>size<span style="color: #808030;">(</span><span style="color: #808030;">)</span><span style="color: #808030;">)</span><span style="color: #808030;">)</span><span style="color: purple;">;</span>
<span style="color: #666616;">std</span><span style="color: purple;">::</span><span style="color: #603000;">move</span><span style="color: #808030;">(</span><span style="color: #666616;">std</span><span style="color: purple;">::</span>begin<span style="color: #808030;">(</span>features<span style="color: #808030;">)</span><span style="color: #808030;">,</span> <span style="color: #666616;">std</span><span style="color: purple;">::</span>end<span style="color: #808030;">(</span>features<span style="color: #808030;">)</span><span style="color: #808030;">,</span>
<span style="color: #666616;">std</span><span style="color: purple;">::</span>back_inserter<span style="color: #808030;">(</span>result<span style="color: #808030;">)</span><span style="color: #808030;">)</span><span style="color: purple;">;</span>
<span style="color: purple;">}</span>
<span style="color: maroon; font-weight: bold;">return</span> result<span style="color: purple;">;</span>
<span style="color: purple;">}</span>
</pre>
<!--Created using ToHtml.com on 2019-02-18 12:16:29 UTC-->
<br />
<div style="text-align: center;">
<h3 style="background: rgb(255, 255, 255) none repeat scroll 0% 0%;">
<span style="color: blue;"><u><b>D.Extract features of faces</b></u></span></h3>
<div style="text-align: left;">
<pre style="background: rgb(255, 255, 255) none repeat scroll 0% 0%; color: black;"><span style="color: #666616;">std</span><span style="color: purple;">::</span><span style="color: #603000;">vector</span><span style="color: purple;"><</span>face_key<span style="color: purple;">></span> face_key_extractor<span style="color: purple;">::</span>
forward<span style="color: #808030;">(</span><span style="color: maroon; font-weight: bold;">const</span> <span style="color: #666616;">std</span><span style="color: purple;">::</span><span style="color: #603000;">vector</span><span style="color: purple;"><</span><span style="color: maroon; font-weight: bold;">float</span><span style="color: purple;">></span> <span style="color: #808030;">&</span>input<span style="color: #808030;">,</span> <span style="color: #603000;">size_t</span> batch_size<span style="color: #808030;">)</span>
<span style="color: purple;">{</span>
executor_<span style="color: #808030;">-</span><span style="color: #808030;">></span>arg_dict<span style="color: #808030;">(</span><span style="color: #808030;">)</span><span style="color: #808030;">[</span><span style="color: maroon;">"</span><span style="color: #0000e6;">data</span><span style="color: maroon;">"</span><span style="color: #808030;">]</span><span style="color: #808030;">.</span>SyncCopyFromCPU<span style="color: #808030;">(</span>input<span style="color: #808030;">.</span>data<span style="color: #808030;">(</span><span style="color: #808030;">)</span><span style="color: #808030;">,</span>
input<span style="color: #808030;">.</span>size<span style="color: #808030;">(</span><span style="color: #808030;">)</span><span style="color: #808030;">)</span><span style="color: purple;">;</span>
<span style="color: dimgrey;">//data1 tell the executor, how many face(s) need to process</span>
executor_<span style="color: #808030;">-</span><span style="color: #808030;">></span>arg_dict<span style="color: #808030;">(</span><span style="color: #808030;">)</span><span style="color: #808030;">[</span><span style="color: maroon;">"</span><span style="color: #0000e6;">data1</span><span style="color: maroon;">"</span><span style="color: #808030;">]</span> <span style="color: #808030;">=</span> batch_size<span style="color: purple;">;</span>
executor_<span style="color: #808030;">-</span><span style="color: #808030;">></span>Forward<span style="color: #808030;">(</span><span style="color: maroon; font-weight: bold;">false</span><span style="color: #808030;">)</span><span style="color: purple;">;</span>
<span style="color: #666616;">std</span><span style="color: purple;">::</span><span style="color: #603000;">vector</span><span style="color: purple;"><</span>face_key<span style="color: purple;">></span> result<span style="color: purple;">;</span>
<span style="color: maroon; font-weight: bold;">if</span><span style="color: #808030;">(</span><span style="color: #808030;">!</span>executor_<span style="color: #808030;">-</span><span style="color: #808030;">></span>outputs<span style="color: #808030;">.</span>empty<span style="color: #808030;">(</span><span style="color: #808030;">)</span><span style="color: #808030;">)</span><span style="color: purple;">{</span>
<span style="color: dimgrey;">//shape of features is [batch_size, 512]</span>
<span style="color: maroon; font-weight: bold;">auto</span> features <span style="color: #808030;">=</span> executor_<span style="color: #808030;">-</span><span style="color: #808030;">></span>outputs<span style="color: #808030;">[</span><span style="color: #008c00;">0</span><span style="color: #808030;">]</span><span style="color: #808030;">.</span>Copy<span style="color: #808030;">(</span>Context<span style="color: #808030;">(</span>kCPU<span style="color: #808030;">,</span> <span style="color: #008c00;">0</span><span style="color: #808030;">)</span><span style="color: #808030;">)</span><span style="color: purple;">;</span>
Shape <span style="color: maroon; font-weight: bold;">const</span> shape<span style="color: #808030;">(</span><span style="color: #008c00;">1</span><span style="color: #808030;">,</span> step_per_feature<span style="color: #808030;">)</span><span style="color: purple;">;</span>
features<span style="color: #808030;">.</span>WaitToRead<span style="color: #808030;">(</span><span style="color: #808030;">)</span><span style="color: purple;">;</span>
<span style="color: dimgrey;">//split features into and array</span>
<span style="color: maroon; font-weight: bold;">for</span><span style="color: #808030;">(</span><span style="color: #603000;">size_t</span> i <span style="color: #808030;">=</span> <span style="color: #008c00;">0</span><span style="color: purple;">;</span> i <span style="color: #808030;">!</span><span style="color: #808030;">=</span> batch_size<span style="color: purple;">;</span> <span style="color: #808030;">+</span><span style="color: #808030;">+</span>i<span style="color: #808030;">)</span><span style="color: purple;">{</span>
<span style="color: dimgrey;">//step_per_feature is 512, memory </span>
<span style="color: dimgrey;">//of NDArray is continuous make things easier</span>
NDArray feature<span style="color: #808030;">(</span>features<span style="color: #808030;">.</span>GetData<span style="color: #808030;">(</span><span style="color: #808030;">)</span> <span style="color: #808030;">+</span> i <span style="color: #808030;">*</span> step_per_feature<span style="color: #808030;">,</span>
shape<span style="color: #808030;">,</span> Context<span style="color: #808030;">(</span>kCPU<span style="color: #808030;">,</span> <span style="color: #008c00;">0</span><span style="color: #808030;">)</span><span style="color: #808030;">)</span><span style="color: purple;">;</span>
result<span style="color: #808030;">.</span>emplace_back<span style="color: #808030;">(</span><span style="color: #666616;">std</span><span style="color: purple;">::</span><span style="color: #603000;">move</span><span style="color: #808030;">(</span>feature<span style="color: #808030;">)</span><span style="color: #808030;">)</span><span style="color: purple;">;</span>
<span style="color: purple;">}</span>
<span style="color: maroon; font-weight: bold;">return</span> result<span style="color: purple;">;</span>
<span style="color: purple;">}</span>
<span style="color: maroon; font-weight: bold;">return</span> result<span style="color: purple;">;</span>
<span style="color: purple;">}</span> </pre>
<pre style="background: rgb(255, 255, 255) none repeat scroll 0% 0%; color: black;"> </pre>
<pre style="background: rgb(255, 255, 255) none repeat scroll 0% 0%; color: black;"> </pre>
<h2 style="background: rgb(255, 255, 255) none repeat scroll 0% 0%; text-align: center;">
<span style="color: blue;"><u>Find most similar faces from database</u></span></h2>
<pre style="background: rgb(255, 255, 255) none repeat scroll 0% 0%; color: black;"><span style="font-family: "times" , "times new roman" , serif;"> I use cosine similarity to compare similarity in this small example, it is quite easy with the help of </span></pre>
<pre style="background: rgb(255, 255, 255) none repeat scroll 0% 0%; color: black;"><span style="font-family: "times" , "times new roman" , serif;">opencv. </span></pre>
<br />
<h3 style="text-align: center;">
<span style="color: blue;"><u>A.Similarity compare</u></span></h3>
<br />
<pre style="background: rgb(255, 255, 255) none repeat scroll 0% 0%; color: black;"><span style="color: maroon; font-weight: bold;">double</span> face_key<span style="color: purple;">::</span>similarity<span style="color: #808030;">(</span><span style="color: maroon; font-weight: bold;">const</span> face_key <span style="color: #808030;">&</span>input<span style="color: #808030;">)</span> <span style="color: maroon; font-weight: bold;">const</span>
<span style="color: purple;">{</span>
CV_Assert<span style="color: #808030;">(</span>key_<span style="color: #808030;">.</span>GetData<span style="color: #808030;">(</span><span style="color: #808030;">)</span> <span style="color: #808030;">!</span><span style="color: #808030;">=</span> <span style="color: maroon; font-weight: bold;">nullptr</span> <span style="color: #808030;">&</span><span style="color: #808030;">&</span>
input<span style="color: #808030;">.</span>key_<span style="color: #808030;">.</span>GetData<span style="color: #808030;">(</span><span style="color: #808030;">)</span> <span style="color: #808030;">!</span><span style="color: #808030;">=</span> <span style="color: maroon; font-weight: bold;">nullptr</span><span style="color: #808030;">)</span><span style="color: purple;">;</span>
cv<span style="color: purple;">::</span>Mat_<span style="color: purple;"><</span><span style="color: maroon; font-weight: bold;">float</span><span style="color: purple;">></span> <span style="color: maroon; font-weight: bold;">const</span> key1<span style="color: #808030;">(</span><span style="color: #008c00;">1</span><span style="color: #808030;">,</span> <span style="color: #008c00;">512</span><span style="color: #808030;">,</span>
<span style="color: maroon; font-weight: bold;">const_cast</span><span style="color: purple;"><</span><span style="color: maroon; font-weight: bold;">float</span><span style="color: #808030;">*</span><span style="color: purple;">></span><span style="color: #808030;">(</span>input<span style="color: #808030;">.</span>key_<span style="color: #808030;">.</span>GetData<span style="color: #808030;">(</span><span style="color: #808030;">)</span><span style="color: #808030;">)</span><span style="color: #808030;">,</span> <span style="color: #008c00;">0</span><span style="color: #808030;">)</span><span style="color: purple;">;</span>
cv<span style="color: purple;">::</span>Mat_<span style="color: purple;"><</span><span style="color: maroon; font-weight: bold;">float</span><span style="color: purple;">></span> <span style="color: maroon; font-weight: bold;">const</span> key2<span style="color: #808030;">(</span><span style="color: #008c00;">1</span><span style="color: #808030;">,</span> <span style="color: #008c00;">512</span><span style="color: #808030;">,</span>
<span style="color: maroon; font-weight: bold;">const_cast</span><span style="color: purple;"><</span><span style="color: maroon; font-weight: bold;">float</span><span style="color: #808030;">*</span><span style="color: purple;">></span><span style="color: #808030;">(</span>key_<span style="color: #808030;">.</span>GetData<span style="color: #808030;">(</span><span style="color: #808030;">)</span><span style="color: #808030;">)</span><span style="color: #808030;">,</span> <span style="color: #008c00;">0</span><span style="color: #808030;">)</span><span style="color: purple;">;</span>
<span style="color: maroon; font-weight: bold;">auto</span> <span style="color: maroon; font-weight: bold;">const</span> denominator <span style="color: #808030;">=</span> <span style="color: #666616;">std</span><span style="color: purple;">::</span><span style="color: #603000;">sqrt</span><span style="color: #808030;">(</span>key1<span style="color: #808030;">.</span>dot<span style="color: #808030;">(</span>key1<span style="color: #808030;">)</span> <span style="color: #808030;">*</span> key2<span style="color: #808030;">.</span>dot<span style="color: #808030;">(</span>key2<span style="color: #808030;">)</span><span style="color: #808030;">)</span><span style="color: purple;">;</span>
<span style="color: maroon; font-weight: bold;">if</span><span style="color: #808030;">(</span>denominator <span style="color: #808030;">!</span><span style="color: #808030;">=</span> <span style="color: green;">0.0</span><span style="color: #808030;">)</span><span style="color: purple;">{</span>
<span style="color: maroon; font-weight: bold;">return</span> key2<span style="color: #808030;">.</span>dot<span style="color: #808030;">(</span>key1<span style="color: #808030;">)</span> <span style="color: #808030;">/</span> denominator<span style="color: purple;">;</span>
<span style="color: purple;">}</span>
<span style="color: maroon; font-weight: bold;">return</span> <span style="color: #008c00;">0</span><span style="color: purple;">;</span>
<span style="color: purple;">}</span>
</pre>
<!--Created using ToHtml.com on 2019-02-18 12:19:23 UTC-->
<br />
<h3 style="text-align: center;">
<span style="color: blue;"><u>B.Find most similar face</u></span></h3>
Find the most similar face is really easy, all we need to do is compare the features stored in the array one by one and return the one with the highest confidence.<br />
<br />
<br />
<pre style="background: rgb(255, 255, 255) none repeat scroll 0% 0%; color: black;"><span style="color: dimgrey;">//for simplicity, I put struct at here in this blog</span>
<span style="color: maroon; font-weight: bold;">struct</span> id_info
<span style="color: purple;">{</span>
<span style="color: maroon; font-weight: bold;">double</span> confident_ <span style="color: #808030;">=</span> <span style="color: #808030;">-</span><span style="color: green;">1.0</span><span style="color: purple;">;</span>
<span style="color: #666616;">std</span><span style="color: purple;">::</span><span style="color: #603000;">string</span> id_<span style="color: purple;">;</span>
<span style="color: purple;">}</span><span style="color: purple;">;</span>
<span style="color: maroon; font-weight: bold;">struct</span> face_info
<span style="color: purple;">{</span>
face_key key_<span style="color: purple;">;</span>
<span style="color: #666616;">std</span><span style="color: purple;">::</span><span style="color: #603000;">string</span> id_<span style="color: purple;">;</span>
<span style="color: purple;">}</span><span style="color: purple;">;</span>
face_reg_db<span style="color: purple;">::</span>id_info face_reg_db<span style="color: purple;">::</span>
find_most_similar_face<span style="color: #808030;">(</span><span style="color: maroon; font-weight: bold;">const</span> face_key <span style="color: #808030;">&</span>input<span style="color: #808030;">)</span> <span style="color: maroon; font-weight: bold;">const</span>
<span style="color: purple;">{</span>
id_info result<span style="color: purple;">;</span>
<span style="color: dimgrey;">//type of face_keys_ is std::vector<face_info></span>
<span style="color: maroon; font-weight: bold;">for</span><span style="color: #808030;">(</span><span style="color: #603000;">size_t</span> i <span style="color: #808030;">=</span> <span style="color: #008c00;">0</span><span style="color: purple;">;</span> i <span style="color: #808030;">!</span><span style="color: #808030;">=</span> face_keys_<span style="color: #808030;">.</span>size<span style="color: #808030;">(</span><span style="color: #808030;">)</span><span style="color: purple;">;</span> <span style="color: #808030;">+</span><span style="color: #808030;">+</span>i<span style="color: #808030;">)</span><span style="color: purple;">{</span>
<span style="color: maroon; font-weight: bold;">auto</span> <span style="color: maroon; font-weight: bold;">const</span> confident <span style="color: #808030;">=</span>
face_keys_<span style="color: #808030;">[</span>i<span style="color: #808030;">]</span><span style="color: #808030;">.</span>key_<span style="color: #808030;">.</span>similarity<span style="color: #808030;">(</span>input<span style="color: #808030;">)</span><span style="color: purple;">;</span>
<span style="color: maroon; font-weight: bold;">if</span><span style="color: #808030;">(</span>confident <span style="color: #808030;">></span> result<span style="color: #808030;">.</span>confident_<span style="color: #808030;">)</span><span style="color: purple;">{</span>
result<span style="color: #808030;">.</span>confident_ <span style="color: #808030;">=</span> confident<span style="color: purple;">;</span>
result<span style="color: #808030;">.</span>id_ <span style="color: #808030;">=</span> face_keys_<span style="color: #808030;">[</span>i<span style="color: #808030;">]</span><span style="color: #808030;">.</span>id_<span style="color: purple;">;</span>
<span style="color: purple;">}</span>
<span style="color: purple;">}</span>
<span style="color: maroon; font-weight: bold;">return</span> result<span style="color: purple;">;</span>
<span style="color: purple;">}</span>
</pre>
<!--Created using ToHtml.com on 2019-02-18 11:46:04 UTC-->
<br />
<h2 style="text-align: center;">
<span style="color: blue;"><u>Summary</u></span></h2>
In today's post, I show you the most critical parts of face recognize with opencv, dlib and mxnet. I believe this is a great starting point if you want to build a high quality face recognition app by c++.<br />
<br />
Real world applications are <b><span style="color: blue;">much more complicated</span></b> than this small example since they always need to support <span style="color: blue;"><b>more features</b></span> and required to be <span style="color: blue;"><b>efficient</b></span>, but no matter how complex they are, the main flow of the 2D face recognition are almost the same as this post show you.<br /><br /></div>
</div>
ThamBloghttp://www.blogger.com/profile/12617350461752184349noreply@blogger.com5tag:blogger.com,1999:blog-4702230343097536610.post-92151919199474288652019-02-05T00:10:00.002-08:002021-02-26T08:37:05.311-08:00Use person re-id model to identify person do not exist in the data set by c++ Person re-id comparing two images of person captured under different conditions, recently this field achieve big improvement with the helps of deep learning, but <b><span style="color: red;">is it good enough to identify person do not exist in the data set?</span></b> This is the question I want to figure out in this post.<br />
<br />
Let me show you an example before we start.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<iframe allowfullscreen="" class="YOUTUBE-iframe-video" data-thumbnail-src="https://i.ytimg.com/vi/kxTCZQfH6gc/0.jpg" frameborder="0" height="266" src="https://www.youtube.com/embed/kxTCZQfH6gc?feature=player_embedded" width="320"></iframe></div>
<br />
<br />
The results are not perfect yet, let us hope that better techniques and larger data sets would release in the future. The algorithm itself is very easy, main flows are drawn in pic00<br />
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjiSbgf_s03n2u8ltrf69Wde4rcRU28LDTj7P9UHPTDYUe9pZHkEPp1LyN3wTfSIO_YZLlajls-4ftn7KKDpjsm_TjkoJPI6tkMb1qGyq4h7JJeCIt-Eny44AnOcNuHWIE6NemipYCjIpQ/s1600/main_flow.png" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="497" data-original-width="815" height="243" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjiSbgf_s03n2u8ltrf69Wde4rcRU28LDTj7P9UHPTDYUe9pZHkEPp1LyN3wTfSIO_YZLlajls-4ftn7KKDpjsm_TjkoJPI6tkMb1qGyq4h7JJeCIt-Eny44AnOcNuHWIE6NemipYCjIpQ/s400/main_flow.png" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">pic00</td></tr>
</tbody></table>
For those who wants to read the source codes directly, please go to <a href="https://github.com/stereomatchingkiss/blogCodes2/tree/master/mxnet_person_re_id">github</a>, in order to compile it, you will need opencv3.4.2 and mxnet. You can pick every build tools you like, I use qmake in this example. If you want to know how to reproduce the results, please read on.<br />
<br />
<h3 style="text-align: center;">
<u><b><span style="color: blue;">1. Download pretrained model of person re-id</span></b></u></h3>
<br />
Download pretrained model from <a href="https://mega.nz/#!NxFVWI5R!qRozzEIPtnnC1ALB9NsGC_-6ssZaq2M-TChg2_FttaM">here</a>. Precision and mAP of this model perform on <a href="https://drive.google.com/file/d/0B8-rUzbwVRk0c054eEozWG9COHM/view">market1501</a> are<br />
<br />
top1:0.923100<br />
top5:0.972090<br />
top10:0.984264<br />
mAP:0.797564<br />
<br />
If you want to train it by yourself, please follow the guide of <a href="https://github.com/dmlc/gluon-cv/tree/master/scripts/re-id/baseline">gluoncv</a>, it is quite easy.<br />
<br />
<h3 style="text-align: center;">
<span style="color: blue;"><u><b>2. Download pretrained model of yolo v3</b></u></span></h3>
<br />
Download pretrained model from <a href="https://mega.nz/#!h4NThaTZ!W8Awa0Ord1A_Qgykpf_RXJrHhrOwoc13Qlb7ULPMYx8">here</a>. This is the model converted from the pretrained model of gluoncv.<br />
<br />
<h3 style="text-align: center;">
<span style="color: blue;"><u>3. Detect person from video by yolo v3</u></span></h3>
<br />
Before we perform person re-id, we need to detect person from the video, yolo v3 works well for this task, you could find more details in this <a href="https://qtandopencv.blogspot.com/2018/10/person-detectionyolo-v3-with-helps-of.html">blog</a>. It show you how to load the models trained by gluoncv(or mxnet) too, you will need that skills to load the model of person re-id too.<br />
<br />
<h3 style="text-align: center;">
<span style="color: blue;"><u><b>4. Extract features of person</b></u></span></h3>
<br />
After we find out bounding boxes of the persons, we need to extract the features of the persons, this could be done by mxnet without much issues.<br />
<br />
<pre style="background: rgb(255, 255, 255) none repeat scroll 0% 0%; color: black;">cv<span style="color: purple;">::</span>Mat_<span style="color: purple;"><</span><span style="color: maroon; font-weight: bold;">float</span><span style="color: purple;">></span> person_feautres_extractor<span style="color: purple;">::</span>get_features<span style="color: #808030;">(</span><span style="color: maroon; font-weight: bold;">const</span> cv<span style="color: purple;">::</span>Mat <span style="color: #808030;">&</span>input<span style="color: #808030;">)</span>
<span style="color: purple;">{</span>
<span style="color: dimgrey;">//convert cv::Mat to ndarray</span>
<span style="color: maroon; font-weight: bold;">auto</span> data <span style="color: #808030;">=</span> to_ndarray_<span style="color: #808030;">-</span><span style="color: #808030;">></span>convert<span style="color: #808030;">(</span>input<span style="color: #808030;">)</span><span style="color: purple;">;</span>
data<span style="color: #808030;">.</span>CopyTo<span style="color: #808030;">(</span><span style="color: #808030;">&</span>executor_<span style="color: #808030;">-</span><span style="color: #808030;">></span>arg_dict<span style="color: #808030;">(</span><span style="color: #808030;">)</span><span style="color: #808030;">[</span><span style="color: maroon;">"</span><span style="color: #0000e6;">data</span><span style="color: maroon;">"</span><span style="color: #808030;">]</span><span style="color: #808030;">)</span><span style="color: purple;">;</span>
executor_<span style="color: #808030;">-</span><span style="color: #808030;">></span>Forward<span style="color: #808030;">(</span><span style="color: maroon; font-weight: bold;">false</span><span style="color: #808030;">)</span><span style="color: purple;">;</span>
cv<span style="color: purple;">::</span>Mat_<span style="color: purple;"><</span><span style="color: maroon; font-weight: bold;">float</span><span style="color: purple;">></span> result<span style="color: #808030;">(</span><span style="color: #008c00;">1</span><span style="color: #808030;">,</span> <span style="color: #008c00;">2048</span><span style="color: #808030;">)</span><span style="color: purple;">;</span>
<span style="color: maroon; font-weight: bold;">if</span><span style="color: #808030;">(</span><span style="color: #808030;">!</span>executor_<span style="color: #808030;">-</span><span style="color: #808030;">></span>outputs<span style="color: #808030;">.</span>empty<span style="color: #808030;">(</span><span style="color: #808030;">)</span><span style="color: #808030;">)</span><span style="color: purple;">{</span>
<span style="color: dimgrey;">//copy data to cpu by synchronize api since </span></pre>
<pre style="background: rgb(255, 255, 255) none repeat scroll 0% 0%; color: black;"><span style="color: dimgrey;"> //Forward api of mxnet is async</span>
executor_<span style="color: #808030;">-</span><span style="color: #808030;">></span>outputs<span style="color: #808030;">[</span><span style="color: #008c00;">0</span><span style="color: #808030;">]</span><span style="color: #808030;">.</span>SyncCopyToCPU<span style="color: #808030;">(</span>result<span style="color: #808030;">.</span>ptr<span style="color: purple;"><</span><span style="color: maroon; font-weight: bold;">float</span><span style="color: purple;">></span><span style="color: #808030;">(</span><span style="color: #808030;">)</span><span style="color: #808030;">,</span> <span style="color: #008c00;">2048</span><span style="color: #808030;">)</span><span style="color: purple;">;</span>
<span style="color: purple;">}</span>
<span style="color: maroon; font-weight: bold;">return</span> result<span style="color: purple;">;</span>
<span style="color: purple;">}</span></pre>
<pre style="background: rgb(255, 255, 255) none repeat scroll 0% 0%; color: black;"><span style="color: purple;"> </span></pre>
<pre style="background: rgb(255, 255, 255) none repeat scroll 0% 0%; color: black;"></pre>
<pre style="background: rgb(255, 255, 255) none repeat scroll 0% 0%; color: black;"></pre>
<h3 style="background: rgb(255, 255, 255) none repeat scroll 0% 0%; text-align: center;">
<u><span style="color: blue;"><span style="font-family: "times" , "times new roman" , serif;">5. Find out most similar persons from the features pool</span></span></u></h3>
<pre style="background: rgb(255, 255, 255) none repeat scroll 0% 0%; color: black;"><span style="font-family: "times" , "times new roman" , serif;"> I use cosine similarity to compare two features in this experiment.</span></pre>
<pre style="background: rgb(255, 255, 255) none repeat scroll 0% 0%; color: black;"><span style="color: maroon; font-weight: bold;">float</span> cosine_similarity<span style="color: purple;">::</span>
compare_feature<span style="color: #808030;">(</span><span style="color: maroon; font-weight: bold;">const</span> cv<span style="color: purple;">::</span>Mat_<span style="color: purple;"><</span><span style="color: maroon; font-weight: bold;">float</span><span style="color: purple;">></span> <span style="color: #808030;">&</span>lhs<span style="color: #808030;">,</span> <span style="color: maroon; font-weight: bold;">const</span> cv<span style="color: purple;">::</span>Mat_<span style="color: purple;"><</span><span style="color: maroon; font-weight: bold;">float</span><span style="color: purple;">></span> <span style="color: #808030;">&</span>rhs<span style="color: #808030;">)</span>
<span style="color: purple;">{</span>
cv<span style="color: purple;">::</span>multiply<span style="color: #808030;">(</span>lhs<span style="color: #808030;">,</span> rhs<span style="color: #808030;">,</span> numerator_<span style="color: #808030;">)</span><span style="color: purple;">;</span>
cv<span style="color: purple;">::</span><span style="color: #603000;">pow</span><span style="color: #808030;">(</span>lhs<span style="color: #808030;">,</span> <span style="color: #008c00;">2</span><span style="color: #808030;">,</span> lhs_pow_<span style="color: #808030;">)</span><span style="color: purple;">;</span>
cv<span style="color: purple;">::</span><span style="color: #603000;">pow</span><span style="color: #808030;">(</span>rhs<span style="color: #808030;">,</span> <span style="color: #008c00;">2</span><span style="color: #808030;">,</span> rhs_pow_<span style="color: #808030;">)</span><span style="color: purple;">;</span>
<span style="color: maroon; font-weight: bold;">auto</span> <span style="color: maroon; font-weight: bold;">const</span> numerator <span style="color: #808030;">=</span> cv<span style="color: purple;">::</span>sum<span style="color: #808030;">(</span>numerator_<span style="color: #808030;">)</span><span style="color: #808030;">[</span><span style="color: #008c00;">0</span><span style="color: #808030;">]</span><span style="color: purple;">;</span>
<span style="color: maroon; font-weight: bold;">auto</span> <span style="color: maroon; font-weight: bold;">const</span> denom_lhs <span style="color: #808030;">=</span> cv<span style="color: purple;">::</span>sum<span style="color: #808030;">(</span>lhs_pow_<span style="color: #808030;">)</span><span style="color: #808030;">[</span><span style="color: #008c00;">0</span><span style="color: #808030;">]</span><span style="color: purple;">;</span>
<span style="color: maroon; font-weight: bold;">auto</span> <span style="color: maroon; font-weight: bold;">const</span> denom_rhs <span style="color: #808030;">=</span> cv<span style="color: purple;">::</span>sum<span style="color: #808030;">(</span>rhs_pow_<span style="color: #808030;">)</span><span style="color: #808030;">[</span><span style="color: #008c00;">0</span><span style="color: #808030;">]</span><span style="color: purple;">;</span>
<span style="color: maroon; font-weight: bold;">auto</span> <span style="color: maroon; font-weight: bold;">const</span> denominator <span style="color: #808030;">=</span> <span style="color: #666616;">std</span><span style="color: purple;">::</span><span style="color: #603000;">sqrt</span><span style="color: #808030;">(</span>denom_lhs <span style="color: #808030;">*</span> denom_rhs<span style="color: #808030;">)</span><span style="color: purple;">;</span>
<span style="color: maroon; font-weight: bold;">return</span> <span style="color: maroon; font-weight: bold;">static_cast</span><span style="color: purple;"><</span><span style="color: maroon; font-weight: bold;">float</span><span style="color: purple;">></span><span style="color: #808030;">(</span>denominator <span style="color: #808030;">!</span><span style="color: #808030;">=</span> <span style="color: green;">0.0</span> <span style="color: purple;">?</span> numerator <span style="color: #808030;">/</span>
denominator <span style="color: purple;">:</span> <span style="color: green;">0.0</span><span style="color: #808030;">)</span><span style="color: purple;">;</span>
<span style="color: purple;">}</span>
</pre>
<!--Created using ToHtml.com on 2019-02-05 11:37:15 UTC-->
<!--Created using ToHtml.com on 2019-02-05 08:04:19 UTC-->
<br />
<pre style="background: rgb(255, 255, 255) none repeat scroll 0% 0%; color: black;"><span style="font-family: "times" , "times new roman" , serif;"> </span></pre>
<pre style="background: rgb(255, 255, 255) none repeat scroll 0% 0%; color: black;"><span style="font-family: "times" , "times new roman" , serif;"> Then find out the most similar features in the db, return the id in the db if similarity value greater</span></pre>
<pre style="background: rgb(255, 255, 255) none repeat scroll 0% 0%; color: black;"><span style="font-family: "times" , "times new roman" , serif;">than threshold, else create a new id and return it.</span></pre>
<pre style="background: rgb(255, 255, 255) none repeat scroll 0% 0%; color: black;"><span style="font-family: "times" , "times new roman" , serif;"> </span></pre>
<pre style="background: rgb(255, 255, 255) none repeat scroll 0% 0%; color: black;"><span style="color: #666616;">std</span><span style="color: purple;">::</span><span style="color: #603000;">vector</span><span style="color: purple;"><</span>visitor_identify<span style="color: purple;">::</span>visitor_info<span style="color: purple;">></span> visitor_identify<span style="color: purple;">::</span>
detect_and_identify_visitors<span style="color: #808030;">(</span><span style="color: maroon; font-weight: bold;">const</span> cv<span style="color: purple;">::</span>Mat <span style="color: #808030;">&</span>input<span style="color: #808030;">)</span>
<span style="color: purple;">{</span>
<span style="color: dimgrey;">//detect persons in the input</span>
obj_det_<span style="color: #808030;">-</span><span style="color: #808030;">></span>forward<span style="color: #808030;">(</span>input<span style="color: #808030;">)</span><span style="color: purple;">;</span>
<span style="color: maroon; font-weight: bold;">auto</span> <span style="color: maroon; font-weight: bold;">const</span> input_size <span style="color: #808030;">=</span> cv<span style="color: purple;">::</span>Size<span style="color: #808030;">(</span>input<span style="color: #808030;">.</span>cols<span style="color: #808030;">,</span> input<span style="color: #808030;">.</span>rows<span style="color: #808030;">)</span><span style="color: purple;">;</span>
<span style="color: maroon; font-weight: bold;">auto</span> <span style="color: maroon; font-weight: bold;">const</span> detect_results <span style="color: #808030;">=</span> obj_filter_<span style="color: #808030;">-</span><span style="color: #808030;">></span>filter<span style="color: #808030;">(</span>obj_det_<span style="color: #808030;">-</span><span style="color: #808030;">></span>get_outputs<span style="color: #808030;">(</span><span style="color: #808030;">)</span><span style="color: #808030;">,</span> input_size <span style="color: #808030;">)</span><span style="color: purple;">;</span>
<span style="color: #666616;">std</span><span style="color: purple;">::</span><span style="color: #603000;">vector</span><span style="color: purple;"><</span>visitor_info<span style="color: purple;">></span> result<span style="color: purple;">;</span>
<span style="color: maroon; font-weight: bold;">for</span><span style="color: #808030;">(</span><span style="color: maroon; font-weight: bold;">auto</span> <span style="color: maroon; font-weight: bold;">const</span> <span style="color: #808030;">&</span>det <span style="color: purple;">:</span> detect_results<span style="color: #808030;">)</span><span style="color: purple;">{</span>
<span style="color: dimgrey;">//extract features from the person</span>
<span style="color: maroon; font-weight: bold;">auto</span> <span style="color: maroon; font-weight: bold;">const</span> feature <span style="color: #808030;">=</span>
feature_extract_<span style="color: #808030;">-</span><span style="color: #808030;">></span>get_features<span style="color: #808030;">(</span>input<span style="color: #808030;">(</span>det<span style="color: #808030;">.</span>roi_<span style="color: #808030;">)</span><span style="color: #808030;">.</span>clone<span style="color: #808030;">(</span><span style="color: #808030;">)</span><span style="color: #808030;">)</span><span style="color: purple;">;</span>
<span style="color: dimgrey;">//find most similar features in the database</span>
<span style="color: maroon; font-weight: bold;">auto</span> <span style="color: maroon; font-weight: bold;">const</span> id_info <span style="color: #808030;">=</span> db_<span style="color: #808030;">-</span><span style="color: #808030;">></span>find_most_similar_id<span style="color: #808030;">(</span>feature<span style="color: #808030;">)</span><span style="color: purple;">;</span>
visitor_info vinfo<span style="color: purple;">;</span>
vinfo<span style="color: #808030;">.</span>roi_ <span style="color: #808030;">=</span> det<span style="color: #808030;">.</span>roi_<span style="color: purple;">;</span>
<span style="color: dimgrey;">//if the confident(similarity) of the most similar features </span>
<span style="color: dimgrey;">//were greather than the threshold</span>
<span style="color: dimgrey;">//return the id found in the db, else add a new id and return it</span>
<span style="color: maroon; font-weight: bold;">if</span><span style="color: #808030;">(</span>id_info<span style="color: #808030;">.</span>confident_ <span style="color: #808030;">></span> re_id_threshold_<span style="color: #808030;">)</span><span style="color: purple;">{</span>
vinfo<span style="color: #808030;">.</span>id_ <span style="color: #808030;">=</span> id_info<span style="color: #808030;">.</span>id_<span style="color: purple;">;</span>
vinfo<span style="color: #808030;">.</span>confidence_ <span style="color: #808030;">=</span> id_info<span style="color: #808030;">.</span>confident_<span style="color: purple;">;</span>
<span style="color: purple;">}</span><span style="color: maroon; font-weight: bold;">else</span><span style="color: purple;">{</span>
<span style="color: maroon; font-weight: bold;">auto</span> <span style="color: maroon; font-weight: bold;">const</span> new_id <span style="color: #808030;">=</span> db_<span style="color: #808030;">-</span><span style="color: #808030;">></span>add_new_id<span style="color: #808030;">(</span>feature<span style="color: #808030;">)</span><span style="color: purple;">;</span>
vinfo<span style="color: #808030;">.</span>id_ <span style="color: #808030;">=</span> new_id<span style="color: purple;">;</span>
vinfo<span style="color: #808030;">.</span>confidence_ <span style="color: #808030;">=</span> <span style="color: green;">1.0</span><span style="color: #006600;">f</span><span style="color: purple;">;</span>
<span style="color: purple;">}</span>
result<span style="color: #808030;">.</span>emplace_back<span style="color: #808030;">(</span><span style="color: #666616;">std</span><span style="color: purple;">::</span><span style="color: #603000;">move</span><span style="color: #808030;">(</span>vinfo<span style="color: #808030;">)</span><span style="color: #808030;">)</span><span style="color: purple;">;</span>
<span style="color: purple;">}</span>
<span style="color: maroon; font-weight: bold;">return</span> result<span style="color: purple;">;</span>
<span style="color: purple;">}</span> </pre>
<pre style="background: rgb(255, 255, 255) none repeat scroll 0% 0%; color: black;"> </pre><br /><pre style="background: rgb(255, 255, 255) none repeat scroll 0% 0%; color: black;"> </pre>
<!--Created using ToHtml.com on 2019-02-05 11:41:13 UTC-->ThamBloghttp://www.blogger.com/profile/12617350461752184349noreply@blogger.com3tag:blogger.com,1999:blog-4702230343097536610.post-51882732530705908502019-01-13T00:47:00.002-08:002021-02-26T08:37:14.183-08:00Deep learning 12-Train a detector based on yolo v3(by gluoncv) by custom data GluonCV come with lots of useful pretrained model for object detection, including ssd, yolo v3 and faster-rcnn. Their website come with an example to show you how to fine tune your own data set with ssd, but they do not show us how to do it with yolo v3. If you were like me, struggling in training your custom data with yolo v3, this post may ease your pain since I already modify the script to help you train your custom data.<br />
<br />
<h3 style="text-align: center;">
<u><span style="color: blue;">1.Select a tool to draw your bounding box</span></u></h3>
<br />
I use <a href="https://github.com/tzutalin/labelImg">labelImg</a> for my purpose, easy to install and use on windows and ubuntu.<br />
<br />
<h3 style="text-align: center;">
<span style="color: blue;"><u>2.Convert the xml files generated by labelImg to lst format</u></span></h3>
<br />
I write some small classes to perform the conversion task, you can find them on <a href="https://github.com/stereomatchingkiss/blogCodes2/tree/master/mxnet_train_object_detector/generate_face_and_human_labels">github</a>, if you cannot compile it, please open an issue at github.You don't need opencv and mxnet for this task after all.<br />
<br />
<h3 style="text-align: center;">
<u><span style="color: blue;">3.Convert lst file to rec format</span></u></h3>
<br />
Follow the instructions <a href="https://gluon-cv.mxnet.io/build/examples_datasets/detection_custom.html#sphx-glr-build-examples-datasets-detection-custom-py">at here</a>, study how to use im2rec.py should be enough.<br />
<br />
<h3 style="text-align: center;">
<b><u><span style="color: blue;"> 4. Adjust the train_yolo3.py, make it able to read file with rec format</span></u></b></h3>
<br />
I do this part for you already, you can download the script(train_yolo3_custom.py) from <a href="https://github.com/stereomatchingkiss/blogCodes2/blob/master/mxnet_train_object_detector/trainer/train_yolo3_custom.py">github</a>. Before you use that, you will need to<br />
<br />
<ol>
<li>Copy voc_detection.py on <a href="https://github.com/dmlc/gluon-cv/blob/master/gluoncv/utils/metrics/voc_detection.py" rel="nofollow noopener">github</a>
</li>
<li>Change the file name to voc_detection_2.py</li>
<li>Move it to the folder of gluoncv.utils.metrics(mine is C:\my_folder\Anaconda3\Lib\site-packages\gluoncv\utils\metrics)</li>
<li>Change the codes from <code>from gluoncv.utils.metrics.voc_detection import VOC07MApMetric</code> to <code>from gluoncv.utils.metrics.voc_detection_2 import VOC07MApMetric </code>
</li>
</ol>
This is because the voc_detection.py Anaconda on windows got bug, if your voc_detection.py is fine, you can omit this step and change <code>from gluoncv.utils.metrics.voc_detection_2 to import VOC07MApMetric </code><span style="font-family: inherit;"><code></code></span><code><code>gluoncv.utils.metrics.voc_detection import VOC07MApMetric</code></code><br />
<br />
<code><code> <span style="font-family: "georgia" , "times new roman" , serif;">You can prefer to install nightly release too if you want to save some troubles.</span></code></code><br />
<br />
<h3 style="text-align: center;">
<code><code></code></code><u><span style="color: blue;"><b>5.Enter command to train your data</b></span></u></h3>
<br />
I add a few command line options in this script, they are<br />
<br />
--train_dataset : Location of the rec file for training<br />
--validate_dataset: Location of the rec file validate<br />
--pretrained: If you enter this, it will use the pretrained weights of coco dataset, else only use the pretrained weights of imageNet.<br />
--classes_list : Location of the file with the name of classes. Every line present one class, each class should match with their own id. Example :<br />
<br />
<br />
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEheJXfh9Gau8lnpQdNmc-D1YX5E73HkBBaJ6wIT_mUkiXo5uI7BJa8PqL94A0HQqIvt0uxJ3F2nAXNIuRBE27AWdb3mAo1rErRH2hDck6bJSg4vqPEIoyXTNulEYXegAlfzj6F4X2DwDGg/s1600/Capture.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="150" data-original-width="206" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEheJXfh9Gau8lnpQdNmc-D1YX5E73HkBBaJ6wIT_mUkiXo5uI7BJa8PqL94A0HQqIvt0uxJ3F2nAXNIuRBE27AWdb3mAo1rErRH2hDck6bJSg4vqPEIoyXTNulEYXegAlfzj6F4X2DwDGg/s1600/Capture.PNG" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;"><span face=""trebuchet ms" , sans-serif">pic00</span></td></tr>
</tbody></table>
<br />
<br />
ID of face is 0 so it is put at line 0, ID of person is 1 so it is put at line 1<br />
<br />
Example : python train_yolo3_custom.py --epochs 20 --lr 0.0001 --train_dataset face_person.rec --validate_dataset face_person.rec --classes_list face_person_list.txt --batch-size 3 --val-interval 5 --mixup<br />
<div style="text-align: center;">
<h3>
<u><span style="color: blue;"><b>6.Tips</b></span></u></h3>
</div>
<br />
1. If you do not enter --no-random-shape, you better make your learning rate lower(ex : 0.0001 instead of 0.001), else it is very easy to explode(loss become nan).<br />
2. Not every dataset works better with random-shape, run a few epoch(ex : 5) with smaller data(ex , 300~400 images) to find out which parameters works well.<br />
3. Enable random-shape will eat much more ram, without it I can set my batch-size as 8, with it I could only set my batch-size as 3.<br />
<br />
<h3 style="text-align: center;">
<u><span style="color: blue;"><b>7. Measure performance</b></span></u></h3>
<br />
In order to measure the performance, we need a test set,unfortunately there do not exist a test set which designed for human and face detection, therefore I pick two data sets to measure the performance of the trained model. <a href="http://vis-www.cs.umass.edu/fddb/">FDDB</a> for face detection(the label of FDDB is more like head rather than face), <a href="http://host.robots.ox.ac.uk/pascal/VOC/voc2012/index.html#devkit">Pascal Voc</a> for human detection. You can find the validate_yolo3.py at github.<br />
<br />
The model I used train with 40 epoch, the mAP of this on model on the training set is close to 0.9. Both of the experiments are based on IOU = 0.5.<br />
<h4 style="text-align: center;">
<u><b><span style="color: blue;">7.1 Performance of face detection</span></b></u></h4>
<br />
mAP close to 1.0 when IOU is 0.5, this looks too good for real, let us check the inference results by our eyes to find out what is happening. Following images are inference with input-shape as 320 with the model trained with 40 epochs.<br />
<br />
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhuA90MTbOjeV0U8ZIxJBWkEwqp-Mkx6Vo_Sk_DtVe4iapcSfXWrcUY8pr8L10hf0_K4_Yp0qVD4hj2PIoiNYjtw6oyQ5cAu7KwMXBTEuBSGvniwR2ISmsvPXCIx9JfnKTMYgpEx795EYk/s1600/Capture.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="335" data-original-width="427" height="251" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhuA90MTbOjeV0U8ZIxJBWkEwqp-Mkx6Vo_Sk_DtVe4iapcSfXWrcUY8pr8L10hf0_K4_Yp0qVD4hj2PIoiNYjtw6oyQ5cAu7KwMXBTEuBSGvniwR2ISmsvPXCIx9JfnKTMYgpEx795EYk/s320/Capture.PNG" width="320" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">pic01(2002/08/11/big/img_534.jpg)</td></tr>
</tbody></table>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhE9ZZuUX_VmrS-7bGS_XU51NMaNOs8A_y4Jq95iExzX6cntFfcfLJJO-9C68Yc4PFfp0KgMxMxD_6Df5PJoGP19i99sr_bRBzudP3-MsAhKxlw62x7KolJSWwXJi_LO3C3kRQQrFhE-hM/s1600/Capture.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="375" data-original-width="250" height="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhE9ZZuUX_VmrS-7bGS_XU51NMaNOs8A_y4Jq95iExzX6cntFfcfLJJO-9C68Yc4PFfp0KgMxMxD_6Df5PJoGP19i99sr_bRBzudP3-MsAhKxlw62x7KolJSWwXJi_LO3C3kRQQrFhE-hM/s320/Capture.PNG" width="213" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">pic02(2002/08/11/big/img_558.jpg)</td></tr>
</tbody></table>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhDp73TJ_NwCjTxLjOnz_iI56FUMuN8XTBuIPzPW1873o9g3CiARRpZ6pUP5B-aj0DONm2uSz-1decvnOUySAYnbGhLGeVn0MQ-RnT5RbXPWzkepgqqcP83Lznb5BaGe6vqwwmdHY5qszI/s1600/Capture.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="321" data-original-width="496" height="207" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhDp73TJ_NwCjTxLjOnz_iI56FUMuN8XTBuIPzPW1873o9g3CiARRpZ6pUP5B-aj0DONm2uSz-1decvnOUySAYnbGhLGeVn0MQ-RnT5RbXPWzkepgqqcP83Lznb5BaGe6vqwwmdHY5qszI/s320/Capture.PNG" width="320" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">pic03(2002/08/11/big/img_570.jpg)</td></tr>
</tbody></table>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgVub-A03LzACVANQYMGbUwBIM7PTsGe9rtZ39_Wxs97f11_OrSI9wCqHlZNST8F37KP2DGORPwE5kh_gN88_Cpy81J2d9b1JmMi8I3gcY3wTcA-i0yxu9YrwRT5O0vOypllqkbriQDB8k/s1600/Capture.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="375" data-original-width="468" height="256" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgVub-A03LzACVANQYMGbUwBIM7PTsGe9rtZ39_Wxs97f11_OrSI9wCqHlZNST8F37KP2DGORPwE5kh_gN88_Cpy81J2d9b1JmMi8I3gcY3wTcA-i0yxu9YrwRT5O0vOypllqkbriQDB8k/s320/Capture.PNG" width="320" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">pic04(2002/08/11/big/img_58.jpg)</td></tr>
</tbody></table>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg84MEgRD1JTMZm0fVntBLg5CVfFYKBFpy60IBxF5hnT23eTAt4AJARPP3N7x13Pf7KAuB_cun2vGpc_UkJrnbe2PmsfXLj9IQSN4sO0qgWjed7qZsIIx4ZM1h403vL3XK6t-017r2dVMM/s1600/Capture.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="369" data-original-width="258" height="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg84MEgRD1JTMZm0fVntBLg5CVfFYKBFpy60IBxF5hnT23eTAt4AJARPP3N7x13Pf7KAuB_cun2vGpc_UkJrnbe2PmsfXLj9IQSN4sO0qgWjed7qZsIIx4ZM1h403vL3XK6t-017r2dVMM/s320/Capture.PNG" width="223" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">pic05(2002/08/11/big/img_726.jpg)</td></tr>
</tbody></table>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiNxDl-4cNQN_kjCNf3DlLb4c3jAffNg8hzsIjpcQDq6R-8X51KjxZ1FYfwW0-7ispfawT7DtTtEgOvD08mtgAs-UWdkPulwPJcWawV-O321fE40T1RiNgjVWLSz4KJFgqAo0a81sFx3fM/s1600/Capture.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="372" data-original-width="336" height="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiNxDl-4cNQN_kjCNf3DlLb4c3jAffNg8hzsIjpcQDq6R-8X51KjxZ1FYfwW0-7ispfawT7DtTtEgOvD08mtgAs-UWdkPulwPJcWawV-O321fE40T1RiNgjVWLSz4KJFgqAo0a81sFx3fM/s320/Capture.PNG" width="289" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">pic06(2002/08/11/big/img_752.jpg)</td></tr>
</tbody></table>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEikVG0ERMsQ5XpmutbYok6sL0nlCglEuZvJ8tQOR8qEXeXtDe38Vo49cD5HH-f5WgQg0wie4pEfkb-GBAno-0AUUDenqA3d5yLuqLGga9wUXMl6kJXFMuJ-_TY0cfWWkzbuCEZCXbcDQ0s/s1600/Capture.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="353" data-original-width="498" height="226" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEikVG0ERMsQ5XpmutbYok6sL0nlCglEuZvJ8tQOR8qEXeXtDe38Vo49cD5HH-f5WgQg0wie4pEfkb-GBAno-0AUUDenqA3d5yLuqLGga9wUXMl6kJXFMuJ-_TY0cfWWkzbuCEZCXbcDQ0s/s320/Capture.PNG" width="320" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">pic07(2002/08/11/big/img_478.jpg)</td></tr>
</tbody></table>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiWpDBw14LUFX9DgWhFynQeQZv93QKJrV5TQav32h5gxoPc2rRBMCeuEGDQ9tO3TUt9ldkVWq6wysu2SN4TGr7NsvaJ9bYbO5cbwB3Narq6vrCfgKCzlIc34RHUq9WTA2aZeTbkBwjfl2E/s1600/Capture.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="306" data-original-width="496" height="197" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiWpDBw14LUFX9DgWhFynQeQZv93QKJrV5TQav32h5gxoPc2rRBMCeuEGDQ9tO3TUt9ldkVWq6wysu2SN4TGr7NsvaJ9bYbO5cbwB3Narq6vrCfgKCzlIc34RHUq9WTA2aZeTbkBwjfl2E/s320/Capture.PNG" width="320" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">pic08(2002/08/11/big/img_492.jpg)</td></tr>
</tbody></table>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjB_wuk71_6rEYlb-KWMsojQOeXl95LojexvaJZgP-tldA4jbBVk0jU2DqUCPsWi8xmMgWSm6ncgW3sDXGdOj8Zf2bNYX9iRxL9BR3NaqZZ3b7a-osQ5V98cHFth4IzhmbxwCRugF4j6JI/s1600/Capture.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="292" data-original-width="503" height="185" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjB_wuk71_6rEYlb-KWMsojQOeXl95LojexvaJZgP-tldA4jbBVk0jU2DqUCPsWi8xmMgWSm6ncgW3sDXGdOj8Zf2bNYX9iRxL9BR3NaqZZ3b7a-osQ5V98cHFth4IzhmbxwCRugF4j6JI/s320/Capture.PNG" width="320" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">pic09(2002/08/11/big/img_496.jpg)</td></tr>
</tbody></table>
<br />
<div class="separator" style="clear: both; text-align: center;">
</div>
<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
</div>
From these pictures(pic01~pic09), we know the model works quite well.<br />
<div class="separator" style="clear: both; text-align: center;">
</div>
<br />
<div class="separator" style="clear: both; text-align: center;">
</div>
Usually mAP of test would not higher than training set, either this test set is <span style="color: blue;">far too easy</span> than the training set, or this detector is <span style="color: blue;">overfit </span>to this test set. No matter what, this test set is not good enough to measure the performance of the face detector. <br />
<h4 style="text-align: center;">
<span style="color: blue;"><u><b>7.2 Performance of person detection</b></u></span></h4>
<br />
Unlike face detector, mAP of person detector only got 0.583mAP on the images listed by person_val.txt(I only apply on the images contain person), there are still a big room to improve the accuracy. <br />
<br />
Adding more data may improve the performance since this test results tell us this model got high variance, in order to find out what kind of data we should add, one of the solution is study the mis-classify or the person cannot detected by eyes, then write down the reasons.<br />
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgptG1CdxsnBixqo6qINEMW-N_i2jvRTpXl7hI3jcdgfZLwZSnGcVd7w2nWtQuIBNCU00I9RdtjvwFSrZ28UWqGo5Z7ZbMb5gyLoaOm8G_ZExD90pgWRseZfJVe1u90pkkeN1iEXw0Cq3M/s1600/Capture.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="334" data-original-width="504" height="212" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgptG1CdxsnBixqo6qINEMW-N_i2jvRTpXl7hI3jcdgfZLwZSnGcVd7w2nWtQuIBNCU00I9RdtjvwFSrZ28UWqGo5Z7ZbMb5gyLoaOm8G_ZExD90pgWRseZfJVe1u90pkkeN1iEXw0Cq3M/s320/Capture.PNG" width="320" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">pic10(2008_000003.jpg)</td></tr>
</tbody></table>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhA0uhWxsvGZZipXM0h7JsP5rw6wZydqIFUJOkZw0v5_mWMrO7pPy7qc3zxvryYHrKRU56IrTSr_Z4F9tq9-gKsk7SVqtabKCHa_R_PehIwbbxjZB-jSTlM6mEUbyT39aJFJmbvI5WirlU/s1600/Capture.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="354" data-original-width="494" height="229" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhA0uhWxsvGZZipXM0h7JsP5rw6wZydqIFUJOkZw0v5_mWMrO7pPy7qc3zxvryYHrKRU56IrTSr_Z4F9tq9-gKsk7SVqtabKCHa_R_PehIwbbxjZB-jSTlM6mEUbyT39aJFJmbvI5WirlU/s320/Capture.PNG" width="320" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">pic11(2008_000032.jpg)</td></tr>
</tbody></table>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhhqLCEzjaO22A2Zf7C5JVX8Am4Sw-XgmsPT_kGkJ0mI09IM52i0czRgwPfGKsi9T4cpNI0my3aIADg27oXb4_oLzDMxPWFDfMW1nUdrD7fOHB7VGy4Q7eVL1O71CFiQd_8FxyAwRayjHE/s1600/Capture.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="368" data-original-width="498" height="236" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhhqLCEzjaO22A2Zf7C5JVX8Am4Sw-XgmsPT_kGkJ0mI09IM52i0czRgwPfGKsi9T4cpNI0my3aIADg27oXb4_oLzDMxPWFDfMW1nUdrD7fOHB7VGy4Q7eVL1O71CFiQd_8FxyAwRayjHE/s320/Capture.PNG" width="320" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">pic12(2008_000051.jpg)</td></tr>
</tbody></table>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjDjK9HH6NKO03k0ebKlzjESH7OH-6KzwiCYZZHtTr-wnryjxKJg8ixDwSNX8lLWKGaquUeSHZCbl6DlwyZc6SAwChZIT83cRmScdsBRcXq_sUT7kmjRV91xMuuteh3mnzC92H7yxtF0hk/s1600/Capture.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="372" data-original-width="494" height="240" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjDjK9HH6NKO03k0ebKlzjESH7OH-6KzwiCYZZHtTr-wnryjxKJg8ixDwSNX8lLWKGaquUeSHZCbl6DlwyZc6SAwChZIT83cRmScdsBRcXq_sUT7kmjRV91xMuuteh3mnzC92H7yxtF0hk/s320/Capture.PNG" width="320" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">pic13(2008_000082.jpg)</td></tr>
</tbody></table>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjnksi0AC8OokwPfwH1UQiBzc0k-UVtAkF8dKwv0MhwSeCbdk08H9dRzePNiub5GRoDiusK2wpk1bTkFS6Ohui7cwGxXpbCfgM0pkCbYwDG43qMXrDcBSHegb299nzESP6S9iAqUuAa3Ns/s1600/Capture.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="368" data-original-width="398" height="295" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjnksi0AC8OokwPfwH1UQiBzc0k-UVtAkF8dKwv0MhwSeCbdk08H9dRzePNiub5GRoDiusK2wpk1bTkFS6Ohui7cwGxXpbCfgM0pkCbYwDG43qMXrDcBSHegb299nzESP6S9iAqUuAa3Ns/s320/Capture.PNG" width="320" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">pic14(2008_000138.jpg)</td></tr>
</tbody></table>
After we gather the data, we can create a table, list out the name of the images and describe the errors(pic15, a small example).<br />
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhlGPbZVfw9zJPTlRbEZWr7ICI6E73E8urqrqdjur3R1XM-56tZ_JakHM6fSvkI3_Xu44Wqa_WtDPWNH4ysHGJodlbkb63IYSGUOhjfQDEo8P2nkkDH034lOyf0-1KHPHqMghCwW_xPh4U/s1600/Capture.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="152" data-original-width="1047" height="56" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhlGPbZVfw9zJPTlRbEZWr7ICI6E73E8urqrqdjur3R1XM-56tZ_JakHM6fSvkI3_Xu44Wqa_WtDPWNH4ysHGJodlbkb63IYSGUOhjfQDEo8P2nkkDH034lOyf0-1KHPHqMghCwW_xPh4U/s400/Capture.PNG" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">pic15</td></tr>
</tbody></table>
With the helps of error analysis like this, we could find out which part we should put more focus into, and what kind of data we should collect. From the experiments we can find out accuracy of yolo v3 is very high, although recall got lots of space to improve.<br />
<h3 style="text-align: center;">
<u><b><span style="color: blue;">8. Model and data</span></b></u></h3>
<br />
You can find the <b><a href="https://mega.nz/#F!5kM3zIQS!O8tO8iGePnx60ACFoiWWLQ">model and data at mega</a></b>. I do not put the data with images but the annotations only, you need to download the images by yourself(since I worry this may have legal issues if I publish the data). Not only added the bounding boxes for person, I also adjust the bounding boxes of faces a lot, original bounding boxes provided by kaggle are more like designed for "head detector" rather than "face detector".<br />
<br />
<h3 style="text-align: center;">
<u><b><span style="color: blue;">9. Conclusion</span></b></u></h3>
<br />
This detector got a lot of space to improve, especially the mAP of human, but it will take a lot of times to do it so I decide to stop at here. My annotations of the humans, you can find a lot of them are overlap with other person a lot, bounding boxes without overlap with the person may help the models detect more persons.<br />
<br />
You can use the annotations and model as your free will, do me a favor by reference this site if you do use them, thanks.<br />
<br />
The source codes can find at <b><a href="https://github.com/stereomatchingkiss/blogCodes2/tree/master/mxnet_train_object_detector">github</a></b>.<br />
<br /><br />
<br />
<br />
<br />
<br />
<br />
<div>
<br />
</div>
<div>
<div>
<br /></div>
<div>
<br /></div>
</div>
ThamBloghttp://www.blogger.com/profile/12617350461752184349noreply@blogger.com1tag:blogger.com,1999:blog-4702230343097536610.post-83101557631083097722018-10-04T05:04:00.001-07:002021-02-26T08:37:36.775-08:00Person detection(Yolo v3) with the helps of mxnet, able to run on gpu/cpu In this post I will show you how to do object detection with the helps of the cpp-package of mxnet. Why do I introduce mxnet? Because following advantages make it a decent library for standalone project development<br />
<br />
1. It is open source and royalty free<br />
2. Got decent support for GPU/CPU<br />
3. Scaling efficiently to multiple GPU and machines<br />
4. Support cpp api, which means you do not need to ask the users to install python environment , shipped the source codes in order to run your apps<br />
5. mxnet support many platforms, including windows, linux, mac, aws, android, ios<br />
6. It got a lot of pre-trained models<br />
7. <a href="https://github.com/Microsoft/MMdnn">MMDNN </a>support mxnet, which mean we can convert the models trained by different libraries to mxnet(although not all of the models could be converted).<br />
<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<iframe allowfullscreen="" class="YOUTUBE-iframe-video" data-thumbnail-src="https://i.ytimg.com/vi/Uz-G-Zyb458/0.jpg" frameborder="0" height="266" src="https://www.youtube.com/embed/Uz-G-Zyb458?feature=player_embedded" width="320"></iframe></div>
<br />
<h3 style="text-align: center;">
<b><u><span style="color: blue;">Step 1 : Download model and convert it to the format can load by cpp package</span></u></b></h3>
1. Install anaconda(the version come with python3)<br />
2. <a href="https://anaconda.org/anaconda/mxnet">Install mxnet from the terminal of anaconda</a><br />
3. <a href="https://github.com/dmlc/gluon-cv">Install gluon-cv from the terminal of anaconda</a><br />
4. Download model and convert it by following scripts<br />
<br />
<br />
<!--HTML generated using hilite.me--><br />
<div style="background: rgb(255, 255, 255) none repeat scroll 0% 0%; border-width: 0.1em 0.1em 0.1em 0.8em; border: medium solid gray; overflow: auto; padding: 0.2em 0.6em; width: auto;">
<pre style="line-height: 125%; margin: 0px;"><span style="color: #008800; font-weight: bold;">import</span> <span style="color: #0e84b5; font-weight: bold;">gluoncv</span> <span style="color: #008800; font-weight: bold;">as</span> <span style="color: #0e84b5; font-weight: bold;">gcv</span>
<span style="color: #008800; font-weight: bold;">from</span> <span style="color: #0e84b5; font-weight: bold;">gluoncv.utils</span> <span style="color: #008800; font-weight: bold;">import</span> export_block
net <span style="color: #333333;">=</span> gcv<span style="color: #333333;">.</span>model_zoo<span style="color: #333333;">.</span>get_model(<span style="background-color: #fff0f0;">'yolo3_darknet53_coco'</span>, pretrained<span style="color: #333333;">=</span><span style="color: #008800; font-weight: bold;">True</span>)
export_block(<span style="background-color: #fff0f0;">'yolo3_darknet53_coco'</span>, net)
</pre>
</div>
<br />
<br />
<h3 style="text-align: center;">
<span style="color: blue;"><u> Step 2 : Load the models after convert</u></span></h3>
<!--HTML generated using hilite.me--><br />
<div style="background: rgb(255, 255, 255) none repeat scroll 0% 0%; border-width: 0.1em 0.1em 0.1em 0.8em; border: medium solid gray; overflow: auto; padding: 0.2em 0.6em; width: auto;">
<pre style="line-height: 125%; margin: 0px;"><span style="color: #333399; font-weight: bold;">void</span> <span style="color: #0066bb; font-weight: bold;">load_check_point</span>(std<span style="color: #333333;">::</span>string <span style="color: #008800; font-weight: bold;">const</span> <span style="color: #333333;">&</span>model_params,
std<span style="color: #333333;">::</span>string <span style="color: #008800; font-weight: bold;">const</span> <span style="color: #333333;">&</span>model_symbol,
Symbol <span style="color: #333333;">*</span>symbol,
std<span style="color: #333333;">::</span>map<span style="color: #333333;"><</span>std<span style="color: #333333;">::</span>string, NDArray<span style="color: #333333;">></span> <span style="color: #333333;">*</span>arg_params,
std<span style="color: #333333;">::</span>map<span style="color: #333333;"><</span>std<span style="color: #333333;">::</span>string, NDArray<span style="color: #333333;">></span> <span style="color: #333333;">*</span>aux_params,
Context <span style="color: #008800; font-weight: bold;">const</span> <span style="color: #333333;">&</span>ctx)
{
Symbol new_symbol <span style="color: #333333;">=</span> Symbol<span style="color: #333333;">::</span>Load(model_symbol);
std<span style="color: #333333;">::</span>map<span style="color: #333333;"><</span>std<span style="color: #333333;">::</span>string, NDArray<span style="color: #333333;">></span> params <span style="color: #333333;">=</span> NDArray<span style="color: #333333;">::</span>LoadToMap(model_params);
std<span style="color: #333333;">::</span>map<span style="color: #333333;"><</span>std<span style="color: #333333;">::</span>string, NDArray<span style="color: #333333;">></span> args;
std<span style="color: #333333;">::</span>map<span style="color: #333333;"><</span>std<span style="color: #333333;">::</span>string, NDArray<span style="color: #333333;">></span> auxs;
<span style="color: #008800; font-weight: bold;">for</span> (<span style="color: #008800; font-weight: bold;">auto</span> iter <span style="color: #333333;">:</span> params) {
std<span style="color: #333333;">::</span>string type <span style="color: #333333;">=</span> iter.first.substr(<span style="color: #0000dd; font-weight: bold;">0</span>, <span style="color: #0000dd; font-weight: bold;">4</span>);
std<span style="color: #333333;">::</span>string name <span style="color: #333333;">=</span> iter.first.substr(<span style="color: #0000dd; font-weight: bold;">4</span>);
<span style="color: #008800; font-weight: bold;">if</span> (type <span style="color: #333333;">==</span> <span style="background-color: #fff0f0;">"arg:"</span>)
args[name] <span style="color: #333333;">=</span> iter.second.Copy(ctx);
<span style="color: #008800; font-weight: bold;">else</span> <span style="color: #008800; font-weight: bold;">if</span> (type <span style="color: #333333;">==</span> <span style="background-color: #fff0f0;">"aux:"</span>)
auxs[name] <span style="color: #333333;">=</span> iter.second.Copy(ctx);
<span style="color: #008800; font-weight: bold;">else</span>
<span style="color: #008800; font-weight: bold;">continue</span>;
}
<span style="color: #333333;">*</span>symbol <span style="color: #333333;">=</span> new_symbol;
<span style="color: #333333;">*</span>arg_params <span style="color: #333333;">=</span> args;
<span style="color: #333333;">*</span>aux_params <span style="color: #333333;">=</span> auxs;
}
</pre>
</div>
<br />
<br />
You could use the load_check_point function as following<br />
<br />
<!--HTML generated using hilite.me--><br />
<div style="background: rgb(255, 255, 255) none repeat scroll 0% 0%; border-width: 0.1em 0.1em 0.1em 0.8em; border: medium solid gray; overflow: auto; padding: 0.2em 0.6em; width: auto;">
<pre style="line-height: 125%; margin: 0px;"> Symbol net;
std<span style="color: #333333;">::</span>map<span style="color: #333333;"><</span>std<span style="color: #333333;">::</span>string, NDArray<span style="color: #333333;">></span> args, auxs;
load_check_point(model_params, model_symbols, <span style="color: #333333;">&</span>net, <span style="color: #333333;">&</span>args, <span style="color: #333333;">&</span>auxs, context);
<span style="color: #888888;">//The shape of the input data must be the same, if you need different size,</span>
<span style="color: #888888;">//you could rebind the Executor or create a pool of Executor.</span>
<span style="color: #888888;">//In order to create input layer of the Executor, I make a dummy NDArray.</span>
<span style="color: #888888;">//The value of the "data" could be change later</span>
args[<span style="background-color: #fff0f0;">"data"</span>] <span style="color: #333333;">=</span> NDArray(Shape(<span style="color: #0000dd; font-weight: bold;">1</span>, <span style="color: #008800; font-weight: bold;">static_cast</span><span style="color: #333333;"><</span><span style="color: #333399; font-weight: bold;">unsigned</span><span style="color: #333333;">></span>(input_size.height),
<span style="color: #008800; font-weight: bold;">static_cast</span><span style="color: #333333;"><</span><span style="color: #333399; font-weight: bold;">unsigned</span><span style="color: #333333;">></span>(input_size.width), <span style="color: #0000dd; font-weight: bold;">3</span>), context);
executor_.reset(net.SimpleBind(context, args, std<span style="color: #333333;">::</span>map<span style="color: #333333;"><</span>std<span style="color: #333333;">::</span>string, NDArray<span style="color: #333333;">></span>(),
std<span style="color: #333333;">::</span>map<span style="color: #333333;"><</span>std<span style="color: #333333;">::</span>string, OpReqType<span style="color: #333333;">></span>(), auxs));
</pre>
</div>
<br />
model_params is the location of the weights(ex : yolo3_darknet53_coco.params), model_symbols(ex : yolo3_darknet53_coco.json) is the location of the symbols saved as json.<br />
<h3 style="text-align: center;">
<b><u><span style="color: blue;">Step 3: Convert image format</span></u></b></h3>
Before we feed the image into the executor of mxnet, we need to convert them.<br />
<br />
<br />
<!--HTML generated using hilite.me--><br />
<div style="background: rgb(255, 255, 255) none repeat scroll 0% 0%; border-width: 0.1em 0.1em 0.1em 0.8em; border: medium solid gray; overflow: auto; padding: 0.2em 0.6em; width: auto;">
<pre style="line-height: 125%; margin: 0px;">NDArray <span style="color: #0066bb; font-weight: bold;">cvmat_to_ndarray</span>(cv<span style="color: #333333;">::</span>Mat <span style="color: #008800; font-weight: bold;">const</span> <span style="color: #333333;">&</span>bgr_image, Context <span style="color: #008800; font-weight: bold;">const</span> <span style="color: #333333;">&</span>ctx)
{
cv<span style="color: #333333;">::</span>Mat rgb_image;
cv<span style="color: #333333;">::</span>cvtColor(bgr_image, rgb_image, cv<span style="color: #333333;">::</span>COLOR_BGR2RGB);
rgb_image.convertTo(rgb_image, CV_32FC3);</pre>
<pre style="line-height: 125%; margin: 0px;"> //This api copy the data of rgb_image into NDArray. As far as I know,</pre>
<pre style="line-height: 125%; margin: 0px;"> //opencv guarantee continuous of cv::Mat unless it is sub matrix of cv::Mat
<span style="color: #008800; font-weight: bold;">return</span> NDArray(rgb_image.ptr<span style="color: #333333;"><</span><span style="color: #333399; font-weight: bold;">float</span><span style="color: #333333;">></span>(),
Shape(<span style="color: #0000dd; font-weight: bold;">1</span>, <span style="color: #008800; font-weight: bold;">static_cast</span><span style="color: #333333;"><</span><span style="color: #333399; font-weight: bold;">unsigned</span><span style="color: #333333;">></span>(rgb_image.rows), <span style="color: #008800; font-weight: bold;">static_cast</span><span style="color: #333333;"><</span><span style="color: #333399; font-weight: bold;">unsigned</span><span style="color: #333333;">></span>(rgb_image.cols), <span style="color: #0000dd; font-weight: bold;">3</span>),
ctx);
}
</pre>
</div>
<br />
<br />
<h3 style="text-align: center;">
<b><u><span style="color: blue;">Step 4 : Perform object detection on video</span></u></b></h3>
<br />
<!--HTML generated using hilite.me--><br />
<div style="background: rgb(255, 255, 255) none repeat scroll 0% 0%; border-width: 0.1em 0.1em 0.1em 0.8em; border: medium solid gray; overflow: auto; padding: 0.2em 0.6em; width: auto;">
<pre style="line-height: 125%; margin: 0px;"><span style="color: #333399; font-weight: bold;">void</span> object_detector<span style="color: #333333;">::</span>forward(<span style="color: #008800; font-weight: bold;">const</span> cv<span style="color: #333333;">::</span>Mat <span style="color: #333333;">&</span>input)
{
<span style="color: #888888;">//By default, input_size_.height equal to 256 input_size_.width equal to 320.</span>
<span style="color: #888888;">//Yolo v3 has a limitation, width and height of the image must be divided by 32.</span>
<span style="color: #008800; font-weight: bold;">if</span>(input.rows <span style="color: #333333;">!=</span> input_size_.height <span style="color: #333333;">||</span> input.cols <span style="color: #333333;">!=</span> input_size_.width){
cv<span style="color: #333333;">::</span>resize(input, resize_img_, input_size_);
}<span style="color: #008800; font-weight: bold;">else</span>{
resize_img_ <span style="color: #333333;">=</span> input;
}
<span style="color: #008800; font-weight: bold;">auto</span> data <span style="color: #333333;">=</span> cvmat_to_ndarray(resize_img_, <span style="color: #333333;">*</span>context_);
<span style="color: #888888;">//Copy the data of the image to the "data"</span>
data.CopyTo(<span style="color: #333333;">&</span>executor_<span style="color: #333333;">-></span>arg_dict()[<span style="background-color: #fff0f0;">"data"</span>]);
<span style="color: #888888;">//Forward is an async api.</span>
executor_<span style="color: #333333;">-></span>Forward(<span style="color: #007020;">false</span>);
}
</pre>
</div>
<br />
<h3 style="text-align: center;">
<b><u><span style="color: blue;">Step 5 : Draw bounding boxes on image</span></u></b></h3>
<br />
<br />
<!--HTML generated using hilite.me--><br />
<div style="background: rgb(255, 255, 255) none repeat scroll 0% 0%; border-width: 0.1em 0.1em 0.1em 0.8em; border: medium solid gray; overflow: auto; padding: 0.2em 0.6em; width: auto;">
<pre style="line-height: 125%; margin: 0px;"><span style="color: #333399; font-weight: bold;">void</span> plot_object_detector_bboxes<span style="color: #333333;">::</span>plot(cv<span style="color: #333333;">::</span>Mat <span style="color: #333333;">&</span>inout,
std<span style="color: #333333;">::</span>vector<span style="color: #333333;"><</span>mxnet<span style="color: #333333;">::</span>cpp<span style="color: #333333;">::</span>NDArray<span style="color: #333333;">></span> <span style="color: #008800; font-weight: bold;">const</span> <span style="color: #333333;">&</span>predict_results,
<span style="color: #333399; font-weight: bold;">bool</span> normalize)
{
<span style="color: #008800; font-weight: bold;">using</span> <span style="color: #008800; font-weight: bold;">namespace</span> mxnet<span style="color: #333333;">::</span>cpp;
<span style="color: #888888;">//1. predict_results get from the output of Executor(executor_->outputs)</span>
<span style="color: #888888;">//2. Must Set Context as cpu because we need process data by cpu later</span>
<span style="color: #008800; font-weight: bold;">auto</span> labels <span style="color: #333333;">=</span> predict_results[<span style="color: #0000dd; font-weight: bold;">0</span>].Copy(Context(kCPU, <span style="color: #0000dd; font-weight: bold;">0</span>));
<span style="color: #008800; font-weight: bold;">auto</span> scores <span style="color: #333333;">=</span> predict_results[<span style="color: #0000dd; font-weight: bold;">1</span>].Copy(Context(kCPU, <span style="color: #0000dd; font-weight: bold;">0</span>));
<span style="color: #008800; font-weight: bold;">auto</span> bboxes <span style="color: #333333;">=</span> predict_results[<span style="color: #0000dd; font-weight: bold;">2</span>].Copy(Context(kCPU, <span style="color: #0000dd; font-weight: bold;">0</span>));
<span style="color: #888888;">//1. Should call wait because Forward api of Executor is async</span>
<span style="color: #888888;">//2. scores and labels could treat as one dimension array</span>
<span style="color: #888888;">//3. BBoxes can treat as 2 dimensions array</span>
bboxes.WaitToRead();
scores.WaitToRead();
labels.WaitToRead();
<span style="color: #333399; font-weight: bold;">size_t</span> <span style="color: #008800; font-weight: bold;">const</span> num <span style="color: #333333;">=</span> bboxes.GetShape()[<span style="color: #0000dd; font-weight: bold;">1</span>];
<span style="color: #008800; font-weight: bold;">for</span>(<span style="color: #333399; font-weight: bold;">size_t</span> i <span style="color: #333333;">=</span> <span style="color: #0000dd; font-weight: bold;">0</span>; i <span style="color: #333333;"><</span> num; <span style="color: #333333;">++</span>i) {
<span style="color: #333399; font-weight: bold;">float</span> <span style="color: #008800; font-weight: bold;">const</span> score <span style="color: #333333;">=</span> scores.At(<span style="color: #0000dd; font-weight: bold;">0</span>, <span style="color: #0000dd; font-weight: bold;">0</span>, i);
<span style="color: #008800; font-weight: bold;">if</span> (score <span style="color: #333333;"><</span> thresh_) <span style="color: #008800; font-weight: bold;">break</span>;
<span style="color: #333399; font-weight: bold;">size_t</span> <span style="color: #008800; font-weight: bold;">const</span> cls_id <span style="color: #333333;">=</span> <span style="color: #008800; font-weight: bold;">static_cast</span><span style="color: #333333;"><</span><span style="color: #333399; font-weight: bold;">size_t</span><span style="color: #333333;">></span>(labels.At(<span style="color: #0000dd; font-weight: bold;">0</span>, <span style="color: #0000dd; font-weight: bold;">0</span>, i));
<span style="color: #008800; font-weight: bold;">auto</span> <span style="color: #008800; font-weight: bold;">const</span> color <span style="color: #333333;">=</span> colors_[cls_id];
<span style="color: #888888;">//pt1 : top left; pt2 : bottom right</span>
cv<span style="color: #333333;">::</span>Point pt1, pt2;
<span style="color: #888888;">//get_points perform normalization</span>
std<span style="color: #333333;">::</span>tie(pt1, pt2) <span style="color: #333333;">=</span> normalize_points(bboxes.At(<span style="color: #0000dd; font-weight: bold;">0</span>, i, <span style="color: #0000dd; font-weight: bold;">0</span>), bboxes.At(<span style="color: #0000dd; font-weight: bold;">0</span>, i, <span style="color: #0000dd; font-weight: bold;">1</span>),
bboxes.At(<span style="color: #0000dd; font-weight: bold;">0</span>, i, <span style="color: #0000dd; font-weight: bold;">2</span>), bboxes.At(<span style="color: #0000dd; font-weight: bold;">0</span>, i, <span style="color: #0000dd; font-weight: bold;">3</span>),
normalize, cv<span style="color: #333333;">::</span>Size(inout.cols, inout.rows));
cv<span style="color: #333333;">::</span>rectangle(inout, pt1, pt2, color, <span style="color: #0000dd; font-weight: bold;">2</span>);
std<span style="color: #333333;">::</span>string txt;
<span style="color: #008800; font-weight: bold;">if</span> (labels_.size() <span style="color: #333333;">></span> cls_id) {
txt <span style="color: #333333;">+=</span> labels_[cls_id];
}
std<span style="color: #333333;">::</span>stringstream ss;
ss <span style="color: #333333;"><<</span> std<span style="color: #333333;">::</span>fixed <span style="color: #333333;"><<</span> std<span style="color: #333333;">::</span>setprecision(<span style="color: #0000dd; font-weight: bold;">3</span>) <span style="color: #333333;"><<</span> score;
txt <span style="color: #333333;">+=</span> <span style="background-color: #fff0f0;">" "</span> <span style="color: #333333;">+</span> ss.str();
put_label(inout, txt, pt1, color);
}
}
</pre>
</div>
<br />
<br />
I only mentioned the key points in this post, if you want to study the details, please check it on <a href="https://github.com/stereomatchingkiss/blogCodes2/tree/master/mxnet_cpp_object_detection">github</a>.<br />
<br /><br />
<br />ThamBloghttp://www.blogger.com/profile/12617350461752184349noreply@blogger.com0tag:blogger.com,1999:blog-4702230343097536610.post-89579440260886508502018-09-29T20:26:00.000-07:002018-09-29T20:26:16.839-07:00Install cpp package of mxnet on windows 10, with cuda and opencv Compile and install cpp-package of mxnet on windows 10 is a little bit tricky when I writing this post.<br />
<br />
The <a href="http://mxnet.incubator.apache.org/install/windows_setup.html">install page of mxnet</a> tell us almost everything we need to know, but there are something left behind haven't wrote into the pages yet, today I would like to write down the pitfalls I met and share with you how do I solved them.<br />
<br />
<h2 style="text-align: center;">
<b><u><span style="color: blue;">Pitfalls</span></u></b></h2>
1. Remember to download the mingw dll from the openBLAS download page, put those dll into some place could be found by the system, else you wouldn't be able to generate op.h for cpp-package.<br />
<br />
2. Install Anaconda(recommended) or the python package on the mxnet install page on your , machines and register the path(the path with python.exe), else you wouldn't be able to generate op.h for cpp-package.<br />
<br />
3. Compile the project without cpp-package first, else you may not able to generate op.h.<br />
<br />
Cmake command for reference, change it to suit your own need<br />
<br />
a : Run these command first<br />
<br />
cmake -G "Visual Studio 14 2015 Win64" ^<br />
-DCUDA_USE_STATIC_CUDA_RUNTIME=ON ^<br />
-DENABLE_CUDA_RTC=ON ^<br />
-DMKLDNN_VERBOSE=ON ^<br />
-DUSE_CUDA=ON ^<br />
-DUSE_CUDNN=ON ^<br />
-DUSE_F16C=ON ^<br />
-DUSE_GPERFTOOLS=ON ^<br />
-DUSE_JEMALLOC=OFF ^<br />
-DUSE_LAPACK=ON ^<br />
-DUSE_MKLDNN=ON ^<br />
-DUSE_MKLML_MKL=ON ^<br />
-DUSE_MKL_IF_AVAILABLE=ON ^<br />
-DUSE_MXNET_LIB_NAMING=ON ^<br />
-DUSE_OPENCV=ON ^<br />
-DUSE_OPENMP=ON ^<br />
-DUSE_PROFILER=ON ^<br />
-DUSE_SSE=ON ^<br />
-DWITH_EXAMPLE=ON ^<br />
-DWITH_TEST=ON ^<br />
-DCMAKE_INSTALL_PREFIX=install ..<br />
<br />
cmake --build . --config Release<br />
<br />
b : Run these command, with cpp package on<br />
<br />
cmake -G "Visual Studio 14 2015 Win64" ^<br />
-DCUDA_USE_STATIC_CUDA_RUNTIME=ON ^<br />
-DENABLE_CUDA_RTC=ON ^<br />
-DMKLDNN_VERBOSE=ON ^<br />
-DUSE_CUDA=ON ^<br />
-DUSE_CUDNN=ON ^<br />
-DUSE_F16C=ON ^<br />
-DUSE_GPERFTOOLS=ON ^<br />
-DUSE_CPP_PACKAGE=ON ^<br />
-DUSE_LAPACK=ON ^<br />
-DUSE_MKLDNN=ON ^<br />
-DUSE_MKLML_MKL=ON ^<br />
-DUSE_MKL_IF_AVAILABLE=ON ^<br />
-DUSE_MXNET_LIB_NAMING=ON ^<br />
-DUSE_OPENCV=ON ^<br />
-DUSE_OPENMP=ON ^<br />
-DUSE_PROFILER=ON ^<br />
-DUSE_SSE=ON ^<br />
-DWITH_EXAMPLE=ON ^<br />
-DWITH_TEST=ON ^<br />
-DCMAKE_INSTALL_PREFIX=install ..<br />
<br />
cmake --build . --config Release --target INSTALL<br />
<br />
4. After you compile and install the libs, you may find out you missed some headers in the install<br />
path, I missed nnvm and mxnet-cpp. What I did is copy the folders to the install folder.<br />
<br />
Hope these could help someone who is pulling their head when compile cpp-package of mxnet on windows 10.<br />
<br />ThamBloghttp://www.blogger.com/profile/12617350461752184349noreply@blogger.com0tag:blogger.com,1999:blog-4702230343097536610.post-68979830780607393472018-08-04T20:08:00.001-07:002018-12-12T16:56:05.704-08:00Qt and computer vision 2 : Build a simple computer vision application with Qt5 and opencv3<script src="https://cdn.rawgit.com/google/code-prettify/master/loader/run_prettify.js"></script>
In this post, I will show you how to build a dead simple computer vision application with Qt Creator and opencv3 step by step.<br />
<br />
<h3 style="text-align: center;">
<b><u><span style="color: blue;">Install opencv3.4.1(or newer version) on windows</span></u></b></h3>
<div style="text-align: center;">
<br /></div>
0. Go to <a href="https://sourceforge.net/projects/opencvlibrary/files/opencv-win/3.4.2/opencv-3.4.2-vc14_vc15.exe/download">source forge</a>, download prebuild binary of <a href="https://opencv.org/releases.html">opencv3.4.2</a>. or you could <a href="http://pythonopencv.com/step-by-step-install-opencv-3-3-with-visual-studio-2015-on-windows-10-x64-2017-diy/">build it by yourself</a><br />
<br />
1. Double click on the opencv-3.4.2-vc14_vc15.exe and extract it to your favorite folder(pic_00)<br />
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh3ruvpc2F7Cs99E-49RWZEv3W28hzb1D0D9FlzZtjUe7fH_wMEnWUFK6pgAMq8BgmmJxwKWdKTcaGkNpqV-6PyS2PFjwAfwp_FlmVnFnc4Aedf7zSK-w3c8a8cBTGD12-UESLJrsFbN7E/s1600/opencv_install_00a.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="358" data-original-width="1203" height="117" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh3ruvpc2F7Cs99E-49RWZEv3W28hzb1D0D9FlzZtjUe7fH_wMEnWUFK6pgAMq8BgmmJxwKWdKTcaGkNpqV-6PyS2PFjwAfwp_FlmVnFnc4Aedf7zSK-w3c8a8cBTGD12-UESLJrsFbN7E/s400/opencv_install_00a.png" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Pic00</td></tr>
</tbody></table>
<br />
2. Open the folder you extract(assume you extract it to /your_path/opencv_3_4_2). You will see a folder call "opencv" .<br />
<br />
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjj6MdKYdAwPJT3IzpcfQgpxG9H7kXPtYhzB295LYG5URP-K3o6enpj1aOd58fqSEUX1d_r7Dz0C96rwwrOm6rU3DInJkUIbBQlIjy-IJHOlixOPTfTIbgOPmpyb9OAlv0c4lX0juewN3A/s1600/opencv_install_01.JPG" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="351" data-original-width="701" height="198" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjj6MdKYdAwPJT3IzpcfQgpxG9H7kXPtYhzB295LYG5URP-K3o6enpj1aOd58fqSEUX1d_r7Dz0C96rwwrOm6rU3DInJkUIbBQlIjy-IJHOlixOPTfTIbgOPmpyb9OAlv0c4lX0juewN3A/s400/opencv_install_01.JPG" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Pic01</td></tr>
</tbody></table>
3. Open your QtCreator you <a href="https://qtandopencv.blogspot.com/2018/04/qt-and-computer-vision-0-setup.html">installed</a>.<br />
<br />
<br />
<h3 style="text-align: center;">
<b><u><span style="color: blue;">Create a new project by Qt Creator</span></u></b></h3>
<br />
<br />
4. Create a new project<br />
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhXC2J-r1RWj4BybynJZlKQcfCJdjsNtHFSOQ0315z92zJH_vRbrGwIU5YqPJL8gT3uSSHhRtjmhf6tul9vbjOBh9WJ_W1tK9IDjCvkfZRTD0XhpCO6FypqiFJZIIUD5JxajJk5I_H4vE8/s1600/Capture.JPG" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="576" data-original-width="1181" height="195" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhXC2J-r1RWj4BybynJZlKQcfCJdjsNtHFSOQ0315z92zJH_vRbrGwIU5YqPJL8gT3uSSHhRtjmhf6tul9vbjOBh9WJ_W1tK9IDjCvkfZRTD0XhpCO6FypqiFJZIIUD5JxajJk5I_H4vE8/s400/Capture.JPG" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Pic02</td></tr>
</tbody></table>
5. You will see a lot of options, for simplicity, let us choose "Application->Non-Qt project->Plain c++ application". This tell the QtCreator, we want to create a c++ program without using any Qt components.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
</div>
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEitSxjQjreuq_OA2VJjtfcW2LxU2Fx7tdhHuq38hyphenhyphenzZ1JegCNmqnh2by3WoRd5M9qxIcbT3E3JbNnOpqK2deYNknn_2dZmXZAo-cuj9NOegbeguFXljcG7gJIuPwVrbRVhMCT9S6TPZqOY/s1600/non_qt_project.JPG" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="879" data-original-width="1600" height="218" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEitSxjQjreuq_OA2VJjtfcW2LxU2Fx7tdhHuq38hyphenhyphenzZ1JegCNmqnh2by3WoRd5M9qxIcbT3E3JbNnOpqK2deYNknn_2dZmXZAo-cuj9NOegbeguFXljcG7gJIuPwVrbRVhMCT9S6TPZqOY/s400/non_qt_project.JPG" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Pic03</td></tr>
</tbody></table>
<div class="separator" style="clear: both; text-align: center;">
</div>
<br />
<br />
6. Enter the path of the folder and name of the project.<br />
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi5u5CVwKwF9SIdXW_pghr5tbdNnyU8acW3QKE-rMOLPm6JQNV2wx7qfM04S-aGoBQ_Ft8mJYVYBJ541GA_rc3PhUdlAes6J-FFjlkTsf3QcUAsI0abXmGkA5241x7_-JuviI0uiBoaJq0/s1600/Capture.JPG" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="499" data-original-width="801" height="248" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi5u5CVwKwF9SIdXW_pghr5tbdNnyU8acW3QKE-rMOLPm6JQNV2wx7qfM04S-aGoBQ_Ft8mJYVYBJ541GA_rc3PhUdlAes6J-FFjlkTsf3QcUAsI0abXmGkA5241x7_-JuviI0uiBoaJq0/s400/Capture.JPG" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Pic04</td></tr>
</tbody></table>
7. Click the Next button and use qmake as your build system by now(you can prefer cmake too, but I always prefer qmake when I am working with Qt).<br />
<br />
8. You will see a page ask you to select your kits, kits is a tool QtCreator use to group different settings like device, compiler, Qt version etc.<br />
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhCz8Z0UlQhimFL9TgXMKyIBq3kdFfwr3apFVAAAVcwkoMJxUS9wAbe_laP-r3ECRdc67okpdaJ4_guVOdAgVzBQRfEdex8UtzEnGavkSUnrOny-f7FtcxsVoqi5t9P_87XdhBp5PbWU4k/s1600/Capture.JPG" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="497" data-original-width="800" height="247" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhCz8Z0UlQhimFL9TgXMKyIBq3kdFfwr3apFVAAAVcwkoMJxUS9wAbe_laP-r3ECRdc67okpdaJ4_guVOdAgVzBQRfEdex8UtzEnGavkSUnrOny-f7FtcxsVoqi5t9P_87XdhBp5PbWU4k/s400/Capture.JPG" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Pic05</td></tr>
</tbody></table>
9. Click on next, QtCreator may ask you want to add to version control or not, for simplicity, select None. Click on finish.<br />
<br />
10. If you see a screen like this, that means you are success.<br />
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiFqJra-whNd3ydhLKA_dXzwmNC5F1Hc_dyWryFGsj3IpBtMLFDD8N1EbVa8-_KqyjzY8MLNDwrUWW47Xo-isQr-COVLirvtYlxL4oSuGL-Y4L69kMpiIwvfM-ce5E8B7GGmlfhlII7xNs/s1600/cpp_project.JPG" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="724" data-original-width="1322" height="218" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiFqJra-whNd3ydhLKA_dXzwmNC5F1Hc_dyWryFGsj3IpBtMLFDD8N1EbVa8-_KqyjzY8MLNDwrUWW47Xo-isQr-COVLirvtYlxL4oSuGL-Y4L69kMpiIwvfM-ce5E8B7GGmlfhlII7xNs/s400/cpp_project.JPG" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Pic06</td></tr>
</tbody></table>
<br />
<br />
11. Write codes to read an image by opencv<br />
<br />
<br />
<!-- HTML generated using hilite.me --><br />
<div style="background: #ffffff; border-width: 0.1em 0.1em 0.1em 0.8em; border: solid gray; overflow: auto; padding: 0.2em 0.6em; width: auto;">
<pre style="line-height: 125%; margin: 0;"><span style="color: #557799;">#include <iostream></span>
<span style="color: #557799;">#include <opencv2/core.hpp></span>
<span style="color: #557799;">#include <opencv2/highgui.hpp></span>
<span style="color: #888888;">//propose of namespace are</span>
<span style="color: #888888;">//1. Decrease the chance of name collison</span>
<span style="color: #888888;">//2. Help you organizes your codes into logical groups</span>
<span style="color: #888888;">//Without declaring using namespace std, everytime when you are using</span>
<span style="color: #888888;">//the classes, functions in the namespace, you have to call with the</span>
<span style="color: #888888;">//prefix "std::".</span>
<span style="color: #008800; font-weight: bold;">using</span> <span style="color: #008800; font-weight: bold;">namespace</span> cv;
<span style="color: #008800; font-weight: bold;">using</span> <span style="color: #008800; font-weight: bold;">namespace</span> std;
<span style="color: #888888;">/**</span>
<span style="color: #888888;"> * main function is the global, designated start function of c++.</span>
<span style="color: #888888;"> * @param argc Number of the parameters of command line</span>
<span style="color: #888888;"> * @param argv Content of the parameters of command line.</span>
<span style="color: #888888;"> * @return any integer within the range of int, meaning of the return value is</span>
<span style="color: #888888;"> * defined by the users</span>
<span style="color: #888888;"> */</span>
<span style="color: #333399; font-weight: bold;">int</span> <span style="color: #0066bb; font-weight: bold;">main</span>(<span style="color: #333399; font-weight: bold;">int</span> argc, <span style="color: #333399; font-weight: bold;">char</span> <span style="color: #333333;">*</span>argv[])
{
<span style="color: #008800; font-weight: bold;">if</span>(argc <span style="color: #333333;">!=</span> <span style="color: #0000dd; font-weight: bold;">2</span>){
cout<span style="color: #333333;"><<</span><span style="background-color: #fff0f0;">"Run this example by invoking it like this: "</span><span style="color: #333333;"><<</span>endl;
cout<span style="color: #333333;"><<</span><span style="background-color: #fff0f0;">"./step_02.exe lena.jpg"</span><span style="color: #333333;"><<</span>endl;
cout<span style="color: #333333;"><<</span>endl;
<span style="color: #008800; font-weight: bold;">return</span> <span style="color: #333333;">-</span><span style="color: #0000dd; font-weight: bold;">1</span>;
}
<span style="color: #888888;">//If you execute by Ctrl+R, argv[0] == "step_02.exe", argv[1] == lena.jpg</span>
cout<span style="color: #333333;"><<</span>argv[<span style="color: #0000dd; font-weight: bold;">0</span>]<span style="color: #333333;"><<</span><span style="background-color: #fff0f0;">","</span><span style="color: #333333;"><<</span>argv[<span style="color: #0000dd; font-weight: bold;">1</span>]<span style="color: #333333;"><<</span>endl;
<span style="color: #888888;">//Open the image</span>
<span style="color: #008800; font-weight: bold;">auto</span> <span style="color: #008800; font-weight: bold;">const</span> img <span style="color: #333333;">=</span> imread(argv[<span style="color: #0000dd; font-weight: bold;">1</span>]);
<span style="color: #008800; font-weight: bold;">if</span>(<span style="color: #333333;">!</span>img.empty()){
imshow(<span style="background-color: #fff0f0;">"img"</span>, img); <span style="color: #888888;">//Show the image on screen</span>
waitKey(); <span style="color: #888888;">//Do not exist the program until users press a key</span>
}<span style="color: #008800; font-weight: bold;">else</span>{
cout<span style="color: #333333;"><<</span><span style="background-color: #fff0f0;">"cannot open image:"</span><span style="color: #333333;"><<</span>argv[<span style="color: #0000dd; font-weight: bold;">1</span>]<span style="color: #333333;"><<</span>endl;
<span style="color: #008800; font-weight: bold;">return</span> <span style="color: #333333;">-</span><span style="color: #0000dd; font-weight: bold;">1</span>;
}
<span style="color: #008800; font-weight: bold;">return</span> <span style="color: #0000dd; font-weight: bold;">0</span>; <span style="color: #888888;">//usually we return 0 if everything are normal</span>
}
</pre>
</div>
<br />
<br />
<h3 style="text-align: center;">
<b><u><span style="color: blue;">How to compile and link the opencv lib with the help of Qt Creator and qmake</span></u></b></h3>
<br />
Before you can execute the app, you will need to compile and link to the libraries of opencv. Let me show you how to do it. If you missed steps a and b, you will see <span style="color: red;">a lot of error messages</span> like Pic07 or Pic09 show.<br />
<br />
12. Tell the compiler, where are the header files, this could be done by adding following command in the step_02.pro.<br />
<br />
<b>INCLUDEPATH += your_install_path_of_opencv/opencv/opencv_3_4_2/opencv/build/include</b><br />
<br />
The compiler will tell you it can't locate the header files if you do not add this line(see Pic07).<br />
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgbZ6GH_9yn62bEN9_rw79GyDV3IorWrT0wY6xlEMUQkwQzPhyslzWqHWqlkSw3zlYW2vws0-wyNSxWB-jj-w9wOfKO5Jt-oS5O9d1EXu84mztHguc_kVIFWHE9Gdqby5z9mkIYs0-XKLU/s1600/include_file_issue.JPG" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="82" data-original-width="478" height="67" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgbZ6GH_9yn62bEN9_rw79GyDV3IorWrT0wY6xlEMUQkwQzPhyslzWqHWqlkSw3zlYW2vws0-wyNSxWB-jj-w9wOfKO5Jt-oS5O9d1EXu84mztHguc_kVIFWHE9Gdqby5z9mkIYs0-XKLU/s400/include_file_issue.JPG" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Pic07</td></tr>
</tbody></table>
<br />
If your INCLUDEPATH is correct, QtCreator should be able to find the headers and use the auto complete to help you type less words(Pic08).<br />
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjZGyVuv02YHQSEglkSiaCIJVKklt0VAhLUIse5cirAJ1kpgCEKob4_4r7k99getXFqRfGyoCCprDNWVSxySsGbTXIPdfpjvb0I18CzIYJW5LU_d1tmUKt25fYAMQZ_g4q3PnWU9pXN2JQ/s1600/auto_complete.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="648" data-original-width="1152" height="225" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjZGyVuv02YHQSEglkSiaCIJVKklt0VAhLUIse5cirAJ1kpgCEKob4_4r7k99getXFqRfGyoCCprDNWVSxySsGbTXIPdfpjvb0I18CzIYJW5LU_d1tmUKt25fYAMQZ_g4q3PnWU9pXN2JQ/s400/auto_complete.png" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Pic08</td></tr>
</tbody></table>
<br />
<br />
13. Tell linker which libraries of the opencv it should link to by following command.<br />
<br />
<b>LIBS += your_install_path_of_opencv/opencv/opencv_3_4_2/opencv/build/x64/vc14/lib/opencv_world342.lib</b><br />
<br />
Without this step, you will see the errors of "unresolved external symbols"(Pic08).<br />
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjj_bd7egQj8N3QGKrQdHH7rZ28utxe-zHLWsNajjhKFMPgcMzscTiowNBkRfuGBBELEY4pi6huyGHw4ifVE7fZslSTnmF14DyjQV5-OJiOlCek4UboanCLt-VNiNYyV1rgUDyekG8jygE/s1600/symbol_bug.JPG" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="304" data-original-width="1430" height="85" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjj_bd7egQj8N3QGKrQdHH7rZ28utxe-zHLWsNajjhKFMPgcMzscTiowNBkRfuGBBELEY4pi6huyGHw4ifVE7fZslSTnmF14DyjQV5-OJiOlCek4UboanCLt-VNiNYyV1rgUDyekG8jygE/s400/symbol_bug.JPG" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Pic09</td></tr>
</tbody></table>
14. Change from debug to release.<br />
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh1fYuzQD6Z_kr3enPxuv__aq47r8N54l-KKxdZZS-RWcPwZKUreQbMyDdSOJXaeXGthvDjB9-n5Bh7HrBw5nZWHu82kxun68fQwHg45-BT-XMnupMZUb-5F5zDRXZuxcHKLodJDUdwl4I/s1600/Capture.JPG" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="847" data-original-width="1271" height="266" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh1fYuzQD6Z_kr3enPxuv__aq47r8N54l-KKxdZZS-RWcPwZKUreQbMyDdSOJXaeXGthvDjB9-n5Bh7HrBw5nZWHu82kxun68fQwHg45-BT-XMnupMZUb-5F5zDRXZuxcHKLodJDUdwl4I/s400/Capture.JPG" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Pic10</td></tr>
</tbody></table>
<br />
Click the icon surrounded by the red region and change it from debug to release. Why do we do that? Because<br />
<br />
<br />
<ul>
<li>Release mode is much more faster than debug mode in many cases</li>
<li>The library we link to is build as release library, do not mixed debug and release libraries in your project unless you are asking for trouble</li>
</ul>
I will introduce more details of compile, link, release, debug in the future, for now, just click Ctrl+B to compile and link the app.<br />
<br />
<h3 style="text-align: center;">
<b><span style="color: blue;"><u>Execute the app</u></span></b></h3>
After we compile and link the app, we already have the exe in the folder(in the folder show at Pic11).<br />
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgLeSLiTExJE_DF3x1NanLVBaz0d6gM62_nMNQamz_lDebvafF3vwG1CvDyB9TPFtqpX_tYEGQIaVWble7_giSCAImh6kAWCZ2I3JBs-coYs6x2ToA-9C-A5FGIlJCK87ahx4nUmtAjG0A/s1600/Capture.JPG" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="599" data-original-width="1299" height="183" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgLeSLiTExJE_DF3x1NanLVBaz0d6gM62_nMNQamz_lDebvafF3vwG1CvDyB9TPFtqpX_tYEGQIaVWble7_giSCAImh6kAWCZ2I3JBs-coYs6x2ToA-9C-A5FGIlJCK87ahx4nUmtAjG0A/s400/Capture.JPG" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Pic11</td></tr>
</tbody></table>
<br />
We are almost done now, just few more steps the app could up and run.<br />
<br />
13. Copy the dll <span style="color: orange;">opencv_world342.dll</span> and <span style="color: orange;">opencv_ffmpeg342_64.dll</span>(they place in /your_path/opencv/opencv_3_4_2/opencv/build/bin) into a new folder(we called it global_dll).<br />
<br />
14. Add the path of this folder into system path. Without step 13 and 14, the exe wouldn't be able to find the dll when we execute the app, and you may see following error when you execute the app from command line(Pic12). I recommend you use the tool--Rapid environment editor(Pic13) to edit your path on windows.<br />
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh6EuTx4hh-m_1pf77Gx3me9xhWjQV1_ZZmxfWcXT0g-9jmL1LrorESF1ucwubDOXXXvVxwZ1EU8cMHwI_7oY3pRLOKBnAaa1jx399V-tk9YqHy6w8sNZbjSfmWiumP0YTJ12G7Gt7euxk/s1600/Capture.JPG" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="506" data-original-width="1136" height="177" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh6EuTx4hh-m_1pf77Gx3me9xhWjQV1_ZZmxfWcXT0g-9jmL1LrorESF1ucwubDOXXXvVxwZ1EU8cMHwI_7oY3pRLOKBnAaa1jx399V-tk9YqHy6w8sNZbjSfmWiumP0YTJ12G7Gt7euxk/s400/Capture.JPG" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Pic12</td></tr>
</tbody></table>
<br />
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjx62rwnDFvQYvqbT9xfiadgtSHCEJNwFIZUbcWpIDNjHV7apGC_WDa1odJa18bexsva9kOmxLWvCMp7oX0aLt8gQRM2PCuvn6TfixwZWO7cq3ax1TQUA-by8NOOpaq2TUwZKZ_PoZKRu0/s1600/Capture.JPG" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="132" data-original-width="930" height="56" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjx62rwnDFvQYvqbT9xfiadgtSHCEJNwFIZUbcWpIDNjHV7apGC_WDa1odJa18bexsva9kOmxLWvCMp7oX0aLt8gQRM2PCuvn6TfixwZWO7cq3ax1TQUA-by8NOOpaq2TUwZKZ_PoZKRu0/s400/Capture.JPG" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Pic13</td></tr>
</tbody></table>
15. Add command line argument in the QtCreator, without it, the app do not know where is the image when you click Ctrl+R to execute the program.<br />
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg0RQltCV3BQZXG-4J8MkmvaHMn3pMYTT5kOsTjep1HMVg3gv9YZxu5yaPr10EOqzeu5LOF5qFOdKWYq79bZWMQuJoY88bMKBN-QF6OTMx9pp0mANH26DX1ZuDDClMoJqRZwOrOQf8hfS0/s1600/Capture.JPG" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="470" data-original-width="1306" height="142" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg0RQltCV3BQZXG-4J8MkmvaHMn3pMYTT5kOsTjep1HMVg3gv9YZxu5yaPr10EOqzeu5LOF5qFOdKWYq79bZWMQuJoY88bMKBN-QF6OTMx9pp0mANH26DX1ZuDDClMoJqRZwOrOQf8hfS0/s400/Capture.JPG" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Pic14.jpg</td></tr>
</tbody></table>
<div class="separator" style="clear: both; text-align: center;">
</div>
16. If success, you should see the app open an image specify from the command line arguments list(Pic15).<br />
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgbRHW0rPlgH76O_dawUG9406rsoMQpbnWGHKSCEoZoNa1k3ytB1JEXD38kKxTJA594Dxv8fqXIymxk1TxnywzERMj7X4HraC_dZ3FmP9rCm2iu2oUMVKPcE9hYuRYYB5vW4a0MDB9D54E/s1600/Capture.JPG" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="508" data-original-width="981" height="206" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgbRHW0rPlgH76O_dawUG9406rsoMQpbnWGHKSCEoZoNa1k3ytB1JEXD38kKxTJA594Dxv8fqXIymxk1TxnywzERMj7X4HraC_dZ3FmP9rCm2iu2oUMVKPcE9hYuRYYB5vW4a0MDB9D54E/s400/Capture.JPG" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Pic15</td></tr>
</tbody></table>
<br />
These are easy but could be annoying at first. I hope this post could leverage your frustration. You can find the source codes located at <a href="https://github.com/stereomatchingkiss/blogCodes2/tree/master/qt_and_cv_step_by_step/step_02">github</a>.<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />ThamBloghttp://www.blogger.com/profile/12617350461752184349noreply@blogger.com1tag:blogger.com,1999:blog-4702230343097536610.post-41986576622010261082018-04-22T09:42:00.002-07:002018-04-22T15:24:18.943-07:00Qt and computer vision 1 : Setup environment of Qt5 on windows step by step Long time haven't updated my blog, today rather than write a newer, advanced deep learning topics like "<a href="http://qtandopencv.blogspot.my/2017/10/modern-way-to-estimate-homography.html">Modern way to estimate homography matrix(by lightweight cnn)</a>" or "<a href="http://qtandopencv.blogspot.my/2017/08/let-us-create-semantic-segmentation.html">Let us create a semantic segmentation model by PyTorch</a>", I prefer to start a series of topics for new comers who struggling to build a computer vision app by c++. I hope my posts could help more people find out <span style="color: blue;">use c++ to develop application could as <b><span style="font-size: large;">easy</span></b> as another "much easier to use languages"(ex : python)</span>.<br />
<br />
Rather than introduce you most of the features of Qt and opencv like other books did, these topics will introduce the subsets of Qt which could help us develop decent computer vision application step by step.<br />
<br />
<div style="text-align: center;">
<u><b><span style="font-size: large;">c++ is as easy to use as python, really?</span></b></u></div>
<br />
Many programmers may found this nonsense, but my own experience told me it is not, because I <span style="color: blue; font-size: large;">never </span>found languages like python, java or c# are "<span style="color: red;">much easier to use compare with c++</span>". What make our perspective become so different? I think the answers are<br />
<br />
1. Know how to <span style="color: blue;">use c++ effectively</span>.<br />
2. There exist <span style="color: blue;">great libraries</span> for the tasks(ex : <a href="https://www.qt.io/">Qt</a>, <a href="https://www.boost.org/">boost</a>, <a href="https://opencv.org/">opencv</a>, <a href="http://dlib.net/">dlib</a>, <a href="https://github.com/gabime/spdlog">spdlog </a>etc).<br />
<br />
As long as these two conditions are satisfy, I believe many programmers will have the same conclusion as mine. I will try my best to help you learn how to develop easy to maintain application by c++ in these series, show you how to solve those "small yet annoying issues" which may scare away many new comers.<br />
<br />
<br />
<div style="text-align: center;">
<b><u><span style="font-size: large;">Install visual c++ 2015</span></u></b></div>
<br />
Some of you may ask "why 2015 but not 2017"? Because when I am writing this post, cuda do not have decent support on visual c++ 2017 or mingw yet, cuda is very important to computer vision app, especially when deep learning take over many computer vision tasks today.<br />
<br />
1. Go to this <a href="https://www.visualstudio.com/vs/older-downloads/">page</a>, click on the download button of visual studio 2015.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgBcfUkeAYKFOJSEI3g8GxQGwUHyx64ctPM3Jziv3IgyJLnjPNEpZ21M1nLuB8Eu-QlRP_wZLOhi5JbivEp0UHfztKV_CgFtjHhi1QRtyJe7DwQc-TNSRug5gH_d6hy71QRXsGAVjl_klA/s1600/Capture.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="426" data-original-width="1176" height="115" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgBcfUkeAYKFOJSEI3g8GxQGwUHyx64ctPM3Jziv3IgyJLnjPNEpZ21M1nLuB8Eu-QlRP_wZLOhi5JbivEp0UHfztKV_CgFtjHhi1QRtyJe7DwQc-TNSRug5gH_d6hy71QRXsGAVjl_klA/s320/Capture.JPG" width="320" /></a></div>
<br />
2. Download visual studio 2015 community(you may need to open an account before you can enter this page)<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhclLI9AH8P_O_bj6wnEJzcWZMbacCYQGef7PQFdl04bigNpHlwSLMXOPZWRviSsbem6SSSOxhppaN0NqMfw95zuQIvlChyphenhyphenD1AS6jBUrc8xNmjp0UK_jXr_IRduhkaA16TjagUfquX1XMw/s1600/Capture.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="417" data-original-width="836" height="159" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhclLI9AH8P_O_bj6wnEJzcWZMbacCYQGef7PQFdl04bigNpHlwSLMXOPZWRviSsbem6SSSOxhppaN0NqMfw95zuQIvlChyphenhyphenD1AS6jBUrc8xNmjp0UK_jXr_IRduhkaA16TjagUfquX1XMw/s320/Capture.JPG" width="320" /></a></div>
3. Double click on the exe "en_visual_studio_community_2015_with_update_3_x86_x64_web_installer_xxxx" and wait a few minutes.<br />
<br />
4. Install visual c++ tool box as following shown, make sure you select all of them.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi7RGIIZDEmqMbUFPb4GaPahL9puEn6TD66jALw-UMViNeBPPuhjQkNyOk3k5_s91-x2Kj1BoDckbwzh3nCknPFzp4eE1h3_vQn_EDCEvX-rm3bEumJJ8aI1oIvRW5yyLTGc0S4LVPAUtM/s1600/Capture.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="256" data-original-width="373" height="218" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi7RGIIZDEmqMbUFPb4GaPahL9puEn6TD66jALw-UMViNeBPPuhjQkNyOk3k5_s91-x2Kj1BoDckbwzh3nCknPFzp4eE1h3_vQn_EDCEvX-rm3bEumJJ8aI1oIvRW5yyLTGc0S4LVPAUtM/s320/Capture.JPG" width="320" /></a></div>
<br />
<br />
<br />
<div style="text-align: center;">
<u><b><span style="font-size: large;">Install Qt5 on windows</span></b></u></div>
<br />
1. Go to the <a href="https://www.qt.io/download">download page of Qt</a><br />
2. Select open source version of Qt<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhDSrADF_OHrIwvc9KO0jnIEz_Kl0cWII3_4g1oOJMA_YYJLKAKUUksHZFPMGVb4s_q5zt8eUiUF54HalMe9sV5UaHsvnuZpHKTy79RdrxGT9CFlUkfSzWcxd0XI0nU7CgUB4DbykHhY8Y/s1600/Capture.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="283" data-original-width="584" height="155" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhDSrADF_OHrIwvc9KO0jnIEz_Kl0cWII3_4g1oOJMA_YYJLKAKUUksHZFPMGVb4s_q5zt8eUiUF54HalMe9sV5UaHsvnuZpHKTy79RdrxGT9CFlUkfSzWcxd0XI0nU7CgUB4DbykHhY8Y/s320/Capture.JPG" width="320" /></a></div>
<br />
<br />
3. Click download button, wait until qt-unified-windows downloaded.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjAH_xoWmzv1EUE2Fw_aPl8vSfWjPBenqvxZxPwhbeUrwdURrlARer5pDrjpHLMGaq43Tp9CpppBpcGi0CUphJigjWvBCmiyMwClaXfJxhr9VWcV8n9KFCWsrk-cOMWi0o6ehyYilrWO3M/s1600/Capture.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="641" data-original-width="1171" height="175" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjAH_xoWmzv1EUE2Fw_aPl8vSfWjPBenqvxZxPwhbeUrwdURrlARer5pDrjpHLMGaq43Tp9CpppBpcGi0CUphJigjWvBCmiyMwClaXfJxhr9VWcV8n9KFCWsrk-cOMWi0o6ehyYilrWO3M/s320/Capture.JPG" width="320" /></a></div>
<br />
<br />
4. Double click on the installer, click next->skip->next<br />
<br />
5. Select the path you want to install Qt<br />
<br />
6. Select the version of Qt you want to install, every version of Qt(Qt5.x) have a lot of binary files to download, only select the one you need. We prefer to install Qt5.9.5 at here. Why Qt5.9.5? Because Qt5.9 is a long term support version of Qt, in theory long term support should be more stable.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgiULZPcB0yHXGyMRsuKblJjJe1gfOVpFXKFfYr-Xoi4enJH8cBTVIaZ5fwHUsEB3awXpc_2bm9oa2wjeQDFgnl6jW0EHw0Q696Nzpnp_sVB8_b_t3_4WcWKjTBVSAO9rGLM_9Acmlq8Ts/s1600/Capture.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="470" data-original-width="486" height="309" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgiULZPcB0yHXGyMRsuKblJjJe1gfOVpFXKFfYr-Xoi4enJH8cBTVIaZ5fwHUsEB3awXpc_2bm9oa2wjeQDFgnl6jW0EHw0Q696Nzpnp_sVB8_b_t3_4WcWKjTBVSAO9rGLM_9Acmlq8Ts/s320/Capture.JPG" width="320" /></a></div>
<br />
<br />
7. Click next and install.<br />
<br />
<div style="text-align: center;">
<b><u><span style="font-size: large;">Test Qt5 installed or not</span></u></b></div>
<br />
<br />
1. Open QtCreator and run an example. Go to your install path(ex: C:/Qt/3rdLibs/Qt), navigate to your_install_path/Tools/QtCreator/bin and double click on the qtcreator.exe.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhUOsIBTZyn64mzJmrG1-36wBobpY4-o5jZGaxIrf4gaZzWyQlb_0Yhn43N0WVp6vezRxoHc4fG1KMX8pAByKjdwVZ620G95xXxwDZwIyadFCUUaDcrKVF-s41Eq4RKqX021LkQNlKKmlw/s1600/Capture.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="168" data-original-width="589" height="91" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhUOsIBTZyn64mzJmrG1-36wBobpY4-o5jZGaxIrf4gaZzWyQlb_0Yhn43N0WVp6vezRxoHc4fG1KMX8pAByKjdwVZ620G95xXxwDZwIyadFCUUaDcrKVF-s41Eq4RKqX021LkQNlKKmlw/s320/Capture.JPG" width="320" /></a></div>
<br />
2. Select Welcome->Example->Qt Quick Controls 2 - Gallery<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiHbzC1NHzbk4hFM6JJFZc09RktI0r7E6iQbLuLOtki8WTAMRGRUsr2BAyGTAQDeQ61PIZviQpfkjWSD6iN8aIU6k_duxRJhBi78jp_tsGnG1xN8ybXFFd-rbpMCYw4fF71PScUfqSsHJ8/s1600/Capture.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="499" data-original-width="1600" height="99" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiHbzC1NHzbk4hFM6JJFZc09RktI0r7E6iQbLuLOtki8WTAMRGRUsr2BAyGTAQDeQ61PIZviQpfkjWSD6iN8aIU6k_duxRJhBi78jp_tsGnG1xN8ybXFFd-rbpMCYw4fF71PScUfqSsHJ8/s320/Capture.JPG" width="320" /></a></div>
<br />
3. Click on the example, it may pop out a message box to ask you some questions, you can click on yes or no.<br />
<br />
4. Every example you open would pop out help page like this, keep it or not is your choices, sometimes they are helpful.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi8lHbn9IdIsp42Ohdjanz-XvEUP8A-JVgLO9dlOJAEF44i37u2blAK8hPa2N5YnV_HWoq8ZQsfzpoMYrUweas833Pc68rF7CNfnhOG7Ia3FBX1ljDj7a-PcZpA6d6jfX0cPxwyD7gfGL4/s1600/Capture.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="813" data-original-width="1600" height="162" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi8lHbn9IdIsp42Ohdjanz-XvEUP8A-JVgLO9dlOJAEF44i37u2blAK8hPa2N5YnV_HWoq8ZQsfzpoMYrUweas833Pc68rF7CNfnhOG7Ia3FBX1ljDj7a-PcZpA6d6jfX0cPxwyD7gfGL4/s320/Capture.JPG" width="320" /></a></div>
<br />
<br />
5. First, select the version of Qt you want to use(surrounded by red bounding box). Second, keep the shadow build option on(surrounded by green bounding box), why keep it on? Because shadow build could help you <span style="color: blue;">separate your source codes and build binary</span>. Third select you want to build your binary as debug or release version(surrounded by blue bounding box). Usually we <span style="color: red;"><b>cannot</b></span> mix debug/release libraries together, I will open another topic to discuss the benefits of debug/release, explain what is MT/MD, which one you should choose etc.<br />
<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgcrkDdsDHcLofcNGqnTLTbPN-JYCmYP53GVxozri2YVF-c7nTsS2sRan6L9jg3nyKJgLa96LHWrPldEdCXyJAPSekYhei-AVNmULWsRcTc7xn6yzL-Jv-uobrevWaTIPi4CltUX_b0YhM/s1600/Capture.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="959" data-original-width="1256" height="244" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgcrkDdsDHcLofcNGqnTLTbPN-JYCmYP53GVxozri2YVF-c7nTsS2sRan6L9jg3nyKJgLa96LHWrPldEdCXyJAPSekYhei-AVNmULWsRcTc7xn6yzL-Jv-uobrevWaTIPi4CltUX_b0YhM/s320/Capture.JPG" width="320" /></a></div>
<br />
<br />
<br />
6. Click on the run button or Ctrl + R, then you should see the example running on your computer.<br />
<br />
<br />
<br />
ThamBloghttp://www.blogger.com/profile/12617350461752184349noreply@blogger.com0tag:blogger.com,1999:blog-4702230343097536610.post-59722289057233420182017-10-10T23:49:00.003-07:002017-10-13T14:16:12.485-07:00Deep learning 11-Modern way to estimate homography matrix(by light weight cnn) Today I want to introduce a modern way to estimate relative homography between a pair of images. It is a solution introduced by the paper titled <a href="https://arxiv.org/pdf/1606.03798.pdf">Deep Image Homography Estimation</a>.<br />
<div>
<br /></div>
<h2 style="text-align: center;">
<u>Introduction</u></h2>
<div>
Q : What is a homography matrix?</div>
<div>
<br /></div>
<div>
A : Homography matrix is a 3x3 transformation matrix that maps the points in one image to the <span style="color: blue;">corresponding points</span> in another image.</div>
<div>
<br /></div>
<div>
Q : What are the use of homography matrix?</div>
<div>
<br /></div>
<div>
A : There are many applications depend on the homography matrix, a few of them are image stitching, camera calibration, augmented reality.</div>
<div>
<br /></div>
<div>
Q : How to calculate a homography matrix between two images?</div>
<div>
<br /></div>
<div>
A : Traditional solution are based on two steps, corner estimation and robust homography estimation. In corner detection step, You need at least 4 points correspondences between the two images, usually we would find out these points by matching features like AKAZE, SIFT, SURF. Generally, the features found by those algorithms are over complete, we would prune out the outliers(ex : by RANSAC) after corner estimation. If you are interesting about the whole process describe by c++, take a look at this <a href="https://github.com/stereomatchingkiss/blogCodes2/tree/master/stitching">project</a>.</div>
<div>
<br /></div>
<div>
Q : Traditional solution require heavy computation, do we have another way to obtain homography matrix between two images?</div>
<div>
<br /></div>
<div>
A : This is the question the paper want to answer, instead of design the features by hand, this paper design an algorithm to learn the homography between two images. The biggest selling point of the paper is they <b><span style="color: blue;">turn </span></b><span style="color: blue;">the homography estimation problem into a</span> <b><span style="color: blue;">machine learning problem</span></b>.</div>
<div>
<br /></div>
<div>
<br /></div>
<h2 style="text-align: center;">
<b><u>HomographyNet</u></b></h2>
<div>
This paper use VGG style CNN to measure the homography matrix between two images, they call it HomographyNet. This model is trained in an end to end fashion, quite simple and neat.</div>
<div>
<br /></div>
<div>
<br /></div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj7TZfMitaijwbJM6cRUr1X77fYlAfIJCOI2lOul7181hbO3AUhPu1tPDL9wbyROcz4W9fuO7TJafpIKJgXmgBXhRwGjNPY3jRvagwwmDMTjNxPk0-ZKJPFLbe_gyVaDGeC2Y_wZwcUbmc/s1600/homonet.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="198" data-original-width="996" height="77" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj7TZfMitaijwbJM6cRUr1X77fYlAfIJCOI2lOul7181hbO3AUhPu1tPDL9wbyROcz4W9fuO7TJafpIKJgXmgBXhRwGjNPY3jRvagwwmDMTjNxPk0-ZKJPFLbe_gyVaDGeC2Y_wZwcUbmc/s400/homonet.jpg" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Fig00</td></tr>
</tbody></table>
<div>
</div>
<div>
HomographyNet come with two versions, classification and regression. Regression network produces eight real value numbers and use L2 loss as the final layer. Classification network use softmax as the final layer and quantize every real values into 21bins. First version has better accuracy, while average accuracy of second version is much worse than first version, it can produce confidences. </div>
<div>
<br /></div>
<div>
<br /></div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi4EbnXnjVICccOehF5Xp8cKaTQqE5eU5e0uSastW_vRGsqukERidnhl8MATjpfcblsdphoqf32Inc1YOkvrWblWAGibkaaG6420L8TDi4uU37qEdir_pOqwKWYGgSWMtZkSeBTt9no4d4/s1600/homonet_01.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="301" data-original-width="1014" height="117" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi4EbnXnjVICccOehF5Xp8cKaTQqE5eU5e0uSastW_vRGsqukERidnhl8MATjpfcblsdphoqf32Inc1YOkvrWblWAGibkaaG6420L8TDi4uU37qEdir_pOqwKWYGgSWMtZkSeBTt9no4d4/s400/homonet_01.jpg" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Fig01</td></tr>
</tbody></table>
<div>
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiGns8-K3-fLQqZoelH4_D-SSLCpYsV0a7Ki98nvAEOm1S_Lkcl4pLZSkjRHLbePiXdkGpekmbhjOJ9jw3u-Io12vLY17IQCjRrd6Jc-E4X-UJAvSG2LB6uLU2pHGE7jyHmjka0P70u8Ac/s1600/homonet_02.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="394" data-original-width="423" height="297" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiGns8-K3-fLQqZoelH4_D-SSLCpYsV0a7Ki98nvAEOm1S_Lkcl4pLZSkjRHLbePiXdkGpekmbhjOJ9jw3u-Io12vLY17IQCjRrd6Jc-E4X-UJAvSG2LB6uLU2pHGE7jyHmjka0P70u8Ac/s320/homonet_02.jpg" width="320" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Fig02</td></tr>
</tbody></table>
</div>
<h2 style="text-align: center;">
<b><u>4-Point Homography Parameterization</u></b></h2>
<div>
Instead of using a 3x3 homography matrix as the label(ground truth), this paper use 4-point parameterization as label.</div>
<div>
<br /></div>
<div>
Q : What is 4-point parameterization?</div>
<div>
<br /></div>
<div>
A : 4-point parameterization store the different of 4 corresponding points between two images, Fig03 and Fig04 explain it well.</div>
<div>
<br /></div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjZkZ6jxotLzMhjyQxjlB6NbmFXN-USWdazCmQiy8Edg6jdOGxIlvKwqVeVL1kZMyHMryxevglv1GdmGxVqyVz-05iUFxw_3CFhyphenhyphenNa_NKWsQxj0YVJGpw8tCsoILLjffVpIRhJ6sipqEzk/s1600/homonet_04.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="101" data-original-width="205" height="157" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjZkZ6jxotLzMhjyQxjlB6NbmFXN-USWdazCmQiy8Edg6jdOGxIlvKwqVeVL1kZMyHMryxevglv1GdmGxVqyVz-05iUFxw_3CFhyphenhyphenNa_NKWsQxj0YVJGpw8tCsoILLjffVpIRhJ6sipqEzk/s320/homonet_04.jpg" width="320" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Fig03</td></tr>
</tbody></table>
<div>
<br /></div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg5l_pr_w9pKyp7WgDDJuL7zw3kM6EuZz-cDNigFJVm43KuqVzQNgfaoyl2RGmlvWOFvA47vNHG_XmCQGqQFeqdAr_2dkBCuUOSkk3hc1I9F4k5qAwXKwxYikwtZ9JEYpP1WnOMSkO5IDs/s1600/homonet_05.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="213" data-original-width="419" height="202" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg5l_pr_w9pKyp7WgDDJuL7zw3kM6EuZz-cDNigFJVm43KuqVzQNgfaoyl2RGmlvWOFvA47vNHG_XmCQGqQFeqdAr_2dkBCuUOSkk3hc1I9F4k5qAwXKwxYikwtZ9JEYpP1WnOMSkO5IDs/s400/homonet_05.jpg" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Fig04</td></tr>
</tbody></table>
<div>
Q : Why do they use 4-point parameterization but not 3x3 matrix?</div>
<div>
<br /></div>
<div>
A : Because the 3x3 homography is very difficult to train, the problem is the 3x3 matrix mixing </div>
<div>
rotation and translation together, the paper explain why. </div>
<div>
<br /></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace;">The submatrix [H11, H12; H21, H22] represents the rotational terms in the homography., while the vector [H13, H23] is the translational offset. Balancing the rotational and translational terms as part of an optimization problem is difficult.</span></div>
<div>
<i><a href="https://arxiv.org/pdf/1606.03798.pdf">Deep Image Homography Estimation</a></i></div>
<div>
<br /></div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhgCy532pBr2tyB6rlIinVyV6xSG4rfd52NWY-Z4xvQ7IKJfq9bis9jt0qQ52m9gY8sj6BnOJIm7OYQy179yRnbYNgdzlc_LYEawiClTOHEsBtzZ5VWiALiDxl61jN2AXSOEwWos9GGKwY/s1600/homonet_06.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="85" data-original-width="296" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhgCy532pBr2tyB6rlIinVyV6xSG4rfd52NWY-Z4xvQ7IKJfq9bis9jt0qQ52m9gY8sj6BnOJIm7OYQy179yRnbYNgdzlc_LYEawiClTOHEsBtzZ5VWiALiDxl61jN2AXSOEwWos9GGKwY/s1600/homonet_06.jpg" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Fig05</td></tr>
</tbody></table>
<div>
<br /></div>
<h2 style="text-align: center;">
<b> <u>Data Generation</u></b></h2>
<div>
<br /></div>
<div>
Q : Training deep convolution neural networks from scratch requires a large amount of data, where could we obtain the data?</div>
<div>
<br /></div>
<div>
A : The paper invent a smart solution to generate nearly unlimited number of labeled training examples. Fig05 summarize the whole process</div>
<div>
<br /></div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEik5ETXmFTNMMqKEYkNVvwcf2HJy2hTi_D0u-9gjC4qVsSRuQx3BfRyVRR-4UqK4SInJD6ia_-nOIgztPfB_OILQKpOZoe3w3b1MByXi44eF8kiYuIMkigqsU-tjDqeGHQwvd_JaF0oO84/s1600/homonet_03.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="733" data-original-width="573" height="400" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEik5ETXmFTNMMqKEYkNVvwcf2HJy2hTi_D0u-9gjC4qVsSRuQx3BfRyVRR-4UqK4SInJD6ia_-nOIgztPfB_OILQKpOZoe3w3b1MByXi44eF8kiYuIMkigqsU-tjDqeGHQwvd_JaF0oO84/s400/homonet_03.jpg" width="312" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Fig06</td></tr>
</tbody></table>
<h2 style="text-align: center;">
<b><u><br />Results</u></b></h2>
<div>
Results of my implementation is <b><span style="color: blue;">outperform </span></b>the paper, average loss of mine is <span style="color: blue;">2.58</span>, while the paper is <span style="color: blue;">9.2</span>. Largest loss of my model is 19.53. Performance of my model are better than the paper more than <span style="color: blue;">3.5 times</span>(<span style="color: blue;">9.2/2.58 = 3.57</span>). What makes the performance improve so much?A few of reasons I could think of are</div>
<div>
<br /></div>
<div>
<div>
1. I <span style="color: blue;">change the network architectures</span> from vgg like to squeezeNet1.1 like.</div>
<div>
2. I <span style="color: blue;">do not</span> apply any data augmentation, maybe blurring or occlusion cause the model harder to train.</div>
<div>
3. The paper use data augmentation to generate 500000 data for training, but I use 500032 images from imagenet as my training set. I guess this potentially increase variety of the data, the end result is network become easier to train and more robust(but they may not work well for blur or occlusion).</div>
</div>
<div>
<br /></div>
<div>
Following are some of the results, the region estimated by the model(red rectangle) is very close to the real regions(blue rectangle).</div>
<div>
<br /></div>
<div>
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhafYhq-29W7qtE2w9x4bSNle-3ED4OFtMGdeLkHxSuHOEjk-uUB4UfP3WBYHcqW474XOrL-I1URJafS9blLhoEYfX5TFzcawZQ1idqKd15PHGn5GSOWW5CA_1O4exEHNyTykzeUNN1SoE/s1600/montage.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="800" data-original-width="1600" height="200" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhafYhq-29W7qtE2w9x4bSNle-3ED4OFtMGdeLkHxSuHOEjk-uUB4UfP3WBYHcqW474XOrL-I1URJafS9blLhoEYfX5TFzcawZQ1idqKd15PHGn5GSOWW5CA_1O4exEHNyTykzeUNN1SoE/s400/montage.jpg" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Fig07</td></tr>
</tbody></table>
<div>
<br /></div>
<div>
<br /></div>
<h2 style="text-align: center;">
<b><u>Final thoughts</u></b></h2>
<div>
The results looks great, but this paper do not answer two important questions.</div>
<div>
<br /></div>
<div>
1. The paper only test on synthesis images, do they work on real world images?</div>
<div>
2. How should I use the trained model to predict a homography matrix?</div>
<div>
<br /></div>
<div>
I would like to know the answer, if anyone find out, please leave me a message.</div>
<h2 style="text-align: center;">
<b><u>Codes and model</u></b></h2>
<div>
As usual, I place my codes at <a href="https://github.com/stereomatchingkiss/blogCodes2/tree/master/deep_homography">github</a>, model at <a href="https://mega.nz/#!8oMW0bpT!IjkEj_2FfmwZ3svvySFrtRZc6mGTxoR4naMshQcDbP8">mega</a>.</div>
<div>
<br /></div>
<div>
<b class="markup--strong markup--blockquote-strong"> If you liked this article, please help others find it by clicking the little g+ icon below. Thanks a lot!</b></div>
ThamBloghttp://www.blogger.com/profile/12617350461752184349noreply@blogger.com0tag:blogger.com,1999:blog-4702230343097536610.post-32738333108958128662017-09-03T00:37:00.002-07:002017-09-03T00:46:22.924-07:00Wrong way to use QThread<script src="https://cdn.rawgit.com/google/code-prettify/master/loader/run_prettify.js"></script>
There are two ways to use QThread, first solution is inherit QThread and override run function, the other solution is create a controller. Today I would like to talk about second solution and show you how to misused QThread(general gotcha of QThread).<br />
<br />
The most common error I see is <span style="color: red;"><b>calling the function of the worker directly</b></span>, please <span style="color: red;"><b>do not do that</b></span>, because in this way, your worker <b><span style="color: red;">will not work on another thread</span></b> but the thread you are calling it.<br />
<br />
Allow me prove this to you by a small example. I do not separate implementation and declaration in this post because this make the post easier to read.<br />
<br />
case 1 : Call by function<br />
<br />
<br />
1 : let us create a very simple, naive worker, this worker <span style="color: red;"><b>must be an QObject</b></span>, because we need to move the worker into QThread.<br />
<br />
<pre class="prettyprint">class naive_worker : public QObject
{
Q_OBJECT
public:
explicit naive_worker(QObject *obj = nullptr);
void print_working_thread()
{
qDebug()<<QThread::currentThread();
}
};
</pre>
<br />
2 : create a dead simple gui by QtDesigner. Button "Call by normal function" will call the function "print_working_thread", directly, button "Call by signal and slot" will call the "print_working_thread" by signal and slot, "Print current thread address" will print the address of main thread(gui thread).<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhh0bZFOUDS3bPOmxeJFlhNPe7LCEbYKeRZhoScppYtEXoqrYcO5oBRUp-kNYx_ZLf5CrILTaHqxQPToa2HTJmWTEq1e693VHsBUO630GuKiQ6Ilm0IMIvS3_vTc_a89zRRIaLyuaVMVAQ/s1600/simple_gui.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="404" data-original-width="642" height="251" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhh0bZFOUDS3bPOmxeJFlhNPe7LCEbYKeRZhoScppYtEXoqrYcO5oBRUp-kNYx_ZLf5CrILTaHqxQPToa2HTJmWTEq1e693VHsBUO630GuKiQ6Ilm0IMIvS3_vTc_a89zRRIaLyuaVMVAQ/s400/simple_gui.JPG" width="400" /></a></div>
<br />
3 : Create a controller<br />
<br />
<br />
<pre class="prettyprint">class naive_controller : public QObject
{
Q_OBJECT
public:
explicit naive_controller(QObject *parent = nullptr):
QObject(parent),
worker_(new naive_worker)
{
//move your worker to thread, so Qt know how to handle it
//your worker should not have a parent before calling
//moveToThread
worker_->moveToThread(&thread_);
connect(&thread_, &QThread::finished, worker_, &QObject::deleteLater);
//this connection is very important, in order to make worker work on the thread
//we move to, we have to call it by the mechanism of signal and slot
connect(this, &naive_controller::print_working_thread_by_signal_and_slot,
worker_, &naive_worker::print_working_thread);
thread_.start();
}
~naive_controller()
{
thread_.wait();
thread_.quit();
}
void print_working_thread_by_normal_call()
{
worker_->print_working_thread();
}
signals:
void print_working_thread_by_signal_and_slot();
private:
QThread thread_;
naive_worker *worker_;
};
</pre>
<br />
4 : Call it by two different functions and compare their address.<br />
<br />
<br />
<pre class="prettyprint">class MainWindow : public QMainWindow
{
Q_OBJECT
public:
explicit MainWindow(QWidget *parent = nullptr);
~MainWindow();
private slots:
void on_pushButtonPrintCurThread_clicked()
{
//this function will be called when
//"Print current thread address" is clicked
qDebug()<<QThread::currentThread();
}
void on_pushButtonCallNormalFunc_clicked()
{
//this function will be called when
//"Call by normal function" is clicked
controller_->print_working_thread_by_normal_call();
}
void on_pushButtonCallSignalAndSlot_clicked()
{
//this function will be called when
//"Call by signal and slot" is clicked
controller_->print_working_thread_by_signal_and_slot();
}
private:
naive_controller *controller_;
naive_worker *worker_;
Ui::MainWindow *ui;
};
</pre>
<br />
5. Run the app and click the button with following order. "print_working_thread"->"Call by normal function"->"Call by signal and slot" and see what happen. Following are my results<br />
<br />
QThread(0x1bd25796020) //call "print_working_thread"<br />
QThread(0x1bd25796020) //call "Call by normal function"<br />
QThread(0x1bd2578bf70) //call "Call by signal and slot"<br />
<br />
Apparently, to make our worker run in the QThread we moved to, we <span style="color: red;"><b>have to</b></span> call it through signal and slot machanism, else it will <b><span style="color: red;">execute in the same thread</span></b>. You may ask, this is too complicated, do we have an easier way to spawn a thread? Yes we do, you can try QtConcurrent::run and std::async, they are easier to use compare with QThread(it is a regret that c++17 fail to include future.then) , I use QThread when I need more power, like thread communication, queue operation.<br />
<br />
<div style="text-align: center;">
<b><u>Source codes</u></b></div>
<br />
Located at <a href="https://github.com/stereomatchingkiss/blogCodes2/tree/master/qt_thread_tutorial">github</a>.ThamBloghttp://www.blogger.com/profile/12617350461752184349noreply@blogger.com0tag:blogger.com,1999:blog-4702230343097536610.post-88914696930035356842017-08-28T22:54:00.000-07:002017-10-13T14:15:57.886-07:00Deep learning 10-Let us create a semantic segmentation model(LinkNet) by PyTorch Deep learning, in recent years this technique take over many difficult tasks of computer vision, semantic segmentation is one of them. The first segmentation net I implement is LinkNet, it is a fast and accurate segmentation network. <br />
<br />
<div style="text-align: center;">
<h2>
<u><b>Introduction</b></u></h2>
</div>
<br />
<u><i>Q </i></u>: <i>What is LinkNet?</i><br />
<br />
A : LinkNet is a convolution neural network designed for semantic segmentation. This network is 10 times faster than <a href="https://arxiv.org/pdf/1511.00561v2.pdf">SegNet</a> and more accurate.<br />
<br />
<i><u>Q</u></i> : <i>What is semantic segmentation? Any difference with segmentation?</i><br />
<br />
A : Of course they are difference. Segmentation partition image into several "similar" parts, but you <span style="color: blue;">do not know what are those parts presents</span>. On the other hand, semantic segmentation partition the image into different <span style="color: blue;">pre-determined labels</span>. Those labels are present as color as the end results. For example, checkout the following images(from <a href="https://github.com/mostafaizz/camvid">camvid</a>).<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjmK_zaPr_1zyok4A_qUgPcMfIVX6A61oKdeeuOfT5zkWX2Sbwgh4z5AC70dMm8IEsBfpItShh_RLrBuBW8B6cxeHQ4g-2vTa_yz4C5-eC5UKaNaUiokcZA3PCY8rs-RkCPzfIHVavdhMw/s1600/_segmentation_example.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="320" data-original-width="960" height="106" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjmK_zaPr_1zyok4A_qUgPcMfIVX6A61oKdeeuOfT5zkWX2Sbwgh4z5AC70dMm8IEsBfpItShh_RLrBuBW8B6cxeHQ4g-2vTa_yz4C5-eC5UKaNaUiokcZA3PCY8rs-RkCPzfIHVavdhMw/s320/_segmentation_example.jpg" width="320" /></a></div>
<br />
<br />
<i><u>Q</u></i> : <i>Semantic segmentation sounds like object detection, are they the same thing?</i><br />
<br />
A : No, they are not, although you may achieve the same goal by both of them.<br />
From the aspect of tech, they use different approach. From the view of end results, semantic segmentation tell you what are those pixels are, but they <span style="color: blue;">do not tell you how many instance in your images</span>, object detection show you how many instance in your images by minimal bounding box, but it <span style="color: blue;">do not give you </span><span class="comment-copy"><span style="color: blue;">delienation of objects</span>. For example, checkout below images(from <a href="https://pjreddie.com/darknet/yolo/">yolo</a>).</span><br />
<span class="comment-copy"><br /></span>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgOsdM9UsBtMdfSs57zajDqmNHKjZ5FsQ6dqGYCBR8eChVSGKRBCfnqvGgeYtXYgUJDzQvTVA_5i1OAxrkanW-nmOYdorhrEfZvH-pKeVJbBbKvDTMhqIKu4OZj1wF_jkHhj0SiX4SceI0/s1600/object_detection.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="1328" data-original-width="1600" height="265" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgOsdM9UsBtMdfSs57zajDqmNHKjZ5FsQ6dqGYCBR8eChVSGKRBCfnqvGgeYtXYgUJDzQvTVA_5i1OAxrkanW-nmOYdorhrEfZvH-pKeVJbBbKvDTMhqIKu4OZj1wF_jkHhj0SiX4SceI0/s320/object_detection.png" width="320" /></a></div>
<span class="comment-copy"><br /></span>
<br />
<div style="text-align: center;">
<h3>
<u><b>Network architectures</b></u></h3>
</div>
<br />
LinkNet paper describe their network architecture with excellent graphs and simple descriptions, following are the figures copy shameless from the paper.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEidY5fMuLldJ0Y4fWTupDgAFv7LihrbwYrwHIuBmsnvR1BymD367BTUVFrKb-HOyiZKUN-j1qOAng36vqm3m5-Nwes_u7vSyjxs34ZL7E18O7ibmZIL1Z4cOAkeRiOkn2jolGY41xTL3Bo/s1600/link_net_00.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="810" data-original-width="602" height="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEidY5fMuLldJ0Y4fWTupDgAFv7LihrbwYrwHIuBmsnvR1BymD367BTUVFrKb-HOyiZKUN-j1qOAng36vqm3m5-Nwes_u7vSyjxs34ZL7E18O7ibmZIL1Z4cOAkeRiOkn2jolGY41xTL3Bo/s320/link_net_00.png" width="237" /></a></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgOFiakrIui5mpEZ9KUddw-bxQhCs2OkZqXASCXYq44qXNJVncHgkbEe6pQ6hqPSdruitEZDA1ikELg6bKE4vzsZFCz6kQ8CNJ1ezbcuCqZsEt6APDGjO5NeqmPwFXhU9VtCk5kwJ0wwS8/s1600/link_net_01.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="653" data-original-width="536" height="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgOFiakrIui5mpEZ9KUddw-bxQhCs2OkZqXASCXYq44qXNJVncHgkbEe6pQ6hqPSdruitEZDA1ikELg6bKE4vzsZFCz6kQ8CNJ1ezbcuCqZsEt6APDGjO5NeqmPwFXhU9VtCk5kwJ0wwS8/s320/link_net_01.png" width="262" /></a></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi4CZBbF6vpO2cyrAlrM2HIoruF5VBHbGpG7nKfbwwnpTXm3IIhLVKEd8FqjN0kHVTiCRRwtw60L9OSicEUGrC-ZgWS6WeQfj5OIe64QUDg7dyQpR91gXrj9rSsKzWuRyAnTRrp2dgpv4A/s1600/link_net_02.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="386" data-original-width="544" height="227" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi4CZBbF6vpO2cyrAlrM2HIoruF5VBHbGpG7nKfbwwnpTXm3IIhLVKEd8FqjN0kHVTiCRRwtw60L9OSicEUGrC-ZgWS6WeQfj5OIe64QUDg7dyQpR91gXrj9rSsKzWuRyAnTRrp2dgpv4A/s320/link_net_02.png" width="320" /></a></div>
<br />
<br />
LinkNet adopt encoder-decoder architecture, according to the paper, LinkNet performance or come from adding the output of encoder to the decoder, this help the decoder easier to recover the information. If you want to know the details, please study section 3 of the paper, it is nice writing, very easy to understand.<br />
<br />
<i><u>Q</u></i> : <i>The paper is easy to read, but they do not explain what is full convolution, could you tell me what that means?</i><br />
<br />
A : <span class="inline_editor_value"><span class="rendered_qtext">Full convolution indicates that the neural network is composed of convolution layers and activation <span style="color: blue;">only</span>, <span style="color: blue;">without any</span> full connection or pooling layers. </span></span><br />
<span class="inline_editor_value"><span class="rendered_qtext"><br /></span></span>
<span class="inline_editor_value"><span class="rendered_qtext"><u><i>Q</i></u> : <i>How do they perform down-sampling without pooling layers?</i></span></span><br />
<span class="inline_editor_value"><span class="rendered_qtext"><br /></span></span>
<span class="inline_editor_value"><span class="rendered_qtext">A : Make the <span style="color: blue;">stride</span> of convolution as <span style="color: blue;">2 x 2</span> and do <span style="color: blue;">zero padding</span>, if you cannot figure it out why this work, I suggest you create an excel file, write down some data and do some experiment.</span></span><br />
<br />
<u><i>Q</i></u> : <i>Which optimizer work best?</i><br />
<br />
A
: According to the paper, <span style="color: blue;">rmsprop</span> is the winner, my experiments told me
the same thing too, in case you are interesting, below are the graph of
training loss. From left to right is rmsprop, adam, sgd. Hyper
parameters are<br />
<br />
Initial learning rate : adam and rmsprop are 5e-4, sgd is 1e-3<br />
Augmentation : random crop(480,320) and horizontal flip<br />
Normalize : subtract mean(based on imagenet mean value) and divided by 255<br />
Batch size : 16<br />
Epoch : 800<br />
Training examples : 368<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh0Wb34EO38D6JJeD7BnZbYf_sDdw_fIbBDIqic9BtAuDFydz4z3kS-9ncerfvsNHXfCqyyXGpeHEE2EnxyrbaX4B6mrEdvJOZOdM3rY9W7T2udF4qONeoaS2Anbd8wSVq6LYsUpGDYQ1k/s1600/segmentation_loss_368.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="400" data-original-width="1600" height="100" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh0Wb34EO38D6JJeD7BnZbYf_sDdw_fIbBDIqic9BtAuDFydz4z3kS-9ncerfvsNHXfCqyyXGpeHEE2EnxyrbaX4B6mrEdvJOZOdM3rY9W7T2udF4qONeoaS2Anbd8wSVq6LYsUpGDYQ1k/s400/segmentation_loss_368.jpg" width="400" /></a></div>
<br />
<br />
The
results of adam and rmsprop are very close. Loss of sgd steadily
decrease, but it converge very slow even with higher learning rate,
maybe higher learning rate would work better for SGD.<br />
<br />
<div style="text-align: center;">
<h3>
<b><u>Data pre-processing</u></b></h3>
</div>
<br />
Almost every computer vision task need you to pre-process your data, segmentation is not an exception, following are my steps.<br />
<br />
1 : Convert the color do not exist in the category into void(0, 0, 0)<br />
2 : Convert the color into integer<br />
3 : Zero mean(mean value come from imagenet)<br />
<br />
<div style="text-align: center;">
<h3>
<u><b>Experiment on camvid</b></u></h3>
</div>
<br />
Enough of Q&A, let us have some benchmark and pictures😊.<br />
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhdQ5TRpXwxeF17Jh-9mMDKnVdY359jdVhOY_cI292256Zc-9bsol79ekcz562o6bCyL-0A_-UGFQlK2eH-YDsQpUXtQDnFllFfQliTQgnRUopIa1-j2DzchjA7GuRk5hRIHuB1ozcmvV4/s1600/model_accuracy.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="97" data-original-width="1600" height="23" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhdQ5TRpXwxeF17Jh-9mMDKnVdY359jdVhOY_cI292256Zc-9bsol79ekcz562o6bCyL-0A_-UGFQlK2eH-YDsQpUXtQDnFllFfQliTQgnRUopIa1-j2DzchjA7GuRk5hRIHuB1ozcmvV4/s400/model_accuracy.png" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Performance </td></tr>
</tbody></table>
Model 1,2,3 all train with same parameters, pre-processing but with different input size when training, they are (128,128), (256,256), (512, 512). When testing, the size of the images are (960,720).<br />
<br />
Following are some examples, from left to right is original image, ground truth and predicted image. <br />
<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgdqalXH28Aw_YfQ1v6hsjWHJzJegem-PAjYcYk5LesEJaGvPNZp0guuEPTSGNdDldq7zOVlJBFTnnh9gjWIKvcamnrbiwEEBWMK3WGV1pkixFKxaAjE-hWb-J_O3uksTJso2rRMXDaKm4/s1600/0006R0_f03180_mongtage.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="360" data-original-width="1440" height="80" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgdqalXH28Aw_YfQ1v6hsjWHJzJegem-PAjYcYk5LesEJaGvPNZp0guuEPTSGNdDldq7zOVlJBFTnnh9gjWIKvcamnrbiwEEBWMK3WGV1pkixFKxaAjE-hWb-J_O3uksTJso2rRMXDaKm4/s320/0006R0_f03180_mongtage.jpg" width="320" /></a></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEig8bNub48Hg3CtfyT_608OVyEaeQxV__KDSdOCx64TOV_uVpgRGcSZyMiUtNjyDKAzk6NlU91F4vWZIL2KhqHG2cKOC7yIBVLKMLlKyAPytXFh3PaGKnC5nGxEeYS6v4Qh0E31eagbI4g/s1600/0006R0_f03810_mongtage.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="360" data-original-width="1440" height="80" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEig8bNub48Hg3CtfyT_608OVyEaeQxV__KDSdOCx64TOV_uVpgRGcSZyMiUtNjyDKAzk6NlU91F4vWZIL2KhqHG2cKOC7yIBVLKMLlKyAPytXFh3PaGKnC5nGxEeYS6v4Qh0E31eagbI4g/s320/0006R0_f03810_mongtage.jpg" width="320" /></a></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg6b4Pgvlr2GqLhb09VMO4h8qhGQOO1-llhR3mt7YzHka-riuPcUBKMvkyVNgaOesvs-UggK2uWyjU4MhyAVlCFHDo9PIjlPB7vlxBo1Be6V-LjrVTTxDOPFJsIlJPAvFFkkH0hs3_gfUk/s1600/0006R0_f03870_mongtage.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="360" data-original-width="1440" height="80" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg6b4Pgvlr2GqLhb09VMO4h8qhGQOO1-llhR3mt7YzHka-riuPcUBKMvkyVNgaOesvs-UggK2uWyjU4MhyAVlCFHDo9PIjlPB7vlxBo1Be6V-LjrVTTxDOPFJsIlJPAvFFkkH0hs3_gfUk/s320/0006R0_f03870_mongtage.jpg" width="320" /></a></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEghtMo4uqMvpbqHqw70VbNL3nyY79OKwR5uEe54vgA5sEDwtUu0rbZOtGmcYnBJkzfjINaQ_Bw-ACYYEABiFv29wISHqzrHKzxtIFMGBR3ziyzaRFd7qXEjTS0aYsvnfzOoii5kLEAmUg0/s1600/0016E5_00540_mongtage.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="360" data-original-width="1440" height="80" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEghtMo4uqMvpbqHqw70VbNL3nyY79OKwR5uEe54vgA5sEDwtUu0rbZOtGmcYnBJkzfjINaQ_Bw-ACYYEABiFv29wISHqzrHKzxtIFMGBR3ziyzaRFd7qXEjTS0aYsvnfzOoii5kLEAmUg0/s320/0016E5_00540_mongtage.jpg" width="320" /></a></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgpp9Oxnh7vZ7cQhUqwNHr1Di1T2ObZNv3c3Hs68pxt77cnbe6FMrCeARGCDd_iuLM9DemeF1IkN0f_uHwuehOGkwhJg5laOTQKuM_xhMXNJO9Lqbf2Ed9WslNob97HKsHgf1iHpa46THY/s1600/0016E5_01230_mongtage.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="360" data-original-width="1440" height="80" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgpp9Oxnh7vZ7cQhUqwNHr1Di1T2ObZNv3c3Hs68pxt77cnbe6FMrCeARGCDd_iuLM9DemeF1IkN0f_uHwuehOGkwhJg5laOTQKuM_xhMXNJO9Lqbf2Ed9WslNob97HKsHgf1iHpa46THY/s320/0016E5_01230_mongtage.jpg" width="320" /></a></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhBwe3dgB7jdUOLBAr97qWM8a10qvQifXY-3jYmI2fbXVgbQH5oIEenG4EromSBHA5YuR9RLkK8CSgfIwO5vkiMBIlOMJndirn2cB1MsjWbc6FY5ZClZ6K3rjuTYpA6HAewscYYVrvVPLs/s1600/0016E5_01410_mongtage.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="360" data-original-width="1440" height="80" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhBwe3dgB7jdUOLBAr97qWM8a10qvQifXY-3jYmI2fbXVgbQH5oIEenG4EromSBHA5YuR9RLkK8CSgfIwO5vkiMBIlOMJndirn2cB1MsjWbc6FY5ZClZ6K3rjuTYpA6HAewscYYVrvVPLs/s320/0016E5_01410_mongtage.jpg" width="320" /></a></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjqOUbgXG2Nc55UT-U2TwJocXkAyfr-uQP77qFBL3Zigw1lTeTHisfmajNCdLvGVI6HN6OmUOkjtPnlvejK_hYuZuFtcT3bE-YOFEpHbqv8Tt0s9mBuCNB_J4nyQ5GVBWeSrK1o8KBq6OU/s1600/0016E5_01620_mongtage.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="360" data-original-width="1440" height="80" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjqOUbgXG2Nc55UT-U2TwJocXkAyfr-uQP77qFBL3Zigw1lTeTHisfmajNCdLvGVI6HN6OmUOkjtPnlvejK_hYuZuFtcT3bE-YOFEpHbqv8Tt0s9mBuCNB_J4nyQ5GVBWeSrK1o8KBq6OU/s320/0016E5_01620_mongtage.jpg" width="320" /></a></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEinCqebVcS0XIO4P5u0kD77IWvL6RP_pc-fzJggwN9_KHHCiwVV5WfGMkQRvgBCg6kuYpAsdalFc3vRsXrFxrjE_Q5bluGmGYIx_xohyphenhyphenpDW6-Xhveh3kRngwQPMmjs6LRBTQJF-h-71flU/s1600/0016E5_05100_mongtage.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="360" data-original-width="1440" height="80" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEinCqebVcS0XIO4P5u0kD77IWvL6RP_pc-fzJggwN9_KHHCiwVV5WfGMkQRvgBCg6kuYpAsdalFc3vRsXrFxrjE_Q5bluGmGYIx_xohyphenhyphenpDW6-Xhveh3kRngwQPMmjs6LRBTQJF-h-71flU/s320/0016E5_05100_mongtage.jpg" width="320" /></a></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh4-pC6TzbqciUIUZob6k-U_9m12terTJ9JiF5wS-w1kG9PSN-9E-Y74H-B1FVRK-fJ31k65LFSTzC-cYXsTp3Bs2zjD2_1LX2WVyurb9SUtzANPbM3mFM-aTZ7HnXcTQsF5_JfJ2F9Es4/s1600/0016E5_05370_mongtage.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="360" data-original-width="1440" height="80" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh4-pC6TzbqciUIUZob6k-U_9m12terTJ9JiF5wS-w1kG9PSN-9E-Y74H-B1FVRK-fJ31k65LFSTzC-cYXsTp3Bs2zjD2_1LX2WVyurb9SUtzANPbM3mFM-aTZ7HnXcTQsF5_JfJ2F9Es4/s320/0016E5_05370_mongtage.jpg" width="320" /></a></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiy5I5JFTD7Bq4lTqjN-bSsIlumQGj8Eqo0R59kjoGreQ7GIT69ZJyaN_qvKtt-BO37Woukgnx0Ctsg8n3NrX-J6mMFtU3amG7q5JQq5C-ETejYChnspPh_4bG68dUI7yQ-0svj20PTyD4/s1600/Seq05VD_f04890_mongtage.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="360" data-original-width="1440" height="80" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiy5I5JFTD7Bq4lTqjN-bSsIlumQGj8Eqo0R59kjoGreQ7GIT69ZJyaN_qvKtt-BO37Woukgnx0Ctsg8n3NrX-J6mMFtU3amG7q5JQq5C-ETejYChnspPh_4bG68dUI7yQ-0svj20PTyD4/s320/Seq05VD_f04890_mongtage.jpg" width="320" /></a></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgXwpzEW3f3CJU53xqlJw2jW52rzfmxBomTJdQR4-WbjTH1Bt5D3ywYuQYaCvLdAkLnYTUQqY4seUgZW5kWVI3SucMJK7-kVYcjuMh4KAZhzU4IIwdrb44rcERyaMkuuuIOrwI4wAnxqsw/s1600/Seq05VD_f03840_mongtage.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="360" data-original-width="1440" height="80" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgXwpzEW3f3CJU53xqlJw2jW52rzfmxBomTJdQR4-WbjTH1Bt5D3ywYuQYaCvLdAkLnYTUQqY4seUgZW5kWVI3SucMJK7-kVYcjuMh4KAZhzU4IIwdrb44rcERyaMkuuuIOrwI4wAnxqsw/s320/Seq05VD_f03840_mongtage.jpg" width="320" /></a></div>
<br />
Results looks quite good and <span style="color: red;">IoU is much better than the paper</span>, possible reasons are<br />
<br />
1 : I augment the data by random crop and horizontal flip, the paper may use another methods or do not perform augmentation at all(?).<br />
<br />
2 : My pre-processing are different with the paper<br />
<br />
3 : I did not omit void when training<br />
<br />
4 : My measurement on IoU is wrong<br />
<br />
5 : My model is more complicated than the paper(wrong implementation)<br />
<br />
6 : It is overfit<br />
<br />
7 : Random shuffle training and testing data create data leakage because many images of camvid<br />
are very similar to each other<br />
<br />
<br />
<div style="text-align: center;">
<h3>
<u><b>Trained models and codes</b></u></h3>
</div>
<br />
1 : As usual, located at <a href="https://github.com/stereomatchingkiss/blogCodes2/tree/master/python_deep_learning/segmentation">github</a>.<br />
2 : <a href="https://mega.nz/#!48M3SBQZ!PZY_1p0kZlyIAs4kCaKuZBJ22E59-wILskgluWYnWbA">Model trained with 368 images, 12 labels(include void), random crop (128x128),800 epoch </a><br />
3 : <a href="https://mega.nz/#!Mh9WxB4A!PZawXq0K3MtzGkRUa6gyQfsHWaxYMjgSWzOQhBOEGQE">Model trained with 368 images, 12 labels(include void), random crop (480x320),800 epoch</a> <br />
4 : <a href="https://mega.nz/#!psEnEC4Q!-TXrNCYDmvyVbWdhxJgrgMPhhX17fpjrLGN5GbiWsYI">Model trained with 368 images, 12 labels(include void), random crop (512x512),800 epoch</a> <br />
<br />
<div style="text-align: center;">
<h3>
<u><b>Miscellaneous</b></u></h3>
</div>
<br />
<u><i>Q</i></u> : <i>Is it possible to create portable model by PyTorch?</i><br />
<br />
A : It is possible, but not easy. you could check out <a href="http://pytorch.org/tutorials/advanced/super_resolution_with_caffe2.html">ONNX and caffe2</a> if you want to try it. Someone manage to convert pytorch model to caffe model and loaded by opencv dnn. Right now opencv dnn do not support <a href="http://pytorch.org/">PyTorch </a>but <a href="https://github.com/hughperkins/pytorch">PyTorch</a>. Thanks god opencv dnn can import model trained by <a href="http://torch.ch/">torch </a> at ease(right now opencv dnn <a href="https://github.com/opencv/opencv/issues/9502">do not support nngraph</a>).<br />
<br />
<u><i>Q</i></u> : <i>What are IoU and iIoU in the paper refer to?</i><br />
<br />
A : This <a href="https://www.cityscapes-dataset.com/benchmarks/">page</a> give good definition, although I still can't figure out how to calculate iIoU.<br />
<br />
<br />
<b class="markup--strong markup--blockquote-strong"> If you liked this article, please help others find it by clicking the little g+ icon below. Thanks a lot!</b>ThamBloghttp://www.blogger.com/profile/12617350461752184349noreply@blogger.com2tag:blogger.com,1999:blog-4702230343097536610.post-31068894190936190822017-08-07T00:40:00.003-07:002017-10-13T20:12:45.209-07:00Deep learning 09-Performance of perceptual losses for super resolution Have you ever scratch your head when upscaling low resolution images? I do, because we all know the quality of the images after upscaling degrade. Thanks to the rise of machine learning in recent years, we are able to upscale single image with better results compare with traditional solutions(ex : bilinear, bicubic. You do not need to know what they are except they are apply widely in many products), we call this technique <i><span style="color: blue;"><b>super resolution</b></span></i>.<br />
<br />
This sound great, but how could we do it?I did not know it either until I study the tutorials of part2 of the marvelous <a href="http://course.fast.ai/">Practical Deep learning for Coders</a>, this course is fantastic to get your feet wet on deep learning.<br />
<br />
I will try my best to explain everything with <span style="color: blue;">minimal prerequisite knowledge</span> on machine learning and computer vision, however, some knowledge of convolution neural network(cnn) is needed. The course of <a href="http://course.fast.ai/lessons/lessons.html">part1</a> is excellent if you want to learn cnn in depth. If you are in a hurry, <a href="http://www.pyimagesearch.com/2016/08/01/lenet-convolutional-neural-network-in-python/">pyimagesearch</a> and <a href="https://medium.com/towards-data-science/deep-learning-2-f81ebe632d5c">medium</a> has a short tutorial about cnn.<br />
<br />
<div style="text-align: center;">
<u><b>What is super resolution and how does it work</b></u></div>
<br />
<br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><u><b>Q</b></u></span> : <i>What is super resolution</i><br />
<br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><b>A</b></span> : Super resolution is a class of technique to enhance the resolution of images or videos.<br />
<br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><u><b>Q</b></u></span> : <i>There are many softwares could help us upscale images, why do we need super resolution?</i><br />
<br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><b>A</b></span> : Traditional solutions of upscaling image apply interpolation algorithm on one image only(ex: bilinear or bicubic). In the contrast, super resolution <i><span style="color: blue;"><b>exploit info from another source</b></span></i>, either from contiguous frames, from the model trained by machine learning or different scale from one image.<br />
<br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><u><b>Q</b></u></span> : <i>How does super resolution work</i><br />
<br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><b>A</b></span> : Super resolution I want to introduce today is based on <a href="https://arxiv.org/abs/1603.08155">Perceptual losses for Real-Time style Transfer and Super-Resolution</a>.(please consult <a href="https://en.wikipedia.org/wiki/Superresolution">wiki</a> if you want to study another type of super resolution). The most interesting part of this solution is it treat super resolution as an <i><span style="color: blue;"><b>image transformation problem</b></span></i>(it is a process where an input image is transformed into an output image). This mean we may use the same technique to solve colorization, denoising, depth estimation, semantic segmentation and another tasks(It is not a problem if you do not know what they are).<br />
<br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><b><u>Q</u></b></span> : <i>How do we transformed low resolution image to high resolution image?</i><br />
<br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><b>A</b></span> : A picture worth a thousand words.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhwXDIs51rgLroIXBKQlN_kmvNPbdy4fMZVQQFC54p7OIbwxhj-JYwpBXXiwASKJMS1jgs54Ep8acq0HHyoyvpNWeUBb0Y8vy-y96o3no0Q-glK36n1s8780SVD7gCCSwJIz1U8xaBAI0U/s1600/perceptual_super_res.dot.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="131" data-original-width="903" height="92" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhwXDIs51rgLroIXBKQlN_kmvNPbdy4fMZVQQFC54p7OIbwxhj-JYwpBXXiwASKJMS1jgs54Ep8acq0HHyoyvpNWeUBb0Y8vy-y96o3no0Q-glK36n1s8780SVD7gCCSwJIz1U8xaBAI0U/s640/perceptual_super_res.dot.png" width="640" /></a></div>
<br />
<br />
This network is composed by two components, <b><span style="font-family: inherit;"><span style="color: blue;"><i>image transformation network</i></span></span></b> and a <b><span style="font-family: inherit;"><span style="color: blue;"><i>loss network</i></span></span></b>. Image transformation network transform low resolution image into high resolution image, while loss network measuring the difference between predicted high resolution image and the true high resolution image<br />
<br />
<u><span style="font-family: "arial" , "helvetica" , sans-serif;"><b>Q</b></span></u> : <i>What is the loss network anyway?Why do we use it to measure the loss?</i><br />
<br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><b>A</b></span> : Loss network is an image classification network train on <a href="http://www.image-net.org/">imagenet</a> (ex : vgg16, resnet, densenet). We use it to measure the loss because we want our network to better measure perceptual and semantic difference between images. The paper call the loss measure by this loss network <span style="color: blue;"><i><b>perceptual loss</b></i></span>.<br />
<br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><b><u>Q</u></b></span> : <i>What makes the loss network able to generate better loss?</i><br />
<br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><b>A</b></span> : The loss network can generate better loss because the convolutional neural network trained for image classification have <a href="https://medium.com/towards-data-science/deep-learning-2-f81ebe632d5c">already learned to encode</a> the perceptual and semantic information we want.<br />
<br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><u><b>Q</b></u></span> : <i>The color of the image is different after upscale, how could I fixed it?</i><br />
<br />
A : You could apply histogram matching as the paper mentioned, this should be able to deal with most of the cases.<br />
<br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><u><b>Q</b></u></span> : <i>Any draw back of this algorithm?</i><br />
<br />
A : Of course, nothing is perfect.<br />
<br />
1 : Not all of the image work, they may look very ugly after upscale.<br />
2 : The image maybe ice cream to your eyes, but it is not reconstructing the photo exactly but create details based on its training from example images.It is impossible to reconstruct the image with perfect results, because we have no way to retrieve the information did not exist from the beginning.<br />
3 : Color of part of the images change after upscale, even histogram matching cannot fix it.<br />
<br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><u><b>Q</b></u></span> : <i>What is histogram matching?</i><br />
<br />
A : It is a way to make the color distribution of image A looks like image B.<br />
<br />
<div style="text-align: center;">
<u><b>Experiment</b></u></div>
<br />
All of the experiments use same network architecture and train on 80000 images from <a href="http://www.image-net.org/">imagenet</a>, 2 epoch. From left to right are original image, image upscale 4x by bicubic, image upscale by super resolution by 4x.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjwOi0PgYKdCZge4DII7J7ixWGNOV31PSKohW-9dMOKgkI05z1fcswPbLgdPREfWy0NHsZz_nB_jTvkJwj2oIMkkqM4z462O-PG_2pTM4NUeW2hnoprfDjBcC2hEl4_k_c1axxTU3MqZmc/s1600/baby_montage.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="512" data-original-width="1536" height="106" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjwOi0PgYKdCZge4DII7J7ixWGNOV31PSKohW-9dMOKgkI05z1fcswPbLgdPREfWy0NHsZz_nB_jTvkJwj2oIMkkqM4z462O-PG_2pTM4NUeW2hnoprfDjBcC2hEl4_k_c1axxTU3MqZmc/s320/baby_montage.jpg" width="320" /></a></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjbQDsko1gJ81kK7ylppjKgiRHPCk-d4E3uFnNc5QND3tHeyn1PP692v2FlEwiuTwuWl9w53A4YdnGu15z_u8ppQ43AEMyuhY6reSVGwZ8bEFDlV1XuSaiAneh15JURIV0NBQFS8u6r4q4/s1600/bird_montage.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="288" data-original-width="864" height="106" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjbQDsko1gJ81kK7ylppjKgiRHPCk-d4E3uFnNc5QND3tHeyn1PP692v2FlEwiuTwuWl9w53A4YdnGu15z_u8ppQ43AEMyuhY6reSVGwZ8bEFDlV1XuSaiAneh15JURIV0NBQFS8u6r4q4/s320/bird_montage.jpg" width="320" /></a></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi30kFH-0R4kI04f0GGhKREqCukFKRxvMHnxCibGNHJEQ1mZOFzqUAM5RLVvH4lZDFzsxSDw_UI_CfwpOEYdgdAhmotDD677QTxxvRJysIlKAmDGmUyXQxMEz5nLfwf_x0KYLNePYoJX9g/s1600/butterfly_montage.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="256" data-original-width="768" height="106" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi30kFH-0R4kI04f0GGhKREqCukFKRxvMHnxCibGNHJEQ1mZOFzqUAM5RLVvH4lZDFzsxSDw_UI_CfwpOEYdgdAhmotDD677QTxxvRJysIlKAmDGmUyXQxMEz5nLfwf_x0KYLNePYoJX9g/s320/butterfly_montage.jpg" width="320" /></a></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgtKNLhhg6KnfkuRcPNfuUoeq97PRLpe-9isuXZnpd7l-MjWfJkIthrR0a63BshPR0E2giqF5oDFRiZRydrYSEbw2zMdmquYlqSXDXOdcZCxUSMjiAH90GQJQ5kf5D-m4IR6aEhPoTeOSA/s1600/fruits_montage.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="512" data-original-width="1536" height="106" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgtKNLhhg6KnfkuRcPNfuUoeq97PRLpe-9isuXZnpd7l-MjWfJkIthrR0a63BshPR0E2giqF5oDFRiZRydrYSEbw2zMdmquYlqSXDXOdcZCxUSMjiAH90GQJQ5kf5D-m4IR6aEhPoTeOSA/s320/fruits_montage.jpg" width="320" /></a></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj9go6S3_m-fhvCslqRzQAz_RUwpvQ3831KSdJHjQpfhtUg1Tkm7lotC5sodeguyVtjSJEfWRqyVc5K87IW_MIrwEE-4YptSl2l8OErYkW2B6zAq-lB6vDfCwYT1m_kcTEuac5sKhjJd_I/s1600/girl_montage.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="360" data-original-width="744" height="154" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj9go6S3_m-fhvCslqRzQAz_RUwpvQ3831KSdJHjQpfhtUg1Tkm7lotC5sodeguyVtjSJEfWRqyVc5K87IW_MIrwEE-4YptSl2l8OErYkW2B6zAq-lB6vDfCwYT1m_kcTEuac5sKhjJd_I/s320/girl_montage.jpg" width="320" /></a></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg3G1wNLzV5lk2R-TRP733Hqv4KI7TMkLKzmiPE05f9IF6PZrzRYoa83jQJ8SOpiyf0TssAxR77nRbbr7btKEWiQwN0GfXPFHfX6ch8-qjcuQfLaC_jg7fVGqeahsm2srcwbAy4cfXBHXQ/s1600/lena_montage.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="512" data-original-width="1536" height="106" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg3G1wNLzV5lk2R-TRP733Hqv4KI7TMkLKzmiPE05f9IF6PZrzRYoa83jQJ8SOpiyf0TssAxR77nRbbr7btKEWiQwN0GfXPFHfX6ch8-qjcuQfLaC_jg7fVGqeahsm2srcwbAy4cfXBHXQ/s320/lena_montage.jpg" width="320" /></a></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjrn6OIm_Cy5RW5pbKcXUXU3YnR6fEuuQnqf5ye17EgX_Ru3SwZ_jaVlQeOXd9XsrBcHKBzYA7VKPZPpdIGCC8PlBdHV1xY-Fcmteea1uEG0dZc6b2XWc5noLnISJka0PPAANNRwrcokes/s1600/monkey_montage.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="480" data-original-width="1500" height="102" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjrn6OIm_Cy5RW5pbKcXUXU3YnR6fEuuQnqf5ye17EgX_Ru3SwZ_jaVlQeOXd9XsrBcHKBzYA7VKPZPpdIGCC8PlBdHV1xY-Fcmteea1uEG0dZc6b2XWc5noLnISJka0PPAANNRwrcokes/s320/monkey_montage.jpg" width="320" /></a></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhAB3gkxxUXiyZ7tc6ILv_4LdEwVZTprQkZNQ8DYdWqIUIUgXsHPrdv7onEToDaKVIjpKUqpg-EJuiBXNh5DhuKfnXhvkkgulkyuG9ciIIIbxHUwCQvzr6sQzTWe6taapyXb9bYEbSV6FI/s1600/woman_montage.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="344" data-original-width="684" height="160" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhAB3gkxxUXiyZ7tc6ILv_4LdEwVZTprQkZNQ8DYdWqIUIUgXsHPrdv7onEToDaKVIjpKUqpg-EJuiBXNh5DhuKfnXhvkkgulkyuG9ciIIIbxHUwCQvzr6sQzTWe6taapyXb9bYEbSV6FI/s320/woman_montage.jpg" width="320" /></a></div>
<br />
<br />
<br />
The results are not perfect, but this is not the end, super resolution is a hot research topic, every paper is a stepping stone for next algorithm, we will see more and more better, advance techniques pop out in the future.<br />
<br />
<div style="text-align: center;">
<u><b>Sharing trained model and codes</b></u></div>
<br />
1 : <a href="https://github.com/stereomatchingkiss/blogCodes2/tree/master/python_deep_learning/super_res">Notebook to transform the imagenet data to training data</a><br />
2 : <a href="https://github.com/stereomatchingkiss/blogCodes2/tree/master/python_deep_learning/image_resize">Notebook to train and use the super resolution model</a><br />
3 : <a href="https://mega.nz/#!V9klRDYJ!Qz5eMgnuQh4Km34bystYG6khLSVWO2wF4E_iRLJjdLo">Network model with transformation network and loss network, trained on 80000 images</a><br />
<br />
<b class="markup--strong markup--blockquote-strong"> If you liked this article, please help others find it by clicking the little g+ icon below. Thanks a lot!</b>ThamBloghttp://www.blogger.com/profile/12617350461752184349noreply@blogger.com0tag:blogger.com,1999:blog-4702230343097536610.post-24377220779337097872017-07-19T08:19:00.003-07:002017-10-13T20:13:08.112-07:00Deep learning 08--Neural style by Keras<script src="https://cdn.rawgit.com/google/code-prettify/master/loader/run_prettify.js"></script>
<br />
<div class="separator" style="clear: both; text-align: center;">
</div>
Today I want to write down how to implement the neural style of the paper <a href="https://arxiv.org/abs/1508.06576">A Neural Algorithm of Artistic Style</a> by Keras learn from <a href="http://course.fast.ai/">fast.ai course</a>. You can find the codes located at <a href="https://github.com/stereomatchingkiss/blogCodes2/tree/master/neural_style">github</a>.<br />
<br />
Before I begin to explain how to do it, I want to mentioned that generate artistic style by deep neural network is different with image classification, we need to learn new concepts and add them into our tool boxes, if you find it hard to understand at the first time you saw it, do not fear, I have the same feeling too. You can ask me the questions or go to <a href="http://forums.fast.ai/">fast ai forum</a>.<br />
<br />
The paper present an algorithm to generate artistic style image by combine two image together using convolution neural network. Here are examples combine source images(bird, dog, building) with style images like <a href="https://s11.postimg.org/f0a9ugy83/starry.png">starry </a>, <a href="https://s14.postimg.org/dl545skch/alice.jpg">alice</a> and <a href="https://s11.postimg.org/r9s0a380z/tes_teach.jpg">tes_teach</a>. From left to right is style image, source image, image combined by convolution neural network.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiVHtoy9niNaggDSsM4Oo8f-TqHqmRIV6W-XMD_yjpT5ajj_aAqsd_yDELozx3buExlC1AtUqIhyphenhyphenZrjSQI4ZpyH5PSXrcjLB9016xqgP4xboMSzbYmmqnbBufzZyhcqhRULwJtAikf2g7I/s1600/bird_montage.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="873" data-original-width="1449" height="192" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiVHtoy9niNaggDSsM4Oo8f-TqHqmRIV6W-XMD_yjpT5ajj_aAqsd_yDELozx3buExlC1AtUqIhyphenhyphenZrjSQI4ZpyH5PSXrcjLB9016xqgP4xboMSzbYmmqnbBufzZyhcqhRULwJtAikf2g7I/s320/bird_montage.png" width="320" /></a></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjBNPuuXz5aCUDt9w1fOllTNPQ49Iy7N8HsREpyTSZHjghgcRiBfKJQjSAUrZQFLhadJ4AR3gMH9wfSJEbMf3Ch4EfBAO_5Sa3SKMwd0m1fhWXFJvph61fvzMt2Ino9AA2oBy1QU17sBWA/s1600/building_montage.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="630" data-original-width="720" height="280" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjBNPuuXz5aCUDt9w1fOllTNPQ49Iy7N8HsREpyTSZHjghgcRiBfKJQjSAUrZQFLhadJ4AR3gMH9wfSJEbMf3Ch4EfBAO_5Sa3SKMwd0m1fhWXFJvph61fvzMt2Ino9AA2oBy1QU17sBWA/s320/building_montage.png" width="320" /></a></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjdCESuOUtHlnXFYOpkagn387K6C1zL9N8G2Vx4QF8wDhUUauSLgXYLWRshuNJnxLb6dhK8c4OpXvAAFwqJ1-8Smwpr4e1X_nr9RqCxlBxKBVlNaEAiA6H_pTGqw1LEndfJ4c1PsPQdrmk/s1600/dog_montage.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="483" data-original-width="939" height="164" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjdCESuOUtHlnXFYOpkagn387K6C1zL9N8G2Vx4QF8wDhUUauSLgXYLWRshuNJnxLb6dhK8c4OpXvAAFwqJ1-8Smwpr4e1X_nr9RqCxlBxKBVlNaEAiA6H_pTGqw1LEndfJ4c1PsPQdrmk/s320/dog_montage.png" width="320" /></a></div>
<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
</div>
<br />
<br />
<br />
<br />
Let us begin our journey of the implementation of the algorithm(I assume you know how to install Keras, tensorflow, numpy, cuda and other tools, I recommend using ubuntu16.04.x as your os, this could save you tons of headache when setup your deep learning toolbox).<br />
<br />
<div style="text-align: center;">
<b><u>Step 1 : Import file and modules</u></b></div>
<br />
<pre class="prettyprint">
from PIL import Image
import os
import keras.backend as K
import vgg16_avg
from keras.models import Model
from keras.layers import *
from keras import metrics
from scipy.optimize import fmin_l_bfgs_b
from scipy.misc import imsave
</pre>
<br />
<br />
<br />
<div style="text-align: center;">
<b><u>Step 2 : Preprocess our input image</u></b></div>
<br />
<pre class="prettyprint">
#the value of rn_mean is come from image net data set
rn_mean = np.array([123.68, 116.779, 103.939], dtype=np.float32)
#create image close to zero mean and convert rgb channel to bgr channel
#since the vgg model need bgr channel. ::-1 invert the order of axis 0
preproc = lambda x: (x - rn_mean)[:,:,:,::-1]
#We need to undo the preprocessing before we save it to our hard disk
deproc = lambda x: x[:,:,:,::-1] + rn_mean
</pre>
<br />
<div style="text-align: center;">
<b><u>Step 3 : Read the source image and style image</u></b></div>
<br />
Source image is the image you want to apply style on it. Style image is the style you want to apply on the source image.<br />
<br />
<br />
<pre class="prettyprint">dpath= os.getcwd() + "/"
#I make the size of content image, style image, generated img
#have the same shape, but this is not mandatory
#since we do not use any full connection layer
def read_img(im_name, shp):
style_img = Image.open(im_name)
if len(shp) > 0:
style_img = style_img.resize((shp[2], shp[1]))
style_arr = np.array(style_img)
#The image read by PIL is three dimensions, but the model
#need a four dimensions tensor(first dim is batch size)
style_arr = np.expand_dims(style_arr, 0)
return preproc(style_arr)
content_img_name = "dog"
content_img_arr = read_img(dpath + "img/{}.png".format(content_img_name), [])
content_shp = content_img_arr.shape
style_img_arr = read_img(dpath + "img/starry.png", content_shp)
</pre>
<br />
<div style="text-align: center;">
<b><u>Step 4 : Load vgg16_avg</u></b></div>
<br />
Unlike doing image classification with pure sequential api of Keras, to build a neural style network, we need to use <a href="https://keras.io/backend/">backend api of Keras</a>.<br />
<br />
<br />
<pre class="prettyprint">content_base = K.variable(content_img_arr)
style_base = K.variable(style_img_arr)
gen_img = K.placeholder(content_shp)
batch = K.concatenate([content_base, style_base, gen_img], 0)
#Feed the batch into the vgg model, every time we call the model/layer to
#generate output, it will generate output of content_base, style_base,
#gen_img. Unlike content_base and style_base, gen_img is a placeholder,
#that means we will need to provide data to this placeholder later on
model = vgg16_avg.VGG16_Avg(input_tensor = batch, include_top=False)
#build a dict of model layers
outputs = {l.name:l.output for l in model.layers}
#I prefer these 1~3 layers hierarchy as my style_layers,
#you can try it out with different range
style_layers = [outputs['block{}_conv1'.format(i)] for i in range(1,4)]
content_layer = outputs['block4_conv2']
</pre>
<br />
If you find K.variable, K.placeholder very confuse, please check the document of <a href="https://www.blogger.com/blogger.g?blogID=4702230343097536610#editor/target=post;postID=2437722077933709787">TensorFlow </a>and <a href="https://keras.io/backend/">Keras backend api</a>.<br />
<br />
Step 5 : Create function to find loss and gradient<br />
<br />
<br />
<pre class="prettyprint">#gram matrix is a matrix collect the correlation of all of the vectors
#in a set. Check wiki(https://en.wikipedia.org/wiki/Gramian_matrix)
#for more details
def gram_matrix(x):
#change height,width,depth to depth, height, width, it could be 2,1,0 too
#maybe 2,0,1 is more efficient due to underlying memory layout
features = K.permute_dimensions(x, (2,0,1))
#batch flatten make features become 2D array
features = K.batch_flatten(features)
return K.dot(features, K.transpose(features)) / x.get_shape().num_elements()
def style_loss(x, targ):
return metrics.mse(gram_matrix(x), gram_matrix(targ))
content_loss = lambda base, gen: metrics.mse(gen, base)
#l[1] is the output(activation) of style_base, l[2] is the
#output of gen_img loss of style image and gen_img. As the
#paper suggest, we add the loss of all convolution layers
loss = sum([style_loss(l[1], l[2]) for l in style_layers])
#content_layer[0] is the output of content_base,
#content_layer[2] is the output of gen_img
#loss of content image and gen_img
loss += content_loss(content_layer[0], content_layer[2]) / 10.
#The loss need two variables but we only pass in one,
#because we only got one placeholder in the graph,
#the other variable already determine by K.variable
grad = K.gradients(loss, gen_img)
#We cannot call loss and grad directly, we need
#to create a function(convert it to symbolic definition)
#before we can feed it into the solver
fn = K.function([gen_img], [loss] + grad)
</pre>
<br />
You can adjust the weight of style loss and content loss by yourself until you think the image looks good enough. <a href="http://forums.fast.ai/t/lesson-8-discussion/1522/122?u=tham">The function at the end only tells you that the concatenated list of loss and grads is the output that you want to - eventually - minimize. So, when you feed it to the solver bfgs, it will try to minimize the loss and will stop when the gradients are also zero (a minimum, hopefully not just a local one).</a><br />
<br />
Step 6 : Create a helper class to separate loss and gradient<br />
<br />
<br />
<pre class="prettyprint">#fn will return loss and grad, but fmin_l_bfgs need to seperate them
#that is why we need a class to separate loss and gradient and store them
class Evaluator:
def __init__(self, fn_, shp_):
self.fn = fn_
self.shp = shp_
def loss(self, x):
loss_, grads_ = self.fn([x.reshape(self.shp)])
self.grads = grads_.flatten().astype(np.float64)
return loss_.astype(np.float64)
def grad(self, x):
return np.copy(self.grads)
evaluator = Evaluator(fn, content_shp)
</pre>
<br />
Step 7 : Generate a random noise image(white noise image mentioned by the paper)<br />
<br />
<br />
<pre class="prettyprint">#This is the real value of the placeholder--gen_img
rand_img = lambda shape: np.random.uniform(-2.5, 2.5, shape)/100
</pre>
<br />
<br />
Step 8 : Minimize the loss of rand_img with the source image and style image<br />
<br />
<br />
<pre class="prettyprint">def solve_img(evalu, niter, x):
for i in range(0, niter):
x, min_val, info = fmin_l_bfgs_b(evalu.loss, x.flatten(),
fprime=evalu.grad, maxfun = 20)
#value of PIL lie within -127 and 127
x = np.clip(x, -127, 127)
print(i, ',Current loss value:', min_val)
x = x.reshape(content_shp)
simg = deproc(x.copy())
img_name = '{}_{}_neural_style_img_{}.png'.
format(dpath + "gen_img/", content_img_name, i)
imsave(img_name, simg[0])
return x
solve_img(evaluator, 10, rand_img(content_shp)/10.)
</pre>
<br />
You may ask, why using fmin_l_bfgs_b but not stochastic gradient descent? The answer is we can, but we have a better choice. Unlike image classification, we do not have a lot of batch to run, right now we only need to figure out the loss and gradient between three inputs, they are source image, style image and the random image, using fmin_l_bfgs_b is more than enough.<br />
<br />
ThamBloghttp://www.blogger.com/profile/12617350461752184349noreply@blogger.com0tag:blogger.com,1999:blog-4702230343097536610.post-6600078321640198202017-05-30T19:56:00.003-07:002017-05-31T00:07:22.162-07:00Create a better images downloader(Google, Bing and Yahoo) by Qt5 I mentioned how to create a simple Bing image downloader in <a href="http://qtandopencv.blogspot.my/2017/05/scrape-bing-images-by-qwebengine.html">Download Bing images by Qt5</a>, in this post I will explain how do I tackle the challenges I have encountered when I try to build a better image downloader app by Qt5, the skills I used are apply on <a href="https://github.com/stereomatchingkiss/QImageScraper/blob/master/README.md">QImageScraper version_1.0</a>, if you want to know the details, please dive into the codes, they are too complicated to write down in this blog.<br />
<br />
<div style="text-align: center;">
<b><u>1 : Show all of the images searched by Bing</u></b></div>
<br />
To show all of the images searched by Bing, we need to make sure the page is scrolled to the bottom, unfortunately there is no way to check it with 100% accuracy if we are not scrolling the page manually, because the <span style="color: blue;">height of the scroll bar keep changing</span> when you scrolling it, this make the program almost impossible to determine when should it stop to scroll the page.<br />
<br />
<div style="text-align: center;">
<b><u>Solution</u></b></div>
<br />
I give several solutions a try but none of them are optimal, I have no choice but seek a compromise. Rather than scrolling the page full auto, I adopt semi auto solution as Pic.1 shown.<br />
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjgQSB0RayKyDFS7xzlzlsV8oYByb877Xs07QQG0WuIw8TVullMdEY3mFYnwbHDQlJ9x-IpYV22Ajh5uMXr-gRifEg9tOY6KNU3wp6mUKcYPMNgNe3KRept4nlqqqvie8t3SR6pslEA290/s1600/scroll_page_flow.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="491" data-original-width="983" height="199" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjgQSB0RayKyDFS7xzlzlsV8oYByb877Xs07QQG0WuIw8TVullMdEY3mFYnwbHDQlJ9x-IpYV22Ajh5uMXr-gRifEg9tOY6KNU3wp6mUKcYPMNgNe3KRept4nlqqqvie8t3SR6pslEA290/s400/scroll_page_flow.png" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Pic.1</td></tr>
</tbody></table>
<div style="text-align: center;">
<b><u><br />2 : Not all of the images are downloable</u></b></div>
<br />
There are several reasons may cause this issue.<br />
<br />
<br />
<ol>
<li>The search engine(Bing, Yahoo, Google etc) fail to find direct link of the image.</li>
<li>The server "think" you is not a real human(robot?)</li>
<li>Network error</li>
<li>There are no error happen, but the reply of the server take too long</li>
</ol>
<div style="text-align: center;">
<b><u> Solution</u></b> </div>
<div>
<br /></div>
<div>
Although I cannot find a perfect solution for problem 2, but there are some tricks to alleviate it, let the flow chart(Pic.2) clear the miasma.</div>
<div>
<br /></div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg_WDnBFxqmih-g9PJlqbITLiHHG6GU3UsIUH5UL-mZiP8kb34NaGPx6z7UkD07z33AjmFmSiiyC2FYrF07yK6mYbCoqvP0CQY5dEhS08h9sb4chqWrDaSFI06SA8-73_fyjsyh40hBulM/s1600/scroll_page_flow.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="544" data-original-width="956" height="227" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg_WDnBFxqmih-g9PJlqbITLiHHG6GU3UsIUH5UL-mZiP8kb34NaGPx6z7UkD07z33AjmFmSiiyC2FYrF07yK6mYbCoqvP0CQY5dEhS08h9sb4chqWrDaSFI06SA8-73_fyjsyh40hBulM/s400/scroll_page_flow.png" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Pic.2</td></tr>
</tbody></table>
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg9gy0a2ddpadE1eS-07dGDQsE92xSXMGgaqtMrpfWzJgwbUeIG_dU5XsA0NIfEQYqiawJrFlOq0D4eeD0Ez4kXek1hTccUWX-f-vguRCatOjaxx2Aj2teYM8tLnh9rtDFd1kJ_97BM2LE/s1600/scroll_page_flow.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="641" data-original-width="713" height="358" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg9gy0a2ddpadE1eS-07dGDQsE92xSXMGgaqtMrpfWzJgwbUeIG_dU5XsA0NIfEQYqiawJrFlOq0D4eeD0Ez4kXek1hTccUWX-f-vguRCatOjaxx2Aj2teYM8tLnh9rtDFd1kJ_97BM2LE/s400/scroll_page_flow.png" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Pic.3</td></tr>
</tbody></table>
<br />
<br />
<div>
Simply put, if error happen, I will try to download thumbnail, if even the thumbnail cannot download, I will try next image. After all, this solution is not too bad, let us see the results of download 893 smoke images search by Google.<br />
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhSEgES7tuE9bKeGWUQ3qT106H79jI8Mxzc-sPVIbBd3oRBwDnSzCzdq8y-d1vtUFv6mBW8qF9Ge9zwxGU4LUB5xIuyZUwyqQvbXXPpxSTg1fUjdWBfjhGDYpPsjlksNtifiCI5kxeA_Xc/s1600/download_statistic.JPG" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="647" data-original-width="1041" height="247" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhSEgES7tuE9bKeGWUQ3qT106H79jI8Mxzc-sPVIbBd3oRBwDnSzCzdq8y-d1vtUFv6mBW8qF9Ge9zwxGU4LUB5xIuyZUwyqQvbXXPpxSTg1fUjdWBfjhGDYpPsjlksNtifiCI5kxeA_Xc/s400/download_statistic.JPG" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Pic.4</td></tr>
</tbody></table>
<br /></div>
<div>
<br /></div>
<div>
All of the images could be downloaded, 817 of them are big images, 76 of them are small images, not a perfect results but not bad either. Something I did not mentioned in Pic.2 and Pic.3 are </div>
<div>
<br /></div>
<div>
<ol>
<li>I always switch user agents</li>
<li>I start next download with random period(0.5second~1.5second)</li>
</ol>
<div>
Purpose of these "weird" operations is try to <span style="color: blue;">emulate behaviors of humans</span>, this could lower down the risk of being treated as "robot" by the servers. I cannot find free, trustable proxies yet, else I would like to randomly connect to different proxies time to time too, please tell me where to find those proxies if you know, thanks.</div>
</div>
<div>
<br /></div>
<div>
<br /></div>
<div style="text-align: center;">
<b><u>3 : Type of the images are mismatch or did not specify in file extension</u></b></div>
<div>
<br /></div>
<div>
Not all of the images have correct type(jpg, png, gif etc), I am very lucky that Qt5 provide us QImageReader, this class can determine the type of the image from contents rather than extension. With it we can change the suffix of the file into the real format, remove the files which are not images.<br />
<br /></div>
<div>
<div style="text-align: center;">
<b><u>4 : QFile fail to rename/remove file</u></b></div>
<br />
QFile::rename and QFile::remove got some troubles on windows(works well on mac), this bug me a while, it cost me one day to find out QImageReader blocking the file<br />
<br /></div>
<div>
<div style="text-align: center;">
<b><u>5 : Invalid file name</u></b> </div>
<br />
Not all of the file name are valid, it is extremely hard to find out a perfect way to determine the file name is valid or not, I only do some minimal process for this issue--Remove illegal characters and trimmed white spaces.<br />
<br />
<div style="text-align: center;">
<b><u>6 : Deploy app on major platforms</u></b></div>
<br />
One of the strong selling points Qt is the ability of cross-platform, to tell you the truth I can build the app and run it on windows, mac and linux without changing single line of codes, it work out of the box. Problem is, deploy the app on linux is not fun at all, it is a very complicated task, I will try deploy this image downloader after linuxqtdeploy become mature.<br />
<br />
<h3>
<b><u>Summary</u></b></h3>
In this blog post, I reviewed some problems I met when using Qt5 to develop an image downloader, this is by no means exhaustive but scrape the surface, if you want to know the details, every nitty-gritty, better dive into source codes.<br />
<br />
<a href="http://qtandopencv.blogspot.my/2017/05/scrape-bing-images-by-qwebengine.html">Download Bing images by Qt5</a><br />
<a href="https://github.com/stereomatchingkiss/QImageScraper">Source codes of QImageScraper</a><br />
<br /></div>
ThamBloghttp://www.blogger.com/profile/12617350461752184349noreply@blogger.com0tag:blogger.com,1999:blog-4702230343097536610.post-34050472015483199602017-05-14T09:23:00.002-07:002017-07-30T18:18:36.939-07:00Download Bing images by Qt5<script src="https://cdn.rawgit.com/google/code-prettify/master/loader/run_prettify.js"></script>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEioM0irUlGynIPVZT3C4AkEZ5NHD11HoTmxjpvTer4XcBnD_0Tqbt_dAgV5bUf-QctnW0z_XDTX7g4jZcaLx3I4G0ZvXRU9NnzEm17MLryFo6JKDz-rp7GYvtPZykBiT67mZBdR1syMzoc/s1600/more-data-question.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="202" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEioM0irUlGynIPVZT3C4AkEZ5NHD11HoTmxjpvTer4XcBnD_0Tqbt_dAgV5bUf-QctnW0z_XDTX7g4jZcaLx3I4G0ZvXRU9NnzEm17MLryFo6JKDz-rp7GYvtPZykBiT67mZBdR1syMzoc/s320/more-data-question.png" width="320" /></a></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
Have you ever need more data for your image classifier?I do, but download images search by Google, Bing nor Flickr one by one are very time consuming, why not we write a small, simple images scraper to help us? Sounds like a good idea, as usual, before I start the task, I list out the requirements of this small app.<br />
<div>
<br /></div>
<div>
a : Cross platform, able to work under ubuntu and windows with one code base(no plan for mobiles since this is a tool design for machine learning)</div>
<div>
b : Support regular expression, because I need to parse the html</div>
<div>
c : Support high level api of networking</div>
<div>
d : Have decent webEngine, it is very hard(impossible?) to scrape the images from those search engine without it</div>
<div>
e : Support unicode</div>
<div>
f : Easy to create ui, because I want instant feedback of the website, this could speed up development times<br />
g : Ease to build, solving dependency problem of different 3rd libraries are not fun at all<br />
<br /></div>
<div>
</div>
<div>
<div>
<div>
After search through my toolbox I find out Qt5 is almost ideal for my task. In this post I will use Bing as an example(Google and Yahoo images share the same tricks, processes of scraping these big 3 image search engine are very similar). If you ever try to study the source codes of the search results of Bing, you will find out they are very complicated, difficult to read(Maybe MS spend lots of time to prevent users scrape images). Are you afraid?Rest assured, the steps of scraping image from Bing is a little bit complicated but not impossible as long as you have nice tools to aid you :).</div>
</div>
</div>
<div>
<br /></div>
<div style="text-align: center;">
<b><u>Step 1 : You need a decent, modern browser like firefox or chrome</u></b></div>
<div style="text-align: center;">
<b><u><br /></u></b></div>
<div style="text-align: left;">
Why do we need a decent browser?Because they have a powerful feature--<span style="color: blue;">Inspect Element,</span> this function can help you find out the contents(links, buttons etc) of the website. </div>
<div style="text-align: left;">
<br /></div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh1bxD7gBNQjhejsGSzg30-DlEOY0rRmqBRlL55JjjSi1m4TsjMQIJUDIvHFmzpa-O3rD9xjsmxNUO8hxJMO4VDOfcaXUl_TlvYcniCGMR3Kzfo6DaqPGmEayvy0BlMFV36xzytuqVOxB0/s1600/imsc01.JPG" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="192" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh1bxD7gBNQjhejsGSzg30-DlEOY0rRmqBRlL55JjjSi1m4TsjMQIJUDIvHFmzpa-O3rD9xjsmxNUO8hxJMO4VDOfcaXUl_TlvYcniCGMR3Kzfo6DaqPGmEayvy0BlMFV36xzytuqVOxB0/s400/imsc01.JPG" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Pic1</td></tr>
</tbody></table>
<div style="text-align: left;">
<br /></div>
<div style="text-align: left;">
<br /></div>
<div style="text-align: center;">
<b><u>Step 2 : Click Inspect Element on interesting content</u></b></div>
<div style="text-align: left;">
<br /></div>
<div style="text-align: left;">
Move your mouse to the contents you want to observe and click <span style="color: blue;">Inspect Element</span>.</div>
<div style="text-align: left;">
<br /></div>
<div style="text-align: left;">
<br /></div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgDZiXybOehBv5VKM47I5CAnV2Ml5fjX9zHCuzLH8Ephdl_84izU9i4y-fAj4AhJZyUeODaJ3UlzYWwug3kcHFlleGkWY5R319NurGciwMagK0XwnxQ4nyVmxliH3aVsGnq5HC6abwThq8/s1600/imsc02.JPG" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="230" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgDZiXybOehBv5VKM47I5CAnV2Ml5fjX9zHCuzLH8Ephdl_84izU9i4y-fAj4AhJZyUeODaJ3UlzYWwug3kcHFlleGkWY5R319NurGciwMagK0XwnxQ4nyVmxliH3aVsGnq5HC6abwThq8/s320/imsc02.JPG" width="320" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Pic2</td></tr>
</tbody></table>
<div style="text-align: left;">
<br /></div>
<div style="text-align: left;">
<br /></div>
<div style="text-align: left;">
After that the browser should show you the codes of the interesting content.</div>
<div style="text-align: left;">
<br /></div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgCwMT-SCk1uvK-jo0kmZBhl6qWV47PGPbL1pyY8rxFL6AUqkgTBWyEFkYmdTjsPKSA25e6DkO3ws2qQV3ApoinI6NvWM0IfBAkvI7PsBaFZaacjR1RT7UoUSNXoNbHtmo8Cdfvdh_R4mw/s1600/imsc03.JPG" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="117" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgCwMT-SCk1uvK-jo0kmZBhl6qWV47PGPbL1pyY8rxFL6AUqkgTBWyEFkYmdTjsPKSA25e6DkO3ws2qQV3ApoinI6NvWM0IfBAkvI7PsBaFZaacjR1RT7UoUSNXoNbHtmo8Cdfvdh_R4mw/s400/imsc03.JPG" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Pic3</td></tr>
</tbody></table>
<div style="text-align: left;">
<br /></div>
<div style="text-align: left;">
The codes point by the browser may not something you want, if this is the case, look around the codes point by the browser as Pic3 show.</div>
<div style="text-align: left;">
<br /></div>
<div style="text-align: center;">
<b><u>Step 3 : Create a simple prototype by Qt5</u></b></div>
<div style="text-align: left;">
<br /></div>
<div style="text-align: left;">
We already know how to inspect the source codes of the web page, let us create a simple ui to help us. This ui do not need to be professional or beautiful, after it is just a prototype. The functions we need to create a Bing image scraper are</div>
<div style="text-align: left;">
<br /></div>
<div style="text-align: left;">
a : Scroll pages</div>
<div style="text-align: left;">
b : Click see more images</div>
<div style="text-align: left;">
c : Parse and get the links of images</div>
<div style="text-align: left;">
d : Download images</div>
<div style="text-align: left;">
<br /></div>
<div style="text-align: left;">
With the help of <a href="https://www.qt.io/ide/">Qt Designer</a>, I am able to "draw" the ui(Pic4) within 5 minutes(ignore parse_icon_link and next_page buttons for this tutorial).</div>
<div style="text-align: left;">
<br /></div>
<div style="text-align: left;">
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjjZvF16SHjn5fkWLCB0gG_CEkRqaH4Efy7gfgOERBmatjGmd8S1jf7LZoubGl-mXuF1_Q2MRdkYzx5DCGadBGdCnqET1lI7RwxDqpkvYDFT5yNeSwTsBnGMdly94fPBnQ-o_TykI8i3oY/s1600/imsc04.JPG" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="211" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjjZvF16SHjn5fkWLCB0gG_CEkRqaH4Efy7gfgOERBmatjGmd8S1jf7LZoubGl-mXuF1_Q2MRdkYzx5DCGadBGdCnqET1lI7RwxDqpkvYDFT5yNeSwTsBnGMdly94fPBnQ-o_TykI8i3oY/s400/imsc04.JPG" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Pic4<br />
<br />
<div style="text-align: left;">
<span style="font-size: small;">Since Qt Designer do not QWebEngineView yet, I add it manually by codes</span></div>
<div style="text-align: left;">
<br /></div>
</td></tr>
</tbody></table>
</div>
<pre class="prettyprint"> ui->gridLayout->addWidget(web_view_, 4, 0, 1, 2);
</pre>
<br />
Pic5 is what it looks like when running.<br />
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhh6O3C7rLUUhu_d8tweXY8Kvb-CVSH5Cs_a4c9NHJ-8K-rQSKwhR2z9TKbiOKw3bIyzI2VC-jKQVC0vCLdJrVKPl3pLYjziaZAd6bI-xSnXKVpD9iw3V-vAWFqGm81C2B7tCzz0PddKhI/s1600/imsc06.JPG" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="310" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhh6O3C7rLUUhu_d8tweXY8Kvb-CVSH5Cs_a4c9NHJ-8K-rQSKwhR2z9TKbiOKw3bIyzI2VC-jKQVC0vCLdJrVKPl3pLYjziaZAd6bI-xSnXKVpD9iw3V-vAWFqGm81C2B7tCzz0PddKhI/s400/imsc06.JPG" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Pic5</td></tr>
</tbody></table>
<br />
<br />
<div style="text-align: center;">
<b><u>Step 4 : Implement scroll function by js</u></b></div>
<br />
<pre class="prettyprint"> //get scroll position of the scroll bar and make it deeper
auto const ypos = web_page_->scrollPosition().ry() + 10000;
//scroll to deeper y position
web_page_->runJavaScript(QString("window.scrollTo(0, %1)").arg(ypos));
</pre>
<br />
<div style="text-align: center;">
<b><u>Step 5 : Implement parse image link function </u></b></div>
<br />
<br />
Before we can get the full link of the images, we need to scrape the links of the page.<br />
<br />
<pre class="prettyprint"> web_page_->gttoHtml([this](QString const &contents)
{
QRegularExpression reg("(search\\?view=detailV2[^\"]*)");
auto iter = reg.globalMatch(contents);
img_page_links_.clear();
while(iter.hasNext()){
QRegularExpressionMatch match = iter.next();
if(match.captured(1).right(20) != "ipm=vs#enterinsights"){
QString url = QUrl("https://www.bing.com/images/" +
match.captured(1)).toString();
url.replace("&amp", "&");
img_page_links_.push_back(url);
}
}
});
</pre>
<br />
<div style="text-align: center;">
<b><u>Step 6 : Simulate "See more image"</u></b></div>
<br />
This part is a little bit tricky, I tried to find the words "See more images" but find nothing, the reason is the source codes return by <span style="color: blue;">View Page Source</span>(Pic 6) do not update.<br />
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiyWhtQ8WUaKK0fRARa3F8VLuorh2vSb35oGES1oDhrD6z9LfCJShEg4GvY1KwihiJy2jcl4OPSfGDvoCmEJHFsyXKkHf3FlJLVlYZOR_zJgnihcCulmqlIUUAwm4mIcUqWsyFaDXe4zsc/s1600/imsc05.JPG" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="400" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiyWhtQ8WUaKK0fRARa3F8VLuorh2vSb35oGES1oDhrD6z9LfCJShEg4GvY1KwihiJy2jcl4OPSfGDvoCmEJHFsyXKkHf3FlJLVlYZOR_zJgnihcCulmqlIUUAwm4mIcUqWsyFaDXe4zsc/s400/imsc05.JPG" width="337" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Pic 6</td></tr>
</tbody></table>
<br />
Solution is easy, use <span style="color: blue;">Inspect Element</span> to replace <span style="color: blue;">View Page Source</span>(sometimes it is easier to find the contents you want by <span style="color: blue;">View Page Source</span>, both of them are valuable for web scraping).<br />
<br />
<br />
<br />
<pre class="prettyprint">
web_page_->runJavaScript("document.getElementsByClassName"
"(\"btn_seemore\")[0].click()");
</pre>
<br />
<br />
<div style="text-align: center;">
<b><u>Step 7 : Download images</u></b></div>
<br />
Overall our ultimate goal is download the image we want, let us finish last part of this prototype.<br />
<br />
First, we need to get the html text of the image page(the page with the link of image source).<br />
<br />
<br />
<pre class="prettyprint"> if(!img_page_links_.isEmpty()){
web_page_->load(img_page_links_[0]);
}
</pre>
<br />
Second, download the image<br />
<pre class="prettyprint">
void experiment_bing::web_page_load_finished(bool ok)
{
if(!ok){
qDebug()<<"cannot load webpage";
return;
}
web_page_->toHtml([this](QString const &contents)
{
QRegularExpression reg("src2=\"([^\"]*)");
auto match = reg.match(contents);
if(match.hasMatch()){
QNetworkRequest request(match.captured(1));
QString const header = "msnbot-media/1.1 (+http://search."
"msn.com/msnbot.htm)";
//without this header, some image cannot download
request.setHeader(QNetworkRequest::UserAgentHeader, header);
downloader_->append(request, ui->lineEditSaveAt->text());
}else{
qDebug()<<"cannot capture img link";
}
if(!img_page_links_.isEmpty()){
//this image should not download again
img_page_links_.pop_front();
}
});
}
</pre>
<br />
Third, download next image<br />
<br />
<br />
<pre class="prettyprint">
void experiment_bing::download_finished(size_t unique_id, QByteArray)
{
if(!img_page_links_.isEmpty()){
web_page_->load(img_page_links_[0]);
}
}
</pre>
<br />
<div style="text-align: center;">
<b><u> Summary</u></b></div>
<br />
These are the key points of scraping images of Bing search by QtWebEngine, The downloader I use in this post are come from <a href="https://github.com/stereomatchingkiss/qt_enhance/blob/master/network/download_supervisor.hpp">qt_enhance</a>, whole prototype are placed at <a href="https://mega.nz/#!h9lWRKDS!N-CGsGqBqky3kZjBqz5jrB2ncEIt2RIOH1_PKilwe9s">mega</a>. If you want to know more, visit following link<br />
<br />
<a href="http://qtandopencv.blogspot.my/2017/05/create-better-images-downloadergoogle.html">Create a better images downloader(Google, Bing and Yahoo) by Qt5</a><br />
<br />
<div style="text-align: center;">
<b><u>Warning</u></b></div>
<br />
Because <a href="https://bugreports.qt.io/browse/QTBUG-60669">Qt-bug 66099</a>, this prototype do not work under windows 10, unfortunately this bug is rated as P2, that means we may need to wait a while before it can be fixed by Qt community.<br />
<br />
<div style="text-align: center;">
<b><u>Edit</u></b></div>
<br />
Qt5.9 Beta4 fixed Qt-bug 60669, it works on my laptop(win 10 64bits) and desktop(Ubuntu 16.04.1).<br />
<br />
There exist a better solution to scrape image link of Bing, I will mentioned it on the next post.ThamBloghttp://www.blogger.com/profile/12617350461752184349noreply@blogger.com0tag:blogger.com,1999:blog-4702230343097536610.post-23378843263042386902017-03-12T00:40:00.001-08:002017-10-13T14:15:15.569-07:00Deep learning 07-Challenge dog vs cat fun competition of kaggle by dlib and mxnet Today I want to record down the experiences I learned from the <a href="https://www.kaggle.com/c/dogs-vs-cats-redux-kernels-edition/leaderboard">dog vs cat fun competition of kaggle</a> which I spend about two weeks on it(this is the first competition I took). My best rank in this competition is 67, this rank is close to <span style="color: red;"><b>top 5%</b></span>(there are 1314 team).<br />
<br />
The first tool I give it a try is dlib, although this library lack a lot of features compare with another deep learning toolbox, I still like it very much, especially the fact that dlib can work as <span style="color: red;">zero dependency library</span>.<br />
<br />
<div style="text-align: center;">
<b><u>What I have learned</u></b></div>
<div style="text-align: center;">
<br /></div>
<div style="text-align: left;">
1 : Remember to record down the parameters you used</div>
<div style="text-align: left;">
<br /></div>
<div style="text-align: left;">
At first I write down the records in my header file, this make my codes become harder to read as times go on, I should save those records in excel like format from the beginning. The other thing I learn is, I should record the parameters even I am running out of times, I find out without the records, I become more panic when the dead line was closer.</div>
<div style="text-align: left;">
<br />
2 : Feed pseudo labels into the mini-batch with naive way do not work</div>
<div style="text-align: left;">
<br /></div>
<div style="text-align: left;">
I should split the data of mini-batch with some sort of ratio, like 2/3 truth labels, 1/3 pseudo labels.</div>
<div style="text-align: left;">
<br /></div>
<div style="text-align: left;">
3 : Leverage pretrained model is much easier to get good results</div>
<div style="text-align: left;">
<br /></div>
<div style="text-align: left;">
I do not use pre-trained model but train the network from scratch, this do not give me great results, especially when I cannot afford to train the image with bigger size since my gpu only got 2GB of rams, my score was 0.27468 with brand new model. To speed things up, I treat resnet34 of dlib as feature extractor, save the features extracted by resnet34 and train on new network, this push my score to 0.09627, a big improve.</div>
<div style="text-align: left;">
<br /></div>
4 : Ensemble and k-cross validation<br />
<br />
To improve my score, I split the data set into 5 cross data set and ensemble the results by average them, this push my score to 0.06266. I do not apply stacking because I learn this technique after the competition finished. Maybe I can get better results if I know this technique earlier.<br />
<div>
<br /></div>
<div style="text-align: left;">
5 : How to use dlib, keras and mxnet</div>
<div style="text-align: left;">
<br /></div>
<div style="text-align: left;">
I put the codes of dlib and mxnet on <a href="https://github.com/stereomatchingkiss/blogCodes2/tree/master/cnn_playground/dog_vs_cat">github</a>, I removed all of the non-work solutions, that is why you do not see any codes related to keras. keras did not help me improve my score but mxnet did, what I have done with mxnet was finetune all of the resnet pretrained models and ensemble them with the results trained by dlib. This improve my score to 0.05051.</div>
<div style="text-align: left;">
<br /></div>
<div style="text-align: left;">
6 : Read the post at forums, it may give you useful info</div>
<div style="text-align: left;">
<br /></div>
<div style="text-align: left;">
I learned that the data set got some "errors" in it, I removed those false images from the training data set.</div>
<div style="text-align: left;">
<br />
7 : Fast ai course is awesome, I should view them earlier</div>
<br />
If I have watched the videos before I take this competition, I believe I could perform better in this competition. The <a href="http://forums.fast.ai/">forum of this course</a> is very helpful too, it is royal free and open.<br />
<br />
8 : X-Crop validation may help you improve your score<br />
<br />
I found this technique from PyImageSearch and dlib, but I do not have enough of times to try this technique out.<br />
<br />
9 : Save settings in JSON format<br />
<br />
Rather than hard code the parameters, save them in JSON file is better, because<br />
<br />
a : Do not need to change the source codes frequently, this save compile time<br />
b : Every models, experiments can have their own settings record, easier to reproduce<br />
training resultThamBloghttp://www.blogger.com/profile/12617350461752184349noreply@blogger.com0tag:blogger.com,1999:blog-4702230343097536610.post-73066305307051133062016-06-30T17:04:00.003-07:002016-06-30T20:21:49.075-07:00Speed up image hashing of opencv(img_hash) and introduce color moment hash In this post, I would like to show you two things.<br />
<br />
1 : How could I <span style="color: magenta; font-size: large;"><b>accelerate the speed</b></span> of the <a href="https://github.com/Itseez/opencv_contrib/pull/688"><span style="color: blue;">img_hash module(click me)</span></a> from <span style="color: magenta; font-size: large;"><b>1.5x~500x</b></span>(roughly) from my <a href="http://qtandopencv.blogspot.my/2016/06/introduction-to-image-hash-module-of.html"><span style="color: blue;">last post(click me)</span></a>.<br />
<br />
2 : A new image hash algorithms which <span style="color: magenta;"><b>works quite well</b></span> under rotation attack.<br />
<h2 style="text-align: center;">
<span style="color: blue;"><b><u>Accelerate the speed of img_hash</u></b></span></h2>
We only need <span style="color: magenta;">one line</span> to gain this huge performance gain, no more, no less.<br />
<br />
<br />
<pre class="prettyprint">cv::ocl::setUseOpenCL(false);
</pre>
<br />
What I do is <span style="color: magenta;">close the optimization of openCL</span>(I would not discuss why this speed things up dramatically on my laptop, if you are interesting about it, I would open another topic to discuss this phenomenon). Let us measure the performance after the change. Codes located <a href="https://github.com/stereomatchingkiss/blogCodes2/blob/master/image_hash/hash_samples.cpp"><span style="color: blue;">at here(click me)</span></a>.<br />
<br />
Following comparison do not list the results of PHash about Average hash, PHash and Color hash algorithms, because I cannot find these algorithms in PHash library.<br />
<br />
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhk_6H14YNR2phIuLO-RyeRXcvVHg4S9qz5MG9RJm2VNDVT9Y5XZVbTpY5MkYCl-ZqHcYLxlyOomdBXd5i89973AVKptIZjQAGtdMOrMJLo_iO292HiqmkABNNdIREAtzaP3Mv9NvS8EX0/s1600/Capture.JPG" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="75" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhk_6H14YNR2phIuLO-RyeRXcvVHg4S9qz5MG9RJm2VNDVT9Y5XZVbTpY5MkYCl-ZqHcYLxlyOomdBXd5i89973AVKptIZjQAGtdMOrMJLo_iO292HiqmkABNNdIREAtzaP3Mv9NvS8EX0/s640/Capture.JPG" width="640" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Computation time</td></tr>
</tbody></table>
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhv3tJWb3Nir75GEAVagNaEX10G1tZaE74gZDLOm6bKQPFuBrE6R3imchcVen5tRK-qW78KyWVP8rd_G6xcvyzs_jEJt2tFZ5nh_VI96hYHHLc_iOITjX5cmobM1cFSsgt1Cbn2rFQAIeg/s1600/Capture.JPG" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="76" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhv3tJWb3Nir75GEAVagNaEX10G1tZaE74gZDLOm6bKQPFuBrE6R3imchcVen5tRK-qW78KyWVP8rd_G6xcvyzs_jEJt2tFZ5nh_VI96hYHHLc_iOITjX5cmobM1cFSsgt1Cbn2rFQAIeg/s640/Capture.JPG" width="640" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Comparison time</td></tr>
</tbody></table>
<br />
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgTP6byyDQPGfL_s1b1rVqesNRazopbsqQ-VW0987iah5eC2I5EAKMpIFK2xvXv29fWy44J15Ms5AfjeGiQaNer1wuM4BCchZa2ktRYusjwQ-E2J84RaEXBOz0cv0JqCu_hWeyKAxovdWo/s1600/Capture.JPG" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="108" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgTP6byyDQPGfL_s1b1rVqesNRazopbsqQ-VW0987iah5eC2I5EAKMpIFK2xvXv29fWy44J15Ms5AfjeGiQaNer1wuM4BCchZa2ktRYusjwQ-E2J84RaEXBOz0cv0JqCu_hWeyKAxovdWo/s640/Capture.JPG" width="640" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Computation of img_hash with and without opencl</td></tr>
</tbody></table>
<br />
As the results show, computation time of img_hash outperform PHash after I switch off opencl support(on you computer, switch it on <span style="color: magenta;">may</span> help you gain better performance) on my laptop(y410p). Whatever, the comparison performance do not change much with or without opencl support.<br />
<br />
<br />
<h2 style="text-align: center;">
<span style="color: blue;"><b><u>Benchmark of Color Moment Hash</u></b></span></h2>
<div>
<span style="color: blue;"><b><u><br /></u></b></span></div>
<div>
In this section, I would like to introduce an image hash algorithm which works quite well under rotation attack and provide a much better test results than my <a href="http://qtandopencv.blogspot.my/2016/06/introduction-to-image-hash-module-of.html"><span style="color: blue;">last post(click me)</span></a>. This algorithm is introduced by <a href="http://www.naturalspublishing.com/files/published/54515x71g3omq1.pdf"><span style="color: blue;">this paper(click me)</span></a>, the class ColorMomentHash of img_hash module implement this algorithm.</div>
<div>
<br /></div>
<div>
My last post only use one image--lena.png to do the experiment under different attack, in this post I will use the data set from phash to do the test(use<span style="color: blue;"> <a href="http://www.phash.org/download/"><span style="color: blue;">miscellaneous data set(click me)</span></a></span> as original image, apply different attack on it). These 3D bar charts are generated by Qt data visualization, I do not upload it to github yet because the codes are quite messy, if you need the source codes, please send the request to my <span style="color: blue;"><a href="mailto:thamngapwei@gmail.com">email</a>(thamngapwei@gmail.com)</span>, I would send you a copy of the codes, but do not expect I would refine the codes any time soon.<br />
<br />
The name of the images are quite long, it do not looks good when I draw it on chart, so I rename them to shorter form(001~023). Following are the mapping of those images. You can download the mapping of new name and old name <a href="https://mega.nz/#!Z1kSkByC!SxrrpqGt410X2ea33OSJm1X4qPPKT-dZIN-EcmCivuI"><span style="color: blue;">from mega(click me)</span></a>.<br />
<br />
Threshold of the tests of color moment hash is 8, if the L2-Norm of two hash greater than 8, we treat it as fail, and draw it with red bars.</div>
<div>
<br /></div>
<h3 style="text-align: center;">
<b><u><span style="color: blue;">Contrast attack</span></u></b></h3>
<div>
<br /></div>
<div>
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg975XsjLPVyXY7l4SPzpi7CF0a-rF_sHS7dO2gxcpjPk3IZ9Ijl2zmnCNFYoqvNyoaiuivJj7M0m1DSFgNfH19V0RKyNBRTl9nfzKzK0R0GqJKSc0sa8l1QYVylTX_0PNGwQHcYROKaWU/s1600/contrast_attack_color_moment.JPG" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="338" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg975XsjLPVyXY7l4SPzpi7CF0a-rF_sHS7dO2gxcpjPk3IZ9Ijl2zmnCNFYoqvNyoaiuivJj7M0m1DSFgNfH19V0RKyNBRTl9nfzKzK0R0GqJKSc0sa8l1QYVylTX_0PNGwQHcYROKaWU/s640/contrast_attack_color_moment.JPG" width="640" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;"><span style="font-size: 12.8px;">Contrast attack on color moment hash</span></td></tr>
</tbody></table>
<div class="separator" style="clear: both; text-align: center;">
</div>
</div>
<div style="text-align: left;">
<span style="color: blue;"> </span><span style="color: blue;"> </span><span style="color: blue;"> </span><br />
<span style="color: blue;"> </span><br />
<span style="color: blue;"> </span>Param is the gamma value of gamma correction.</div>
<h3 style="text-align: center;">
<b><span style="color: blue;"><u><br /></u></span></b></h3>
<h3 style="text-align: center;">
<b><span style="color: blue;"><u>Resize attack</u></span></b></h3>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiMoQvwFlgfXtpw4wEOLukDFVUhBQgrBBsnkMy6bz5EJTyQ2SuCwSgbNuKfNC-lTTR1OoeUDzGesP5jRVA4p2uMnBhLOaaiydBQ6q-hUB8xxNmDy7aADyaL48B4yyiBlxrffqiGImTm9dY/s1600/resize_attack_color_moment.JPG" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="330" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiMoQvwFlgfXtpw4wEOLukDFVUhBQgrBBsnkMy6bz5EJTyQ2SuCwSgbNuKfNC-lTTR1OoeUDzGesP5jRVA4p2uMnBhLOaaiydBQ6q-hUB8xxNmDy7aADyaL48B4yyiBlxrffqiGImTm9dY/s640/resize_attack_color_moment.JPG" width="640" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Resize attack on color moment hash<br />
<br />
<br />
<div style="text-align: left;">
<span style="font-size: small;"> </span><br />
<span style="font-size: small;"> Param is the aspect ratio of horizontal and vertical site.</span><br />
<span style="font-size: small;"><br /></span></div>
</td></tr>
</tbody></table>
<h3 style="text-align: center;">
<b><span style="color: blue;"><u>Gaussion noise attack</u></span></b></h3>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhbpMedEOnYTu8O-Jbq4GzYh_1x2XHowEupoTfnByN6ZDXTTTj1oJANoJfEiH-6kgWLf8GDRZPLGNlaiQ3HULHggok2N5PFpIwr-sNyINJMk7bkU7tcYXQCpPVgQPo6YX0kdlYj-pX1cBw/s1600/gaussian_attack_color_moment.JPG" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="350" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhbpMedEOnYTu8O-Jbq4GzYh_1x2XHowEupoTfnByN6ZDXTTTj1oJANoJfEiH-6kgWLf8GDRZPLGNlaiQ3HULHggok2N5PFpIwr-sNyINJMk7bkU7tcYXQCpPVgQPo6YX0kdlYj-pX1cBw/s640/gaussian_attack_color_moment.JPG" width="640" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;"><span style="font-size: 12.8px;">Gaussian noise attack on color moment hash</span></td></tr>
</tbody></table>
<div>
<span style="color: blue;"> </span><span style="color: blue;"> </span><br />
<span style="color: blue;"> </span><br />
<br />
<span style="color: blue;"> </span>Param is the standard deviation of gaussion.<br />
<span style="color: blue;"><br /></span></div>
<div>
<b><span style="color: blue;"><u><br /></u></span></b></div>
<div>
<b><span style="color: blue;"><u><br /></u></span></b></div>
<div>
<h3 style="text-align: center;">
<b><span style="color: blue;"><u>Salt and pepper noise attack</u></span></b></h3>
</div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiKjJ4dMAnuNzhAbfu3LSW7CzYg0zAKnUu7wQyeX78lUFuGIxPwTjxyPb2ye6hn5hs-lyJP_QpsBjJ3h5dwlnHAgFkgUXiM7kHLW7D-Ww5SIs0nc3JnsV2CYJGBV1-UqdCZXNrL4vAdiQk/s1600/salt_pepper_attack_color_moment.JPG" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="336" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiKjJ4dMAnuNzhAbfu3LSW7CzYg0zAKnUu7wQyeX78lUFuGIxPwTjxyPb2ye6hn5hs-lyJP_QpsBjJ3h5dwlnHAgFkgUXiM7kHLW7D-Ww5SIs0nc3JnsV2CYJGBV1-UqdCZXNrL4vAdiQk/s640/salt_pepper_attack_color_moment.JPG" width="640" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;"><span style="font-size: 12.8px;">Salt and pepper noise attack on color moment hash</span></td></tr>
</tbody></table>
<div>
Param is the threshold of pepper and salt.</div>
<div>
<h3 style="text-align: center;">
<b><span style="color: blue;"><u>Rotation attack</u></span></b></h3>
</div>
<div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjBVp6gIGHadyW6ZY9b0hQiVtwR8uSK4MacSLWquhFV4m9GzLzFV6c7JAfF8NGAjNcg5Kn7kRlRKRxFhZlq3KU1rt1BqJimjjnQAgImhGwPFApArtOyjPypjp2CXeP7fzMClFOeM0pG7Ow/s1600/rotation_attack_color_moment.JPG" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="362" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjBVp6gIGHadyW6ZY9b0hQiVtwR8uSK4MacSLWquhFV4m9GzLzFV6c7JAfF8NGAjNcg5Kn7kRlRKRxFhZlq3KU1rt1BqJimjjnQAgImhGwPFApArtOyjPypjp2CXeP7fzMClFOeM0pG7Ow/s640/rotation_attack_color_moment.JPG" width="640" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;"><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td class="tr-caption" style="font-size: 12.8px;"><span style="font-size: 12.8px;">Rotation attack on color moment hash</span></td></tr>
</tbody></table>
<br />
<div style="text-align: justify;">
<span style="font-size: small; text-align: start;"> Param is the angle of rotation.</span></div>
<div style="font-size: medium; text-align: start;">
</div>
</td></tr>
</tbody></table>
<h3 style="text-align: center;">
<b><span style="color: blue;"><u>Gaussian blur attack</u></span></b></h3>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgPHgGJW9TAM3wzARWZdnPV-ujcHkjVTkzVddOQhtQbHBbtedHNkhT07R1NK7Cpm_Rx67OKx0FYtpdB5F5xdFC-Sx_eaTTHpT0Np2Bo3O2s-x3G-SDR3DW5p1V9wBRKOdYHsZpM80q95EE/s1600/gaussian_blur_attack_color_moment.JPG" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="368" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgPHgGJW9TAM3wzARWZdnPV-ujcHkjVTkzVddOQhtQbHBbtedHNkhT07R1NK7Cpm_Rx67OKx0FYtpdB5F5xdFC-Sx_eaTTHpT0Np2Bo3O2s-x3G-SDR3DW5p1V9wBRKOdYHsZpM80q95EE/s640/gaussian_blur_attack_color_moment.JPG" width="640" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;"><span style="font-size: 12.8px;">Gaussian blur attack on color moment hash</span></td></tr>
</tbody></table>
<br />
Param is the standard deviation of 3x3 gaussian filter.<br />
<h3 style="text-align: center;">
<b><span style="color: blue;"><u>Jpeg compression attack</u></span></b></h3>
</div>
<div>
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhZL06U9LJWG6gmeBZf1tdG5lP9pLsUeBM1BCM4ONdyUB8IpZotuKmlVk6ryAni1M2rSJxeFk8HrBtO3lj5l9rezivI0PhDEeAPDameh6x0N2CUL6k5GuOWnI0ggMp4B2Pj8cY98SLAue0/s1600/jpeg_compression_attack_color_moment.JPG" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="384" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhZL06U9LJWG6gmeBZf1tdG5lP9pLsUeBM1BCM4ONdyUB8IpZotuKmlVk6ryAni1M2rSJxeFk8HrBtO3lj5l9rezivI0PhDEeAPDameh6x0N2CUL6k5GuOWnI0ggMp4B2Pj8cY98SLAue0/s640/jpeg_compression_attack_color_moment.JPG" width="640" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;"><span style="font-size: 12.8px;">Jpeg compression attack on color moment hash</span></td></tr>
</tbody></table>
</div>
<div>
<br /></div>
<div>
Param is the quality factor of jpeg compression, 100 means no compress.</div>
<div>
<br /></div>
<div>
<h3 style="text-align: center;">
<b><span style="color: blue;"><u>Watermark attack</u></span></b></h3>
</div>
<div>
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjhEGGckjte7gSffLysEtKNvIvidaEp0AQ1PkFNV2ZbNwWJBpyFF3YoWAtnmnlwK6h9T-EPWml3-p2t_fF2vUcJMbC3rL04xSre7BQerb6kc_XQGRlrQmbGc3ujDb_-SDWQ0euPHZRULkc/s1600/watermark_attack_color_moment.JPG" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="354" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjhEGGckjte7gSffLysEtKNvIvidaEp0AQ1PkFNV2ZbNwWJBpyFF3YoWAtnmnlwK6h9T-EPWml3-p2t_fF2vUcJMbC3rL04xSre7BQerb6kc_XQGRlrQmbGc3ujDb_-SDWQ0euPHZRULkc/s640/watermark_attack_color_moment.JPG" width="640" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;"><span style="font-size: 12.8px;">Watermark attack on color moment hash</span></td></tr>
</tbody></table>
</div>
<div>
Param is the strength of watermark, 1.0 means the mark is 100% opaque. Image 017 and image 023 perform very poor because they are gray scale image.</div>
<div>
<br /></div>
<div>
From these experiment data, we can say color moment hash perform very well under various attack except gaussion noise, salt and pepper noise and contrast attack.</div>
<div>
<br /></div>
<div>
<h2 style="text-align: center;">
<b><span style="color: blue;"><u>Overall results of different algorithms</u></span></b></h2>
</div>
<div>
<br /></div>
<div>
Apparently, there are too many data to show for all of the algorithms, to make things more intuitive, I create the charts to help you measure the performance of these algorithms under different attacks.Their threshold are same as the <a href="http://qtandopencv.blogspot.my/2016/06/introduction-to-image-hash-module-of.html"><span style="color: blue;">last post(click me)</span></a>.</div>
<div>
<br /></div>
<div>
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjuPTvNpAoHZAJ5Kzu-GNxfHeRskVrlSjzvTv9x0SXPP0WnCSgaVzT16flxd8or3bHf1BsHaylWpAahPwDhR6Zb2GgEBBh0E1Zth0GLxCF3SrZuYBQFsKn4XHxG4Q8uf2rchsDnNxFuEnw/s1600/average_performanceJPG.JPG" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="516" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjuPTvNpAoHZAJ5Kzu-GNxfHeRskVrlSjzvTv9x0SXPP0WnCSgaVzT16flxd8or3bHf1BsHaylWpAahPwDhR6Zb2GgEBBh0E1Zth0GLxCF3SrZuYBQFsKn4XHxG4Q8uf2rchsDnNxFuEnw/s640/average_performanceJPG.JPG" width="640" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Average algorithm performance</td></tr>
</tbody></table>
</div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjV7XtSu7s1ObG52RluQ7E6U-9dhuYrG68PWcAd_ERC13CJJuyfzc3UJm2bJkP15QTPxh-VM1E3tkhmAUfFJAQYh6fbss7HeykVvbLJP9EF0bAtw8AiSpLyo_om1uaDHSNtV5-RXoqK1u0/s1600/phash_performanceJPG.JPG" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="548" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjV7XtSu7s1ObG52RluQ7E6U-9dhuYrG68PWcAd_ERC13CJJuyfzc3UJm2bJkP15QTPxh-VM1E3tkhmAUfFJAQYh6fbss7HeykVvbLJP9EF0bAtw8AiSpLyo_om1uaDHSNtV5-RXoqK1u0/s640/phash_performanceJPG.JPG" width="640" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;"><span style="font-size: 12.8px;">PHash algorithm performance</span></td></tr>
</tbody></table>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiSpA7NS_3obrxnnBi29GJbtvXs0sM0IM2x4_pXMrURlD6Lh-0H8Ksa8QgIrlbJGxWeVRFeyWbLQyUJqdaC0yPKM5FBpWorHq-sHnBYjwlngWX7Ax6PBezHnE1Q55_pi8eoi-QTZWTbRq8/s1600/marr_hash_performanceJPG.JPG" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="538" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiSpA7NS_3obrxnnBi29GJbtvXs0sM0IM2x4_pXMrURlD6Lh-0H8Ksa8QgIrlbJGxWeVRFeyWbLQyUJqdaC0yPKM5FBpWorHq-sHnBYjwlngWX7Ax6PBezHnE1Q55_pi8eoi-QTZWTbRq8/s640/marr_hash_performanceJPG.JPG" width="640" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;"><span style="font-size: 12.8px;">Marr Hildreth algorithm performance</span></td></tr>
</tbody></table>
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhiHeVsHtFEzwBIn7EDg5_Yk6CYeHiJbpEKlU7UeTXjSNfFPbYJmXdalI-xnjcawvztx4Ka3lmiHx7RPjnXKzH27ZXhuV21bpp1w0Zpfg5MImUEB0SYJwOlXZ_1x6dtND6vuz7TFSmIl9U/s1600/radial_performanceJPG.JPG" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="542" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhiHeVsHtFEzwBIn7EDg5_Yk6CYeHiJbpEKlU7UeTXjSNfFPbYJmXdalI-xnjcawvztx4Ka3lmiHx7RPjnXKzH27ZXhuV21bpp1w0Zpfg5MImUEB0SYJwOlXZ_1x6dtND6vuz7TFSmIl9U/s640/radial_performanceJPG.JPG" width="640" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;"><span style="font-size: 12.8px;">Radial hash algorithm performance</span></td></tr>
</tbody></table>
<div class="separator" style="clear: both; text-align: center;">
</div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjSgKoTkm5_kqAcLHhI_TBway5OXca719glN0RF4DpQi0HOmDARVl4VEpUiAGaRVskgL7pHitPSvsgvSqLZwYw1wGNAC_Sib3dCd4tthfQQsDjcpbkimVWGa5OaIviUqKU6DGfMPgAGVDU/s1600/bmh_zero_performanceJPG.JPG" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="542" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjSgKoTkm5_kqAcLHhI_TBway5OXca719glN0RF4DpQi0HOmDARVl4VEpUiAGaRVskgL7pHitPSvsgvSqLZwYw1wGNAC_Sib3dCd4tthfQQsDjcpbkimVWGa5OaIviUqKU6DGfMPgAGVDU/s640/bmh_zero_performanceJPG.JPG" width="640" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;"><span style="font-size: 12.8px;">BMH zero algorithm performance</span></td></tr>
</tbody></table>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhLkNJwlLIiMBhHFHbOa7MlXMtTmaVlxZqE8Xl3p6zNemaMib-Qh8VaQTeKiucRwAYqKnNFlH7ydYpko1MC_F7dFsMNo11paK_MzE-CDhClvh_lB7Od_oyK1lnKTKMTrFsrbEYVY03kj_Y/s1600/bmh_one_performanceJPG.JPG" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="552" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhLkNJwlLIiMBhHFHbOa7MlXMtTmaVlxZqE8Xl3p6zNemaMib-Qh8VaQTeKiucRwAYqKnNFlH7ydYpko1MC_F7dFsMNo11paK_MzE-CDhClvh_lB7Od_oyK1lnKTKMTrFsrbEYVY03kj_Y/s640/bmh_one_performanceJPG.JPG" width="640" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;"><span style="font-size: 12.8px;">BMH one algorithm performance</span></td></tr>
</tbody></table>
<div>
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEisy-jDlNLFihugY4zsDZOgyYUz2OYngWfJx9l_2HT92oD-ootua9vfXykNuOl5a6LkF_5iAbYBezrrQX0RStTKEbfipegJReNpuK23EIZfA0bvw1dejY3Pvt1-sVFRd1FFgSyJG5-v42A/s1600/color_moment_performanceJPG.JPG" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="540" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEisy-jDlNLFihugY4zsDZOgyYUz2OYngWfJx9l_2HT92oD-ootua9vfXykNuOl5a6LkF_5iAbYBezrrQX0RStTKEbfipegJReNpuK23EIZfA0bvw1dejY3Pvt1-sVFRd1FFgSyJG5-v42A/s640/color_moment_performanceJPG.JPG" width="640" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;"><span style="font-size: 12.8px;">Color moment algorithm performance</span></td></tr>
</tbody></table>
<div>
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgT15qN7JU2G1dHGf7lhXT_N50NtZCwBKifp6zEZk0mpsyjvAF3QgN_JIMxWM8ManSmvSL931GCDpj6n0wfDWel8nVw_lLiF_bXtUMPQx1ZfbIBydbTguznuv6l04Q1ZyfuCPa8OkBb1HM/s1600/overall_result.JPG" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="556" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgT15qN7JU2G1dHGf7lhXT_N50NtZCwBKifp6zEZk0mpsyjvAF3QgN_JIMxWM8ManSmvSL931GCDpj6n0wfDWel8nVw_lLiF_bXtUMPQx1ZfbIBydbTguznuv6l04Q1ZyfuCPa8OkBb1HM/s640/overall_result.JPG" width="640" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Overall reults</td></tr>
</tbody></table>
</div>
<div>
These are the results of all of the algorithms, from the Overall results chart, it is easy to see that every algorithms have their pros and cons, you need to pick the one suit for your database. If speed is crucial, then average hash maybe is your best choices, because it is the <span style="color: magenta;">fastest algorithms</span> compare with other and perform very well under different attacks except of rotation and salt and pepper noise.If you need rotation resistance, color moment hash is you only choice because other algorithms suck on rotation attack. You can find the codes of these test cases <a href="https://github.com/stereomatchingkiss/blogCodes2/blob/master/image_hash/better_test.cpp"><span style="color: blue;">from here(click me)</span></a>.</div>
<h2 style="text-align: center;">
<b><span style="color: blue;"><u>Compare with PHash library</u></span></b></h2>
<div>
<b><span style="color: blue;"><u><br /></u></span></b></div>
<div>
As this post show, img_hash module possess <span style="color: magenta;">five advantages</span> over the <a href="http://www.phash.org/"><span style="color: blue;">PHash library(click me)</span></a>.</div>
<div>
<br /></div>
<div>
1 : Processing <span style="color: magenta;">speed</span> of this module <span style="color: magenta; font-size: large;">outperform PHash</span>.<br />
<br /></div>
<div>
2 : This module adopt the same license as <a href="http://opencv.org/"><span style="color: blue;">opencv(click me)</span></a>, which means you can do anything with it as you like <span style="color: magenta; font-size: large;">without charging</span>.<br />
<br /></div>
<div>
3 : The codes are much more modern, easier to use, img_hash <span style="color: magenta; font-size: large;">free you from memory management chores</span> once and for all. <span style="color: magenta;">A modern, good c++ library <span style="font-size: large;">should not</span> force their users take care the resources by themselves</span>.<br />
<br /></div>
<div>
4 : Api of img_hash are consistent, much easier to use than PHash library. Do not believe it? Let us see some examples.<br />
<br />
<h3 style="text-align: center;">
<u><span style="color: blue;">Case 1a : Compute Radial Hash by PHash library
</span></u></h3>
<br />
<pre class="prettyprint">Digest digests_0, digests_1;
digest_0.coeffs = 0;
digest_1.coeffs = 1;
ph_image_digest(img_0, 1.0, 1.0, digest_0);
ph_image_digest(img_1, 1.0, 1.0, digest_1);
double pcc = 0;
ph_crosscorr(digest_0, digest_1, pcc, 0.9);
//do something, remember to free your memory :(
free(digest_0.coeffs);
free(digest_1.coeffs);
</pre>
<br />
<div style="text-align: center;">
<b><u><span style="color: blue;">Case 1b : Compare Radial Hash by img_hash
</span></u></b></div>
<br />
<pre class="prettyprint">auto algo = RadialVarianceHash::create();
cv::Mat hash_0, hash_1;
algo->compute(img_0, hash_0);
algo->compute(img_1, hash_1);
double const value = algo->compare(hash_0, hash_1);
//do something
//you do not need to free anything by yourself
</pre>
<br />
<br />
<div style="text-align: center;">
<b><u><span style="color: blue;">Case 2a : Compute Marr Hash by PHash library
</span></u></b></div>
<br />
<pre class="prettyprint">int N = 0;
uint8_t *hash_0 = ph_mh_imagehash(img_0, N);
uint8_t *hash_1 = ph_mh_imagehash(img_1, N);
double const value = ph_hammingdistance2(hash_0 , 72, hash_1, 72);
//do something, remember to free your memory :(
free(hash_0);
free(hash_1);
</pre>
<br />
<div style="text-align: center;">
<b><u><span style="color: blue;">Case 2b : Compare Marr Hash by img_hash
</span></u></b></div>
<br />
<pre class="prettyprint">auto algo = MarrHildrethHash::create();
cv::Mat hash_0, hash_1;
algo->compute(img_0, hash_0);
algo->compute(img_1, hash_1);
double const value = algo->compare(hash_0, hash_1);
//do something
//you do not need to free anything by yourself
</pre>
<br />
<br />
<div style="text-align: center;">
<span style="color: blue;"><b><u>Case 3a : Compute Block mean Hash by PHash library
</u></b></span></div>
<br />
<pre class="prettyprint">BinHash *hash_0 = 0;
BinHash *hash_1 = 0;
ph_bmb_imagehash(imgs_0, 1, &hash_0);
ph_bmb_imagehash(imgs_1, 1, &hash_1);
double const value = ph_hammingdistance2(hash_0->hash,
hash_0->bytelength,
hash_1->hash,
hash_1->bytelength);
//do something, remember to free your memory :(
ph_bmb_free(hash_0);
ph_bmb_free(hash_1);
</pre>
<br />
<div style="text-align: center;">
<b><u><span style="color: blue;">Case 3b : Compare Block mean Hash by img_hash
</span></u></b></div>
<br />
<pre class="prettyprint">auto algo = BlockMeanHash::create(0);
cv::Mat hash_0, hash_1;
algo->compute(img_0, hash_0);
algo->compute(img_1, hash_1);
double const value = algo->compare(hash_0, hash_1);
//do something
//you do not need to free anything by yourself
</pre>
<br />
As you can see, img_hash not only faster, this module also provide you cleaner, more concise way to write your codes, you never need to remember different ways to find out your hash and how to compare them anymore, because the <span style="color: magenta; font-size: large;"><b>api of img_hash are consistent</b></span>.<br />
<br />
<br />
5 : This module only depend on opencv_core and opencv_imgproc, that means you should be able to <span style="color: magenta;">compile it at ease</span> on every major platform without scratching your heads.</div>
<div>
<br />
<h2 style="text-align: center;">
<span style="color: blue;"><b><u>Next move</u></b></span></h2>
Develop an application--<span style="color: magenta;"><b>Similar Vision </b></span>to show the capability of img_hash. Functions of this app are find out similar images from image set(of course, it will leverage the power of img_hash module) and similar video clips from videos. </div>
<div>
<br /></div>
ThamBloghttp://www.blogger.com/profile/12617350461752184349noreply@blogger.com1tag:blogger.com,1999:blog-4702230343097536610.post-66412412627479919012016-06-19T11:59:00.001-07:002023-02-03T21:08:26.165-08:00Introduction to image hash module of opencv Anyone using the defacto standard computer vision library--<a href="http://opencv.org/"><span style="color: blue;">opencv</span></a>, have you ever hope opencv provide us ready to use, image hash algorithms like <a href="http://www.phash.org/docs/design.html">average hash, perceptual hash, block mean hash, radial variance hash, marr hildreth hash like PHash</a><a href="http://www.phash.org/docs/design.html"> </a>does? PHash sound like a robust solution and run quite fast, but prefer PHash mean you <span style="color: red;">need to add more dependencies</span> into your project and <span style="color: red;">open your source codes</span>, open source is not a viable option in most of the commercial products. Do you, like me, <span style="color: red;">do not want to add more dependencies</span> into your codes? Have a royalty free, robust and high performance image hash algorithms for your project?Let us admit it, we do not like to solve dependencies issues related to programming, beyond that, many of the commercial project need to remain close source, it would be much better if opencv provide us an image hash module.<br />
<br />
If opencv do not have one, why not just create one for it?<br />
<br />
1 : The algorithms of image hash are <span style="color: orange;">not too complicated</span>.<br />
2 : PHash library already implement many of image hash algorithms, we could port them to opencv and use it as golden model.<br />
3 : opencv is an<span style="color: orange;"> <b>open source computer vision library</b></span>. If we ever found any bugs, missing features, poor performance, we can do something to make it better.<br />
<br />
The good news is I have implement all of the algorithms I mentioned above, <b><span style="color: orange;">refine</span></b> the performance(ex : block mean hash able to process single channel image), <b><span style="color: orange;">f</span></b><span style="color: orange;"><b>ree you from memory management chores</b></span>. The bad news is this pull request hasn't merged yet when I write this post, so you need to clone/pull it down and build by yourself. Fear not, this module <span style="color: orange;"><b>only depend on</b></span> the core and imgproc of opencv, it should be fairly easy to build(opencv is quite easy to build from the beginning :)).<br />
<br />
Following examples will show you how to use img_hash, you will find out it is much easier to use than PHash library because the api are <span style="color: orange;"><b>more consistent</b></span> + you <span style="color: orange;"><b>do not need to manage the memory</b></span> by yourself.<br />
<br />
<h3 style="text-align: center;">
<span style="color: blue;"><b><u>How to use it</u></b></span></h3>
<br />
<pre class="prettyprint">#include <opencv2/core.hpp>
#include <opencv2/core/ocl.hpp>
#include <opencv2/highgui.hpp>
#include <opencv2/img_hash.hpp>
#include <opencv2/imgproc.hpp>
void computeHash(cv::Ptr<cv::img_hash::ImgHashBase> algo)
{
cv::Mat const input = cv::imread("lena.png");
cv::Mat const target = cv::imread("lena_blur.png");
cv::Mat inHash; //hash of input image
cv::Mat targetHash; //hash of target image
//comupte hash of input and target
algo->compute(input, inHash);
algo->compute(target, targetHash);
//Compare the similarity of inHash and targetHash
//recommended thresholds are written in the header files
//of every classes
double const mismatch = algo->compare(inHash, targetHash);
std::cout<<mismatch<<std::endl;
}
int main()
{
//disable opencl acceleration may boost up speed of img_hash
//however, in this post I do not disable the optimization of opencl
//cv::ocl::setUseOpenCL(false);
computeHash(img_hash::AverageHash::create());
computeHash(img_hash::PHash::create());
computeHash(img_hash::MarrHildrethHash::create());
computeHash(img_hash::RadialVarianceHash::create());
//BlockMeanHash support mode 0 and mode 1, they associate to
//mode 1 and mode 2 of PHash library
computeHash(img_hash::BlockMeanHash::create(0));
computeHash(img_hash::BlockMeanHash::create(1));
computeHash(img_hash::ColorMomentHash::create());
}
</pre>
<br />
<br />
With these functions, we can measure the performance of our algorithms under different "attack", like resize, contrast, noise and rotation. Before we start the test, let me define the thresholds of "pass" and "fail".One thing to remember is, to make thing simple, I only use lena to show the results, different data set may need different thresholds/algorithms to get best results.<br />
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgqlEYSEO9uiOuhzyOUur4umrVK8UBf_ciEx9eVplNem50epce_oRTw_w86czuNtRUBH35y_BXBRFKQKVrvNZHOn90RAP_qspW0OFjOOp_iyq2nF-zGTwaeM2U2IFPQvFgCU4_uwypu6TE/s1600/Capture.JPG" style="margin-left: auto; margin-right: auto;"><img border="0" height="212" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgqlEYSEO9uiOuhzyOUur4umrVK8UBf_ciEx9eVplNem50epce_oRTw_w86czuNtRUBH35y_BXBRFKQKVrvNZHOn90RAP_qspW0OFjOOp_iyq2nF-zGTwaeM2U2IFPQvFgCU4_uwypu6TE/s400/Capture.JPG" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Threshold</td></tr>
</tbody></table>
<br />
<br />
After we determine our threshold, we could use our beloved lena to do the test :).<br />
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgGE6UwUZDYNrosBwZ-6c897uwLRd2KsJELfCh6Nf69B_CrfHVKDGSih-S1IOflqhiezlYXitsMcTUIg_yMw9f40bVRNAT_1TvNKjEoPyMy8iWA-UZe5Btjfc-d06yJBNDBF-9K3D8Vb60/s1600/lena.png" style="margin-left: auto; margin-right: auto;"><img border="0" height="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgGE6UwUZDYNrosBwZ-6c897uwLRd2KsJELfCh6Nf69B_CrfHVKDGSih-S1IOflqhiezlYXitsMcTUIg_yMw9f40bVRNAT_1TvNKjEoPyMy8iWA-UZe5Btjfc-d06yJBNDBF-9K3D8Vb60/s320/lena.png" width="320" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">lena.png</td></tr>
</tbody></table>
<br />
<div style="text-align: center;">
<b><u><span style="color: blue;">Resize attack</span></u></b></div>
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhFrfUNE-hiEBBhNssNMZTAY1704791Mv6b1r-OBawL4aHGrsSy9TtBqr4Th4LZdOXHOAU3RBoCgqFm9UNvLo_zDx6XWzVM5uoxw1UCeUMam7ZnwE4TIILXKcjEeAJMCFWXrOiGQ94jr1U/s1600/Capture.JPG" style="margin-left: auto; margin-right: auto;"><img border="0" height="237" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhFrfUNE-hiEBBhNssNMZTAY1704791Mv6b1r-OBawL4aHGrsSy9TtBqr4Th4LZdOXHOAU3RBoCgqFm9UNvLo_zDx6XWzVM5uoxw1UCeUMam7ZnwE4TIILXKcjEeAJMCFWXrOiGQ94jr1U/s400/Capture.JPG" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Resize attack</td></tr>
</tbody></table>
<br />
<br />
Every algorithms(BMH mean block mean hash) work very well on different size and aspect ratio <span style="color: red;">except</span> of radial variance hash, this algorithms work on different size, but we need to keep the aspect ratio.<br />
<br />
<br />
<div style="text-align: center;">
<span style="color: blue;"><b><u>Contrast Attack</u></b></span></div>
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgEUz1EZD2x-R4eH5WUAOqFsKpNjzwNM0lZuxi4ViES78NP_r9tu-6qQamFGGYGCdEcnBuAoP5sr6wG7saUPfAkD-Bn_781AfRz0TLexVa7ofbyC9YAhRHRRQzYQE50zT9z4W4yIB3ll_c/s1600/Capture.JPG" style="margin-left: auto; margin-right: auto;"><img border="0" height="121" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgEUz1EZD2x-R4eH5WUAOqFsKpNjzwNM0lZuxi4ViES78NP_r9tu-6qQamFGGYGCdEcnBuAoP5sr6wG7saUPfAkD-Bn_781AfRz0TLexVa7ofbyC9YAhRHRRQzYQE50zT9z4W4yIB3ll_c/s400/Capture.JPG" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Contrast Attack</td></tr>
</tbody></table>
<br />
Every algorithms works quite well under different contrast, although Radical variance hash, BMH zero and BMH one do not works well under very low contrast.<br />
<br />
<div style="text-align: center;">
<span style="color: blue;"><b><u> Gaussian Noise Attack</u></b></span></div>
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiapaFOItYWID5iq3U900djYoEkBJWA9AlaqE3IL9GP9sn6D4qh0ddavJS6VeQfIBkxuZZE00jKf317W6vsl4wNbdr-bx2dR47i-z7SmaekNyJK2Kcye5VzXRrvWtw7Ua638_sUhzQjn0w/s1600/Capture.JPG" style="margin-left: auto; margin-right: auto;"><img border="0" height="121" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiapaFOItYWID5iq3U900djYoEkBJWA9AlaqE3IL9GP9sn6D4qh0ddavJS6VeQfIBkxuZZE00jKf317W6vsl4wNbdr-bx2dR47i-z7SmaekNyJK2Kcye5VzXRrvWtw7Ua638_sUhzQjn0w/s400/Capture.JPG" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Gaussian noise attack</td></tr>
</tbody></table>
Very fortunate, every algorithms survive under the attack of gaussian nose.<br />
<br />
<div style="text-align: center;">
<span style="color: blue;"><b><u>Salt And Pepper Noise Attack</u></b></span></div>
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgy1B8LnhLhJyVTK6WdFOhUrHF6N54kmI2c9nD2VAT8BamIgnLcgJ-d8tJ1kTyEJxrZSikArpj3CCnKXP4CYu1DmV_xFE10s3hury2jthBG5E5rkZsZJYEgOONcOa_ggTs9lnJWnfxJW5s/s1600/Capture.JPG" style="margin-left: auto; margin-right: auto;"><img border="0" height="115" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgy1B8LnhLhJyVTK6WdFOhUrHF6N54kmI2c9nD2VAT8BamIgnLcgJ-d8tJ1kTyEJxrZSikArpj3CCnKXP4CYu1DmV_xFE10s3hury2jthBG5E5rkZsZJYEgOONcOa_ggTs9lnJWnfxJW5s/s400/Capture.JPG" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Salt and pepper noise attack</td></tr>
</tbody></table>
As we can see, only Radical hash and BMH perform well under the attack of pepper and salt.<br />
<br />
<br />
<div style="text-align: center;">
<b><span style="color: blue;"><u>Rotation Attack</u></span></b></div>
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEigDWw3ec5pu2mWqtGgqCa9sO4lMDj-E5IQTp_h5BXl-sCSU139ZhIxkvbBCTaK494SWZzIYR1Q6QYziTtwhtLJhX2mL32o-f26UBQTffsMxGVjUx3NhLCca5xAGbc2AvjRe-C-rPlFGUg/s1600/Capture.JPG" style="margin-left: auto; margin-right: auto;"><img border="0" height="400" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEigDWw3ec5pu2mWqtGgqCa9sO4lMDj-E5IQTp_h5BXl-sCSU139ZhIxkvbBCTaK494SWZzIYR1Q6QYziTtwhtLJhX2mL32o-f26UBQTffsMxGVjUx3NhLCca5xAGbc2AvjRe-C-rPlFGUg/s400/Capture.JPG" width="368" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Rotation attack</td></tr>
</tbody></table>
Apparently, all of the algorithms can not survive under rotation attack. But is this really matter?I guess not(do you always need to search the image after rotation by google?). If you really need to deal with rotation attack, I suggest you give BOVW(bag of visual words) a try, I use it to construct <a href="http://qtandopencv.blogspot.my/2016/04/content-based-image-retrievalcbir-00.html"><span style="color: blue;">robust CBIR system</span></a> before, the defects of robust BOVW based CBIR are long computation time, consume a lot of memory and much harder to scale to large data set(you will need to build up distributed system in that case).<br />
<br />
We have go through all of the tests, now let us measure the performance of hash computation time and comparison time of different algorithms(my laptop is Y410P, os is windows 10 64bits, compiler is vc2015 64bits with update 2 install).<br />
<br />
You can find all the details of different attacks <a href="https://github.com/stereomatchingkiss/blogCodes2/blob/master/image_hash/hash_samples.cpp"><span style="color: blue;">at here(click me)</span></a>.<br />
<br />
<div style="text-align: center;">
<b><span style="color: blue;"><u>Computation Performance Test--img_hash vs PHash library</u></span></b></div>
<br />
I use different algorithms to compute the hash of 100 images from ukbench(ukbench03000.jpg~ukbench03099.jpg). The source codes of opencv comparison is located <a href="https://github.com/stereomatchingkiss/blogCodes2/blob/master/image_hash/hash_samples.cpp"><span style="color: blue;">at here</span></a>(check the function measure_computation_time and measure_comparison_time, I am using <span style="color: purple;">img_hash_1_0</span> when I am writing this post), source codes of PHash performance test(<span style="color: purple;">version 0.94</span> since I am on windows) is located <a href="https://github.com/stereomatchingkiss/blogCodes2/blob/master/image_hash/phash_performance.cpp"><span style="color: blue;">at here</span></a>.<br />
<br />
<br />
<div style="text-align: center;">
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgXxHLlFozJY3CXT9-uyDFLEJDQGdcqEQtqLPfPy19YF-6WBvloeYyJsQ5EFP_gElESo2MmGgDEzzFPB5L1UGOX9B5qW9MoydDAiMOEkG_Co4IRkzr-DfDghCCulnNCAWrQhoopWTOpe7g/s1600/Capture.JPG" style="margin-left: auto; margin-right: auto;"><img border="0" height="58" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgXxHLlFozJY3CXT9-uyDFLEJDQGdcqEQtqLPfPy19YF-6WBvloeYyJsQ5EFP_gElESo2MmGgDEzzFPB5L1UGOX9B5qW9MoydDAiMOEkG_Co4IRkzr-DfDghCCulnNCAWrQhoopWTOpe7g/s400/Capture.JPG" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Computation performance test</td></tr>
</tbody></table>
<br /></div>
<div style="text-align: center;">
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh8RZ3i9nuRU_RPPMgfLd7cKyRcKMqdtFFqRJUR-e1KAwM0Y2Fl0vtFRu8svOP9W2RXrfg0mq5suIj0mCCqbBi4EfaWEV3C0ZYVPH_ORBcO4A2y8BgPe4SJl-swqeTMo6XHoX-WTKayeN0/s1600/Capture.JPG" style="margin-left: auto; margin-right: auto;"><img border="0" height="52" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh8RZ3i9nuRU_RPPMgfLd7cKyRcKMqdtFFqRJUR-e1KAwM0Y2Fl0vtFRu8svOP9W2RXrfg0mq5suIj0mCCqbBi4EfaWEV3C0ZYVPH_ORBcO4A2y8BgPe4SJl-swqeTMo6XHoX-WTKayeN0/s400/Capture.JPG" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Comparison performance test</td></tr>
</tbody></table>
<br />
<br /></div>
In most cases, img_hash is faster than PHash, but the speed of BMH zero and BMH one are slower than PHash version almost <span style="color: red;">30% or 40%</span>. The <span style="color: red;">bottleneck is cv::resize</span>(over 95% of times spend on it), to speed things up, we need a faster resize function.<br />
<br />
<br />
<div style="text-align: center;">
<b><u><span style="color: blue;">Find similar image from ukbench</span></u></b></div>
<br />
The results looks good, but could it find similar images? Of course dude, let me show you how could we measure the hash values of our target from ukbench(for simplicity, I only pick 100 images from ukbench).<br />
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhYVsLbDm3NrWZ0TgYta9pjbVxq5rG2yGcAp8XMawrRzUTnqPjRcgWyAJ5q9grPpIiBdGBNkslwNrVp897SIgR4nnfb8ZWYTMqXgfW1xQ6WVkDVu7A47MyULoRlo7q1E4eIs13KQJjaft0/s1600/ukbench03037.jpg" style="margin-left: auto; margin-right: auto;"><img border="0" height="240" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhYVsLbDm3NrWZ0TgYta9pjbVxq5rG2yGcAp8XMawrRzUTnqPjRcgWyAJ5q9grPpIiBdGBNkslwNrVp897SIgR4nnfb8ZWYTMqXgfW1xQ6WVkDVu7A47MyULoRlo7q1E4eIs13KQJjaft0/s320/ukbench03037.jpg" width="320" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">target</td></tr>
</tbody></table>
<br />
<br />
<pre class="prettyprint">void find_target(cv::Ptr<cv::img_hash::ImgHashBase> algo, bool smaller)
{
using namespace cv::img_hash;
cv::Mat input = cv::imread("ukbench/ukbench03037.jpg");
//not a good way to reuse the codes by calling
//measure comparision time, please bear with me
std::vector<cv::Mat> targets = measure_comparison_time(algo, "");
double idealValue;
if(smaller)
{
idealValue = std::numeric_limits<double>::max();
}
else
{
idealValue = std::numeric_limits<double>::min();
}
size_t targetIndex = 0;
cv::Mat inputHash;
algo->compute(input, inputHash);
for(size_t i = 0; i != targets.size(); ++i)
{
double const value = algo->compare(inputHash, targets[i]);
if(smaller)
{
if(value < idealValue)
{
idealValue = value;
targetIndex = i;
}
}
else
{
if(value > idealValue)
{
idealValue = value;
targetIndex = i;
}
}
}
std::cout<<"mismatch value : "<<idealValue<<std::endl;
cv::Mat result = cv::imread("ukbench/ukbench0" +
std::to_string(targetIndex + 3000) +
".jpg");
cv::imshow("input", input);
cv::imshow("found img " + std::to_string(targetIndex + 3000), result);
cv::waitKey();
cv::destroyAllWindows();
}
void find_target()
{
using namespace cv::img_hash;
find_target(AverageHash::create());
find_target(PHash::create());
find_target(MarrHildrethHash::create());
find_target(RadialVarianceHash::create(), false);
find_target(BlockMeanHash::create(0));
find_target(BlockMeanHash::create(1));
}
</pre>
<br />
You will find out every algorithms <span style="color: orange;"><b>give you back the same image you are looking for</b></span>.<br />
<br />
<div style="text-align: center;">
<b><span style="color: blue;"><u>Conclusion</u></span></b></div>
<br />
Average hash and PHash are the fastest algorithms, but if you want a more robust one, pick BMH zero, BMH zero and BMH give similar resutls, but BMH one is slower since it need to spend more computation power. Hash comparision of Radial hash are much slower than other's, because it need to find out peak cross-correlation values from 40 combinations. If you want to know how to speed things up and know more about rotation invariant image hash algorithm, give <a href="http://qtandopencv.blogspot.my/2016/06/speed-up-image-hashing-of-opencvimghash.html"><span style="color: blue;">this link(click me)</span></a> a try.<br />
<br />
You can find the test cases <a href="https://github.com/stereomatchingkiss/blogCodes2/tree/master/image_hash"><span style="color: blue;">at here</span></a>. If you think this post helpful, please give my repositories(<a href="https://github.com/stereomatchingkiss/blogCodes2/tree/master/image_hash"><span style="color: blue;">blogCodes2</span></a> and my img_hash of <a href="https://github.com/stereomatchingkiss/opencv_contrib/tree/img_hash_rotation/modules/img_hash/src"><span style="color: blue;">opencv_contrib</span></a>) a <span style="color: orange;"><b>star </b></span>:). If you want to join the developments, please open a pull request, thanks.ThamBloghttp://www.blogger.com/profile/12617350461752184349noreply@blogger.com7tag:blogger.com,1999:blog-4702230343097536610.post-579623576635772372016-06-17T21:07:00.004-07:002016-06-17T21:07:43.379-07:00Remove annoying trailing white space by c++ If you ever try to commit something to opencv(I am porting/implementing various image hash algorithms to <a href="https://github.com/Itseez/opencv_contrib/pull/688">opencv_contrib</a> when I writing this post, you can find my branch at <a href="https://github.com/stereomatchingkiss/opencv_contrib/tree/img_hash/modules/img_hash">here</a>), you would likely to find out some extremely annoying messages as<br />
<br />
<pre><span class="stdout">modules/tracking/include/opencv2/tracking/tracker.hpp:857: trailing whitespace.
+
modules/tracking/include/opencv2/tracking/tracker.hpp:880: trailing whitespace.
+
modules/tracking/include/opencv2/tracking/tracker.hpp:890: trailing whitespace.
+
modules/tracking/include/opencv2/tracking/tracker.hpp:1433: trailing whitespace.
+ Params();
modules/tracking/include/opencv2/tracking/tracker.hpp:1434: trailing whitespace.
+
modules/tracking/include/opencv2/tracking/tracker.hpp:1444: trailing whitespace.</span></pre>
<pre><span class="stdout">
</span></pre>
<pre><span class="stdout">
</span></pre>
<pre><span style="font-family: "Times New Roman"; white-space: normal;">blablabla. They pop out in your files time to time, cost you more times to fix them, pollute your commit history, not only that, those trailing white spaces, they are <b><span style="color: red;">hard to spot by human eyes</span></b>.</span></pre>
<pre><span style="font-family: "Times New Roman"; white-space: normal;">Apparently, eliminate those trailing white space is not a job suit for humans, we would better leave those tedious tasks to our friends--computer.</span></pre>
<pre>
</pre>
<pre><span style="font-family: "Times New Roman"; white-space: normal;"> To teach our friend know what do I want to do, I write a small program to help us(source codes located <a href="https://github.com/stereomatchingkiss/blogCodes2/blob/master/kill_trailing_white_space/main.cpp">at here</a>), you should be able to compile and run it if you familiar with c++ and boost.</span></pre>
<pre><span style="font-family: "Times New Roman"; white-space: normal;">
</span></pre>
<pre><span style="font-family: "Times New Roman"; white-space: normal;"> Enough of talk, let me show you an example</span></pre>
<br /><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhqzW2sMhWdBEjOs1U84_e1N-ynaoo32F4XTA_zaJvW04yLTjKa5OKV5KkkEYFsIIW76K4uTzDRnhuIHdG2TZHbJaR8IJ5c-zDzCMIgKvBkQKVWjl56RnqaQB-L8iYTaPy_Ko7_VzE3M-E/s1600/Capture.JPG" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="241" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhqzW2sMhWdBEjOs1U84_e1N-ynaoo32F4XTA_zaJvW04yLTjKa5OKV5KkkEYFsIIW76K4uTzDRnhuIHdG2TZHbJaR8IJ5c-zDzCMIgKvBkQKVWjl56RnqaQB-L8iYTaPy_Ko7_VzE3M-E/s400/Capture.JPG" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Example 00<br /><br /><br /></td></tr>
</tbody></table>
<pre><span style="font-family: "Times New Roman"; white-space: normal;"> As you can see, Example 00 contains a lot of tabs and trailing white space, not only that, there are a tab we should not removed(tab of std::string("\t")), this is the time my small tool--<b><span style="color: blue;">kill_trailing_white_space</span></b> come in. All you need to do is specify you want to remove the tab and trailing white space of <b><span style="color: blue;">a file</span></b>, <span style="color: blue;"><b>or the files inside the folder(will scan the folders recursively)</b></span>. Example</span></pre>
<pre><span style="font-family: "Times New Roman"; white-space: normal;">
</span></pre>
<pre><span style="font-family: Times New Roman;"><span style="white-space: normal;">"kill_trailing_white_space --input_file main.cpp"</span></span></pre>
<pre><span style="font-family: Times New Roman;"><span style="white-space: normal;">"kill_trailing_white_space --input_folder img_hash"</span></span></pre>
<pre><span style="font-family: Times New Roman;"><span style="white-space: normal;">
</span></span></pre>
<pre><span style="font-family: Times New Roman;"><span style="white-space: normal;"> After the process, we could have a clean file as Example 01.</span></span></pre>
<pre><span style="font-family: Times New Roman;"><span style="white-space: normal;">
</span></span></pre>
<pre><span style="font-family: Times New Roman;"><span style="white-space: normal;">
</span></span></pre>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgaXNMr6eQnn4vBjFIYUzJnGZYLpoAizHJPoU6KSHShxtiTSMMCkhy3QfUj5SjyZBbhgX8AMwpdkdRQCGrc_O-C22XCh_rI0pA3UMK7kcgRwSI0ILJvgTiunlMKQfw4U5bYIyMfVknxr7A/s1600/Capture.JPG" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="326" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgaXNMr6eQnn4vBjFIYUzJnGZYLpoAizHJPoU6KSHShxtiTSMMCkhy3QfUj5SjyZBbhgX8AMwpdkdRQCGrc_O-C22XCh_rI0pA3UMK7kcgRwSI0ILJvgTiunlMKQfw4U5bYIyMfVknxr7A/s400/Capture.JPG" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Example 01</td></tr>
</tbody></table>
<pre>
</pre>
<pre><span style="font-family: Times New Roman;"><span style="white-space: normal;"> You can see the help menu if you enter --help.</span></span><span style="font-family: "Times New Roman"; white-space: normal;">By now this small tool only support the files with extension ".hpp" and ".cpp". Feel free to modify the codes to suit your needs.</span></pre>
ThamBloghttp://www.blogger.com/profile/12617350461752184349noreply@blogger.com0