The first tool I give it a try is dlib, although this library lack a lot of features compare with another deep learning toolbox, I still like it very much, especially the fact that dlib can work as zero dependency library.
What I have learned
1 : Remember to record down the parameters you used
At first I write down the records in my header file, this make my codes become harder to read as times go on, I should save those records in excel like format from the beginning. The other thing I learn is, I should record the parameters even I am running out of times, I find out without the records, I become more panic when the dead line was closer.
2 : Feed pseudo labels into the mini-batch with naive way do not work
I should split the data of mini-batch with some sort of ratio, like 2/3 truth labels, 1/3 pseudo labels.
3 : Leverage pretrained model is much easier to get good results
I do not use pre-trained model but train the network from scratch, this do not give me great results, especially when I cannot afford to train the image with bigger size since my gpu only got 2GB of rams, my score was 0.27468 with brand new model. To speed things up, I treat resnet34 of dlib as feature extractor, save the features extracted by resnet34 and train on new network, this push my score to 0.09627, a big improve.
To improve my score, I split the data set into 5 cross data set and ensemble the results by average them, this push my score to 0.06266. I do not apply stacking because I learn this technique after the competition finished. Maybe I can get better results if I know this technique earlier.
5 : How to use dlib, keras and mxnet
I put the codes of dlib and mxnet on github, I removed all of the non-work solutions, that is why you do not see any codes related to keras. keras did not help me improve my score but mxnet did, what I have done with mxnet was finetune all of the resnet pretrained models and ensemble them with the results trained by dlib. This improve my score to 0.05051.
6 : Read the post at forums, it may give you useful info
I learned that the data set got some "errors" in it, I removed those false images from the training data set.
7 : Fast ai course is awesome, I should view them earlier
If I have watched the videos before I take this competition, I believe I could perform better in this competition. The forum of this course is very helpful too, it is royal free and open.
8 : X-Crop validation may help you improve your score
I found this technique from PyImageSearch and dlib, but I do not have enough of times to try this technique out.
9 : Save settings in JSON format
Rather than hard code the parameters, save them in JSON file is better, because
a : Do not need to change the source codes frequently, this save compile time
b : Every models, experiments can have their own settings record, easier to reproduce
training result