Experience in training CNN

We usually see two problems when training a CNN. The first one is that the model performs bad on the training data (optimization problem). The second one is that the model performs good on the training data but bad on the validation data (overfitting).

For overfitting, one possible reason is your training data is very different with your validation data. This can be caused by many reasons. You may collected the two datasets on different time. You may used different data collectiong strategies. To solve this kind of problem, you can try to visualize the data and get an intuitive feeling of the data. For the simplest case, you may found that the validation dataset contains more images collected at night. Then you can add this kind of data to your training set or try some data augmentation techniques to generate similar images from your training set to solve this problem. How to solve the dataset inconsistency problem highly depends on your dataset. Here is an example from a kaggle winner.

After solving the dataset inconsistency problem, what we need to do is to decrease the loss on the training dataset.

Sometimes, you will see that some activation layers are dead (the output is always zero). You can first check if you have added batch normalization before the activation layer. I you already have the batch normalizaiton layer, you can check if eps is set to be too big. If the problem is caused by ReLU, you can try to midify the initializaiton of bias and beta in BN (e.g. set it to be +1). If the first activation layer is outputting zero, then you might want to check your input data’s preprocessing and parameters intialization.

After building a good dataset and reducing loss on training dataset, we should care more about the model’s performance on validation dataset. We should keep increasing the capability of out model until we see overfitting. If the model cannot perfectly overfit the training dataset, there might be some false labeled data in our dataset.