The additional gain in performance obtained by adding dropout in the convolutional
layers (3:02% to 2:55%) is worth noting. One may have presumed that since the convolutional layers don’t have a lot of parameters, overfitting is not a problem and therefore
dropout would not have much effect. However, dropout in the lower layers still helps because it provides noisy inputs for the higher fully connected layers which prevents them
from overfitting.