Original Source Here
Take away from Deep Residual learning for Image Recognition
- First problem is vanishing/exploding gradients. That is with deeper network, it is not change weights efficiently, which makes it hard to convergence. This is addressed by normalized initialization, L1,L2 regularization.
- Although (1) is solved, degration problem shows up. With the network depth increases, accuracy gets saturated. Deeper network leads to higher training error compared to shorter network, and it is not overfitting.
- This paper is proposed to address degradation(2) issue. The picture below shows this concept. Normally, in a layer we have input from previous layer. Then we have an internal function and a activation function for this layer output. What residual block does is that it adds additional identical input to its output. E.g. Relu(W1X1) → Relu(W1X1+X1). In this way, it significantlly improves the degration issue and gains more accuracy. (See pic 1)
- The result shows greater improvement compared to “plain” deep networks.
- They proposed 50,101,102 ResNet layers in the paper and beat the current benchmark such as FGG, GoogLeNet.
- They tested with ImageNet, CIFAR-10 datasets and both received the same results in terms of classification, detection and localization.
- They left a open problem that 1202-layer ResNet has higher training accuracy, which they infer it is due to overfitting.
Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot