Resent

What makes a residual network difference is the identity connection between the layers. The identity connection as the curved arrow originating from the input and sinking to the end of the residual block. The residual block is desigend to handle gradient vanishing issues.

Resnet

This figure from: cv-tricks.com

Why the Residual Function will work?

The creators of ResNet again thought as the statement goes:

“ RATHER THAN EXPECTING STACKED LAYERS TO LEARN TO APPROXIMATE H(x), THE AUTHORS ARE LETTING THE LAYERS TO APPROXIMATE A RESIDUAL FUNCTION i.e. F(x) = H(x) – x.”

The above statement is explaining that during training the deep residual network, the main focus is to learn the residual function i.e. F(x). So, if the network will somehow learn the difference (F(x)) between the input and output, then the overall accuracy can be increased. In other words, the residual value should be learned in a way such that it approaches zero.

Why identity mapping will work? How does the identity connection affect the performance of the network?

From the following figure, we can see the gradient has two pathways: pathway-1 is the identity mapping way and pathway-2 is the residual mapping way. The gradient through path2 is affected by the weight layers, while the gradient from the output can be directly pass to the input, hence, there won’t be any change in the value of computed gradients

resnet_backprop

Resnet-v2

resnet-V2

Key Features of ResNet:

ResNet uses Batch Normalization at its core. The Batch Normalization adjusts the input layer to increase the performance of the network. The problem of covariate shift is mitigated. ResNet makes use of the Identity Connection, which helps to protect the network from vanishing gradient problem. Deep Residual Network uses bottleneck residual block design to increase the performance of the network.