动手深度学习note-5(ResNet)

Deep Residual Learning for Image Recognition

模型原理

公式推导

CNN出发

  • 假设计算过程简化为$y=f(x)={} {} {} + u $​,一次权重参数更新的过程为:

    \[ 求解梯度:\frac{\partial{y}}{\partial{w}} \]

    \[ w=w-\alpha \frac{\partial{y}}{\partial{w}}\tag{1} \]

  • 参数传递到下一层神经网络计算梯度时: \[ y'=g(f(x)) \]

    \[ 求解梯度: \frac{\partial{y'}}{\partial{w}}=\frac{\partial{y'}}{\partial{y}}\frac{\partial{y}}{\partial{w}}\tag{2} \]

观察式子从\((1)\)\((2)\),传统的CNN的梯度由于是相乘的关系,因而梯度会越来越小,直至梯度消失,模型无法继续执行参数更新,因而无法训练较深的神经网络

ResNet的解决方法

  • 同样以模型参数传递到第二层网络为例: \[ y'=g(f(x))+f(x) \]

    \[ 求解梯度:\frac{\partial{y'}}{\partial{w}}=\frac{\partial{y'}}{\partial{y}}\frac{\partial{y}}{\partial{w}}+f'(x)\tag{2'} \]

观察式子\((2')\),排除从一开始梯度就不存在的极端情况,即使在训练中某一层出现了梯度消失的情况,即\(\frac{\partial{y'}}{\partial{y}}\frac{\partial{y}}{\partial{w}}=0\),梯度中也包含着上一层中的\(f'(x)\),而一层层往上溯源,\(f'(x) \not = 0\),也就是说,使用ResNet不会出现梯度消失的情况,这样有效解决模型深度问题。

代码实现

下载本次实验的数据集,将数据集储存到Google Drive中的/Dataset/leaves.zip

在colab中查看

实现残差连接

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
class Residual(tf.keras.Model):
def __init__(self, num_channels, use_Residual=False, strides=1):
super().__init__()
self.conv1 = tf.keras.layers.Conv2D(
num_channels, kernel_size=1, strides=strides)
self.conv2 = tf.keras.layers.Conv2D(
num_channels, kernel_size=3, padding='same', strides=1)
self.conv3 = tf.keras.layers.Conv2D(
num_channels*4, kernel_size=1, strides=1)
self.conv4 = None
if use_Residual:
self.conv4 = tf.keras.layers.Conv2D(
num_channels*4, kernel_size=1, strides=strides)
self.bn1 = tf.keras.layers.BatchNormalization()
self.bn2 = tf.keras.layers.BatchNormalization()

def call(self, X):
Y = tf.keras.activations.relu(self.bn1(self.conv1(X)))
Y = tf.keras.activations.relu(self.bn2(self.conv2(Y)))
Y = self.conv3(Y)
if self.conv4 is not None:
X = self.conv4(X)
Y += X
return tf.keras.activations.relu(Y)

实现残差块

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
class ResnetBlock(tf.keras.layers.Layer):
def __init__(self, num_channels, num_residuals,
first_block=False, **kwargs):
super(ResnetBlock, self).__init__(**kwargs)
self.residual_layers = []
for i in range(num_residuals):
if i ==0:
if not first_block:
self.residual_layers.append(Residual(num_channels, use_Residual=True, strides=2))
else:
self.residual_layers.append((Residual(num_channels, use_Residual=True)))
else:
self.residual_layers.append(Residual(num_channels))
def call(self, X):
for layer in self.residual_layers.layers:
X = layer(X)
return X

构建ResNet50

1
2
3
4
5
6
7
8
9
10
11
12
13
14
def net():
return tf.keras.Sequential([
tf.keras.layers.Resizing(224, 224),
tf.keras.layers.Rescaling(1./255),
tf.keras.layers.Conv2D(64, kernel_size=7, strides=2, padding='same'),
tf.keras.layers.BatchNormalization(),
tf.keras.layers.Activation('relu'),
tf.keras.layers.MaxPool2D(pool_size=3, strides=2, padding='same'),
ResnetBlock(64, 3, first_block=True),
ResnetBlock(128, 4),
ResnetBlock(256, 6),
ResnetBlock(512, 3),
tf.keras.layers.GlobalAvgPool2D(),
tf.keras.layers.Dense(units=176)])

resnet-layer

resnet


动手深度学习note-5(ResNet)
https://blog.potential.icu/2024/02/26/2024-2-26-动手深度学习note-5(ResNet)/
Author
Xt-Zhu
Posted on
February 26, 2024
Licensed under