动手深度学习note-1(权重衰退)

权重衰退——Weight-Declay

  • 一种正则化的技术,用来缓解过拟合问题
weight-declay

公式推导

对于一般的损失函数 \[ L(\mathbf{w}, b) = \frac{1}{n}\sum_{i=1}^n \frac{1}{2}\left(\mathbf{w}^\top \mathbf{x}^{(i)} + b - y^{(i)}\right)^2.\tag{1} \]

使用均方范数作为硬性限制

\[ 执行min\ \ L(w,b)并限制||\mathbf{w}||^2\le\theta \]

  • \(\theta\)值取得很小时,意味着更高的正则化

使用权重衰退进行柔性限制

\[ min \ \ L(\mathbf{w},\mathbf{b})+\frac{\lambda}{2}||\mathbf{w}||^2\tag{2} \]

计算梯度: \[ \frac{\partial}{\partial \mathbf{w}}(L(\mathbf{w},b)+\frac{\lambda}{2}||\mathbf{w}||^2)=\frac{\partial L(\mathbf{w},b)}{\partial \mathbf{w}}+\lambda \mathbf{w} \tag{3} \] 执行\(t+1\)时刻参数更新: \[ \mathbf{w}_{t+1}=\mathbf{w}_{t}-\eta\frac{\partial}{\partial \mathbf{w_t}}\tag{4} \]\((3)\)代入\((4)\)\[ \mathbf{w}_{t+1}=(1-\eta \lambda)\mathbf{w}_{t}-\eta\frac{\partial}{\partial \mathbf{w}_{t}}L(\mathbf{w_t},b)\tag{I} \] 对比将\((1)\)求梯度并执行梯度下降: \[ \mathbf{w}_{t+1}=\mathbf{w}_{t}-\eta\frac{\partial}{\partial \mathbf{w}_{t}}L(\mathbf{w_t},b)\tag{II} \]\(\eta \lambda<1\),故这种梯度下降的方法中随着参数的更新,参数也在减小

代码实现

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
import tensorflow as tf
def generate_data(num_inputs, dimension, true_w, true_b, stddev=0.01):
true_w = tf.zeros((dimension, 1)) + true_w
X = tf.zeros((num_inputs, dimension))
X += tf.random.normal(shape = X.shape)
y = tf.matmul(X, true_w) + true_b
y += tf.random.normal(shape = y.shape, stddev=stddev)
return X, y

def load_data(n_train, n_test, is_train, batch_size, true_w, true_b,dimension=10):
train_set = tf.data.Dataset.from_tensor_slices(generate_data(n_train, dimension, true_w, true_b))
test_set = tf.data.Dataset.from_tensor_slices(generate_data(n_test, dimension, true_w, true_b))
if is_train:
train_set = train_set.shuffle(buffer_size=1000)
test_set = test_set.shuffle(buffer_size=1000)
train_set = train_set.batch(batch_size)
test_set = test_set.batch(batch_size)
return train_set, test_set

def train(regularizers, n_epochs=50):
model = tf.keras.Sequential([
tf.keras.layers.Dense(units=1, kernel_regularizer=tf.keras.regularizers.l2(regularizers))
])
model.compile(optimizer='sgd', loss='mse', metrics=['mae'])
model.fit(train_set, epochs=n_epochs)
return model

def test(model):
loss_train = model.evaluate(train_set)
loss_test = model.evaluate(test_set)
return loss_train, loss_test
1
2
3
4
train_set, test_set = load_data(n_train=20, n_test=200, is_train=False, batch_size=5, true_w=1, true_b=0.5)
model = train(regularizers=0)
loss_train, loss_test = test(model)
loss_train, loss_test

调整合适参数验证


动手深度学习note-1(权重衰退)
https://blog.potential.icu/2024/01/25/动手深度学习note-1(权重衰退)/
Author
Xt-Zhu
Posted on
January 25, 2024
Licensed under