转载:
批梯度下降和随机梯度下降存在着一定的差异,主要是在theta的更新上,批量梯度下降使用的是将所有的样本都一批次的引入到theta的计算中,而随机梯度下降在更新theta时只是随机选择所有样本中的一个,然后对theta求导,所以随机梯度下降具有较快的速度,但是可能陷入局部最优解
以下是代码实现:
# coding:utf-8import numpy as npimport randomdef BGD(x, y, theta, alpha, m, max_iteration): """ 批量梯度下降法:batch_Gradient_Descent :param x:train_data :param y:train_label :param theta:初始化权重 :param alpha:学习速率 :param m: :param max_iteration:迭代次数 :return: """ x_train = x.transpose() for i in range(0, max_iteration): hypothesis = np.dot(x, theta) # 神经网络输出 loss = hypothesis - y # 损失函数 # 下降梯度 gradient = np.dot(x_train, loss) / m # 求导之后得到theta theta = theta - alpha * gradient return thetadef SGD(x, y, theta, alpha, m, max_iteration): """ 随机梯度下降法:stochastic_Gradient_Descent :param x:train_data :param y:train_label :param theta:初始化权重 :param alpha:学习速率 :param m: :param max_iteration:迭代次数 :return: """ data = list(range(4)) for i in range(0, max_iteration): hypothesis = np.dot(x, theta) loss = hypothesis - y # 损失函数 index = random.sample(data, 1)[0] # 从data列表中随机选取一个数 # 下降梯度 gradient = loss[index] * x[index] # 求导之后得到theta theta = theta - alpha * gradient return theta # 以下写法也可以,这种运算量较少 # data = list(range(4)) # # for i in range(0, max_iteration): # # index = random.sample(data, 1)[0] # 从data列表中随机选取一个数 # # hypothesis = np.dot(x[index], theta) # 计算神经网络的输出(无激活函数) # loss = hypothesis - y[index] # 损失函数 # # 下降梯度 # gradient = loss * x[index] # # # 求导之后得到theta # theta = theta - alpha * gradient # return thetadef main(): train_data = np.array([[1, 4, 2], [2, 5, 3], [5, 1, 6], [4, 2, 8]]) train_label = np.array([19, 26, 19, 20]) m, n = np.shape(train_data) # 读取矩阵的长度,shape[0]就是读取矩阵第一维度的长度 # 初始化权重都为1 theta = np.ones(n) # ones()函数用以创建指定形状和类型的数组,默认情况下返回的类型是float64 max_iteration = 500 # 迭代次数 alpha = 0.01 # 学习速率 # -------------------------------------------------------------------------------- theta1 = BGD(train_data, train_label, theta, alpha, m, max_iteration) print(theta1) theta2 = SGD(train_data, train_label, theta, alpha, m, max_iteration) print(theta2)if __name__ == "__main__": main()
输出结果: