对数几率回归(logistic regression),也常被叫做逻辑回归,用于分类任务当中。
优点:计算代价不高,易于理解和实现
缺点:容易欠拟合,分类精度可能不高
在使用线性模型进行二分类任务时,需要找一个单调可微函数将分类任务的真实标记y与线性回归模型的预测值联系起来,二分类任务中,输出非0即1,而线性模型则是连续的任意值,二者需要做映射,考虑亥维赛德阶跃函数(单位阶跃函数): f ( x ) = { 0 x < 0 0.5 x = 0 1 x > 0 f(x) = \begin{cases} 0 & x<0 \\ 0.5 & x=0 \\ 1 & x>0 \end{cases} f(x)=⎩⎪⎨⎪⎧00.51x<0x=0x>0 但该函数不连续,在后续的处理中不是很合适,所以采用另一种近似阶跃函数的单调可微函数,对数几率函数(Sigmoid 函数)。 y = 1 1 + e − z y = \frac{1}{1+e^{-z}} y=1+e−z1 将线性回归模型的结果z计算函数值g(z),g(z)>0在分类为1,否则为0。
假设z = Θ^Tx为线性回归模型 g ( z ) = 1 1 + e − θ T x ⟹ 1 − g ( z ) g ( z ) = 1 + e − θ T x g(z) = \frac{1}{1+e^{-\theta^{T}x}} \Longrightarrow \frac{1-g(z)}{g(z)} = 1+e^{-\theta^{T}x} g(z)=1+e−θTx1⟹g(z)1−g(z)=1+e−θTx 假设g(z)是将样本分类为1的概率,则: P { g ( z ) = 1 ∣ x ; θ } = 1 1 + e − θ T x = h θ ( x ) P { g ( z ) = 0 ∣ x ; θ } = e − θ T x 1 + e − θ T x = 1 − h θ ( x ) P\{g(z) = 1|x;\theta\} = \frac{1}{1+e^{-\theta^{T}x}} = h_\theta(x) \\ P\{g(z) = 0|x;\theta\} = \frac{e^{-\theta^{T}x}}{1+e^{-\theta^{T}x}} = 1-h_\theta(x) P{g(z)=1∣x;θ}=1+e−θTx1=hθ(x)P{g(z)=0∣x;θ}=1+e−θTxe−θTx=1−hθ(x) 构建损失函数: L o s s ( h θ ( x ) , y ) = h θ ( x ) y ( 1 − h θ ( x ) ) 1 − y Loss(h_\theta(x),y) = h_\theta(x)^y(1-h_\theta(x))^{1-y} Loss(hθ(x),y)=hθ(x)y(1−hθ(x))1−y 根据极大似然法,最小化代价函数: C o s t ( h θ ( x ) , y ) = ∏ i n h θ ( x i ) y i ( 1 − h θ ( x i ) ) ( 1 − y ) i Cost(h_\theta(x),y) = \prod_i^n h_\theta(x^{i})^{y^i}(1-h_\theta(x^i))^{(1-y)^i} Cost(hθ(x),y)=i∏nhθ(xi)yi(1−hθ(xi))(1−y)i 取对数: J ( θ ) = ∑ i n [ y i l o g ( h θ ( x i ) ) + ( 1 − y i ) l o g ( 1 − h θ ( x i ) ) ] J(\theta) = \sum_{i}^n[y^ilog(h_\theta(x^i))+(1-y^i)log(1-h_\theta(x^i))] J(θ)=i∑n[yilog(hθ(xi))+(1−yi)log(1−hθ(xi))] 梯度上升法求θ的极大似然估计值: θ j : = θ j + α ∂ J ( θ ) θ j \theta_j:= \theta_j + \alpha \frac{\partial J(\theta)}{\theta_j} θj:=θj+αθj∂J(θ) 链式求导: ∂ J ( θ ) θ j = ∂ J ( θ ) ∂ h θ ( x ) ∂ h θ ( x ) ∂ θ T x ∂ θ T x ∂ θ j ∂ J ( θ ) ∂ h θ ( x ) = y ∗ 1 h θ ( x ) + ( y − 1 ) 1 1 − h θ ( x ) ∂ h θ ( x ) ∂ θ T x = e − θ T x ( 1 + e − θ T x ) 2 = h θ ( x ) ( 1 − h θ ( x ) ) ∂ θ T x ∂ θ j = x j \frac{\partial J(\theta)}{\theta_j} = \frac{\partial J(\theta)}{\partial h_\theta(x)} \frac{\partial h_\theta(x)}{\partial \theta^Tx} \frac{\partial \theta^Tx}{\partial \theta_j} \\ \frac{\partial J(\theta)}{\partial h_\theta(x)} = y * \frac{1}{h_\theta(x)} + (y-1)\frac{1}{1 - h_\theta(x)} \\ \frac{\partial h_\theta(x)}{\partial \theta^Tx} = \frac{e^{-\theta^Tx}}{(1+e^{-\theta^Tx})^2} = h_\theta(x)(1- h_\theta(x)) \\ \frac{\partial \theta^Tx}{\partial \theta_j} = x_j θj∂J(θ)=∂hθ(x)∂J(θ)∂θTx∂hθ(x)∂θj∂θTx∂hθ(x)∂J(θ)=y∗hθ(x)1+(y−1)1−hθ(x)1∂θTx∂hθ(x)=(1+e−θTx)2e−θTx=hθ(x)(1−hθ(x))∂θj∂θTx=xj 可推出: ∂ J ( θ ) θ j = ( y − h θ ( x ) ) x j θ j : = θ j + α ∑ i m ( y i − h θ ( x i ) ) x j i \frac{\partial J(\theta)}{\theta_j} = (y-h_\theta(x))x_j \\ \theta_j := \theta_j + \alpha\sum_i^m(y^{i} - h_\theta(x^i))x_j^i θj∂J(θ)=(y−hθ(x))xjθj:=θj+αi∑m(yi−hθ(xi))xji 由该公式可求出最佳回归参数。
这里仍然采用《机器学习实战》内的例子,实战中主要将求参数转为矩阵操作,大大简化计算:
# -*- coding:utf-8 -*- """ @author liuning @date 2019/10/26 16:13 @File logRegres.py @Desciption """ from numpy import mat, shape, ones, exp, array, arange def loadDataSet(): dataMat = []; labelMat = [] with open('testSet.txt','r') as f: for line in f.readlines(): lineArr = line.strip().split() dataMat.append([1.0,float(lineArr[0]),float(lineArr[1])]) labelMat.append(int(lineArr[2])) return dataMat,labelMat def sigmoid(inX): """ 回归方程 :param inX: :return: 1/(1+exp(-inX)) S函数 """"" return 1/(1+exp(-inX)) def gradAscent(dataMat,labelMat): """ 梯度上升 """ dataMat = mat(dataMat) labelMat = mat(labelMat).transpose() n,m = shape(dataMat) alpha = 0.001 # 步长 generations = 500 weights = ones((m,1)) # 结合数学推导,得到weights的更新方法 for k in range(generations): h = sigmoid(dataMat*weights) error = labelMat - h weights = weights + alpha * dataMat.transpose()*error # 极大似然估计法 return weights def plotBestFit(wei): """ 绘图 """ import matplotlib.pyplot as plt weights = wei.getA() dataMat,labelMat = loadDataSet() dataArr = array(dataMat) n = shape(dataArr)[0] xcord1 = []; ycord1 = [] xcord2 = []; ycord2 = [] for i in range(n): if int(labelMat[i]) == 1: xcord1.append(dataArr[i,1]); ycord1.append(dataArr[i,2]) else: xcord2.append(dataArr[i,1]); ycord2.append(dataArr[i,2]) fig = plt.figure() ax = fig.add_subplot(111) ax.scatter(xcord1,ycord1,s=30,c='red',marker='s') ax.scatter(xcord2,ycord2,s=30,c='green') x = arange(-3.0,3.0,0.1) y = (-weights[0]-weights[1]*x)/weights[2] ax.plot(x,y) plt.xlabel('X1');plt.ylabel('X2') plt.show() if __name__ == '__main__': inMat,labelMat = loadDataSet() weights = gradAscent(inMat,labelMat) print(weights) plotBestFit(weights)当数据量很大时,直接全部进行计算时,计算复杂度太高。随机梯度上升方法一次只用一个样本数据更新参数值,由于可以对参数进行增量式更新,所以是一门“在线学习”算法。
其他类似算法还有小批量梯度上升(Mini-batch Gradient Ascent),该方法每次使用batch_size大小的数据集对参数进行更新,是一个折中办法,常用于深度学习中。
def stocGradAscent0(dataMatrix, classLabel,numIter=150): """ 随机梯度上升 :param dataMatrix: :param classLabel: :param numIter: :return: weights nd.array """ dataMatrix = array(dataMatrix) m,n = shape(dataMatrix) weights = ones(n) for j in range(numIter): dataIndex = list(range(m)) for i in range(m): alpha = 4/(1.0+j+i) + 0.01 randIndex = int(random.uniform(0, len(dataIndex))) #随机选取一个样本 h = sigmoid(sum(dataMatrix[randIndex] * weights)) error = classLabel[randIndex] - h weights = weights + alpha * error * dataMatrix[randIndex] # 更新参数 del(dataIndex[randIndex]) return weights其中,alpha会随迭代次数进行修改,alpha会逐渐变小,表示该组数据对参数值的调整影响效果越来越小,但因为由常数值0.01,所以永远不会为0。常数值要根据实际场景进行调整,通常,如果处理的问题是动态变化的,则可以适当加大常数项,确保新的值获得更大的回归系数。
数据样本随机选取可以减少周期性的波动。
案例取自《机器学习实战》第五章
https://archive.ics.uci.edu/ml/datasets.php
Horse-Colic 数据集
训练数据填充缺失值,测试数据集去掉缺失条目
可视化并观察数据
def loadData(): """ 加载训练数据 :return: """ dataMat = []; labelMat = [] with open("horseColicTraining.txt",'r') as f: for line in f.readlines(): lineArr = line.strip().split() lineArr = [float(x) for x in lineArr] dataMat.append([1.0]+lineArr[0:21]) labelMat.append(lineArr[21]) return dataMat,labelMat def loadTestData(): """ 加载测试数据 :return: """ dataMat = []; labelMat = [] with open("horseColicTest.txt",'r') as f: for line in f.readlines(): lineArr = line.strip().split() lineArr = [float(x) for x in lineArr] dataMat.append([1.0] + lineArr[0:21]) labelMat.append(lineArr[21]) return dataMat,labelMat使用随机梯度下降算法,求最佳回归参数。
def signoid(x): return 1/(1+exp(-x)) def stocGradAscent(inMat,labels,numIter=150): """ 随机梯度上升 :param inMat: :param labels: :param numIter: :return: """ inMat = array(inMat) m,n = shape(inMat) #print(m) weights = ones(n) for i in range(numIter): dataIndex = list(range(m)) for j in range(m): alpha = 4.0/(1.0+i+j) + 0.01 randIndex = int(random.uniform(1,len(dataIndex))) h = signoid(sum(inMat[randIndex]*weights)) error = labels[randIndex] - h weights = weights + alpha * error * inMat[randIndex] return weights补一个Andrew课程作业答案:
# -*- coding:utf-8 -*- """ """ import numpy as np import matplotlib.pyplot as plt import h5py from andrew.logistic.lr_utils import load_dataset train_set_x_orig, train_set_y, test_set_x_orig, test_set_y, classes = load_dataset() img_width, img_height = train_set_x_orig.shape[1],train_set_x_orig.shape[2] print("图片像素:"+ str(img_width)+"*"+str(img_height)+"*3") num_pics_train = train_set_y.shape[1] num_pics_test = test_set_y.shape[1] print("训练图片:"+ str(num_pics_train)) print("测试图片:"+str(num_pics_test)) train_set_x_flatten = train_set_x_orig.reshape(train_set_x_orig.shape[0],-1).T # -1表示reshape自动将剩余维度合并 test_set_x_flatten = test_set_x_orig.reshape(test_set_x_orig.shape[0],-1).T print("训练集维度:" + str(train_set_x_flatten.shape)) print("测试集维度:" + str(test_set_x_flatten.shape)) # 标准化数据 train_set_x = train_set_x_flatten / 255 test_set_x = test_set_x_flatten / 255 print("训练结果集维度:" + str(train_set_y.shape)) def sigmoid(x): return 1/(1+np.exp(-x)) def gradient(learning_rate=0.5,iteration=1000): w = np.zeros(shape=(train_set_x.shape[0],1)) b = 0 costs = [] epsilon = 1e-5 for i in range(iteration): A = sigmoid(np.dot(w.T, train_set_x) + b) if(i%100 == 0): costs.append(-(1/train_set_x.shape[1])*np.sum(np.dot(train_set_y,np.log(A+epsilon).T) + np.dot((1-train_set_y),np.log(1-A+epsilon).T))) dw = np.dot(train_set_x, (A - train_set_y).T) db = (1/train_set_x.shape[1])*np.sum(train_set_y - A) w = w - learning_rate * dw b = b - learning_rate * db return w,b,costs def predict(w,b,X): m = X.shape[1] Y_prediction = np.zeros((1,m)) # w = w.reshape(X.shape[0],1) A = sigmoid(np.dot(w.T,X)+b) for i in range(A.shape[1]): Y_prediction[0, i] = 1 if A[0, i] > 0.5 else 0 return Y_prediction w,b,costs = gradient(learning_rate=0.1,iteration=2000) Y_predict_test = predict(w,b,test_set_x) Y_predict_train = predict(w,b,train_set_x) print(Y_predict_test) print("训练集准确性:", format(100 - np.mean(np.abs(Y_predict_train - train_set_y)) * 100), "%") print("测试集准确性:", format(100 - np.mean(np.abs(Y_predict_test - test_set_y)) * 100), "%") print(costs) costs = np.squeeze(costs) plt.plot(costs) plt.ylabel('cost') plt.xlabel('iterations (per hundreds)') plt.show()参考链接:
《机器学习实战》第五章 《机器学习》3.3 机器学习实战教程(六) 梯度上升算法 具有神经网络思维的Logistic回归