【Pytorch】6. Pytorch搭建逻辑归回模型

mac2024-08-17  82

一、理论介绍

Logisric回归不仅可以解决二分类问题,也可以解决多分类问题。其目的是找到一个区分度足够好的决策边界,将每种类别进行划分。 假设输入的数据特征向量 x ∈ R n \displaystyle x\in R^{n} xRn,那么希望找到一条决策边界 ∑ i = 1 n w i x i + b = 0 \displaystyle \sum ^{n}_{i=1} w_{i} x_{i} +b=0 i=1nwixi+b=0,使得当 ∑ i = 1 n w i x i + b ∈ ( 0 , + ∞ ) \displaystyle \sum ^{n}_{i=1} w_{i} x_{i} +b\in ( 0,+\infty ) i=1nwixi+b(0,+)时,为正样本; ∑ i = 1 n w i x i + b ∈ ( − ∞ , 0 ) \displaystyle \sum ^{n}_{i=1} w_{i} x_{i} +b\in ( -\infty,0) i=1nwixi+b(,0)时为负样本。

1、决策边界

h ( x ) = w x + b = 0 \displaystyle h(x) = wx+b=0 h(x)=wx+b=0 ( w , x w,x w,x是向量)

2、sigmoid函数

sigmoid函数将正、负样本判定范围从 ( − ∞ , + ∞ ) \displaystyle ( -\infty ,+\infty ) (,+)缩放到(0,1)。 h ( z )   =   1 1 + e − z \displaystyle h( z) \ =\ \frac{1}{1+e^{-z}} h(z) = 1+ez1

h ( x )   =   1 1 + e − w x − b \displaystyle h( x) \ =\ \frac{1}{1+e^{-wx-b}} h(x) = 1+ewxb1

3、损失函数

对于正样本, g ( x ) g(x) g(x)越大, h ( x ) h( x) h(x)越趋向于1,正样本的概率越高;即 l o g ( h ( x ) ) log(h( x)) log(h(x))越大。对于负样本, g ( x ) g(x) g(x)越小, h ( x ) h( x) h(x)越趋向于0, 1 − h ( x ) 1-h( x) 1h(x)越大, l o g ( 1 − h ( x ) ) log(1-h( x)) log(1h(x))越大。又因为 y y y只有0和1,两个取值。所以优化的损失函数可以整理为如下式子: L ( w ) = 1 m ∑ i = 1 m [ − y ( i )   l o g   ( h w   ( x ( i ) ) ) − ( 1 − y ( i ) )   l o g   ( 1 − h w ( x ( i ) ) ) ] L(w) = \frac{1}{m}\sum_{i=1}^{m}\big[-y^{(i)}\, log\,( h_w\,(x^{(i)}))-(1-y^{(i)})\,log\,(1-h_w(x^{(i)}))\big] L(w)=m1i=1m[y(i)log(hw(x(i)))(1y(i))log(1hw(x(i)))]

4、逻辑回归损失函数的数学解释:

(1)通过找到分类概率P(Y = 1) 与输入变量Z 的直接关系,然后通过比较概率值来判断类别。简单来说就是通过计算下面两个概率分布: P ( Y = 0 ∣ x )   =   1 1 + e w x + b \displaystyle P( Y=0\mid x) \ =\ \frac{1}{1+e^{wx+b}} P(Y=0x) = 1+ewx+b1

P ( Y = 1 ∣ x )   =   e w x + b 1 + e w x + b \displaystyle P( Y=1\mid x) \ =\ \frac{e^{wx+b}}{1+e^{wx+b}} P(Y=1x) = 1+ewx+bewx+b (2)一个事件发生的几率(odds)是指该事件发生的概率与不发生的概率的比值,比如一个事件发生的概率是 p p p。那么该事件发生的几率是 p 1 − p \displaystyle \frac{p}{1-p} 1pp,该事件的对数几率或logit函数是: l o g i t ( p ) = l o g p 1 − p \displaystyle logit( p) =log\frac{p}{1-p} logit(p)=log1pp (3)样本为正样本,即Y=1的对数几率为: l o g P ( Y = 1 ∣ x ) 1 − P ( Y = 1 ∣ x ) = w x + b \displaystyle log\frac{P( Y=1|x)}{1-P( Y=1|x)} =wx+b log1P(Y=1x)P(Y=1x)=wx+b (4)计算最大似然估计。 对于给定的训练集 T   =   { ( x 1 , y 1 ) , ( x 2 , y 2 ) , . . . ( x n , y n ) } \displaystyle T\ =\ \{( x_{1} ,y_{1}) ,( x_{2} ,y_{2}) ,...( x_{n} ,y_{n}) \} T = {(x1,y1),(x2,y2),...(xn,yn)},其中 x i ∈ R n \displaystyle x_{i}\in R^{n} xiRn y i ∈ 0 , 1 \displaystyle y_{i}\in {0,1} yi0,1,假设 P ( Y = 0 ∣ x ) = g ( x ) \displaystyle P( Y=0\mid x)=g(x) P(Y=0x)=g(x),那么 P ( Y = 0 ∣ x ) = 1 − h ( x ) \displaystyle P( Y=0\mid x)=1-h(x) P(Y=0x)=1h(x),所以似然函数如下式: ∏ i = 1 n [ h ( x i ) ] y i [ 1 − h ( x i ) ] 1 − y i \displaystyle \prod ^{n}_{i=1}[ h( x_{i})]^{y_{i}}[ 1-h( x_{i})]^{1-y_{i}} i=1n[h(xi)]yi[1h(xi)]1yi

(5)对似然函数取对数:

L ( w ) = 1 m ∑ i = 1 m [ − y ( i )   l o g   ( h w   ( x ( i ) ) ) − ( 1 − y ( i ) )   l o g   ( 1 − h w ( x ( i ) ) ) ] L(w) = \frac{1}{m}\sum_{i=1}^{m}\big[-y^{(i)}\, log\,( h_w\,(x^{(i)}))-(1-y^{(i)})\,log\,(1-h_w(x^{(i)}))\big] L(w)=m1i=1m[y(i)log(hw(x(i)))(1y(i))log(1hw(x(i)))] = ∑ i = 1 m [ y ( i )   l o g h w ( x ( i ) ) 1 − h w ( x ( i ) )   +   l o g ( 1 − h w ( x ( i ) ) ) \displaystyle \sum ^{m}_{i=1}[ y^{( i)} \ log\frac{h_{w }\left( x^{( i)}\right)}{1-h_{w }\left( x^{( i)}\right)} \ +\ log\left( 1-h_{w }\left( x^{( i)}\right)\right) i=1m[y(i) log1hw(x(i))hw(x(i)) + log(1hw(x(i)))= ∑ i = 1 m [ y ( i ) ( w ⋅ x i + b )   −   l o g ( 1 + e w ⋅ x i + b ) ] \displaystyle \sum ^{m}_{i=1}\left[ y^{( i)}( w\cdot x_{i} +b) \ -\ log\left( 1+e^{w\cdot x_{i} +b}\right)\right] i=1m[y(i)(wxi+b)  log(1+ewxi+b)]

(6) L ( w ) L(w) L(w)对W求导:

∂ L ( w ) ∂ w = ∑ i = 1 n y i x i − ∑ i = 1 n e w x i + b 1 + e w x i + b x i = ∑ i = 1 n ( y i − l o g i t ( w ⋅ x i ) ) x i \displaystyle \frac{\partial L( w)}{\partial w} =\sum ^{n}_{i=1} y_{i} x_{i} -\sum ^{n}_{i=1}\frac{e^{wx_{i} +b}}{1+e^{wx_{i} +b}} x_{i} =\sum ^{n}_{i=1}( y_{i} -logit( w\cdot x_{i})) x_{i} wL(w)=i=1nyixii=1n1+ewxi+bewxi+bxi=i=1n(yilogit(wxi))xi

∂ L ( w ) ∂ b = ∑ i = 1 n y i − ∑ i = 1 n e w x i + b 1 + e w x i + b = ∑ i = 1 n ( y i − l o g i t ( w ⋅ x i ) ) \displaystyle \frac{\partial L( w)}{\partial b} =\sum ^{n}_{i=1} y_{i} -\sum ^{n}_{i=1}\frac{e^{wx_{i} +b}}{1+e^{wx_{i} +b}} =\sum ^{n}_{i=1}( y_{i} -logit( w\cdot x_{i})) bL(w)=i=1nyii=1n1+ewxi+bewxi+b=i=1n(yilogit(wxi)) (7)设定初始参数和学习率,通过梯度下降法不断对参数进行优化: w   =   w   −   α ∂ L ( w ) ∂ w \displaystyle w\ =\ w\ -\ \alpha \frac{\partial L( w)}{\partial w} w = w  αwL(w)

b   =   b   −   α ∂ L ( b ) ∂ b \displaystyle b\ =\ b\ -\ \alpha \frac{\partial L( b)}{\partial b} b = b  αbL(b)

二、代码实现

import torch import matplotlib.pyplot as plt import torch.optim as optim from torch.autograd import Variable import torch.nn as nn import numpy as np #创建初始数据集 n_data = torch.ones(100,2) #创建size=(100,2)的全是1的矩阵 x0 = torch.normal(2*n_data,1) #从给定参数means,std的离散正太分布中抽取随机数,可以共享均值或方差。生成类别0 的样本size=(100,2) y0 = torch.zeros(100) #生成size=(100,1)的全零矩阵,作为类别标签0 x1 = torch.normal(-2*n_data,1) #生成列表1的样本 size(100,2) y1 = torch.ones(100)#生成size=(100,1)的全1矩阵,作为类别标签1 x = torch.cat((x0,x1),0).type(torch.FloatTensor) #按行拼接x0,x1,构建总样本集 y = torch.cat((y0,y1),0).type(torch.FloatTensor) #构建模型 class LogisticRegression(nn.Module): def __init__(self): super(LogisticRegression,self).__init__() self.linear = nn.Linear(2,1) self.sm = nn.Sigmoid() def forward(self,x): out_1 = self.linear(x) out_2 = self.sm(out_1) return out_2 #定义模型 logistic_model = LogisticRegression() if torch.cuda.is_available(): logistic_model.cuda() #定义损失函数和优化函数 criterion = nn.BCELoss() #采用二分类交叉熵损失函数Binary CrossEntropyLoss optimizer = torch.optim.SGD(logistic_model.parameters(),lr=1e-3,momentum=0.9) #模型训练 for epoch in range(1000): if torch.cuda.is_available(): x = Variable(x).cuda() y = Variable(y).cuda() else: x = Variable(x) y = Variable(y) #前向传播 out = logistic_model(x) loss = criterion(out,y) print_loss = loss.data.item() mask = out.ge(0.5).float() #以0.5为阈值进行分类;将结果大于0.5的归类为1,小于0.5的为0 correct = (mask == y).sum() #计算正确预测的样本个数 acc = correct.item() / y.size(0) #反向求导 optimizer.zero_grad() loss.backward() optimizer.step() #每100轮打印当前的误差和精度 if (epoch + 1) % 100 == 0: print('*'*10) print(f'epoch:{epoch+1}') print(f'loss is {print_loss:.4f}') print(f'acc is {acc:.4f}') #可视化 w0,w1 = logistic_model.linear.weight[0] w0 = w0.data.item() w1 = w1.data.item() b = logistic_model.linear.bias.data.item() plot_x = np.arange(-5,5,0.1) plot_y = (-w0 * plot_x - b) / w1 #w0,w1,b为模型中学习导得参数,生成决策边界w0x+w1y+b=0 #因为使用gpu,这里需要先将x.data转化到cpu在转化为numpy plt.scatter(x.data.cpu().numpy()[:, 0], x.data.cpu().numpy()[:, 1], c=y.data.cpu().numpy(), s=100, lw=0, cmap='RdYlGn') plt.plot(plot_x, plot_y) plt.show()

决策边界如下图:

最新回复(0)