1.sigmoid函数

在逻辑回归中,我们引入了一个函数,sigmoid函数。

y=11+exy=\frac{1}{1+e^{-x}}

 该函数有一个很好的特性就是在实轴域上y的取值在(0,1),且有很好的对称性,对极大值和极小值不敏感(因为在取向无论是正无穷还是负无穷的时候函数的y几乎很稳定)。由于sigmoid函数的值域在(0,1)之间,这正好可以表示一个概率值,令P(Y=1X)=11+eθx+bP(Y=1 |X)=\frac{1}{1+e^{-\theta{x}+b}}其中,θ,x\theta ,x为向量。
上式表示在给定x的值后,预测y=1的概率。所以有假设函数

hθ(x)=11+eθx+bh_{\theta}(x)=\frac{1}{1+e^{-\theta{x}+b}}

当Y=1时,P(Y=1)=11+eθx+b=hθ(x)P(Y=1)=\frac{1}{1+e^{-\theta{x}+b}}=h_{\theta}(x)
显然Y=0时,P(Y=0)=1P(Y=1)=1hθ(x)P(Y=0)=1-P(Y=1)=1-h_{\theta}(x)
,将其整合为一个公式:

P=(Y=y)=yhθ(x)+(1y)(1hθ(x)),(y(0,1))P=(Y=y)=yh_{\theta}(x)+(1-y)(1-h_{\theta}(x)),(y∈{(0,1}))

2.损失函数

1对数损失函数

这里引入对数似然函数:

L(Y,P(YX))=logP(YX)L(Y,P(Y|X))=-logP(Y|X)

当Y=1时,损失函数为:logP(Y=1X)-logP(Y=1|X)
当Y=0时,损失函数为:log(1P(Y=1X))-log(1-P(Y=1|X))
则将两个公式合并就得到了一个样本的损失函数:

L(YX)=ylogP(Y=1X)(1y)log(1P(Y=1X))=(ylog(hθ(x))+(1y)log(1hθ(x)))L(Y|X)=-ylogP(Y=1|X)-(1-y)log(1-P(Y=1|X))=-(ylog(h_{\theta}(x))+(1-y)log(1-h_{\theta}(x)))

所以m个样本的损失函数为:

J(θ)=1mi=1m(yilog(hθ(xi))+(1yi)log(1hθ(xi)))J(\theta)=-\frac{1}{m}\sum_{i=1}^{m}{(y_{i}log(h_{\theta}(x_{i}))+(1-y_{i})log(1-h_{\theta}(x_{i})))}

2.最大似然函数

在这里我们将y的概率公式合并为

P(YX)=(hθ(xi))yi(1hθ(xi))(1yi)P(Y |X)=(h_{\theta}(x_{i}))^{y_{i}}(1-h_{\theta}(x_{i}))^{(1-y_{i})}

由于每个样本的概率密度函数是一样的,所以对于每一个样本来说都是同分布的,所以我们构造出这m个样本的最大似然函数:

L(θ((x1,y1),(x2,y2)...(xm,ym)))=πi=1m(hθ(xi))yi(1hθ(xi)(1yi)L(\theta | ((x_{1},y_{1}),(x_{2},y_{2})...(x_{m},y_{m})))=\pi_{i=1}^{m}{(h_{\theta}(x_{i}))^{y_{i}}(1-h_{\theta}(x_{i})^{(1-y_{i})}}

对似然函数取对数的得到对数似然函数:

logL(θ)=i=1m(yilog(hθ(xi))+(1yi)log(1hθ(xi)))logL(\theta)=\sum_{i=1}^{m}{(y_{i}log(h_{\theta}(x_{i}))+(1-y_{i})log(1-h_{\theta}(x_{i})))}

似然函数是要求似然函数的最大值,而我们引入代价函数J(θ)=1mlogL(θ)J(\theta)=-\frac{1}{m}logL(\theta)来用梯度下降法来求最小值。

J(θ)=1mi=1m(yilog(hθ(xi))+(1yi)log(1hθ(xi)))J(\theta)=-\frac{1}{m}\sum_{i=1}^{m}{(y_{i}log(h_{\theta}(x_{i}))+(1-y_{i})log(1-h_{\theta}(x_{i})))}

3.对θ\theta求偏导

θJ(θ)=1mi=1m(yi1hθ(xi)θhθ(xi)(1yi)(11hθ(x))(θhθ(xi)))\frac{\partial}{\partial \theta}J(\theta)=-\frac{1}{m}\sum_{i=1}^{m}({y_{i}\frac{1}{h_{\theta}(x_{i})}\frac{\partial}{\partial \theta}h_{\theta}(x_{i})}-(1-y_{i})(\frac{1}{1-h_{\theta}(x)})(\frac{\partial}{\partial \theta}h_{\theta}(x_{i})))

=1mi=1m(yi1hθ(xi)(1yi)(11hθ(xi))θhθ(xi)=-\frac{1}{m}\sum_{i=1}^{m}(y_{i}\frac{1}{h_{\theta}(x_{i})}-(1-y_{i})(\frac{1}{1-h_{\theta}(x_{i})}) \frac{\partial}{\partial \theta}h_{\theta}(x_{i})

sigmoid函数求导:

g(z)=11+ezg(z)=\frac{1}{1+e^{-z}}

g(z)=ez(1+ez)2g^{'}(z)=\frac{e^{-z}}{(1+e^{-z})^{2}}

=11+ezez1+ez=\frac{1}{1+e^{-z}}\frac{e^{-z}}{1+e^{-z}}

=11+ez(1+ez1+ez11+ez)=\frac{1}{1+e^{-z}}(\frac{1+e^{-z}}{1+e^{-z}}-\frac{1}{1+e^{-z}})

=g(z)(1g(z))=g(z)(1-g(z))

由于hθ(x)=g(θx)=11+eθxh_{\theta}(x)=g(\theta{x})=\frac{1}{1+e^{-\theta x}}

所以:θhθ(x)=hθ(x)(1hθ(x))x\frac{\partial}{\partial \theta}h_{\theta}(x)=h_{\theta}({x})(1-h_{\theta}(x))x

θJ(θ)\frac{\partial}{\partial \theta}J(\theta)

=1mi=1m(yi1hθ(xi)(1yi)(11hθ(xi))(hθ(xi)(1hθ(xi))xi=-\frac{1}{m}\sum_{i=1}^{m}(y_{i}\frac{1}{h_{\theta}(x_{i})}-(1-y_{i})(\frac{1}{1-h_{\theta}(x_{i})})(h_{\theta}({x_{i}})(1-h_{\theta}(x_{i}))x_{i}

=1mi=1m(yi(1hθ(xi))(1yi)hθ(xi))xi=-\frac{1}{m}\sum_{i=1}^{m}(y_{i}(1-h_{\theta}(x_{i}))-(1-y_{i})h_{\theta}(x_{i}))x_{i}

=1mi=1m(yiyihθ(xi)hθ(xi)+yihθ(xi))xi=-\frac{1}{m}\sum_{i=1}^{m}(y_{i}-y_{i}h_{\theta}(x_{i})-h_{\theta}(x_{i})+y_{i}h_{\theta}(x_{i}))x_{i}

=1mi=1m(hθ(xi)yi)xi=\frac{1}{m}\sum_{i=1}^{m}(h_{\theta}(x_{i})-y_{i})x_{i}

最终的结果与线性回归求得的结果一样

4.代码演示

1
2
import numpy as np
import matplotlib.pyplot as plt
1
2
3
4
5
6
7
8
python
x=np.array([0.50,0.75,1.00,1.25,1.50,1.75,1.75,2.00,2.25,2.50,
2.75,3.00,3.25,3.50,4.00,4.25,4.50,4.75,5.00,5.50])
x=x.reshape(-1,1)
y=np.array([0,0,0,0,0,0,1,0,1,0,1,0,1,0,1,1,1,1,1,1])
x=np.concatenate((np.ones((x.shape[0],1)),x),axis=1)#添加偏置项
y=y.reshape(-1,1)
y.shape
(20, 1)
1
2
3
4
5
plt.scatter(x[:,1],y)
plt.xlabel('Hours')
plt.ylabel('pass')
plt.title('exam-data')
plt.show()


在这里插入图片描述

J(θ)=1mi=1m(yilog(hθ(xi))+(1yi)log(1hθ(xi)))J(\theta)=-\frac{1}{m}\sum_{i=1}^{m}{(y_{i}log(h_{\theta}(x_{i}))+(1-y_{i})log(1-h_{\theta}(x_{i})))}

θJ(θ)=1mi=1m(hθ(xi)yi)xi\frac{\partial}{\theta}J(\theta)=\frac{1}{m}\sum_{i=1}^{m}(h_{\theta}(x_{i})-y_{i})x_{i}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
def sigmoid(x):#sigmoid函数
return 1.0/(1+np.exp(-x))
def cost(x,y,theta):#代价函数
x=np.matrix(x)
y=np.matrix(y)
theta=np.matrix(theta)
first=np.multiply(y,np.log(sigmoid(x*theta)))
second=np.multiply(1-y,np.log(1-sigmoid(x*theta)))
return np.sum(first+second)/(-len(x))
def grad(x,y,theta,epochs=1000,lr=0.001):#进行梯度下降
x=np.matrix(x)
y=np.matrix(y)
theta=np.matrix(theta)
#print(x.shape,' ',theta.shape)
m=x.shape[0]
costList=[]
for i in range(epochs+1):
#print(x.shape,' ',theta.shape)
h=sigmoid(x*theta)
delta=x.T*(h-y)/m
theta=theta-lr*delta
if(i%50==0):
costList.append(cost(x,y,theta))#计算损失值
return theta,costList
1
2
3
4
5
6
theta=np.ones((x.shape[1],1))
#print(theta.shape)
theta,costList=grad(x,y,theta,3000,0.3)
a=np.linspace(0,3000,61)#生成61个数
plt.plot(a,costList,c='y')
plt.show()

在这里插入图片描述

1
2
3
4
5
6
7
8
9
10
11
12
from sklearn.linear_model import LogisticRegression
x=np.array([0.50,0.75,1.00,1.25,1.50,1.75,1.75,2.00,2.25,2.50,
2.75,3.00,3.25,3.50,4.00,4.25,4.50,4.75,5.00,5.50])
x=x.reshape(-1,1)
y=np.array([0,0,0,0,0,0,1,0,1,0,1,0,1,0,1,1,1,1,1,1])
y=y.reshape(-1,1)
model=LogisticRegression()
model.fit(x,y)
b=model.intercept_
a=model.coef_
print(a,b)
print(theta)
[[1.14860386]] [-3.13952411]
[[-4.07770898]
 [ 1.50464392]]
1
2
3
4
5
6
7
8
9
10
from sklearn.metrics import classification_report
def predect(x,theta):

x=np.matrix(x)
theta=np.matrix(theta)
return [1 if i>0.5 else 0 for i in (sigmoid(x*theta))]
x2=np.concatenate((np.ones((x.shape[0],1)),x),axis=1)#添加偏置项
prediction=predect(x2,theta)
print(classification_report(y,prediction))

              precision    recall  f1-score   support

           0       0.80      0.80      0.80        10
           1       0.80      0.80      0.80        10

    accuracy                           0.80        20
   macro avg       0.80      0.80      0.80        20
weighted avg       0.80      0.80      0.80        20

​ 可以看出正确率有80%

5.(补充)reshape函数

①numpy.arange(n).reshape(a, b) 依次生成n个自然数,并且以a行b列的数组形式显示

②mat (or array).reshape(c, -1) 必须是矩阵格式或者数组格式,才能使用 .reshape(c, -1) 函数, 表示将此矩阵或者数组重组,以 c行d列的形式表示
③reshape(1,-1)转化成1行
reshape(2,-1)转换成两行
reshape(-1,1)转换成1列
reshape(-1,2)转化成两列

>>> import numpy as np
>>> np.arange(16).reshape(2,8)#生成16个数,以2行8列形式显示
array([[ 0,  1,  2,  3,  4,  5,  6,  7],
       [ 8,  9, 10, 11, 12, 13, 14, 15]])
>>> a=np.arange(16).reshape(2,8)#生成16个数,以2行8列形式显示
>>> a.shape
(2, 8)
>>> a.reshape(4,-1)#改变为m行,d列(-1表示列数自动计算,d=a*b/m)
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])
>>> a.reshape(-1,2)#改变为d行,m列(-1表示行数自动计算,d=a*b/m)
array([[ 0,  1],
       [ 2,  3],
       [ 4,  5],
       [ 6,  7],
       [ 8,  9],
       [10, 11],
       [12, 13],
       [14, 15]])
>>> np.array(1,12,2)#(a,b.c) 从数字a起,步长为c,到b结束
>>> np.arange(1,12,2)#(a,b.c) 从数字a起数,步长为c,到b结束
array([ 1,  3,  5,  7,  9, 11])
>>> a.reshape(1,-1)#1行
array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15]])
>>> a.reshape(2,-1)#2行
array([[ 0,  1,  2,  3,  4,  5,  6,  7],
       [ 8,  9, 10, 11, 12, 13, 14, 15]])
>>> a.reshape(-1,1)#一列
array([[ 0],
       [ 1],
       [ 2],
       [ 3],
       [ 4],
       [ 5],
       [ 6],
       [ 7],
       [ 8],
       [ 9],
       [10],
       [11],
       [12],
       [13],
       [14],
       [15]])
>>> a.reshape(-1,2)#2列
array([[ 0,  1],
       [ 2,  3],
       [ 4,  5],
       [ 6,  7],
       [ 8,  9],
       [10, 11],
       [12, 13],
       [14, 15]])
>>>