f(W,X) = [[w11, w12] dot [[x11,x12,x13], [w12, W22]] [x21,x22,x23],
cost function : 成本函数/目标函数 Σ(i(1,n))(h(x(i)) - y(i))^2
最小二乘法是:线性回归问题的计算方法
有监督学习模型: 计算值 实际值
残差:计算值 - 实际值
kmeans:
假设函数:自定义的方程式 h(x) = w0+ w1*x
成本函数是w0和w1 f(w0 , w1) =Σ(i(1,n))(h(x(i)) - y(i))^2 =Σ(i(1,n))(w0+w1x(i) - y(i))^2 = (w0 + w1x1 - y1)^2 + (w0 + w1x2 - y2)^2 + … + (w0 + w1xn - yn)^2
求极值f(w0 , w1),导数值,梯度下降: f’(w0) = 2*(w0 + w1x1 - y1) + 2(w0 + w1x2 - y2) + … + 2(w0 + w1xn - yn) = 0 f’(w1) = 2x1*(w0 + w1x1 - y1) + 2x2*(w0 + w1x2 - y2) + … + 2xn*(w0 + w1*xn - yn) = 0
2*(w0 + w1x1 - y1) + 2(w0 + w1x2 - y2) + … + 2(w0 + w1*xn - yn) = 0
nw0 + (w1x1 + w1x2 + … w1xn) - (y1+y2+…+yn) = 0 nw0 = (y1+y2+…+yn) - w1(x1+x2+…+xn) w0 = ((y1+y2+…+yn) - w1*(x1+x2+…+xn))/n w0 = mean(y) - w1*mean(x)
2x1(w0 + w1x1 - y1) + 2x2*(w0 + w1x2 - y2) + … + 2xn*(w0 + w1*xn - yn) = 0
w0*(x1+x2+…xn) + w1 * (x12+x22+…+xn^2) - (x1y1 + x2y2+ … + xn*yn) = 0
残熵: 熵增 熵减
x1:大小 3 4 5 7 x2: 距离 0 1 2 3 . . . xm
h(x1, x2) = w1x1 + w2x2 + w0 # w1 = 2 w2 = 1 w0=3
样本: X = [[3, 0],[4, 1], [5,2], [7,3]]
y = [9, 12,15, 20]
残差的平方和: cost = Σ (h(x(i)1, x(i)2) - y(i))^2
残差的平方和: cost = Σ (h(x(i)1, x(i)2) - y(i))^2
h(x1, x2…, xm) = w0 + w1x1 + w2x2 + … + wm*xm
扩展到m个维度: cost = Σ (h(xi1,xi2…,xim) - y(i))^2
高阶: f(x1, x2) = x1^2 + x2 + 2x2^2 + x1 + w0
扩围降阶: x3 = x1^2 x4 = x2^2 f(x1, x2) = x1^2 + x2 + 2x2^2 + x1 + 10 f(x1, x2) = x3 + x2 + 2x4 + x1 + 10
X = [[2,3],[4,6],[7,8]] y = [10,20,40]
XE = [[2,3, 4, 9, 1],[4,6, 16, 36, 1],[7,8, 49,64, 1]]
n个维度的线性回归问题,梯度下降算法的一般解法: 假设函数: f(x1,x2,…,xn) = w0 + w1x1 + … wnxn
X = [[x11,x12,…,x1n] [x21,x22,…,x2n] . . . [xm1,xm2,…,xmn] ]
y = [y1,y2,…,ym]
成本函数: F(w0,w1,…,wn) = Σ(1,m)(f(x(i)1,x(i)2,…,x(i)n) - y(i))^2
F(w0,w1,…,wn) = Σ(1,m)(w0 + w1x(i)1 + … wnx(i)n - y(i))^2
求解成本函数的极小值(梯度下降算法):
F’(w1) = 2 * Σ(1,m)((w0 + w1x(i)1 + … wnx(i)n - y(i))xi1) F’(w2) = 2 * Σ(1,m)((w0 + w1x(i)1 + … wnx(i)n - y(i))xi2) . . . F’(wn) = 2 * Σ(1,m)((w0 + w1x(i)1 + … wnx(i)n - y(i))*xin)
F’(w0) = 2 * Σ(1,m)(w0 + w1x(i)1 + … wnx(i)n - y(i))*1
X.T.dot(X.dot(W)) #
X.dot(W).T.dot(X)
展开式:
F’(w1) = 2 * Σ(1,m)((w0 + w1x(i)1 + … wnx(i)n - y(i))xi1) = 2 (( (w0 + w1x11 + … wnx1n - y1)x11 + (w0 + w1x21 + … +wnx2n - y2) * x21+… +(w0 + w1xm1 + … +wnxmn - ym) * xm1 )) = 2 X.dot(w).T.dot(X[0])
F’(w2) = 2 * Σ(1,m)((w0 + w1x(i)1 + … wnx(i)n - y(i))xi2) = 2 X.dot(w).T.dot(X[1]) . . . F’(wn) = 2 * Σ(1,m)((w0 + w1x(i)1 + … wnx(i)n - y(i))xin) = 2 X.dot(w).T.dot(X[n-1])
F’(w0) = 2 * Σ(1,m)(w0 + w1x(i)1 + … wnx(i)n - y(i))1 = 2 X.dot(w).T.dot(X[n])
F’(w) = 2* X.dot(w).T.dot(X)