we set the output dimension of FC1 layer as 2 . 譬如FC1之后输出:
feat=(256, 2) 256是batch_size,256张图片,每个图片提取两个特征
横纵坐标是两个feature,得到两个类,一共256个点。 直角坐标系映射到极坐标: W 1 、 W 2 W_{1}、W_{2} W1、W2向量长度不相等。 Features learned by the original softmax loss can not be classified simply via angles ——增加角度margin。 The decision boundary in softmax loss is :
( W 1 − W 2 ) x + b 1 − b 2 = 0 (W_{1} −W_{2})x+b_{1} −b_{2}=0 (W1−W2)x+b1−b2=0
If we define x x x as a feature vector (特征向量?) and constrain ∥ W 1 ∥ = ∥ W 2 ∥ = 1 ∥W_{1}∥=∥W_{2}∥=1 ∥W1∥=∥W2∥=1 and b 1 = b 2 = 0 b_{1} =b_{2} =0 b1=b2=0, the boundary:
∥ x ∥ ( c o s ( θ 1 ) − c o s ( θ 2 ) ) = 0 \left \| x \right \|\left ( cos(\theta _{1})-cos(\theta_{2}) \right )=0 ∥x∥(cos(θ1)−cos(θ2))=0, where θ i θ_{i} θi is the angle between W i W_{i} Wi and x
到此为止,boundary只与角度θ有关,修改softmax loss直接优化角度,让 CNNs 提取到角度可分性更高的feature。现在我们来看加了 W 、 b 、 x W、b、x W、b、x的约束后的图像: 通过修改的softmax loss得到的feature。Compared to original softmax loss, the features learned by modified softmax loss are angularly distributed. 作者觉得两类分的还不够开,于是引入一个整数 m ( m ≥ 1 ) m(m ≥ 1) m(m≥1),惩罚因子,控制分开的角度距离,边界变为:
∥ x ∥ ( c o s ( m θ 1 ) − c o s ( θ 2 ) ) = 0 ∥x∥(cos(mθ_{1} )−cos(θ_{2}))=0 ∥x∥(cos(mθ1)−cos(θ2))=0 , ∥ x ∥ ( c o s ( θ 1 ) − c o s ( θ 2 ) ) = 0 , ∥x∥(cos(θ_{1} )− cos(θ_{2} ))=0, ∥x∥(cos(θ1)−cos(θ2))=0,
m m m越大, m θ 1 mθ_{1} mθ1越大,得到更大的角,两类分离越远,如下图:
