Faster Rcnn是双阶段目标检测家族中的一员,由Rcnn -> Spp-net -> Fast Rcnn 再到Faster Rcnn,Faster Rcnn中首次使用深度学习的方法进行关键区域的提取,真正实现了end to end的目标检测,Faster Rcnn是双阶段目标检测系列最关键的节点,其后出现的Mask Rcnn与Cascade Rcnn都是基于Faster Rcnn而来,本次实现一个简要版的Faster Rcnn以增强自己对其的理解。 在之前参加天池比赛时,使用了Faster Rcnn和FPN,并做出了一定的改进也取得了不错的成绩,但当时是在mmdetection框架的基础上进行改进,难免无法顾及一些细节,通过这次从头开始实现Faster Rcnn和FPN,对细节方面有了更好的掌握,相信在实现了Faster Rcnn后,双步和和单步的目标检测算法我都可以进行简要版的复现,下图是Faster Rcnn的结构图。 Faster Rcnn的实现分为五个阶段:
第一阶段,根据输入的图像和标注的框信息(后续称为ground-truth)计算anchor的真实标签和位移坐标,该阶段生成的anchor的真实标签和位移坐标将用于与RPN网络预测的anchor的标签和位移坐标计算RPN网络的损失以更新RPN网络的权重。 假设输入图像大小为(800, 800),采用vgg16作为特征提取网络,下采样16倍,得到的特征图大小为(50, 50),对特征图上的每个点,映射回原图产生anchor,假设设置anchor_scale为(8, 16, 32),anchor_ratio为(0.5,1,2),那么每个位置将产生9个anchor,其中anchor_scale为anchor的大小,anchor_ratio为anchor的长宽比,需要注意的是,这里设置的anchor_scale是相对于特征图的,当映射回原图时需要乘以下采样倍数。对每个位置产生9个anchor,一共需要产生 50 ∗ 50 ∗ 9 50*50*9 50∗50∗9即22500个anchor,对这些anchor进行anchor的定位和采样,即将anchor分配给与其具有最大iou的ground_trouth(会从中采样256个,别的忽略即label为-1,正负样本比例为1:1,根据iou判断正负样本),转换公式如式1、2、3、4。 d x = ( g t x − a n c h o r x ) / a n c h o r w (1) dx=(gt_x-anchor_x)/anchor_w \tag{1} dx=(gtx−anchorx)/anchorw(1) d y = ( g t y − a n c h o r y ) / a n c h o r h (2) dy=(gt_y-anchor_y)/anchor_h\tag{2} dy=(gty−anchory)/anchorh(2) d w = l o g ( g t w / a n c h o r w ) (3) dw=log(gt_w/anchor_w)\tag{3} dw=log(gtw/anchorw)(3) d h = l o g ( g t h / a n c h o r h ) (4) dh=log(gt_h/anchor_h)\tag{4} dh=log(gth/anchorh)(4) 其中,dx、dy、dw、dh为anchor相对于ground_truth的位移坐标,gt_x、gt_y、gt_w、gt_h为ground_truth的中心坐标和宽高,anchor_x、anchor_y、anchor_w、anchor_h为anchor的中心坐标和宽高,同时根据anchor与ground_truth的iou来生成其真实标签(0或1),RPN网络只有前景和背景两种。该阶段的目的是对所有anchor生成其真实的位移坐标和标签,即gt_anchor_locations和gt_anchor_labels,用于联合RPN网络预测的pred_anchor_locations和pred_anchor_labels计算损失函数。
第二阶段,用RPN网络预测所有anchor的位移坐标和标签,即pred_anchor_locations和pred_anchor_labels。 下图显示了RPN网络细节 如图3所示,为RPN网络的实现细节,在实际实现时,第一阶段产生的特征图大小为 50 ∗ 50 50*50 50∗50, 通道数为512,RPN网络由一个 3 ∗ 3 3*3 3∗3的卷积核和两个 1 ∗ 1 1*1 1∗1的卷积分支构成, 3 ∗ 3 3*3 3∗3的卷积核加入了padding=1,即不改变原特征图的尺寸大小,两个 1 ∗ 1 1*1 1∗1的卷积分支分别预测每个位置9个anchor的类别和位移坐标,因此此处输入为提取到的(50,50,512)的特征图,其中512为通道数,而输出为(50, 50,18)的类别预测和(50,50,36)的位移坐标预测。第二阶段产生的pred_anchor_labels和pred_anchor_locations将用于与第一阶段计算的gt_anchor_labels和gt_anchor_locations一起计算RPN阶段的损失loss。
第三阶段,对第二阶段预测的anchor处理,根据第二阶段预测的pred_anchor_locations中的dx、dy、dw、dh结合初始anchor信息反向计算出RPN阶段预测的ground_truth的左上坐标和右下坐标(x1,y1,x2,y2),根据score对其进行排序,取前12000个进行nms,在nms后的剩余框中取前2000个,注意此时的pred_anchor_locations中存储的是反向推算出的预测框在原图上的位置,对剩下的这2000个框根据ground_truth进行采样和定位,计算出这2000个框相对于ground_truth的真实labels和位移坐标locations,根据iou进行采样和定位,与groud_truth的iou大于0.5的分为正样本,此时需要记录其对应的ground_truth的label,该部分标签为类别数,而不是前景背景(0,1),定位公式同第一阶段,然后对定位后的框进行采样,该阶段采样128个,其中正样本比例为0.25,该阶段最后产生的是根据RPN网络预测的pred_anchor_locations、pred_anchor_labels与ground_truth计算出的128个gt_roi_labels和gt_roi_locations。
第四阶段,第二阶段通过RPN网络产生了pred_anchor_labels和pred_anchor_locations,第三阶段从其中采样出了128个sample_rois,对这128个sample_rois计算出了其相对于ground_truth的真实标签和位移坐标即gt_roi_labels和gt_roi_locations,第四阶段将第三阶段采样出的sample_rois先送入roi pooling层获得 7 ∗ 7 ∗ 512 7*7*512 7∗7∗512固定大小的特征图,然后将其拉平产生一个(1, 25088)的特征向量,然后送入两层全连接层得到(1,4096)的特征向量,最后通过两个全连接层分支,分别预测其类别(num_class+1)和位移坐标((num_class+1)*4),即pred_roi_labels和pred_roi_locations。
第五阶段,根据前四个阶段计算的结果计算损失,其中RPN阶段的损失通过gt_anchor_labels、gt_anchor_locations、pred_anchor_labels、pred_anchor_locations计算,ROI阶段的损失通过gt_roi_labels、gt_roi_locations、pred_roi_labels、pred_roi_locations计算,分类损失使用交叉熵损失函数计算,回归损失通过smooth L1损失函数计算,分别计算出rpn_cls_loss、rpn_loc_loss、roi_cls_loss、roi_loc_loss,计算损失时要注意,分类损失是对所有框进行计算,而回归损失只对样本标签有意义的框计算,因此在计算总损失时要在回归损失前乘以10或者使分类损失除以10,即 rpn_loss = rpn_cls_loss/10 + rpn_loc_loss, roi_loss = roi_cls_loss/10+roi_loc_loss, total_loss = rpn_loss+roi_loss。 最后根据损失更新权重。 交叉熵损失函数如式5所示,smooth L1损失如式6所示。 L = − ∑ c = 1 M y c log ( p c ) (5) L=-\sum_{c=1}^{M} y_{c} \log \left(p_{c}\right)\tag{5} L=−c=1∑Myclog(pc)(5) L = { 0.5 x 2 , ∣ x ∣ < 1 ∣ x ∣ − 0.5 , ∣ x ∣ ≥ 1 (6) L=\left\{\begin{array}{cc}{0.5 \mathrm{x}^{2},} & {|x|<1} \\ {|x|-0.5,} & {|x| \geq 1}\end{array}\right.\tag{6} L={0.5x2,∣x∣−0.5,∣x∣<1∣x∣≥1(6)
辅助模块util.py
1. import numpy as np 2. 3. def iou(valid_anchors, gt_box): 4. # 传入两个box,左上坐标和右下坐标,大小为 n*4 5. # 返回ious,((len(valid_anchors)*len(gt_box))) 6. # 每个valid_anchor与每个gt_box都有iou,ious维度:(len(valid_anchors)*len(gt_box)) 7. valid_anchors_num = valid_anchors.shape[0] 8. gt_box_num = gt_box.shape[0] 9. ious = np.empty((valid_anchors_num, gt_box_num)) 10. ious.fill(0) 11. for i, anchor in enumerate(valid_anchors): 12. xa1, ya1, xa2, ya2 = anchor 13. area1 = (xa2-xa1)*(ya2-ya1) 14. for j, bbox in enumerate(gt_box): 15. xb1, yb1, xb2, yb2 = bbox 16. area2 = (xb2-xb1)*(yb2-yb1) 17. xx1 = np.max([xa1, xb1]) 18. yy1 = np.max([ya1, yb1]) 19. xx2 = np.min([xa2, xb2]) 20. yy2 = np.min([ya2, yb2]) 21. if(xx1 < xx2 and yy1 < yy2): 22. inter_area = (yy2-yy1)*(xx2-xx1) 23. iou = inter_area/(area1+area2-inter_area) 24. ious[i, j] = iou 25. return ious 26. 27. def nms(bboxes, thre, scores): 28. # 输入为n*4的框, thre为阙值, scores为每个框对应的score 29. # 输入均为numpy类型 30. # 输出为nms后的剩余框 31. x1 = bboxes[:, 0] 32. y1 = bboxes[:, 1] 33. x2 = bboxes[:, 2] 34. y2 = bboxes[:, 3] 35. areas = (x2-x1)*(y2-y1) 36. order = np.argsort(scores)[::-1] 37. keep = [] # nms后剩下的框的index 38. while order.size > 0: 39. i = order[0] # i为最大score的索引 40. keep.append(i) 41. xx1 = np.maximum(x1[i], x1[order[1:]]) 42. yy1 = np.maximum(y1[i], y1[order[1:]]) 43. xx2 = np.minimum(x2[i], x2[order[1:]]) 44. yy2 = np.minimum(y2[i], y2[order[1:]]) 45. w = np.maximum(0, xx2-xx1) 46. h = np.maximum(0, yy2-yy1) 47. inter = w*h 48. ious = inter/(areas[i]+areas[order[1:]]-inter) 49. indexes = np.where(ious < thre)[0] 50. order = order[indexes+1] 51. return keep主要模块faster_rcnn.py:
1. import torch 2. import torchvision 3. import torch.nn as nn 4. import torch.nn.functional as F 5. import numpy as np 6. import util 7. 8. ''''' 9. 第一阶段,根据原图的gt生成anchor的gt,生成的anchor_gt用于与RPN网络产生的roi计算损失 10. 注意,该阶段anchor只分两类,0或1,-1表示忽略 11. RPN网络对特征图上的每个点上的9个anchor进行预测,预测其类别(0,1)和其相对于gt的相对位置(dx,dy,dw,dh) 12. 此部分我们要先求出每个anchor分配后的实际类别(0,1)和相对于gt的真实位置(dx,dy,dw,dh) 13. 以此来求loss 14. 对于800*800的图,下采样16倍后特征图大小为50*50,每个位置9个anchor,共50*50*9即22500个anchor 15. 对这22500个anchor,先求出其真实的类别和相对gt的位移,再与RPN网络预测的类别和位移相比较,计算损失。 16. 该部分共采样了256个anchor,也就是真实求出的anchor labels中只有256个是1或0,别的都是-1(忽略) 17. ''' 18. # 先制作一张图片,并设置其groud_truth和对应的label 19. image = torch.zeros((1, 3, 800, 800)) 20. bboxes = torch.Tensor([[20, 30, 400, 500], [300, 400, 500, 600]]) 21. labels = torch.Tensor([6, 8]) 22. sub_sample = 16 # 下采样倍数 23. 24. # 获取vgg模型,使用vgg模型提取特征,下采样16倍 25. model = torchvision.models.vgg16(pretrained=True) 26. fe = list(model.features) 27. 28. backbone = [] 29. img_bak = image.clone() 30. for i in fe: 31. img_bak = i(img_bak) 32. if(img_bak.shape[2] < 50): 33. break 34. backbone.append(i) 35. out_channels = img_bak.shape[1] 36. backbone = nn.Sequential(*backbone) 37. feature_map = backbone(image) 38. print(backbone) 39. print(feature_map.shape) # 50*50 40. 41. # 对特征图生成所有anchors,特征图为50*50,将其上每个点映射回原图生成anchors 42. size = 800//16 43. centerX = np.arange(16, (size+1)*16, 16) 44. centerY = np.arange(16, (size+1)*16, 16) 45. # print(centerX) 46. center_x = centerX - 8 47. center_y = centerY - 8 48. print(center_x) 49. # anchor的参数,注意scale是针对特征图的 50. anchor_scales = [8, 16, 32] 51. anchor_ratios = [0.5, 1.0, 2] 52. anchor_center = np.zeros((size*size, 2)) # 2500*2 53. # 初始化anchor的中心, 共2500个 54. index = 0 55. for i in range(len(center_x)): 56. for j in range(len(centerY)): 57. anchor_center[index, 0] = center_x[i] 58. anchor_center[index, 1] = center_y[j] 59. index += 1 60. print(anchor_center.shape) 61. 62. # 生成所有的anchors 63. anchors = torch.zeros((size*size*9, 4), dtype=torch.float32) # 共50*50个位置,每个位置9个anchors,每个anchor4个坐标(x1,y1,x2,y2) 64. index = 0 65. for c in anchor_center: 66. center_x, center_y = c 67. for i in range(len(anchor_scales)): 68. for j in range(len(anchor_ratios)): 69. # h = np.sqrt(sub_sample*anchor_scales[i]*anchor_ratios[j]) 70. # w = np.sqrt(sub_sample*anchor_scales[i]*(1./anchor_ratios[j])) 71. h = sub_sample * anchor_scales[i] * np.sqrt(anchor_ratios[j]) 72. w = sub_sample * anchor_scales[i] * np.sqrt((1. / anchor_ratios[j])) 73. anchors[index, 0] = center_x - w/2 74. anchors[index, 1] = center_y - h/2 75. anchors[index, 2] = center_x + w/2 76. anchors[index, 3] = center_y + h/2 77. index += 1 78. print(anchors.shape) 79. print(anchors) 80. 81. # 获取有效的anchors的索引index, 即不超过边界的anchors 82. valid_anchors_index = np.where( # 有效anchors的索引 83. (anchors[:, 0] >= 0) & 84. (anchors[:, 1] >= 0) & 85. (anchors[:, 2] <= 800) & 86. (anchors[:, 3] <= 800) 87. )[0] 88. print(valid_anchors_index) 89. valid_anchors = anchors[valid_anchors_index] # 有效anchors 90. print(valid_anchors_index.shape) 91. print(valid_anchors.shape) 92. # 计算所有有效anchor和gt的iou 93. ious = util.iou(valid_anchors, bboxes) # (valid_anchors.shape[0], bboxes.shape[0]) 94. print(ious.shape) 95. ''''' 96. 开始分类anchor,与gt的iou最大的ancho分为前景,max iou>0.7的分为前景,否则分为背景 97. ''' 98. gt_maxiou_index = ious.argmax(axis=0) # axis=0表示对列取最大,ious有两列,每一列的最大值的index 99. print(gt_maxiou_index) 100. anchor_maxiou_index = ious.argmax(axis=1) # 对ious每行取最大值,即anchor与几个gt的iou中的最大值 101. print(anchor_maxiou_index) 102. # 取出每个gt最大iou的anchor和每个anchor最大iou的gt 103. gt_maxiou = ious[gt_maxiou_index, np.arange(bboxes.shape[0])] 104. anchor_maxiou = ious[np.arange(valid_anchors.shape[0]), anchor_maxiou_index] 105. print(gt_maxiou.shape) 106. print(anchor_maxiou.shape) 107. gt_maxiou_index = np.where(ious==gt_maxiou)[0] # 和gt有最大iou的anchor的索引 108. 109. # 设置pos参数,即iou大于0.7的为前景,小于0.3为背景,采样256个,前景占比0.5 110. pos_iou_thre = 0.7 111. neg_iou_thre = 0.3 112. pos_ratio = 0.5 113. n_sample = 256 114. valid_anchor_labels = np.empty((valid_anchors.shape[0])) 115. valid_anchor_labels.fill(-1) # 初始化为-1, 表示忽略 116. valid_anchor_labels[gt_maxiou_index] = 1 117. valid_anchor_labels[anchor_maxiou >= pos_iou_thre] = 1 118. valid_anchor_labels[anchor_maxiou < neg_iou_thre] = 0 119. print(valid_anchor_labels.shape) 120. # 采样正负样本 121. n_pos = n_sample*pos_ratio 122. pos_index = np.where(valid_anchor_labels == 1)[0] 123. if(len(pos_index) > n_pos): 124. disable_index = np.random.choice(pos_index, size=(len(pos_index)-n_pos), replace=False) 125. valid_anchor_labels[disable_index] = -1 126. 127. n_neg = n_sample*(1-pos_ratio) 128. if(len(pos_index) > n_pos): 129. pass 130. else: 131. n_neg = n_sample-len(pos_index) 132. neg_index = np.where(valid_anchor_labels==0)[0] 133. if(len(neg_index) > n_neg): 134. disable_index = np.random.choice(neg_index, size=(len(neg_index) - n_neg), replace=False) 135. valid_anchor_labels[disable_index] = -1 136. # 此时正负样本均已采样,共采样256个 137. print(np.sum(valid_anchor_labels==1)) 138. print(np.sum(valid_anchor_labels==0)) 139. 140. # 开始给每个anchor分配位置,dx,dy,dw,dh,将每个anchor分配到与其具有最大iou的gt上,即anchor相对于gt的坐标 141. ''''' 142. t_{x} = (x - x_{a})/w_{a} 143. t_{y} = (y - y_{a})/h_{a} 144. t_{w} = log(w/ w_a) 145. t_{h} = log(h/ h_a) 146. x, y , w, h是ground truth box的中心坐标,宽,高。x_a,y_a,h_a,w_a为anchor boxes的中心坐标,宽,高。 147. ''' 148. anchor_maxiou_gtbox = bboxes[anchor_maxiou_index] 149. print(anchor_maxiou_gtbox.shape) 150. w = anchor_maxiou_gtbox[:, 2] - anchor_maxiou_gtbox[:, 0] 151. h = anchor_maxiou_gtbox[:, 3] - anchor_maxiou_gtbox[:, 1] 152. x = anchor_maxiou_gtbox[:, 0] + w/2 153. y = anchor_maxiou_gtbox[:, 1] + h/2 154. anchor_w = valid_anchors[:, 2] - valid_anchors[:, 0] 155. anchor_h = valid_anchors[:, 3] - valid_anchors[:, 1] 156. anchor_x = valid_anchors[:, 0] + anchor_w/2 157. anchor_y = valid_anchors[:, 1] + anchor_h/2 158. eps = torch.tensor(1e-10) 159. anchor_h = np.maximum(anchor_h, eps) 160. anchor_w = np.maximum(anchor_w, eps) 161. dx = (x-anchor_x)/anchor_w 162. dy = (y-anchor_y)/anchor_h 163. dw = np.log(w/anchor_w) 164. dh = np.log(h/anchor_h) 165. anchor_location = np.vstack((dx, dy, dw, dh)).transpose() 166. print(anchor_location.shape) 167. anchor_labels = np.zeros((anchors.shape[0]), dtype=np.int32) 168. anchor_labels.fill(-1) 169. anchor_locations = np.zeros_like(anchors, dtype=np.float32) 170. anchor_locations.fill(-1) 171. anchor_labels[valid_anchors_index] = valid_anchor_labels 172. anchor_locations[valid_anchors_index] = anchor_location 173. print(anchor_labels.shape) 174. print(anchor_locations.shape) 175. # 以上为第一部分,获取真实的anchor类别和相对gt的位移坐标。 176. 177. ''''' 178. 第二部分,用RPN网络生成预测的anchor的类别和位移坐标 179. ''' 180. class RPN(nn.Module): 181. def __init__(self): 182. super(RPN, self).__init__() 183. mid_channels = 512 184. in_channels = 512 185. self.conv1 = nn.Conv2d(in_channels, mid_channels, 3, 1, 1) 186. self.reg_layer = nn.Conv2d(mid_channels, len(anchor_scales)*len(anchor_ratios)*4, 1, 1, 0) 187. self.cls_layer = nn.Conv2d(mid_channels, len(anchor_scales)*len(anchor_ratios)*2, 1, 1, 0) 188. self.conv1.weight.data.normal_(0, 0.01) 189. self.conv1.bias.data.zero_() 190. self.reg_layer.weight.data.normal_(0, 0.01) 191. self.reg_layer.bias.data.zero_() 192. self.cls_layer.weight.data.normal_(0, 0.01) 193. self.cls_layer.bias.data.zero_() 194. 195. def forward(self, x): 196. x = self.conv1(x) 197. pred_anchor_location = self.reg_layer(x) 198. pred_anchor_cls = self.cls_layer(x) 199. return pred_anchor_location, pred_anchor_cls 200. 201. rpn = RPN() 202. print(feature_map.shape) 203. pred_anchor_location, pred_anchor_cls = rpn(feature_map) 204. print(pred_anchor_location.shape) 205. print(pred_anchor_cls.shape) 206. pred_anchor_location = pred_anchor_location.permute(0, 2, 3, 1).contiguous().view(1, -1, 4) 207. pred_anchor_cls = pred_anchor_cls.permute(0, 2, 3, 1).contiguous().view(1, -1, 2) 208. print(pred_anchor_location.shape) 209. print(pred_anchor_cls.shape) 210. print(anchor_locations.shape) 211. print(anchor_labels.shape) 212. # pred_anchor_location与anchor_locations对应,pred_anchor_cls与anchor_labels对应 213. # 用于计算RPN_loss 214. # objectness_score中存储的是每个anchor属于正类的预测分数 215. objectness_score = pred_anchor_cls.view(1, 50, 50, 9, 2)[:, :, :, :, 1].contiguous().view(1, -1) # 预测每个anchor是正样本的分数 216. # 第二部分结束,用RPN网络预测所有anchor的类别和位移坐标,与第一部分求出的所有anchor的真实类别和位移坐标计算rpn loss 217. ''''' 218. 第三部分,通过rpn预测的anchor的类别和位移坐标生成roi,输入roi head进行预测 219. 该部分对rpn预测的22500个anchor,先根据预测的位移坐标还原到anchor的坐标,再对前n1个进行nms 220. 再在nms后的anchor中选取前n2个传入roi head进行预测。 221. rpn生成的是原始anchors相对与gt的偏移量。 222. 再第一部分先根据实际gt计算出了原始anchor相对于gt的真实偏移量(256个有效的) 223. 该部分的目的是生成送入roi head的框 224. ''' 225. nms_thre = 0.7 226. n_train_pre_nms = 12000 227. n_train_post_nms = 2000 228. n_test_pre_nms = 6000 229. n_test_post_nms = 300 230. min_size = 16 231. # 先把rpn网络预测的位移坐标转换成(x1,y1,x2,y2)坐标 232. ''''' 233. x = (w_{a} * ctr_x_{p}) + ctr_x_{a} 234. y = (h_{a} * ctr_x_{p}) + ctr_x_{a} 235. h = np.exp(h_{p}) * h_{a} 236. w = np.exp(w_{p}) * w_{a} 237. 根据原始anchors坐标和rpn生成的dx, dy, dw, dh逆向推断出预测的gt的位置 238. ''' 239. pred_anchor_location_numpy = pred_anchor_location[0].data.numpy() 240. objectness_score_numpy = objectness_score[0].data.numpy() 241. anchor_w = anchors[:, 2] - anchors[:, 0] 242. anchor_h = anchors[:, 3] - anchors[:, 1] 243. anchor_x = anchors[:, 0] + anchor_w/2 244. anchor_y = anchors[:, 1] + anchor_h/2 245. dx = pred_anchor_location_numpy[:, 0] 246. dy = pred_anchor_location_numpy[:, 1] 247. dw = pred_anchor_location_numpy[:, 2] 248. dh = pred_anchor_location_numpy[:, 3] 249. # dx1 = pred_anchor_location_numpy[:, 0::4] 250. # dy1 = pred_anchor_location_numpy[:, 1::4] 251. # dw1 = pred_anchor_location_numpy[:, 2::4] 252. # dh1 = pred_anchor_location_numpy[:, 3::4] 253. dx = torch.from_numpy(dx) 254. dy = torch.from_numpy(dy) 255. dw = torch.from_numpy(dw) 256. dh = torch.from_numpy(dh) 257. # 获得基于预测结果(位移坐标)得到的预测框在原图的center_x, center_y, w, h 258. pred_gt_center_x = dx*anchor_w+anchor_x 259. pred_gt_center_y = dy*anchor_h+anchor_y 260. pred_gt_w = np.exp(dw)*anchor_w 261. pred_gt_h = np.exp(dh)*anchor_h 262. print(pred_gt_center_x.shape) 263. print(pred_gt_center_y.shape) 264. print(pred_gt_w.shape) 265. print(pred_gt_h.shape) 266. # 再根据得到的center_x, center_y, w, h转换成左上坐标和右下坐标(x1,y1), (x2,y2) 267. rois = torch.zeros_like(pred_anchor_location[0]) # (22500, 4) 268. rois[:, 0] = pred_gt_center_x - pred_gt_w/2 269. rois[:, 1] = pred_gt_center_y - pred_gt_h/2 270. rois[:, 2] = pred_gt_center_x + pred_gt_w/2 271. rois[:, 3] = pred_gt_center_y + pred_gt_h/2 272. print(rois.shape) 273. # 将得到的框映射到原图上,即限制超过边界的坐标 274. img_size = (800, 800) 275. rois[:, 0] = torch.clamp(rois[:, 0], 0, img_size[0]) 276. rois[:, 1] = torch.clamp(rois[:, 1], 0, img_size[1]) 277. rois[:, 2] = torch.clamp(rois[:, 2], 0, img_size[0]) 278. rois[:, 3] = torch.clamp(rois[:, 3], 0, img_size[1]) 279. print(rois) 280. # 去除高度或宽度小于minsize的预测框 281. w = rois[:, 2] - rois[:, 0] 282. h = rois[:, 3] - rois[:, 1] 283. keep = np.where((h.numpy() >= min_size) & (w.numpy() >= min_size))[0] 284. rois = rois[keep, :] 285. before_scores = objectness_score[0][keep] 286. before_scores_numpy = before_scores.data.numpy() 287. print(rois.shape) 288. print(before_scores.shape) 289. print(before_scores_numpy.shape) 290. print(before_scores_numpy.ravel().shape) 291. # 对before_scores按从高到低的顺序排序,取前n1个进行nms,然后再取前n2个送入ROI head中 292. order = np.argsort(before_scores_numpy)[::-1] 293. order = order[:n_train_pre_nms] # 12000 294. order = torch.from_numpy(order.copy()) 295. rois = rois[order, :] # 12000*4 296. scores = before_scores[order] # 12000 297. rois_numpy = rois.data.numpy() 298. scores_numpy = scores.data.numpy() 299. keep = util.nms(rois_numpy, nms_thre, scores_numpy) 300. print(len(keep)) 301. keep = keep[:n_train_post_nms] 302. rois = rois[keep, :] 303. print(rois.shape) 304. # 以上取出了要送入roi head进行预测的roi(RPN网络产生的预测框) 305. 306. ''''' 307. 第四部分,对第三部分产生的rois进行进一步的采样,先对rpn预测后送进来的框进行定位, 308. 即计算每个框和每个gt的iou,根据iou对其进行采样,并进行位移坐标定位。 309. ''' 310. n_sample = 128 311. pos_ratio = 0.25 312. pos_iou_thre = 0.5 313. neg_iou_thre_hi = 0.5 314. neg_iou_thre_lo = 0.0 315. ''''' 316. 先采样,该部分根据输入到这里的rpn产生的roi,先计算这些roi实际的label和相对于gt的位移坐标 317. 用于与roi head生成的对比,计算loss 318. ''' 319. # 计算iou 320. ious = util.iou(rois, bboxes) # 2000*2 321. print(ious) 322. print(ious.shape) 323. # 获取每个anchor对应的最大iou,及对应的gt 324. gt_argroi = ious.argmax(axis=1) 325. roi_max_ious = ious.max(axis=1) 326. gt_roi_label = labels[gt_argroi] # 对每个roi分配真实label 327. # 分配正样本 328. n_pos = n_sample*pos_ratio 329. pos_index = np.where(roi_max_ious > pos_iou_thre)[0] 330. pos_roi_this_image = int(min(n_pos, len(pos_index))) 331. if len(pos_index) > 0: 332. pos_index = np.random.choice(pos_index, size=pos_roi_this_image, replace=False) 333. print(pos_index) 334. print(len(pos_index)) 335. 336. neg_roi_this_image = n_sample - pos_roi_this_image 337. neg_index = np.where((roi_max_ious < neg_iou_thre_hi) & (roi_max_ious > neg_iou_thre_lo))[0] 338. neg_roi_this_image = int(min(neg_roi_this_image, len(neg_index))) 339. if len(neg_index) > 0: 340. neg_index = np.random.choice(neg_index, size=neg_roi_this_image, replace=False) 341. print(neg_index) 342. print(len(neg_index)) 343. # 以上采样出了正样本和负样本的索引,对这些roi求真实label和真实位移坐标作为gt_roi 344. keep_index = np.append(pos_index, neg_index) 345. print(keep_index) 346. sample_rois = rois[keep_index, :] 347. print(sample_rois.shape) 348. # 计算采样的rois的真实位移坐标和真实类别 349. gt_for_sample_rois = bboxes[gt_argroi[keep_index]] # 获取与sample_rois对应的gt框 350. w = sample_rois[:, 2] - sample_rois[:, 0] 351. h = sample_rois[:, 3] - sample_rois[:, 1] 352. center_x = sample_rois[:, 0] + w/2 353. center_y = sample_rois[:, 1] + h/2 354. gt_w = gt_for_sample_rois[:, 2] - gt_for_sample_rois[:, 0] 355. gt_h = gt_for_sample_rois[:, 3] - gt_for_sample_rois[:, 1] 356. gt_center_x = gt_for_sample_rois[:, 0] + w/2 357. gt_center_y = gt_for_sample_rois[:, 1] + h/2 358. eps = torch.tensor(1e-10) 359. h = np.maximum(h, eps) 360. w = np.maximum(w, eps) 361. dx = (gt_center_x - center_x)/w 362. dy = (gt_center_y - center_y)/h 363. dw = np.log(gt_w/w) 364. dh = np.log(gt_h/h) 365. gt_sample_roi_locations = np.vstack((dx, dy, dw, dh)).transpose() 366. gt_sample_roi_labels = gt_roi_label[keep_index] 367. gt_sample_roi_labels[pos_roi_this_image:] = 0 # 负样本的labels设置成0 368. ''''' 369. gt_sample_roi_locations与gt_sample_roi_labels是roi部分的ground truth 370. ''' 371. print(gt_sample_roi_locations) 372. print(gt_sample_roi_locations.shape) 373. print(gt_sample_roi_labels.shape) 374. print(sample_rois) 375. # 以上为处理结果,gt_sample_roi_locations和gt_sample_roi_labels为每个sample_roi对应的真实label和位移坐标 376. # sample_rois将被送入roi head来预测label和位移结果 377. print(sample_rois.shape) 378. roi_indexes = torch.zeros((sample_rois.shape[0]), dtype=torch.int32) 379. print(roi_indexes.shape) 380. # rois是用于输入roi head的数据,再sample_rois的基础上添加了一个img的索引,在本例中只有一个image 381. 382. rois = torch.zeros((sample_rois.shape[0], sample_rois.shape[1]+1)) 383. rois[:, 0] = roi_indexes 384. rois[:, 1:] = sample_rois 385. print(rois.shape) 386. print(rois) 387. ''''' 388. 此处处理逻辑是先把sample_rois加上一维,来表示是哪张图片的,因为实际中可能一次传入一个batch多张图片 389. 在本代码中只传入一张,所以该维全初始化为0,然后将sample_rois下采样16倍映射到对应的feature_map上 390. 然后传入roi pooling获得roi pooling处理后的结果,再传入roi head获得预测的结果 391. ''' 392. size = 7 393. roi_pooling = nn.AdaptiveMaxPool2d(size, size) 394. out_put = [] # 用于存储roi pooling处理后的结果 395. # 下采样sub_sample倍,从原图映射到特征图上 396. rois[:, 1:].mul_(1.0/16.0) 397. print(feature_map.shape) 398. for i in range(rois.shape[0]): 399. roi = rois[i] 400. img_index = int(roi[0]) 401. feature_im = feature_map[img_index, :, int(roi[1]):int(roi[3]), int(roi[2]):int(roi[4])] # 取出对应到feature map上的图 402. roi_pooling_im = roi_pooling(feature_im) 403. out_put.append(roi_pooling_im[0]) 404. out_put = torch.stack(out_put) 405. print(out_put.shape) 406. # output中存储的就是sample_rois经过roi pooling处理后的特征图 407. out_put_linear = out_put.view(out_put.shape[0], -1) # 后面都是全连接层 408. print(out_put_linear.shape) 409. class ROIHead(nn.Module): 410. def __init__(self, num_class): 411. super(ROIHead, self).__init__() 412. num_class = num_class 413. self.linear1 = nn.Linear(25088, 4096) 414. self.linear2 = nn.Linear(4096, 4096) 415. # 输入的是每个rois映射到特征图再经过roi pooling的结果,预测每个roi中物体的类别和位移坐标 416. self.location = nn.Linear(4096, (num_class+1)*4) # 每个类别的位移坐标 417. self.score = nn.Linear(4096, (num_class+1)) # 每个类别的分数 418. self._init_weight() 419. 420. def _init_weight(self): 421. self.linear1.weight.data.normal_(0, 0.01) 422. self.linear1.bias.data.zero_() 423. self.linear2.weight.data.normal_(0, 0.01) 424. self.linear2.bias.data.zero_() 425. self.location.weight.data.normal_(0, 0.01) 426. self.location.bias.data.zero_() 427. self.score.weight.data.normal_(0, 0.01) 428. self.score.bias.data.zero_() 429. 430. def forward(self, x): 431. x = self.linear1(x) 432. x = self.linear2(x) 433. pred_roi_locations = self.location(x) # (num_class+1)*4 434. pred_roi_labels = self.score(x) # num_class+1 435. return pred_roi_locations, pred_roi_labels 436. 437. roihead = ROIHead(num_class=20) 438. print(out_put_linear.shape) 439. pred_roi_locations, pred_roi_labels = roihead(out_put_linear) 440. print(pred_roi_locations.shape) # (n_sample, (num_class+1)*4) 441. print(pred_roi_labels.shape) # (n_sample, (num_class+1)) 442. 443. ''''' 444. 第五部分,计算损失函数,本部分分两小部分,第一部分计算rpn的损失,第二部分计算roi的损失 445. ''' 446. # rpn损失计算使用 447. loss_lambda = 10 448. print("RPN Loss") 449. print(anchor_locations.shape) 450. print(anchor_labels.shape) 451. print(pred_anchor_location.shape) 452. print(pred_anchor_cls.shape) 453. anchor_locations = torch.from_numpy(anchor_locations) 454. anchor_labels = torch.from_numpy(anchor_labels) 455. pred_anchor_location = pred_anchor_location[0] 456. pred_anchor_cls = pred_anchor_cls[0] 457. print(anchor_locations.shape, anchor_labels.shape, pred_anchor_location.shape, pred_anchor_cls.shape) 458. # 分类损失, 交叉熵损失 459. anchor_labels = anchor_labels.long() 460. rpn_cls_loss = F.cross_entropy(pred_anchor_cls, anchor_labels, ignore_index=-1) 461. print(rpn_cls_loss) 462. # 回归损失,smooth l1损失, 只对gt anchor labels为1的进行smooth l1损失计算 463. pos_index = anchor_labels > 0 464. print(pos_index.shape) 465. print(pos_index) 466. mask = pos_index.unsqueeze(1).expand_as(anchor_locations) 467. print(mask.shape) 468. print(mask) 469. # 取出label为正的anchor location计算损失 470. anchor_locations = anchor_locations[mask].view(-1, 4) # 18*4 471. pred_anchor_location = pred_anchor_location[mask].view(-1, 4) # 18*4 472. x = torch.abs(anchor_locations - pred_anchor_location) 473. print(x.shape) 474. rpn_loc_loss = (x < 1).float()*0.5*x**2 + (x >= 1).float()*(x-0.5) 475. rpn_loc_loss = rpn_loc_loss.sum() # 这是回归损失总和,要求平均 476. print(rpn_loc_loss) 477. n_reg = (anchor_labels>0).float().sum() # 总数 478. print(n_reg) 479. rpn_loc_loss = rpn_loc_loss/n_reg # 平均 480. print(rpn_loc_loss) 481. rpn_loss = rpn_cls_loss + loss_lambda*rpn_loc_loss 482. print("rpn loss:{}".format(rpn_loss)) 483. 484. print("RPN Loss Finished") 485. print("-----------------------------------") 486. # 计算roi损失使用 487. print("-----------------------------------") 488. print("ROI Loss") 489. print(gt_sample_roi_locations.shape) 490. print(gt_sample_roi_labels.shape) 491. print(pred_roi_locations.shape) 492. print(pred_roi_labels.shape) 493. gt_sample_roi_locations = torch.from_numpy(gt_sample_roi_locations) 494. gt_sample_roi_labels = gt_sample_roi_labels.long() 495. # 分类损失 496. roi_cls_loss = F.cross_entropy(pred_roi_labels, gt_sample_roi_labels, ignore_index=-1) 497. print(roi_cls_loss) 498. # 回归损失 499. pred_roi_locations = pred_roi_locations.view(pred_roi_locations.shape[0], -1, 4) 500. print(pred_roi_locations.shape) # 128*21*4 501. # 取出pred_roi_locations与gt_roi_locations中对应的那一类的位移坐标进行计算 502. pred_roi_locations = pred_roi_locations[np.arange(0, pred_roi_locations.shape[0]), gt_sample_roi_labels] # 128*4 503. print(pred_roi_locations.shape) 504. 505. # 取出正标签,并计算其loss 506. pos_index = gt_sample_roi_labels > 0 # 正标签 507. mask = pos_index.unsqueeze(1).expand_as(pred_roi_locations) # 掩码 508. print(mask.shape) 509. pred_roi_locations = pred_roi_locations[mask].view(-1, 4) # 获取预测框中为正标签的部分 510. gt_sample_roi_locations = gt_sample_roi_locations[mask].view(-1, 4) # 同上,获取gt中的 511. print(pred_roi_locations.shape, gt_sample_roi_locations.shape) 512. x = torch.abs(pred_roi_locations - gt_sample_roi_locations) 513. roi_loc_loss = (x < 1).float()*0.5*x**2 + (x >= 1).float()*(x-0.5) 514. roi_loc_loss = roi_loc_loss.sum() 515. print(roi_loc_loss) 516. n_reg = (gt_sample_roi_labels > 0).sum() 517. roi_loc_loss = roi_loc_loss/n_reg 518. roi_loss = roi_cls_loss + loss_lambda*roi_loc_loss 519. print(roi_loc_loss) 520. print("roi_loss: {}".format(roi_loss)) 521. print("ROI Loss Finished") 522. total_loss = rpn_loss+roi_loss 523. print("total loss: {}".format(total_loss)) 524. total_loss.backward()后续会开始看基于深度学习的边缘检测方法(应用于缺陷检测)