[物体检测]物体检测论文梳理

计算机视觉顶会(ICCV、CVPR、ECCV)

论文 代码 期刊/年份 作者/学校 备注
R-CNN CVPR 2014 Ross Girshick/UC Berkeley 二阶段
Fast R-CNN ICCV 2015 Ross Girshick/Microsoft Research 二阶段
Faster R-CNN NIPS 2015 Shaoqing Ren/Microsoft Research 二阶段
YOLO CVPR 2016 Joseph Redmon/University of Washington 一阶段
SSD ECCV 2016 Wei Liu/UNC Chapel Hill 一阶段
FPN CVPR 2017 Tsung-Yi Lin/Facebook AI Research
YOLO v2 CVPR 2017 Joseph Redmon/University of Washington 一阶段
RetinaNet ICCV 2017 Tsung-Yi Lin/Facebook AI Research 一阶段
Mask R-CNN ICCV 2017 Kaiming He/Facebook AI Research 二阶段
YOLO v3 Arvix 2018 Joseph Redmon/University of Washington 一阶段
RefineDet CVPR2018

简图比较

R-CNN系列

YOLO

SSD

SSD

核心设计理念

  1. 采用多尺度特征图用于检测(好)
  2. 设置先验框(快)
  3. 采用卷积进行检测(特征块不用pooling)

难点

  1. 正负样本获得(难例挖掘)
  2. 从特征块中预测bbox和cls(不使用pooling,而是通过concat)

数据

PASCAL VOC2007
PASCAL VOC2012
COCO

模型

每一层的大小的推断代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
from mxnet.gluon import nn
from mxnet import nd

# data_Conv6为Conv6层的数据, 此层上应用的卷积操作是op_Conv6
data_Conv6 = nd.random.randn(1,1024,19,19) # 输入输出数据格式是 batch x channel x height x width
print("data_Conv6.shape =", data_Conv6.shape)

op_Conv6 = nn.Conv2D(1024, kernel_size=1) # Conv: 1*1*1024
op_Conv6.initialize(force_reinit=True)
data_Conv7 = op_Conv6(data_Conv6) # 19*19*1024
print("data_Conv7.shape =", data_Conv7.shape)

op_Conv7 = nn.Sequential()
op_Conv7.add(nn.Conv2D(256, kernel_size=1)) # Conv: 1*1*256
op_Conv7.add(nn.Conv2D(512, kernel_size=3, padding=1, strides=2)) # Conv: 3*3*512-s2
op_Conv7.initialize(force_reinit=True)
data_Conv8_2 = op_Conv7(data_Conv7) # 10*10*512
print("data_Conv8_2.shape =", data_Conv8_2.shape)

op_Conv8_2 = nn.Sequential()
op_Conv8_2.add(nn.Conv2D(128, kernel_size=1)) # Conv: 1*1*128
op_Conv8_2.add(nn.Conv2D(256, kernel_size=3, padding=1, strides=2)) # Conv: 3*3*256-s2
op_Conv8_2.initialize(force_reinit=True)
data_Conv9_2 = op_Conv8_2(data_Conv8_2) # 10*10*512
print("data_Conv9_2.shape =", data_Conv9_2.shape)

op_Conv9_2 = nn.Sequential()
op_Conv9_2.add(nn.Conv2D(128, kernel_size=1)) # Conv: 1*1*128
op_Conv9_2.add(nn.Conv2D(256, kernel_size=3, strides=1)) # Conv: 3*3*256-s1
op_Conv9_2.initialize(force_reinit=True)
data_Conv10_2 = op_Conv9_2(data_Conv9_2) # 10*10*512
print("data_Conv10_2.shape =", data_Conv10_2.shape)

op_Conv10_2 = nn.Sequential()
op_Conv10_2.add(nn.Conv2D(128, kernel_size=1)) # Conv: 1*1*128
op_Conv10_2.add(nn.Conv2D(256, kernel_size=3, strides=1)) # Conv: 3*3*256-s1
op_Conv10_2.initialize(force_reinit=True)
data_Conv11_2 = op_Conv10_2(data_Conv10_2) # 10*10*512
print("data_Conv11_2.shape =", data_Conv11_2.shape)

损失函数

优化算法

SGD

  • initial learning rate 10−3
  • 0.9 momentum
  • 0.0005 weight decay
  • batch size 32

参考

链接 说明
SSD PaperSSD SlideSSD Code-Caffe 官方
目标检测 SSD原理与实现 讲解细致,有tf实现
深度学习笔记(七)SSD 论文阅读笔记简化 讲解细致,尤其是正负样本获得,可继续阅读同一作者的深度学习笔记(七)SSD 论文阅读笔记

YOLO

参考

链接 说明
YOLOv1 PaperYOLOv2 PaperYOLOv3 Paper 官方
目标检测 YOLO原理与实现 讲解细致,有tf实现
目标检测 YOLOv2原理与实现(附YOLOv3) 讲解细致,有tf实现

Faster R-CNN

Faster R-CNN论文详解

参考

GluonCV-Detection
(机器之心)从R-CNN到RFBNet,目标检测架构5年演进全盘点
【重磅】基于深度学习的目标检测算法综述
白话mAP

NMS原理(非极大值抑制)+python实现
Soft-NMS
R-CNN、SPP-Net、Fast R-CNN、Faster R-CNN总结
Spatial Transformer Networks
理解Spatial Transformer Networks