Head detection algorithm based on improved FaceBoxes
-
摘要:
对于监控视频下的垂直视角, 常常会出现人体遮挡, 背景模糊等导致检测人体精度低的问题, 本文提出以人头为目标代替人体进行检测的监控视频人数统计的方法.该方法对一种轻量型的网络FaceBoxes进行改进, 并加入多尺度特征融合的方式, 提升密集人头检测的精度.由于监控视频中, 人头目标比较小, 一般不会出现特大目标, 因此使用k-means方法对候选框的尺度进行聚类, 再进行训练, 并使用交叠区域检测方法提高预测框的精度.实验结果表明, 改进型的FaceBoxes方法相比之前的网络精度提高了约4%, 其速度稍有降低.
Abstract:For the vertical perspective of surveillance video, human body occlusion and background blur often occur, which lead to low accuracy of human body detection.This paper proposes a surveillance video number statistics method that takes human head as the target to detect human body. This method improves a lightweight network of FaceBoxesand adds multi-scale feature fusion to improve the accuracy of dense head detection. In the surveillance video, the head target is relatively small, and generally there is no large target. Therefore, k-means method is used to cluster the scale of candidate frame, and then to train, and overlapping region detection method is used to improve the accuracy of prediction frame. The experimental results show that the improved FaceBoxesmethod is about 4% more accurate and slightly slower than the previous one.
-
Key words:
- FaceBoxes /
- head detection /
- neural network /
- feature fusion
-
表 1 人头数据集Kmeans++聚类结果
类别数 4类 5类 Anchor尺度 (43, 45) (60, 64)(80, 94) ((144, 174) (114, 132) (88, 104) (160, 194) (63, 69) (42, 44) IOU 0.73 0.85 表 2 改进前后模型性能对照表, *表示使用交叠区域检测方法(CPU 4110@2.1 GHz)
指标 AP(平均准确率)/% 帧数/s-1 YOLOV3 89.62 / FaceBoxes 92.57 7 改进后FaceBoxes 95.18 8 改进后FaceBoxes* 96.28 4 -
[1] 吉训生, 吴凡.基于混合卷积神经网络的人头检测方法[J].高技术通讯, 2018, 28(4): 313-319. DOI: 10.3772/j.issn.1002-0470.2018.04.004.JI X S, WU F. Head detection using hybrid convolution neural networks[J].Chinese High Technology Letters, 2018, 28(4): 313-319.DOI: 10.3772/j.issn.1002-0470.2018.04.004. [2] REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposalnetworks[C]//Proceedings of the 28th International Conference on Neural Information Processing Systems. Cambridge: ACM, 2015: 91-99. [3] LIN T Y, DOLLáR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 2117-2125. DOI: 10.1109/CVPR.2017.106. [4] LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot multibox detector[C]//14th European Conference on Computer Vision. Amsterdam: Springer, 2016. DOI: 10.1007/978-3-319-46448-0_2. [5] REDMON J, FARHADI A. YOLO9000: better, faster, stronger[C]//Proceedings of 2017IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 7263-7271. DOI: 10.1109/CVPR.2017.690. [6] ZHANG S F, ZHU X Y, LEI Z, et al. FaceBoxes: a CPU real-time face detector with high accuracy[C]//2017 IEEE International Joint Conference on Biometrics (IJCB). Denver: IEEE, 2017: 1-9. DOI: 10.1109/BTAS.2017.8272675. [7] 高玮军, 师阳, 杨杰, 等.一种改进的轻量人头检测方法[J/OL].计算机工程与应用, 2020: 1-9. (2020-02-21).https://kns.cnki.net/KCMS/detail/11.2127.tp.20200221.0926.008.html. [8] 刘阳, 李银萍.基于视频分析的复杂场景人数统计方法研究[J].信息通信, 2018(5): 49-53. DOI: 10.3969/j.issn.1673-1131.2018.05.021.LIU Y, LI Y P. Research on the method of people counting based on video analysis in complex scenes[J].Information & Communications, 2018(5): 49-53.DOI: 10.3969/j.issn.1673-1131.2018.05.021. [9] 吉训生, 王昊.基于优化可形变区域全卷积神经网络的人头检测方法[J].激光与光电子学进展, 2019, 56(14): 121-131. DOI: 10.3788/LOP56.141009.JI X S, WANG H. Head detection method based on optimized deformable regional fully convolutional neutral networks[J].Laser & Optoelectronics Progress, 2019, 56(14): 121-131.DOI: 10.3788/LOP56.141009. -