基于维度交互和跨层尺度级联的雾天目标检测算法

苏佳; 梁奔; 冯康康; 孟俊彤; 贾欣雨; 侯卫民

doi:10.19304/J.ISSN1000-7180.2023.0009

基于维度交互和跨层尺度级联的雾天目标检测算法

Object detection algorithm in foggy weather based on dimensional interaction and cross-layer scale cascade

摘要

摘要: 针对雾天场景下目标检测过程中由于图像模糊导致模型检测精度低、鲁棒性不佳等问题，结合数据增强对YOLOv5算法进行了优化改进，提出一种基于维度交互和跨层尺度级联的目标检测方法。首先，将三重注意力嵌入特征提取结构，捕捉不同维度间的依赖关系，增强空间和通道间信息的融合交互，提高对重要特征的关注能力。其次，提出多尺度感受野增强模块（MREM）。采用多次重复池化采样融合残差连接思想，有效扩大目标感受野获取多尺度特征，增强模型对细节信息的提取能力。再次，提出跨层级联路径聚合网络（CLC-PAN）结构。采用跨层连接的方式促进不同尺度特征信息融合，提高浅层细节信息和深层语义信息的交互，并通过加深特征金字塔采样层数捕获更丰富的语义特征，使各种锚框的铺设间隔更加合理，提高模型检测能力。最后，使用SIoU损失函数作为目标边界框回归损失函数，提高目标框定位准确度和样本训练速度。实验结果表明，改进后检测方法模型大小为15.8 MB，mAP达到71.3%，相较于YOLOv5s提升了7%，能够满足雾天场景下的快速准确地实时目标检测。

Abstract: Aiming at the problems such as low accuracy and poor robustness of model detection due to fuzzy image in the process of target detection in foggy scenarios, the YOLOv5 algorithm is optimized and improved in combination with data enhancement, and a target detection method based on dimensional interaction and cross-layer scale cascading is proposed. Firstly, triplet attention is embedded in the feature extraction structure to capture the dependencies between different dimensions, enhance the fusion and interaction of information between spaces and channels, and improve the ability to focus on important features. Secondly, a Multi-scale Receptive-field Enhancement Module (MREM) is proposed. Multiple repeated pooled sampling and residual connection were used to enlarge the target receptive field to obtain multi-scale features and enhance the ability of the model to extract details. Thirdly, the structure of Cross-Layer Cascading Path Aggregation Network(CLC-PAN) is proposed. Cross-layer connection is adopted to promote the fusion of feature information of different scales, improve the interaction between shallow detail information and deep semantic information, and capture richer semantic features by deepening the sampling layers of the feature pyramid, so that the laying intervals of various anchor frames are more reasonable and the model detection ability is improved. Finally, the SIoU loss function is used as the target bounding box regression loss function to improve the target box positioning accuracy and sample training speed. The experimental results show that the model size of the improved detection method is 15.8 MB, and the mAP reaches 71.3%, which is 7% higher than that of YOLOv5s, which can meet the needs of fast and accurate real-time target detection in foggy scenes.

HTML全文

参考文献(21)

施引文献

资源附件(0)