周传华, 周东东, 夏徐东, 周子涵. 基于卷积注意力机制和多损失联合的跨模态行人重识别[J]. 微电子学与计算机, 2022, 39(6): 22-30. DOI: 10.19304/J.ISSN1000-7180.2022.0002
引用本文: 周传华, 周东东, 夏徐东, 周子涵. 基于卷积注意力机制和多损失联合的跨模态行人重识别[J]. 微电子学与计算机, 2022, 39(6): 22-30. DOI: 10.19304/J.ISSN1000-7180.2022.0002
ZHOU Chuanhua, ZHOU Dongdong, XIA Xudong, ZHOU Zihan. Cross-modality person re-identification based on convolutional attention mechanism and multi-loss combination[J]. Microelectronics & Computer, 2022, 39(6): 22-30. DOI: 10.19304/J.ISSN1000-7180.2022.0002
Citation: ZHOU Chuanhua, ZHOU Dongdong, XIA Xudong, ZHOU Zihan. Cross-modality person re-identification based on convolutional attention mechanism and multi-loss combination[J]. Microelectronics & Computer, 2022, 39(6): 22-30. DOI: 10.19304/J.ISSN1000-7180.2022.0002

基于卷积注意力机制和多损失联合的跨模态行人重识别

Cross-modality person re-identification based on convolutional attention mechanism and multi-loss combination

  • 摘要: 红外光和可见光(RGB-IR)下的跨模态行人重识别(Re-id)对于现代视频监控,尤其是夜间监控具有重要意义.现有的单模态行人重识别领域的研究成果已达到较高水平.然而,除了光照条件、人体姿势、摄像机角度等常见问题外,跨模态行人重识别问题难点主要在于同时存在模态间的巨大差异和模态内的类内变化,为此本文提出了基于卷积注意力机制和多损失联合的跨模态行人重识别方法.此方法基于双流网络结构,首先在双流网络的两支路中使用Resnet50网络前三层卷积层分别提取行人图片的浅层特征,然后嵌入卷积注意力机制模块以抑制颜色等无关信息的提取,并融合中层特征和支路骨干网络获取的最终特征提升获取特征的辨别力,最后采用双向跨模态三元组损失和身份损失联合约束双流网络,加快网络模型收敛,有效应对模态间的差异以及类内变化.实验结果表明本文提出的方法使跨模态行人重识别问题的精度得到了有效的提升.

     

    Abstract: Cross-modal person re-identification (Re-id) under infrared and visible light (RGB-IR) is of great significance for modern video surveillance, especially nighttime surveillance. The existing research results in the field of single-modal person re-identification have reached a high level. However, in addition to common problems such as lighting conditions, human poses, camera angles, etc., the difficulty of cross-modal person re-identification mainly lies in the simultaneous existence of huge differences between modalities and intra-modal variation within modalities. A cross-modal person re-identification method based on cumulative attention mechanism and joint multi-loss. This method is based on the dual-stream network structure. First, the first three convolutional layers of the Resnet50 network are used in the two branches of the dual-stream network to extract the shallow features of pedestrian images, and then the convolutional attention mechanism module is embedded to suppress the extraction of irrelevant information such as color., and fuse the middle-level features and the final features acquired by the branch backbone network to improve the discrimination of the acquired features. Finally, the bidirectional cross-modal triplet loss and the identity loss are used to jointly constrain the dual-stream network to speed up the convergence of the network model and effectively deal with the inter-modal differences. Differences as well as intra-class variation. The experimental results show that the method proposed in this paper can effectively improve the accuracy of cross-modal person re-identification problem.

     

/

返回文章
返回