史浩, 邢瑜航, 陈炼. 基于多尺度融合注意力机制的人脸表情识别研究[J]. 微电子学与计算机, 2022, 39(3): 34-40. DOI: 10.19304/J.ISSN1000-7180.2021.0799
引用本文: 史浩, 邢瑜航, 陈炼. 基于多尺度融合注意力机制的人脸表情识别研究[J]. 微电子学与计算机, 2022, 39(3): 34-40. DOI: 10.19304/J.ISSN1000-7180.2021.0799
SHI Hao, XING Yuhang, CHEN Lian. Facial expression recognition based on multi-scale feature fusion and attention mechanism[J]. Microelectronics & Computer, 2022, 39(3): 34-40. DOI: 10.19304/J.ISSN1000-7180.2021.0799
Citation: SHI Hao, XING Yuhang, CHEN Lian. Facial expression recognition based on multi-scale feature fusion and attention mechanism[J]. Microelectronics & Computer, 2022, 39(3): 34-40. DOI: 10.19304/J.ISSN1000-7180.2021.0799

基于多尺度融合注意力机制的人脸表情识别研究

Facial expression recognition based on multi-scale feature fusion and attention mechanism

  • 摘要: 针对传统卷积神经网络在表情特征提取阶段容易丢失大量有用信息,无法提取到高判别性表情特征,从而导致表情识别率低的问题,提出一种基于多尺度特征融合注意力机制的人脸表情识别方法.首先,采用VGGNet16来提取卷积特征.为了避免表情特征信息的丢失,将网络中不同层次卷积层的输出特征图进行多尺度特征融合,引入上下文信息的同时提取更加丰富的表情特征信息;为了能够着重关注关键表情特征,在网络中引入了注意力机制.该机制利用分组卷积操作对通道注意力模块进行改进,学习不同通道的权重信息,获取注意力特征图,增强特征的表达能力,抑制冗余信息的影响.为了进一步提高提取到表情特征的可判别性,引入孤岛损失函数,并与Softmax分类损失函数联合使用构成新的损失函数.最后,由于对全连接层进行了删减.为防止网络出现过拟合问题,在卷积层引入了DropBlock策略.实验结果表明,该模型在Fer2013和CK+数据集上分别取得了73.32%和97.40%的平均准确率.

     

    Abstract: Aiming at the problem that traditional convolutional neural networks easily lose a lot of useful information in the expression feature extraction stage, and cannot extract high discriminative expression features, which leads to the problem of low expression recognition rate, a facial expression recognition method based on multi-scale feature fusion and attention mechanism is proposed. First, use VGGNet16 to extract convolutional features, multi-scale feature fusion is performed on the output feature maps of different layers of convolutional layers in the network, and context information is introduced while extracting richer expression feature information. In order to focus on key expression features, an attention mechanism is introduced in the network, the channel attention module is improved by grouping convolution operations, learning the weight information of different channels, obtaining attention feature maps, enhancing the expression ability of features. In order to further improve the discriminability of the extracted expression features, an island loss function is introduced, and combined with the Softmax classification loss function to form a new loss function. Finally, due to the deletion of the fully connected layer, the DropBlock strategy is introduced in the convolution layer to prevent the network from over fitting. The experimental results show that the model has achieved average accuracy rates of 73.32% and 97.40% on the Fer 2013 and CK+ datasets.

     

/

返回文章
返回