才智, 魏为民, 栗风永, 刘畅. 一种时空注意力残差网络的人脸篡改检测方法[J]. 微电子学与计算机, 2022, 39(10): 46-53. DOI: 10.19304/J.ISSN1000-7180.2022.0119
引用本文: 才智, 魏为民, 栗风永, 刘畅. 一种时空注意力残差网络的人脸篡改检测方法[J]. 微电子学与计算机, 2022, 39(10): 46-53. DOI: 10.19304/J.ISSN1000-7180.2022.0119
CAI Zhi, WEI Weimin, LI Fengyong, LIU Chang. Face tampering detection method based on spatiotemporal attention residual network[J]. Microelectronics & Computer, 2022, 39(10): 46-53. DOI: 10.19304/J.ISSN1000-7180.2022.0119
Citation: CAI Zhi, WEI Weimin, LI Fengyong, LIU Chang. Face tampering detection method based on spatiotemporal attention residual network[J]. Microelectronics & Computer, 2022, 39(10): 46-53. DOI: 10.19304/J.ISSN1000-7180.2022.0119

一种时空注意力残差网络的人脸篡改检测方法

Face tampering detection method based on spatiotemporal attention residual network

  • 摘要: 针对深度伪造视频的检测,传统的残差网络检测方法无法捕捉视频帧间远距离依赖关系并且忽略局部关键信息,为此,提出一种结合时空注意力机制的残差网络的人脸篡改检测方法.首先利用OpenCV提取视频帧,使用Dlib工具在每个提取的帧图像上定位面部地标,依据获得的面部地标经裁剪、对齐和调整脸部大小获得人脸帧序列.然后通过去除最后两层(全局平均池化层和全连接层)的残差网络(ResNeXt)提取人脸数据的空间域特征,在此基础上,融合自注意力机制学习上述特征中的局部关键信息.之后使用长短时记忆层捕捉视频帧间的远距离依赖关系,从而获取时间域特征.最后经过Dropout层随机抛弃部分神经元,增加模型的泛化性,并使用全连接层进行人脸的真假分类.在FaceForensics++数据集上进行实验,该方法的检测准确率较多个基线算法均有所提升,表明该方法能有效检测视频中人脸区域是否被篡改.

     

    Abstract: For the detection of deep forged video, the traditional residual network detection method cannot capture the long-range dependency between video frames and ignores the local critical information. Therefore, we propose a face tampering detection method using a residual network combined with a spatiotemporal attention mechanism. First, we extract video frames using OpenCV, locate facial landmarks on each extracted frame using the Dlib tool, and obtain face frame sequences by cropping, aligning, and resizing faces based on the obtained facial landmarks. Then the spatial domain features of the face data are extracted by removing the residual network (ResNeXt) of the last two layers (global average pooling layer and fully connected layer), based on which the local critical information in the above features is learned by fusing the self-attention mechanism. After that, the long and short-term memory layers capture the long-distance dependencies between video frames to obtain the time-domain features. Finally, some neurons are randomly discarded after the Dropout layer to increase the model's generalization, and a fully-connected layer is used to classify faces as true or false. Experiments are conducted on the FaceForensics++ dataset. The detection accuracy of the method is improved over several baseline algorithms, indicating that the method can effectively detect whether the face region in the video is tampered.

     

/

返回文章
返回