毛志强, 马翠红, 崔金龙, 王毅. 基于双流卷积与双中心loss的行为识别研究[J]. 微电子学与计算机, 2019, 36(3): 96-100.
引用本文: 毛志强, 马翠红, 崔金龙, 王毅. 基于双流卷积与双中心loss的行为识别研究[J]. 微电子学与计算机, 2019, 36(3): 96-100.
MAO Zhi-qiang, MA Cui-hong, CUI Jin-long, WANG Yi. Research on action recognition based on two-stream convolution and double center loss[J]. Microelectronics & Computer, 2019, 36(3): 96-100.
Citation: MAO Zhi-qiang, MA Cui-hong, CUI Jin-long, WANG Yi. Research on action recognition based on two-stream convolution and double center loss[J]. Microelectronics & Computer, 2019, 36(3): 96-100.

基于双流卷积与双中心loss的行为识别研究

Research on action recognition based on two-stream convolution and double center loss

  • 摘要: 针对行为视频中相似动作类内差异大、类间差异小, 识别准确率不高的问题, 提出了一种基于双流卷积网络与双中心loss的行为识别方法.该方法首先构建双流卷积网络结构, 以C3Dnet模型作为双流结构的基础模型, 分别提取多尺度RGB视频帧中的表观短时运动信息和堆叠光流图中的长时运动信息; 然后将双流结构提取的深度信息经长短时记忆(LSTM)网络解析后进行特征融合; 最后, 利用基于双中心loss的2C-softmax目标函数, 来最大化类间距离和最小化类内距离, 从而实现相似动作的分类与识别.在数据集KTH上的实验结果表明, 该方法能够准确识别相似动作, 识别准确率可达98.2%, 具有很好的识别效果.

     

    Abstract: Aiming at the problem of large difference in similar action classes, small difference between classes in action video and low recognition accuracy, a action recognition method based on two-stream convolution network and double-center loss is proposed. The method first constructs a two-stream convolutional network structure, and uses the C3Dnet model as the basic model of the two-stream structure to extract the apparent short-term motion information in the multi-scale RGB video frame and the long-term motion information in the stacked optical flow map respectively; Then, the depth information extracted by the two-stream structure is parsed by a long and short time memory (LSTM) network to perform feature fusion; Finally, the 2C-softmax objective function based on dual-center loss is used to maximize the distance between classes and minimize the distance within the class, so as to classify and identify similar actions. The experimental results on the data set KTH show that the method can accurately identify similar actions, and the recognition accuracy can reach 98.2%, which has a good recognition effect.

     

/

返回文章
返回