薛艳飞, 张建明. 基于对抗训练的跨语料库语音情感识别方法[J]. 微电子学与计算机, 2021, 38(3): 77-83.
引用本文: 薛艳飞, 张建明. 基于对抗训练的跨语料库语音情感识别方法[J]. 微电子学与计算机, 2021, 38(3): 77-83.
XUE Yan-fei, ZHANG Jian-ming. Cross-corpus speech emotion recognition based on adversarial training[J]. Microelectronics & Computer, 2021, 38(3): 77-83.
Citation: XUE Yan-fei, ZHANG Jian-ming. Cross-corpus speech emotion recognition based on adversarial training[J]. Microelectronics & Computer, 2021, 38(3): 77-83.

基于对抗训练的跨语料库语音情感识别方法

Cross-corpus speech emotion recognition based on adversarial training

  • 摘要: 在跨语料库语音情感识别中,训练和测试数据分布的差异变得非常明显,导致验证和测试性能差别很大.针对该问题,提出一种基于对抗训练的跨语料库语音情感识别方法.该方法通过语料库之间的对抗训练能有效地缩小不同语料库之间的差异,提升模型对域不变情感特征的提取能力.同时,通过引入多头自注意力机制,对语音序列中不同位置元素之间的依赖关系进行序列建模,增强序列中情感显著特征的提取能力.在以IEMOCAP为源域、MSP-IMPRO为目标域和在以MSP-IMPRO为源域、IEMOCAP为目标域上的实验表明,所提出方法的相对UAR性能相比于基准方法分别提升了0.91%~12.22%和2.27%~6.90%.因此,在目标域标注缺失的情况下,所提出的跨语料库语音情感识别方法具有更好的域不变情感显著特征的提取能力.

     

    Abstract: The difference in data distributions becomes very clear when the training and testing data come from different corpora, causing a large performance gap between validation and testing performance. To solve this problem, a cross-corpus speech emotion recognition method based on adversarial training is proposed. The proposed method can effectively eliminate the differences between different corpora with the adversarial training of corpora, and improve the extracting ability of domain-invariant emotion features. At the same time, model the relative dependence of different position elements in the speech sequence to enhance the emotion-salient features extracting ability of the sequence by introducing the multi-head attention mechanism. When the experiment applies IEMOCAP as the source domain and MSP-IMPRO as the target domain, the results are superior to the benchmark methods about 0.91%~12.22%. Meanwhile, the experiment applies MSP-IMPRO as the source domain and IEMOCAP as the target domain, the results also achieve better performance than the benchmark methods about 2.27%~6.90%. Therefore, in the case of the absence of emotion labels of the target domain, the proposed cross-corpus speech emotion recognition method is more beneficial to extracting domain-invariant emotion salient features.

     

/

返回文章
返回