张培杰,孙颖,张雪英,等.基于关联认知网络的语音情感识别模型[J]. 微电子学与计算机,2023,40(8):1-9. doi: 10.19304/J.ISSN1000-7180.2022.0638
引用本文: 张培杰,孙颖,张雪英,等.基于关联认知网络的语音情感识别模型[J]. 微电子学与计算机,2023,40(8):1-9. doi: 10.19304/J.ISSN1000-7180.2022.0638
ZHANG P J,SUN Y,ZHANG X Y,et al. Speech emotion recognition model based on interactive cognitive network[J]. Microelectronics & Computer,2023,40(8):1-9. doi: 10.19304/J.ISSN1000-7180.2022.0638
Citation: ZHANG P J,SUN Y,ZHANG X Y,et al. Speech emotion recognition model based on interactive cognitive network[J]. Microelectronics & Computer,2023,40(8):1-9. doi: 10.19304/J.ISSN1000-7180.2022.0638

基于关联认知网络的语音情感识别模型

Speech emotion recognition model based on interactive cognitive network

  • 摘要: 人们通过语言表达情感是一个不断变化的过程,为了使用语音信号的时间连续性表达具体情感,本文搭建了一种基于关联认知网络的GA-GRUS-ICN模型. 首先,对输入的语音特征使用GRUS网络提取深度时序特征;然后,引入自注意力机制给重要的特征片段赋予更高的权重;最后,使用关联认知网络ICN构建情感之间的关联性,得到情感关联矩阵和最终识别结果,本文中对于超参数使用遗传算法GA进行选择. 选用TYUT2.0、EMO-DB和CASIA语音数据库中的“悲伤”、“愤怒”、“高兴”三种基本情感作为实验数据,文章设计了五种实验方案进行两个消融实验,实验结果显示,三种模型在三种语音库的UA分别达到了80.83%、98.61%和88.13%,表明GA-GRUS-ICN识别模型在情感语音识别方面有较强的普适性,自注意力机制与GRUS-ICN模型比较适配,亦可以较好地进行语音情感识别.

     

    Abstract: Human expressing emotions through language is a gradually changing process. In order to use the time continuity of speech signals to express specific emotions, this paper builds a GA-GRUS-ICN model based on Interactive cognitive network. Firstly, the GRUS network is used to extract the depth timing features of the input speech features. Then, the self-attention mechanism is introduced to give higher weights to important feature segments. Finally, ICN is used to construct the correlation between emotions to obtain the emotion correlation matrix and the final recognition result. In this paper, the genetic algorithm GA is used to select the hyperparameters. The three basic emotions of “sadness”, “anger” and “happy” in the TYUT2.0, EMO-DB and CASIA emotional voice database are selected as experimental data, the paper designed five experimental schemes to perform two ablation experiments, experimental results show that, The UA of the three models in the three speech database reached 80.83%, 98.61% and 88.13% respectively, indicating that the GA-GRUS-ICN recognition model has strong universality in emotional speech recognition, and the self-attention mechanism is more suitable for the GRUS-ICN model, and can also perform speech emotion recognition well.

     

/

返回文章
返回