• 北大核心期刊(《中文核心期刊要目总览》2017版)
  • 中国科技核心期刊(中国科技论文统计源期刊)
  • JST 日本科学技术振兴机构数据库(日)收录期刊

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

多头自注意力机制Siamese网络文本相似度计算方法

曹小鹏 周凯强

曹小鹏, 周凯强. 多头自注意力机制Siamese网络文本相似度计算方法[J]. 微电子学与计算机, 2021, 38(10): 15-20. doi: 10.19304/J.ISSN1000-7180.2021.0141
引用本文: 曹小鹏, 周凯强. 多头自注意力机制Siamese网络文本相似度计算方法[J]. 微电子学与计算机, 2021, 38(10): 15-20. doi: 10.19304/J.ISSN1000-7180.2021.0141
CAO Xiaopeng, ZHOU Kaiqiang. Siamese network text similarity calculation with multi-head self-attention mechanism[J]. Microelectronics & Computer, 2021, 38(10): 15-20. doi: 10.19304/J.ISSN1000-7180.2021.0141
Citation: CAO Xiaopeng, ZHOU Kaiqiang. Siamese network text similarity calculation with multi-head self-attention mechanism[J]. Microelectronics & Computer, 2021, 38(10): 15-20. doi: 10.19304/J.ISSN1000-7180.2021.0141

多头自注意力机制Siamese网络文本相似度计算方法

doi: 10.19304/J.ISSN1000-7180.2021.0141
基金项目: 

国家自然科学基金 61136002

陕西省重点研发计划项目 2021GY-181

陕西省教育厅科技计划资助项目 2013jk1128

详细信息
    作者简介:

    曹小鹏    男,(1976-),博士,教授.研究方向为自然语言处理, 软件测试

    通讯作者:

    周凯强(通讯作者)   男,(1995-),硕士研究生.研究方向为自然语言处理. E-mail: 840622349@qq.com

  • 中图分类号: TP391.1

Siamese network text similarity calculation with multi-head self-attention mechanism

  • 摘要: 文本相似度的计算是自然语言处理的核心问题.现有的文本相似度计算方法,存在对于深层次的语义信息提取的不充分,且对长文本的相似度计算能力有限的问题.针对现有文本相似度计算方法的缺陷,提出一种基于多头自注意力机制的Siamese网络,利用双向GRU为基础的Siamese模型精确提取文本样本中上下文的语义信息,同时加入多头自注意力机制学习长文本深层次的语义信息.在公开的SICK数据集上,实验结果表明加入多头自注意力机制的Bi-GRU Siamese网络模型可以学习到长文本深层次的语义信息,对比其他的文本相似度的计算方法,相关系数显著提升,处理长文本效果较好.
  • 图  1  Siamese模型基本结构

    图  2  Multi-Head Self-Attention Siamese-Bi-GRU结构

    图  3  多头自注意力机制结构

    图  4  不同句长下两个模型Pearson系数

    表  1  实验参数设置

    实验参数设置 参数值
    GRU隐藏层大小 128
    Muti-Head Self-Attention头数 3
    学习率 0.05
    迭代轮数Epoch 20轮
    词向量维度 200维
    下载: 导出CSV

    表  2  设置不同头数的模型性能

    Attention-Head r ρ MSE
    1 0.821 2 0.792 3 0.254 8
    2 0.830 4 0.824 1 0.242 8
    3 0.852 7 0.832 9 0.240 1
    4 0.847 9 0, 830 1 0.251 2
    下载: 导出CSV

    表  3  其他Siamese网络和MSA-Siamese对比

    r ρ MSE
    LSTM+Attention 0.863 6 0.813 5 0.246 2
    GRU 0.873 2 0.825 4 0.238 2
    Bi-GRU 0.872 1 0.821 0 0.236 5
    BiGRU+Attention 0.887 3 0.831 2 0.226 3
    MSA-Siamese 0.890 7 0.838 8 0.221 6
    下载: 导出CSV

    表  4  两种模型在SICK数据集性能对比

    句子对 真实值 B M
    A group of kids is playing in a yard and an old man is standing in the background.
    A group of boys in a yard is playing and a man is standing in the background.
    4.7 3.9 4.2
    Kids in red shirts are playing in the leaves.
    Children in red shirts are playing in the leaves.
    3.8 3.5 3.2
    People wearing costumes are gathering in a forest and are looking in the same direction.
    People with costumes are gathered in a wooded area looking the same direction.
    4.8 4.4 3.8
    Various people are eating at red tables in a crowded restaurant with purple lights.
    Various customers are eating in a crowded restaurant with purple lights.
    4.5 3.8 4.2
    下载: 导出CSV
  • [1] 俞婷婷, 徐彭娜, 江育娥, 等. 基于改进的Jaccard系数文档相似度计算方法[J]. 计算机系统应用, 2017, 26(12): 137-142. DOI:  10.15888/j.cnki.csa.006123.

    YU T T, XU P N, JIANG Y E, et al. Text similarity method based on the improved Jaccard coefficient[J]. Computer Systems & Applications, 2017, 26(12): 137-142. DOI:  10.15888/j.cnki.csa.006123.
    [2] JARO M A. Advances in record-linkage methodology as applied to matching the 1985 census of Tampa, Florida[J]. Journal of the American Statistical Association, 1989, 84(406): 414-420. DOI:  10.1080/01621459.1989.10478785.
    [3] SALTON G, WONG A, YANG C S. A vector space model for automatic indexing[J]. Communications of the ACM, 1975, 18(11): 613-620. DOI:  10.1145/361219.361220.
    [4] BLEI D M, NG A Y, JORDAN M I. Latent dirichlet allocation[J]. Journal of Machine Learning Research, 2003, 3: 993-1022.
    [5] HUANG P S, HE X D, GAO J F, et al. Learning deep structured semantic models for web search using clickthrough data[C]//Proceedings of the 22nd ACM International Conference on Information & knowledge Management. San Francisco, CA, USA: ACM, 2013. DOI: 10.1145/2505515.2505665.
    [6] TAI K S, SOCHER R, MANNING C D. Improved semantic representations from tree-structured long short-term memory networks[C]//Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing. Beijing, China: Association for Computational Linguistics, 2015. DOI: 10.3115/v1/P15-1150.
    [7] KENTER T, BORISOV A, DE RIJKE M. Siamese CBOW: optimizing word embeddings for sentence representations[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. Berlin, Germany: Association for Computational Linguistics, 2016. DOI: 10.18653/v1/P16-1089.
    [8] MUELLER J, THYAGARAJAN A. Siamese recurrent architectures for learning sentence similarity[C]//Proceedings of the30th AAAI Conference on Artificial Intelligence. Phoenix, Arizona, USA: AAAI Press, 2016. DOI: 10.5555/3016100.3016291.
    [9] LIN Z H, FENG M W, DOS SANTOS C N, et al. A structured self-attentive sentence embedding[C]//Proceedings of the 5th International Conference on Learning Representations. Toulon, France: OpenReview. net, 2017.
    [10] 刘文, 马慧芳, 脱婷, 等. 融合共现距离和区分度的短文本相似度计算方法[J]. 计算机工程与科学, 2018, 40(7): 1281-1286. DOI:  10.3969/j.issn.1007-130X.2018.07.019.

    LIU W, MA H F, TUO T, et al. Short text similarity measure based on co-occurrence distance and discrimination[J]. Computer Engineering and Science, 2018, 40(7): 1281-1286. DOI:  10.3969/j.issn.1007-130X.2018.07.019.
    [11] 肖和, 付丽娜, 姬东鸿. 神经网络与组合语义在文本相似度中的应用[J]. 计算机工程与应用, 2016, 52(7): 139-142. DOI:  10.3778/j.issn.1002-8331.1405-0187.

    XIAO H, FU L N, JI D H. Neural language model and semantic compositionality model in semantic similarity[J]. Computer Engineering and Applications, 2016, 52(7): 139-142. DOI:  10.3778/j.issn.1002-8331.1405-0187.
    [12] GOKUL P P, AKHIL B K, SHIVA K K M. Sentence similarity detection in Malayalam language using cosine similarity[C]//Proceedings of the 2017 2nd IEEE International Conference on Recent Trends in Electronics, Information & Communication Technology. Bangalore, India: IEEE, 2017. DOI: 10.1109/RTEICT.2017.8256590.
    [13] 陈二静, 姜恩波. 文本相似度计算方法研究综述[J]. 数据分析与知识发现, 2017, 1(6): 1-11. DOI:  10.11925/infotech.2096-3467.2017.06.01.

    CHEN E J, JIANG E B. Review of studies on text similarity measures[J]. Data Analysis and Knowledge Discovery, 2017, 1(6): 1-11. DOI:  10.11925/infotech.2096-3467.2017.06.01.
    [14] 张小川, 余林峰, 张宜浩. 基于LDA的多特征融合的短文本相似度计算[J]. 计算机科学, 2018, 45(9): 266-270. https://www.cnki.com.cn/Article/CJFDTOTAL-JSJA201809046.htm

    ZHANG X C, YU L F, ZHANG Y H. Multi-feature fusion for short text similarity calculation based on LDA[J]. Computer Science, 2018, 45(9): 266-270. https://www.cnki.com.cn/Article/CJFDTOTAL-JSJA201809046.htm
    [15] NECULOIU P, VERSTEEGH M, ROTARU M. Learning text similarity with Siamese recurrent networks[C]//Proceedings of the 1st Workshop on Representation Learning for NLP. Berlin, Germany: Association for Computational Linguistics, 2016. DOI: 10.18653/v1/W16-1617.
    [16] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach, CA, USA: NIPS, 2017. DOI: 10.5555/3295222.3295349.
    [17] RANASINGHE T, ORÂSAN C, MITKOV R. Semantic textual similarity with Siamese neural networks[C]//Proceedings of International Conference on Recent Advances in Natural Language Processing. Varna, Bulgaria: INCOMA Ltd., 2019. DOI: 10.26615/978-954-452-056-4_116.
  • 加载中
图(4) / 表(4)
计量
  • 文章访问数:  101
  • HTML全文浏览量:  60
  • PDF下载量:  19
  • 被引次数: 0
出版历程
  • 收稿日期:  2021-01-27
  • 修回日期:  2021-03-01
  • 刊出日期:  2021-10-05

目录

    /

    返回文章
    返回