钟来民, 陆卫忠, 傅启明, 马洁明, 崔志明, 吴宏杰. 基于Transformer-BiLSTM特征融合的DNA结合蛋白预测方法[J]. 微电子学与计算机, 2023, 40(12): 1-9. DOI: 10.19304/J.ISSN1000-7180.2022.0871
引用本文: 钟来民, 陆卫忠, 傅启明, 马洁明, 崔志明, 吴宏杰. 基于Transformer-BiLSTM特征融合的DNA结合蛋白预测方法[J]. 微电子学与计算机, 2023, 40(12): 1-9. DOI: 10.19304/J.ISSN1000-7180.2022.0871
ZHONG Laimin, LU Weizhong, FU Qiming, MA Jieming, CUI Zhiming, WU Hongjie. DNA binding protein identification method based on Transformer-BiLSTM feature fusion[J]. Microelectronics & Computer, 2023, 40(12): 1-9. DOI: 10.19304/J.ISSN1000-7180.2022.0871
Citation: ZHONG Laimin, LU Weizhong, FU Qiming, MA Jieming, CUI Zhiming, WU Hongjie. DNA binding protein identification method based on Transformer-BiLSTM feature fusion[J]. Microelectronics & Computer, 2023, 40(12): 1-9. DOI: 10.19304/J.ISSN1000-7180.2022.0871

基于Transformer-BiLSTM特征融合的DNA结合蛋白预测方法

DNA binding protein identification method based on Transformer-BiLSTM feature fusion

  • 摘要: 蛋白质与生命活动密切相关,脱氧核糖核酸(DNA)结合蛋白作为一种特殊的蛋白质,在生命活动中有着不可替代的作用. 因此,研究DNA结合蛋白有很重要的现实意义,这个课题的研究前景十分广阔. 传统生物技术虽然精度较高,但其成本十分的昂贵,效率比较低,设备要求极高,并不适合现代社会大量研究蛋白质的需求. 机器学习的方法在一定程度上弥补了生物实验技术的不足,但是在数据处理方面远不如深度学习技术来的高效与便捷. 在本研究中提出了一种基于双向平行长短期记忆神经网络(BiLSTM)和Transformer的深度学习框架来预测DNA结合蛋白. 该模型不仅可以进一步提取蛋白质序列的信息和特征,还可以进一步提取进化信息的特征,最后,将这两个特征融合起来进行训练和测试. 该模型拓展了研究人员在蛋白质特征提取方面的研究思路,为使用Transformer编码器块提取蛋白质全局特征提供参考. 在PDB2272数据集上,与PDBP_Fusion模型相比,精度(ACC)和Matthew相关系数(MCC)分别提高了2.64%和5.51%. 该模型的实验结果具有一定的优势.

     

    Abstract: Protein is closely related to life activities. As a special protein,DeoxyriboNucleic Acid(DNA) binding protein plays an irreplaceable role in life activities. Therefore, the study of DNA binding protein has very important practical significance, and the research prospect of this subject is very broad. Although the traditional biotechnology has high precision, its cost is very expensive, relatively low efficiency and high equipment requirements, so it is not suitable for the modern society to study a large number of proteins. To some extent, machine learning makes up for the shortcomings of biological experiment technology, but it is far less efficient and convenient than deep learning technology in data processing. In this study, a deep learning framework based on Bidirectional parallel Long Term and Short Term Memory neural network (BiLSTM) and Transformer is proposed to identify DNA binding proteins. The model can not only further extract the information and characteristics of protein sequences, but also further extract the characteristics of evolutionary information. Finally, the two features are integrated for training and testing. This model expands the research ideas of researchers in protein feature extraction, and provides a reference for extracting global protein features with Transformer encoder blocks. On the PDB2272 dataset, the accuracy(ACC) and Matthew Correlation Coefficient(MCC) improved by 2.64% and 5.51%, respectively, compared to the PDBP_ Fusion model. The experimental results of this model have certain advantages.

     

/

返回文章
返回