张哲,王勇.基于字词特征融合与BO-LightGBM的自动漏洞评估方法[J]. 微电子学与计算机,2023,40(7):27-35. doi: 10.19304/J.ISSN1000-7180.2021.1351
引用本文: 张哲,王勇.基于字词特征融合与BO-LightGBM的自动漏洞评估方法[J]. 微电子学与计算机,2023,40(7):27-35. doi: 10.19304/J.ISSN1000-7180.2021.1351
ZHANG Z,WANG Y. Automatic vulnerability assessment method based on char-word feature fusion and BO-LightGBM[J]. Microelectronics & Computer,2023,40(7):27-35. doi: 10.19304/J.ISSN1000-7180.2021.1351
Citation: ZHANG Z,WANG Y. Automatic vulnerability assessment method based on char-word feature fusion and BO-LightGBM[J]. Microelectronics & Computer,2023,40(7):27-35. doi: 10.19304/J.ISSN1000-7180.2021.1351

基于字词特征融合与BO-LightGBM的自动漏洞评估方法

Automatic vulnerability assessment method based on char-word feature fusion and BO-LightGBM

  • 摘要: 针对目前对软件未知漏洞缺乏及时准确分析与自动评估分类的问题,提出一种字词特征融合与贝叶斯优化LightGBM(Bayesian Optimization of LightGBM,BO-LightGBM)的漏洞特征自动评估方法. 首先,为减少软件未知漏洞描述中存在新术语造成的影响,通过使用字词特征融合的方法提取并融合漏洞描述信息中的字符与单词特征;其中为防止时间信息泄露,将数据按年份排列,使用时间交叉验证方式选取合适的数据集划分方式;其次,利用LightGBM算法通过特征统计确定最优特征的优势,使用该算法对漏洞的机密性、完整性等7个特性进行分类评估. 为进一步提高准确度,加入贝叶斯优化器对LightGBM算法中的8个超参数进行优化调整. 最后,通过美国国家通用漏洞数据库上的实验表明,字词特征融合算法能够结合漏洞描述信息中的单词与字符特征,对未知漏洞的分类评估具有更高的准确率.与其他集成学习算法相比,经过贝叶斯优化参数寻优的LightGBM算法,能够进一步发挥LightGBM算法优势,提高漏洞特征评估准确率.

     

    Abstract: Aiming at the lack of timely and accurate analysis and automatic evaluation classification of unknown software vulnerabilities, a vulnerability feature automatic evaluation method based on word feature fusion and Bayesian optimization of LightGBM was proposed. Firstly, in order to reduce the influence of new terms in the unknown software vulnerability description, the character and word features in the vulnerability description information are extracted by using the word feature aggregation method. In order to prevent time information disclosure, the data are arranged by year, and the appropriate data set division method is selected by time cross validation method. Secondly, LightGBM algorithm is used to determine the advantage of the optimal feature through feature statistics, and the algorithm is used to classify and evaluate the seven characteristics of vulnerability, such as confidentiality and integrity. In order to further improve the accuracy, a Bayesian optimizer is added to optimize and adjust the eight hyperparameters in LightGBM algorithm. Finally, the experiment on the US National Common Vulnerability database shows that the fusion algorithm can combine the word and character features in the vulnerability description information, and has higher accuracy in the classification and evaluation of unknown vulnerabilities. In addition, compared with other integrated learning algorithms, LightGBM algorithm based on Bayesian optimization parameter optimization can further play the advantages of the LightGBM algorithm and improve the accuracy of vulnerability feature evaluation.

     

/

返回文章
返回