基于代价敏感Char-CNN的Web威胁识别方案

张光华; 张凯迪; 齐林

doi:10.19304/J.ISSN1000-7180.2022.0842

基于代价敏感Char-CNN的Web威胁识别方案

Web threat identification scheme based on cost-sensitive Char-CNN

摘要

摘要: 随着互联网技术的迅速发展,网络安全面临的威胁越发严峻,Web攻击量连年翻倍增长. 针对当前Web威胁识别方法手动提取特征识别准确率低、正常和恶意类别样本分布不均衡的问题,本文提出了基于代价敏感的字符级卷积神经网络(Character-level Convolutional Neural Networks,Char-CNN)的Web威胁识别方案. 首先分析Web请求特征,将原始数据统一格式,读取数据并拼接成字符序列,根据预先指定的索引字典将字符序列进行编码；其次利用字符级别CNN提取请求信息,对字符编码进行特征提取和特征选择用于模型训练；最后嵌入代价敏感学习,修改神经网络模型交叉熵损失函数,增加恶意样本分类错误的代价,通过反向传播调整模型参数及权值,进而利用Softmax层进行威胁识别. 实验表明,基于代价敏感的字符级卷积神经网络进行Web威胁识别方案的准确率达到98.99%,相比已有威胁识别方案,在精确率、召回率和F1分数均有提升,并验证了本方案在不平衡数据集上的有效性.

Abstract: With the rapid development of Internet technology, the threats to network security have become more and more serious, and the number of web attacks has doubled year after year. Aiming at the problems of low accuracy of manual feature extraction and uneven distribution of normal and malicious category samples by current web threat identification methods, this paper proposes a web threat identification scheme based on cost-sensitive character-level convolutional neural networks (Char-CNN). Firstly, the characteristics of the Web request are analyzed, the original data is unified in format, the data is read and spliced into a character sequence, and the character sequence is encoded according to the pre-specified index dictionary. Secondly, the character-level CNN is used to extract the request information, and the character encoding is extracted and feature selected for model training. Finally, cost-sensitive learning is embedded, the cross-entropy loss function of the neural network model is modified, the cost of malicious sample classification error is increased, and the model parameters and weights are adjusted through backpropagation, and then the Softmax layer is used for threat identification. Experiments show that the accuracy of the Web threat identification scheme based on cost-sensitive character-level convolutional neural network reaches 98.99%, which improves the accuracy, recall rate and F1 score compared with the existing threat identification scheme, and verifies the effectiveness of the proposed scheme on unbalanced datasets.

HTML全文

参考文献(20)

施引文献

资源附件(0)