赵宏, 常兆斌, 王伟杰. 基于深度自编码和决策树的恶意域名检测[J]. 微电子学与计算机, 2020, 37(5): 13-17.
引用本文: 赵宏, 常兆斌, 王伟杰. 基于深度自编码和决策树的恶意域名检测[J]. 微电子学与计算机, 2020, 37(5): 13-17.
ZHAO Hong, CHANG Zhaobin, WANG Weijie. Malicious domain name detection based on deep auto-encoder and decision tree[J]. Microelectronics & Computer, 2020, 37(5): 13-17.
Citation: ZHAO Hong, CHANG Zhaobin, WANG Weijie. Malicious domain name detection based on deep auto-encoder and decision tree[J]. Microelectronics & Computer, 2020, 37(5): 13-17.

基于深度自编码和决策树的恶意域名检测

Malicious domain name detection based on deep auto-encoder and decision tree

  • 摘要: 针对目前恶意域名检测方法特征提取过程复杂和检测准确率不高的问题,提出一种基于深度自编码和决策树(Deep Auto Encoder and Decision Tree, DAE-DT)的恶意域名检测算法.该算法首先将每一域名按照域名词法组成与结构等属性进行特征映射,并进行正则化处理; 然后将正则化处理后的无标签域名数据随机置0作为模型的输入,域名字符统计特征作为输出,构造深度自编码网络模型.并通过计算模型输出值与未处理数据之间的重构误差,实现各层参数与权值的优化,以增强模型的鲁棒性; 最后依据提取的域名字符统计特征构造恶意域名判定的决策树.通过在Alexa和Malware domain list等标准数据集上进行测试.实验结果表明,该模型的检测准确率、精确率、假阴性率和假阳性率值分别为95.21%、94.17%、2.41%和3.63%.

     

    Abstract: Aiming at the problem that the existing malicious domain name detection methods are not effective enough in performance of accuracy rate and the process of feature extraction, a malicious domain name detection algorithm based on deep auto-encoder and decision tree (DAE-DT) is proposed. According to lexical composition and structure of domain name, each domain name is firstly mapped into the feature space and it is normalized. Then the normalized unlabeled domain names are randomly set to 0 as the input of the model, and the statistical features of domain name are used to as the output to construct the deep auto-encoder network model, and the reconstruction error of the unprocessed data and output data is computed to achieve the purpose of optimizing the parameters and weights so that the model is more robust. Finally, a decision tree for malicious domain name detection is constructed based on the statistical features of domain name. In the experiments on Alexa and malware domain list, the proposed detection algorithm yield an accuracy rate of 95.21%, a precision rate of 94.17%, a false negative rate of 2.41%, and a false positive rate of 3.63%.

     

/

返回文章
返回