张丹丹, 游子毅, 郑建, 陈世国. 基于改进的局部异常因子检测的优化聚类算法[J]. 微电子学与计算机, 2019, 36(11): 43-48.
引用本文: 张丹丹, 游子毅, 郑建, 陈世国. 基于改进的局部异常因子检测的优化聚类算法[J]. 微电子学与计算机, 2019, 36(11): 43-48.
ZHANG Dan-dan, YOU Zi-yi, ZHENG Jian, CHEN Shi-guo. Optimal clustering algorithm based on modified local outlier factor detection[J]. Microelectronics & Computer, 2019, 36(11): 43-48.
Citation: ZHANG Dan-dan, YOU Zi-yi, ZHENG Jian, CHEN Shi-guo. Optimal clustering algorithm based on modified local outlier factor detection[J]. Microelectronics & Computer, 2019, 36(11): 43-48.

基于改进的局部异常因子检测的优化聚类算法

Optimal clustering algorithm based on modified local outlier factor detection

  • 摘要: 聚类分析在无监督学习领域中一直备受国内外学者关注.针对K-means聚类算法对初始聚类中心点敏感、簇内数据相关性差以及收敛到局部最优的缺点, 提出了一种基于离群因子的优化聚类算法.该算法采用信息熵加权欧式距离作为相似性度量依据, 以更明显地区分数据对象间的差异, 然后利用k距离参数自调整的局部异常因子检测算法计算出各数据点的离群因子并筛选出初始聚类中心的候选集, 最后根据其离群因子加权距离法优化聚类中心.通过在UCI数据集上的实验测试结果表明, 优化算法的准确率比K-means++算法、OFMMK-means算法、FCM算法更高, 运行速度比FCM算法更快.该算法能够更好地应用于入侵行为检测、信用风险评估以及多故障诊断等领域.

     

    Abstract: Cluster analysis has been concerned by scholars at home and abroad in the field of unsupervised learning. Aiming at the disadvantages of K-means clustering algorithm for initial clustering center point sensitivity, poor data correlation in clusters and convergence to local optimization, an optimized clustering algorithm based on outlier factor is proposed in this paper. The algorithm firstly takes the information entropy weighted European distance as the basis of similarity measurement, in order to distinguish the difference between the data objects more obviously, then calculates the outlier factor of each data point by using the k distance parameter self-adjusting of the Local Outlier Factor algorithm and selects the candidate set of the initial clustering center, and finally optimizes the clustering center according to the outlier factor weighted distance method. The experimental results on UCI DataSet show that the accuracy of the optimization algorithm is higher than that of k-means++ algorithm, OFMMK-means algorithm and FCM algorithm, and its running speed is faster than the FCM algorithm. The algorithm can be better used in intrusion behavior detection, credit risk assessment and multi-fault diagnosis.

     

/

返回文章
返回