基于距离阈值及样本加权的K-means聚类算法

K-means Clustering Algorithm Based on Distance Threshold and Weighted Sample

摘要: 提出了一种基于距离阈值及样本加权的K-means聚类算法.该算法首先采用样本集的样本均值作为第一个初始族中心;其次基于距离阈值的方法动态确定初始族中心及个数;最后基于样本加权的方法来降低离散点对聚类效果的影响,使带权值的样本点参与整个聚类过程,采用轮廓系数来衡量不同算法的聚类质量.实验结果表明:相比于原始的K-means聚类算法和文献1中算法,所提出的算法具有更好的聚类质量.

Abstract: An improved K-means clustering algorithm is proposed based on distance threshold and weighted sample. First the sample mean of sample set is selected as the first initial clustering center; secondly clustering center and clustering number are dynamically determined based on distance threshold; finally the method of weighted sample to reduce the influence of the clustering effect, the weighted sample points participate in the whole process of clustering and the clustering quality of different clustering algorithm are measured based on silhouettete coefficient. The experimental results show that, compared with the original K-means text clustering algorithm and the algorithm in reference1, the proposed algorithm can improve the clustering quality.