An Improved Big Data Clustering Method Based on Sampling Fusion

LIU Yan, WANG Cun-rui. An Improved Big Data Clustering Method Based on Sampling Fusion[J]. Microelectronics & Computer, 2017, 34(4): 17-21, 27.

Citation:

LIU Yan, WANG Cun-rui. An Improved Big Data Clustering Method Based on Sampling Fusion[J]. Microelectronics & Computer, 2017, 34(4): 17-21, 27.

Citation:

LIU Yan, WANG Cun-rui. An Improved Big Data Clustering Method Based on Sampling Fusion[J]. Microelectronics & Computer, 2017, 34(4): 17-21, 27.

Abstract

Effective mining of large data sets of campus network has been a very far-reaching impact on campus network optimization. So, in this paper, an improved large data clustering algorithm, named Leaders-k-means, was presented. In this method, the former Leaders algorithm is used to obtain initial cluster centers firstly and a number of small sample sets are formed on the basis of those centetrs by random sampling of the large data of the campus network, and then, the initial clustering center is used further as the initial value to carry out K-means clustering for each small sample set, which not only ensures the rationality of the initial value of K-means algorithm, but also makes the algorithm running in a small sample set improving the efficiency of the algorithm, and at last, these small sample sets which have been clustered by k-means method are combined into a larger sample set and the bottom-up hierarchical clustering method is used to obtain the final cluster centers of the original big data set. The proposed algorithm combines the advantages of hierarchical method, partition method and density method. The simulation results show further that the proposed algorithm has good clustering results.

FullText(HTML)

Turn off MathJax

Article Contents

Export File