LI Huan, LIU Feng, ZHU Er-zhou. Research of an Impoved K-means Algorithm for Aanalyzing Mass Data[J]. Microelectronics & Computer, 2016, 33(5): 52-57.
Citation: LI Huan, LIU Feng, ZHU Er-zhou. Research of an Impoved K-means Algorithm for Aanalyzing Mass Data[J]. Microelectronics & Computer, 2016, 33(5): 52-57.

Research of an Impoved K-means Algorithm for Aanalyzing Mass Data

  • Aiming at solving the problem of mass data, this paper proposes an improved K-means algorithm for processing massive data by making use of the Map-Reduce model on the Hadoop platform. In order to solve the problem that faced by traditional K-means algorithm, such as it is sensitive to initial clustering center and clustering number, the improved algorithm firstly finds out the clustering number from sampling data by implementing multiple sampling of massive data; Secondly, with the help of density method the clustering center of data sampling is founded. Finally, the global initial clustering centers of original data are obtained by merging the central points of each sample. The results of the experiments deployed on the Hadoop cluster have shown that the improved algorithm is more efficient, accurate, scalable and has better acceleration ratio than the traditional algorithms.
  • loading

Catalog

    Turn off MathJax
    Article Contents

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return