CHAI Bian-fang, LI You-yi. Active overlapping K-means clustering algorithm based on spark[J]. Microelectronics & Computer, 2021, 38(1): 70-76.
Citation: CHAI Bian-fang, LI You-yi. Active overlapping K-means clustering algorithm based on spark[J]. Microelectronics & Computer, 2021, 38(1): 70-76.

Active overlapping K-means clustering algorithm based on spark

  • Parallel Overlapping K-means clustering algorithm (POKM) based on Spark framework can effectively identify potential pattern of large-scale data. But multiple iterations of data exchange between the Master and the Worker nodes lead to low efficiency of the algorithm, and it is sensitive to the initial clustering center, resulting in unstable clustering results and slow convergence rate. In order to improve the performance and stability of the algorithm, an active overlapping K-means clustering algorithm is proposed. It performs the overlapping K-means algorithm on each worker and obtains the local cluster center, and then collects the centers and runs the overlapping K-means algorithm on the Master node. At the same time, the parallel active selection strategy is adopted to obtain a better initial cluster center to improve the accuracy and convergence speed. Experiment results show that the improved active overlapping clustering algorithm improves the accuracy and reduces the running time.
  • loading

Catalog

    Turn off MathJax
    Article Contents

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return