Improved CTK Weighting Clustering Algorithm Based on MapReduce
-
Abstract
This paper introduces an improved algorithm of Distributed Clustering Based on MapReduce, the process of clustering will be divided into two stages, firstly, introduce Canopy algorithm, find out the suitable K of clustering algorithm by the change of Gradient value. That reduce the number of iterations and avoid the uncertainty of initial center point results in. Then dynamically change the radius of the region to solve the problem of similarity of high-dimensional data sets and solve the problem of characteristic weight of similarity calculation with the weighting of information entropy. Finally, the parallel strategy and scheme of the algorithm are designed according to the MapReduce distributed computing model. Experimental results show that the proposed algorithm has good performance in accuracy, speedup and scalability.
-
-