An Improved Subtractive Clustering Algorithm Based on Hadoop
-
Abstract
Traditional subtractive clustering algorithm' time complexity is pretty high, and it doesn't have the characteristic of distributed processing. Therefore, it is not suitable for the processing requirement in big data environment. This paper proposes an improved subtractive clustering algorithm which is based on Hadoop. It applies multiple MapReduce processes to implement the parallelization of subtractive clustering in solving neighborhood radius, initializing density index, updating density index and dividing the data records. Experiment demonstrates that comparing to the traditional serial algorithms, the proposed improved algorithm can indeed cluster the big data fast and has good stability and expansibility.
-
-