Abstract:
According to the characters of network public opinion information, which are large-scale data, dispersed content and complex large amount, content dispersed and structure, a parallel
K_nearest neighbors(KNN) classification algorithm based on Hadoop platform for network public opinion information classification is studied. In the light of Hadoop platform distributed storage and data parallel processing features, a parallel KNN network public opinion classification algorithm based on MapReduce package is designed. The classification ability and efficiency of the improved KNN network public opinion classification algorithm are experimental verified, and the algorithm is applied to network public opinion data classification tests. The results show that the parallel KNN classification algorithm based Hadoop platform can effectively improve the classification effect and efficiency of network public opinion documents,achieving network public opinion fast、correct classification.