A Clustering-Based Method for Reducing the Amount of Sample in KNN Text Categorization on the Category Deflection

LIU Hai-feng; YAO Ze-qing; SU Zhan; ZHANG Xue-ren

LIU Hai-feng, YAO Ze-qing, SU Zhan, ZHANG Xue-ren. A Clustering-Based Method for Reducing the Amount of Sample in KNN Text Categorization on the Category Deflection[J]. Microelectronics & Computer, 2012, 29(5): 24-28.

Citation:

A Clustering-Based Method for Reducing the Amount of Sample in KNN Text Categorization on the Category Deflection

Abstract

Abstract

KNN is one of the classical algorithms in text categorization.The number of training samples and the density is the primary bottleneck on the algorithm.A reasonable method for reducing the amount of training data can improve the efficiency of classification.This paper proposes an improved KNN model basing on clustering.Firstly, by clustering the samples into clusters, we remove some samples from training set basing on the distance in order to save computing cost.Secondly, take into account the category distribution we bring up a better weighting method in order to overcome the defect that the bigger class of training samples dominated in KNN.The result of test shows that the improved KNN classification algorithm improves the efficiency of its classification.

FullText(HTML)

References (0)

Relative Articles

Supplements (0)

Cited By

Turn off MathJax

Article Contents

A Clustering-Based Method for Reducing the Amount of Sample in KNN Text Categorization on the Category Deflection

Abstract

Catalog

Export File

Citation

Format

Content