刘海峰, 陈琦, 刘守生, 苏展. 一种基于数据偏斜的改进KNN文本分类[J]. 微电子学与计算机, 2010, 27(3): 51-53,58.
引用本文: 刘海峰, 陈琦, 刘守生, 苏展. 一种基于数据偏斜的改进KNN文本分类[J]. 微电子学与计算机, 2010, 27(3): 51-53,58.
LIU Hai-feng, CHEN Qi, LIU Shou-sheng, SU Zhan. An Improved KNN Text Categorization Method Based on Data Uneven[J]. Microelectronics & Computer, 2010, 27(3): 51-53,58.
Citation: LIU Hai-feng, CHEN Qi, LIU Shou-sheng, SU Zhan. An Improved KNN Text Categorization Method Based on Data Uneven[J]. Microelectronics & Computer, 2010, 27(3): 51-53,58.

一种基于数据偏斜的改进KNN文本分类

An Improved KNN Text Categorization Method Based on Data Uneven

  • 摘要: KNN是一种简单、有效、非参数的分类算法.针对样本分布偏斜的分类环境, 首先提出了一种改进的特征选择方法进行特征降维, 在此基础上进一步提出了一种基于分布的改进KNN方法用于文本分类, 降低了分布偏斜问题对决策函数的影响.试验表明, 所提出的改进KNN文本分类方法具有较好的分类性能.

     

    Abstract: KNN is a simple, valid and non-parameter method often applied in categorization.Under the condition that the samples distribution is uneven, we first put forward an improved weighting way in feature reduction;then we improve the KNN basing on density in automatic text categorization.This way reduces the impact from the uneven distribution.we have a test about text categorization.The result shows that these methods have a better precision than the common KNN;similarity

     

/

返回文章
返回