宋呈祥, 陈秀宏, 牛强. 文本分类中基于CHI改进的特征选择方法[J]. 微电子学与计算机, 2018, 35(9): 74-78.
引用本文: 宋呈祥, 陈秀宏, 牛强. 文本分类中基于CHI改进的特征选择方法[J]. 微电子学与计算机, 2018, 35(9): 74-78.
SONG Cheng-xiang, CHEN Xiu-hong, NIU Qiang. Improved Feature Selection Method Based on CHI for Text Categorization[J]. Microelectronics & Computer, 2018, 35(9): 74-78.
Citation: SONG Cheng-xiang, CHEN Xiu-hong, NIU Qiang. Improved Feature Selection Method Based on CHI for Text Categorization[J]. Microelectronics & Computer, 2018, 35(9): 74-78.

文本分类中基于CHI改进的特征选择方法

Improved Feature Selection Method Based on CHI for Text Categorization

  • 摘要: 针对传统卡方统计量(CHI)方法在全局范围内做特征选择时忽略词的频度、词的分布等问题, 提出了一种改进的文本特征选择方法.该方法通过定义特征词频度分布相关性系数, 选择局部出现的强相关性特征, 同时考虑特征词类间分布差异性来提升不均衡数据集的分类指标.结果表明, 改进的方法不仅在分类效果上有明显的提高, 而且性能更加稳定.

     

    Abstract: Because the traditional Chi-square methodchooses the feature in the global scope and ignores theinformationof word frequency and distribution, this paper proposes animproved feature selection method. The method selects anumber of strong featureswith thedefined feature distributioncoefficient, andtakes into account featuredistribution thatimproves theperformances of Chi-square method in theunbalanced dataset.The results of the experimentshow that the improved algorithmnot only shows a significant improvement in classification efficiency, but also has more stable performance.

     

/

返回文章
返回