徐凯, 陈平华, 刘双印. 基于AdaBoost-Bayes算法的中文文本分类系统[J]. 微电子学与计算机, 2016, 33(6): 63-67.
引用本文: 徐凯, 陈平华, 刘双印. 基于AdaBoost-Bayes算法的中文文本分类系统[J]. 微电子学与计算机, 2016, 33(6): 63-67.
XU Kai, CHEN Ping-hua, LIU Shuang-yin. A Chinese Text Classification System Based on Ada Boost-Bayes Algorithm[J]. Microelectronics & Computer, 2016, 33(6): 63-67.
Citation: XU Kai, CHEN Ping-hua, LIU Shuang-yin. A Chinese Text Classification System Based on Ada Boost-Bayes Algorithm[J]. Microelectronics & Computer, 2016, 33(6): 63-67.

基于AdaBoost-Bayes算法的中文文本分类系统

A Chinese Text Classification System Based on Ada Boost-Bayes Algorithm

  • 摘要: 针对中文文本分类准确率低, 分类算法低效不稳定问题, 提出基于自适应提升朴素贝叶斯算法.该算法采用Naive Bayes和AdaBoost, 并且通过优化组合结构, 融合两种算法的优点.首先, 使用SMEL序列组合成词算法对中文语料进行分词, 提取文本特征词汇.然后, 使用增强的贝叶斯分类器, 通过较小的样本训练, 提取出文本特征, 生成训练分类矩阵.结合自适应提升算法对简单分类器进行加权, 保证分类有平稳准确的效果.通过实验证明, 该算法与其他算法相比, 错误率更低, 可以使分类准确率达到98%以上, 而且F1值也优于其他分类算法.

     

    Abstract: In view of the low accuracy of Chinese text classification algorithm, the classification algorithm is inefficient and the problem of low efficiency and low efficiency is proposed. Based on the adaptive algorithm, the proposed algorithm is proposed to improve the accuracy. The algorithm uses Bayes Naive and AdaBoost, and the advantages of the two algorithms are fused by the optimization of the structure. First, using the SMEL sequence of the word segmentation algorithm to segment the Chinese corpus and extract the feature words. Then, the enhanced Bias classifier is used to extract the feature of the text and generate the training classification matrix through the small sample training. Combined with the adaptive lifting algorithm, the simple classifier is weighted to ensure that the classification is stable and accurate. Experiments show that the error rate is lower than other algorithms, and the classification accuracy of the algorithm is more than 98%, and the F1 value is better than other classification algorithms.

     

/

返回文章
返回