谷琼, 袁磊, 熊启军, 宁彬, 李文新. 基于非均衡数据集的代价敏感学习算法比较研究[J]. 微电子学与计算机, 2011, 28(8): 146-149,153.
引用本文: 谷琼, 袁磊, 熊启军, 宁彬, 李文新. 基于非均衡数据集的代价敏感学习算法比较研究[J]. 微电子学与计算机, 2011, 28(8): 146-149,153.
GU Qiong, YUAN Lei, XIONG Qi-jun, NING Bin, LI Wen-xin. A Comparative Study of Cost-Sensitive Learning Algorithm Based on Imbalanced Data Sets[J]. Microelectronics & Computer, 2011, 28(8): 146-149,153.
Citation: GU Qiong, YUAN Lei, XIONG Qi-jun, NING Bin, LI Wen-xin. A Comparative Study of Cost-Sensitive Learning Algorithm Based on Imbalanced Data Sets[J]. Microelectronics & Computer, 2011, 28(8): 146-149,153.

基于非均衡数据集的代价敏感学习算法比较研究

A Comparative Study of Cost-Sensitive Learning Algorithm Based on Imbalanced Data Sets

  • 摘要: 大多数非均衡数据集的研究集中于重构数据集或者代价敏感学习,针对数据集类分布非均衡和不相等误分类代价往往同时发生这一事实,在简要回顾代价敏感学习理论和现有学习算法的基础上,将所提出的自适应混合重取样算法,与基于最小误分类代价的MetaCost算法分别进行实验比较,实验表明所提出算法在代价敏感学习中具有一定的优势,实验结果显示非均衡类对代价敏感学习算法性能产生较大影响,当样本类别差异较大时,用样本类空间重构的方法可以得到较好的分类效果.

     

    Abstract: Most studies on the imbalanced data set classification focused on discussion of re-sampling or cost-sensitive learning systems themselves,however,the fact that imbalanced class distribution and misclassification errors cost unequally always occurring simultaneously was neglected.On the basis of analyzing the theory and algorithm of cost-sensitive learning,a novel hybrid re-sampling technique based on Automated Adaptive Selection of the Number of Nearest Neighbors in order to solve the misclassification problem of imbalanced data set is proposed.We compared hybrid re-sampling algorithm and MetaCost algorithm,Experiment results show that the proposed method can improve the classification accuracy and decrease the misclassification cost effectively.The experimental results confirm that this algorithm is superior to the traditional algorithms as for dealing with the imbalanced problem.

     

/

返回文章
返回