高建国, 崔业勤. 基于信息熵理论的连续属性离散化方法[J]. 微电子学与计算机, 2011, 28(7): 187-189,194.
引用本文: 高建国, 崔业勤. 基于信息熵理论的连续属性离散化方法[J]. 微电子学与计算机, 2011, 28(7): 187-189,194.
GAO Jian-guo, CUI Ye-qin. A New Discretization Method for Continuous Attributes Based on Information Entropy[J]. Microelectronics & Computer, 2011, 28(7): 187-189,194.
Citation: GAO Jian-guo, CUI Ye-qin. A New Discretization Method for Continuous Attributes Based on Information Entropy[J]. Microelectronics & Computer, 2011, 28(7): 187-189,194.

基于信息熵理论的连续属性离散化方法

A New Discretization Method for Continuous Attributes Based on Information Entropy

  • 摘要: 很多数据挖掘和机器学习方法仅仅依赖于离散值的属性,这样必须离散连续的属性.文中提出一种基于信息熵理论的数据离散化方法(IED),利用信息熵的思想衡量离散区间是否类似,同时考虑离散区间大小对离散化结果的影响,该方法综合考虑了离散区间与类之间的独立性.实验结果表明,IED显著地提高了Naïve-bayes分类学习精度.

     

    Abstract: Most data mining and induction learning methods only rely on discrete attributes.So, continuous attributes must be discretized.This paper presents a new data discretization method for continuous attributes based on information entropy, namely IED.It measures the similarity of intervals by using information entropy and considers the effect of the discrete interval size on discretization results.This method synthetically takes into account the independence betweem the merged intervals and target class.Experimental results show that IED can yield more classification accuracy by implementing Naïve-bayes.

     

/

返回文章
返回