秦杰, 闫付亮, 朱海丰, 司群, 谢蕙. 基于链接信息的网页分类算法[J]. 微电子学与计算机, 2012, 29(6): 108-112.
引用本文: 秦杰, 闫付亮, 朱海丰, 司群, 谢蕙. 基于链接信息的网页分类算法[J]. 微电子学与计算机, 2012, 29(6): 108-112.
QIN Jie, YAN Fu-liang, ZHU Hai-feng, SI Qun, XIE Hui. A Webpage Classification Algorithm Based on Link Information[J]. Microelectronics & Computer, 2012, 29(6): 108-112.
Citation: QIN Jie, YAN Fu-liang, ZHU Hai-feng, SI Qun, XIE Hui. A Webpage Classification Algorithm Based on Link Information[J]. Microelectronics & Computer, 2012, 29(6): 108-112.

基于链接信息的网页分类算法

A Webpage Classification Algorithm Based on Link Information

  • 摘要: 为了提高网页文本分类的准确性.克服传统的文本分类算法易受网页中虚假、错误信息的影响.提出一种基于链接信息的网页分类算法.通过对K近邻方法的改进.利用当前网页与其父网页的链接信息对网页实沲分类,用空间向量表示待分类网页的父链接信息。在训练集合中找到K篇与该网页链接信息向量最相似的网页,计算该网页所属的类别,通过实验与传统文本分类算法进行了对比,验证了该方法的有效性.

     

    Abstract: To improve the performance of webpages classification system,and overcome a large number of false, erroneous information filled in the webpages affect the traditional classification algorithms,this paper presents a web page classification algorithm based on link information.Based on the K Nearest Neighbor method,the webpages are classified by the links among webpages.In this paper,the webpage currently classified is presented by the link information of vector space,and find K webpages with the highest similarity to it in the training set.then it is classified to the proper category.We compare the method to traditional classification algorithms through experiments,and the results show that it s more effective.

     

/

返回文章
返回