金凡, 顾进广. 一种改进的T-Spider分布式爬虫[J]. 微电子学与计算机, 2011, 28(8): 102-104.
引用本文: 金凡, 顾进广. 一种改进的T-Spider分布式爬虫[J]. 微电子学与计算机, 2011, 28(8): 102-104.
JIN Fan, GU Jin-guang. An Improved T-Spider Distributed Crawler[J]. Microelectronics & Computer, 2011, 28(8): 102-104.
Citation: JIN Fan, GU Jin-guang. An Improved T-Spider Distributed Crawler[J]. Microelectronics & Computer, 2011, 28(8): 102-104.

一种改进的T-Spider分布式爬虫

An Improved T-Spider Distributed Crawler

  • 摘要: 为了提高互联网网页的抓取速度,提出了一个改进的T-Spider分布式爬虫模型.该爬虫在解析URL阶段将页面进行切割以并行解析,在页面调度阶段使用改进的链接优先权计算方法,提高爬虫的抓取速度和稳定性.通过实验结果分析,验证了该方法的有效性.

     

    Abstract: To increase the speed of the crawler,this paper proposes a model that is based on the T-Spider.During the time of extracting links from the page content,the crawler takes use of the page cutting algorithm,and then uses a new algorithm of link priority computing to enhance the stability and increase the speed of the crawler.The experiment shows that it is availability.

     

/

返回文章
返回