复杂Web页的Wrapper自动化生成技术研究
The Study of Automatic Generation of Wrapper for Data Extraction from Complex Web
-
摘要: 针对基于模板生成Web页的基本特点, 结合Ontology知识, 探索Deep Web垂直搜索中, 复杂Web页面的Wrapper自动化生成的解决方案.对实际复杂Web页面抽取的实验结果表明, 该方法达到了较高的抽取准确率.Abstract: According to characteristic of the template-based Web-pages, combing the ontology, this paper proposes a method to solve the problem of automatic generation of wrapper for complex Web pages in the deep Web vertical-search condition. Experimental evaluation on a number of real complex Web page collections indicates that our method can gain a high extraction precision.