商品价格数据的两种WEB挖掘算法比较

Compare Two Web Mining Algorithm for Commodity Price

摘要: 其他网络商店的商品实时价格是Web商店店主所关注的重要数据, Web数据挖掘使得这一需求变为现实.通过正则表达式算法与分词算法的比较研究, 给出了基于正则表达式的商品价格抽取算法和基于分词的网站目录树抽取算法、HTML网页商品抽取算法与商品价格抽取算法.应用系统的实践表明, 正则表达式算法的挖全率与正确率较低, 而分词算法的挖全率与正确率都达到99%以上, 完全满足应用需求, 同时可以为商品的市场预测与分析提供依据.

Abstract: Commodities price of others e-supermarkets is the most important data for the shopkeepers of shop online.This requirement becomes actuality because of the Web mining developing very fast.The algorithm based on regular expression and the extract algorithm for directory tree of Website, commodities name on the Webpage and commodities price based on participle are described in detailed respectively.All of them depend on the researched of the regular expression and the participle algorithm.The implementation shows that the lower average full rate and accuracy rate is got from regular expression algorithm.However, the participle algorithm can get more than ninety nine percent of average full rate and accuracy rate.The results show as by this way can touch the shopkeepers minds, and it can support the originality data for the commodities markets and forecast analysis.