陈奇, 张曦煌. 基于N-list的并行频繁项集挖掘算法[J]. 微电子学与计算机, 2017, 34(5): 40-44.
引用本文: 陈奇, 张曦煌. 基于N-list的并行频繁项集挖掘算法[J]. 微电子学与计算机, 2017, 34(5): 40-44.
CHEN Qi, ZHANG Xi-huang. An N-list based Parallel Algorithm for Mining Frequent Itemsets[J]. Microelectronics & Computer, 2017, 34(5): 40-44.
Citation: CHEN Qi, ZHANG Xi-huang. An N-list based Parallel Algorithm for Mining Frequent Itemsets[J]. Microelectronics & Computer, 2017, 34(5): 40-44.

基于N-list的并行频繁项集挖掘算法

An N-list based Parallel Algorithm for Mining Frequent Itemsets

  • 摘要: N-list是近几年提出来的一种新的数据结构, 它在频繁项集挖掘中有很高的效率.本文基于N-list提出了一种新型的并行频繁项集挖掘算法PPF算法.该算法通过扫描数据库创建一颗PPC-tree树, 利用PPC-Tree树生成一系列N-list, 将N-list数据表项分配到不同的节点进行深度挖掘, 最后汇总所有节点的结果挖掘出所有的频繁项集.本文在四种不同的数据集上对PPF算法就行了测试和分析, 实验结果表明在任何数据集上PPF算法的运行速度都是最优的.

     

    Abstract: N-list is a novel data structure proposed in recent years. It has been proven to be very efficient for mining frequent itemsets. In this paper, we present PPF, a new parallel algorithm for mining frequent itemsets. The algorithm directly scans dataset to construct a PPC-Tree. Then, the algorithm uses PPC-Tree to generate a series of N-lists which will be assigned to different nodes to mining frequent itemsets. We have conducted extensive experiments to evaluate PPF against PrePost algorithm on four various real datasets. The experimental results show that PPF algorithm is always the fastest one on all datasets.

     

/

返回文章
返回