基于Map/Reduce集群上的模式空间划分的序列模式挖掘

刘骞; 陈明

基于Map/Reduce集群上的模式空间划分的序列模式挖掘

刘骞,
陈明

Sequential Pattern Mining Based on Pattern Space Division in Map/Reduce Cluster

LIU Qian,
CHEN Ming

摘要

摘要: 通过模式空间划分将基于Map/Reduce处理数据集与候选序列模式集的多对多的对应关系的问题转化为处理数据集与以频繁1-序列为基的各子模式空间的多对多的对应关系问题,大大缩小了中间结果键值对集合的规模,避免了由于组合爆炸导致的单一Map节点的瓶颈问题.通过三轮的Map/Reduce任务,实现了模式空间和过滤规则的建立,并在此基础上实现了各子模式空间上独立地进行序列模式的挖掘.通过充分利用整个模式空间的全局特征及各子模式空间的个性特征,设计了优化的非递归挖掘算法,减少了前缀投影库构造次数及对构造的投影库的扫描次数,从而提高了挖掘阶段的效率.

Abstract: By means of pattern space division and based on Map/Reduce, the problem of processing the many-to-many corresponding relationship between the data set and the patterns set is converted to the problem of processing the many-to-many corresponding relationship between the data subsets and the pattern subspaces associated with the length-1 sequential patterns. Thus, the size of the intermediate key/value pairs set is reduced so dramatically that the problem of single Map node bottleneck which results from combinatorial explosion of candidate pattern space is avoided. Over three rounds of Map/Reduce tasks, the pattern space is constructed and divided, the filtering rules is set up and used, father more, the sequential pattern mining is realized in each pattern subspace independently. By making the best of both the universal trait of the whole pattern space and the individuality of pattern subspace, the optimized non-recursive algorithm is designed and implemented to improve the efficiency of mining phase by avoid unnecessary constructing of prefix projected databases and scanning of the constructed prefix projected databases.

HTML全文

参考文献(0)

施引文献

资源附件(0)