Abstract:
By means of pattern space division and based on Map/Reduce, the problem of processing the many-to-many corresponding relationship between the data set and the patterns set is converted to the problem of processing the many-to-many corresponding relationship between the data subsets and the pattern subspaces associated with the length-1 sequential patterns. Thus, the size of the intermediate key/value pairs set is reduced so dramatically that the problem of single Map node bottleneck which results from combinatorial explosion of candidate pattern space is avoided. Over three rounds of Map/Reduce tasks, the pattern space is constructed and divided, the filtering rules is set up and used, father more, the sequential pattern mining is realized in each pattern subspace independently. By making the best of both the universal trait of the whole pattern space and the individuality of pattern subspace, the optimized non-recursive algorithm is designed and implemented to improve the efficiency of mining phase by avoid unnecessary constructing of prefix projected databases and scanning of the constructed prefix projected databases.