Abstract:
Graph query is more and more widely used, and approximate subgraph matching is one of the core technologies. However, the existence of noise in large-scale graph data has a great impact on the accuracy of approximate subgraph matching. In order to further improve the robustness and real-time performance of approximate subgraph matching algorithm, an improved algorithm of approximate subgraph matching based on Chi-square statistics is proposed. In the preprocessing stage, the target graph is partitioned by using the Unified Neighborhood Random Walk Distance and Gaussian Influence Function, so that the partitioned subgraph can achieve the best balance between the topological structure and label attributes.In the algorithm matching stage, the statistical significance captured by Chi-square statistics is used to characterize the subgraph similarity of approximate subgraph matching, and then the proportion of structural similarity and label similarity is adjusted by combining the weight coefficient α. The statistical significance model can takes into account the background structure and label distribution in the neighborhood of vertices, and effectively handles partial label and structural mismatches, so as to get the best matching subgraph. The experimental results in real dataset show that the algorithm is effective and efficient, and can be applied to Top-k approximate subgraph matching.