基于卡方统计的近似子图匹配

徐周波; 黄文文; 刘华东; 杨健

基于卡方统计的近似子图匹配

Approximate subgraph matching based on Chi-squares statistics

摘要

摘要: 图查询的应用越来越广泛，其中近似子图匹配是核心技术之一.但是大规模图数据中噪音的存在对近似子图匹配精确度影响较大，为进一步提高近似子图匹配算法的鲁棒性和实时性，提出一种基于卡方统计的近似子图匹配改进算法.在算法预处理阶段，利用统一邻居随机游走距离和高斯影响函数将目标图划分，使得划分后的子图在拓扑结构和标签属性之间达到最佳平衡.在算法匹配阶段，使用卡方统计量捕获的统计显著性来表征近似子图匹配结构相似度，再结合权重系数α调整结构相似度和标签相似度所占比重，其中统计显著性模型能够充分考虑背景结构和顶点邻域中的标签分布，有效处理部分标签和结构失配，从而得到最佳匹配子图.真实数据集中的实验结果表明，该算法效果较好，运算效率较高，可以应用于Top-k近似子图匹配.

Abstract: Graph query is more and more widely used, and approximate subgraph matching is one of the core technologies. However, the existence of noise in large-scale graph data has a great impact on the accuracy of approximate subgraph matching. In order to further improve the robustness and real-time performance of approximate subgraph matching algorithm, an improved algorithm of approximate subgraph matching based on Chi-square statistics is proposed. In the preprocessing stage, the target graph is partitioned by using the Unified Neighborhood Random Walk Distance and Gaussian Influence Function, so that the partitioned subgraph can achieve the best balance between the topological structure and label attributes.In the algorithm matching stage, the statistical significance captured by Chi-square statistics is used to characterize the subgraph similarity of approximate subgraph matching, and then the proportion of structural similarity and label similarity is adjusted by combining the weight coefficient α. The statistical significance model can takes into account the background structure and label distribution in the neighborhood of vertices, and effectively handles partial label and structural mismatches, so as to get the best matching subgraph. The experimental results in real dataset show that the algorithm is effective and efficient, and can be applied to Top-k approximate subgraph matching.

HTML全文

参考文献(12)

施引文献

资源附件(0)