SONG Guo-xing, ZHOU Xi, MA Bo, ZHAO Fan. Research on High Dimensional Similarity Duplicate Record Detection Algorithm Based on R-tree Index[J]. Microelectronics & Computer, 2017, 34(9): 97-102.
Citation: SONG Guo-xing, ZHOU Xi, MA Bo, ZHAO Fan. Research on High Dimensional Similarity Duplicate Record Detection Algorithm Based on R-tree Index[J]. Microelectronics & Computer, 2017, 34(9): 97-102.

Research on High Dimensional Similarity Duplicate Record Detection Algorithm Based on R-tree Index

  • The classic similar duplicate record detection algorithm SNM, With the increase of the recording dimension, the process of projecting can not only lead to the loss of data, but also the error rate of the algorithm will increase obviously.Aiming at the deficiency of SNM algorithm, using R-tree to construct index maintains the high dimension space characteristic of records.By clustering, the times of records comparing was reduced, so that the efficiency was improved.In order to avoid the influence of high dimensional data scarcity, an improved distance algorithm for measuring record similarity is proposed.Finally, the validity of the algorithm is verified by comparing the real data with the SNM algorithm in different dimensions.
  • loading

Catalog

    Turn off MathJax
    Article Contents

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return