LIN Jing-huai, LIU Zhi-yu, LI Jun-liang, GAO Xin, LI Ze-ke, TANG Zhi-jun, YU Si-hang, XU Jian-hang. High-dimensional hypersphere oversampling method for imbalance classification[J]. Microelectronics & Computer, 2021, 38(5): 65-72.
Citation: LIN Jing-huai, LIU Zhi-yu, LI Jun-liang, GAO Xin, LI Ze-ke, TANG Zhi-jun, YU Si-hang, XU Jian-hang. High-dimensional hypersphere oversampling method for imbalance classification[J]. Microelectronics & Computer, 2021, 38(5): 65-72.

High-dimensional hypersphere oversampling method for imbalance classification

  • In the research of imbalance classification methods of machine learning, the classifier is prone to the problem of low judgment accuracydue to the large difference of the number between the majority class and the minority class. A class of oversampling methods represented by SMOTE are effective to deal with this problem. These types of methods randomly generate the minority new points in the selected line segment to rebalance the data set, but there is the defect of ignoring the diversity of minority samples in the super-dimensional space. A high-dimensional Hypersphere-SMOTE (HS-SMOTE) method isproposed for imbalanced data classification. On the minority sample set, the number of samples that need to be balanced is obtained by random sampling, and based on this sampling, for each sample, its corresponding nearest neighbor is selected in turn through the Euclidean distance in the minority distribution space, and the midpoint of the two points is used for the center to construct a sampled hypersphere in the super-dimensional space. In this area, randomly generate the required minority new points through the dimensional space distance iteration, thus the spatial distribution diversity of the minority samples is increased on the basis of rebalancing the category sample data. A large number of experiments have been carried out on 15 sets of KEEL imbalanced data sets combining Random Forest (RF)classifiers. Compared with the 6 typical oversampling methods, the method proposed in the article has good performance on G-meanandF1-score indicators, and have passed the validity verification of two statistical hypothesis testing methods.
  • loading

Catalog

    Turn off MathJax
    Article Contents

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return