A Method for Unbalanced Big Data Classification Based on Optimization Random Forest

MA Hai-rong; CHENG Xin-wen

MA Hai-rong, CHENG Xin-wen. A Method for Unbalanced Big Data Classification Based on Optimization Random Forest[J]. Microelectronics & Computer, 2018, 35(11): 28-32.

Citation:

MA Hai-rong, CHENG Xin-wen. A Method for Unbalanced Big Data Classification Based on Optimization Random Forest[J]. Microelectronics & Computer, 2018, 35(11): 28-32.

Citation:

MA Hai-rong, CHENG Xin-wen. A Method for Unbalanced Big Data Classification Based on Optimization Random Forest[J]. Microelectronics & Computer, 2018, 35(11): 28-32.

A Method for Unbalanced Big Data Classification Based on Optimization Random Forest

Abstract

Abstract

When utilized traditional random forest (RF) model for classification, there were following problems exist:for example, the classification accuracy was affected by the unbalanced sample set, equality votes of each class would lead to algorithms stalling. We improved the traditional RF model, first of all, we randomly selected the same number of samples from minority class and majority class to build training sample set for RF modeling. Then, according to the voting entropy and the generalized Euclidean distance based on the sample characteristic parameters gradually add the sample with maximum voting entropy to the training sample set. This could solve the problem that in traditional RF model training samples randomly selected contained too few minority class samples. In the classification process when the voting draw occurs, we utilized the generalized Euclidean distance between the test samples and the adjacent training samples to determine the classification result, this would eliminate the stagnation caused by the equality votes of each class. The experimental results show that the optimized RF model in this paper could achieve better classification results for unbalanced data sets.

FullText(HTML)

References (14)

Relative Articles

Supplements (0)

Cited By

Turn off MathJax

Article Contents

A Method for Unbalanced Big Data Classification Based on Optimization Random Forest

Abstract

Catalog

Export File

Citation

Format

Content