QIN Jing, QIAN Xue-zhong, WANG Wei-tao, XIE Guo-wei, SONG Wei. A Algorithm for Unbalanced Big Sata Using Paralleled Random Forest[J]. Microelectronics & Computer, 2017, 34(4): 22-27.
Citation: QIN Jing, QIAN Xue-zhong, WANG Wei-tao, XIE Guo-wei, SONG Wei. A Algorithm for Unbalanced Big Sata Using Paralleled Random Forest[J]. Microelectronics & Computer, 2017, 34(4): 22-27.

A Algorithm for Unbalanced Big Sata Using Paralleled Random Forest

  • Paralleled random forest algorithm based on MapReduce(MR_RF) which constructing trees on partitions is a classic ensemble algorithm for big data classification.However, when encountering imbalanced big data, it's performance will decrease with the tendency of positive samples misclassified because of the low density of positive samples themselves and the algorithm's global optimal criteria for choosing split points.In this paper, An improved paralleled random forest called SBWMR_RF is proposed.It adopts stratified bootstrap to increase the minority during sampling.At the same time, cost-sensitive thought is applied through the key steps of tree construction, modify the distribution of the minority.The experiments prove that SBWMR_RF can effectively classify unbalanced big data especially in extremely unbalanced data scenario without overfitting but high speedup.
  • loading

Catalog

    Turn off MathJax
    Article Contents

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return