李越颖. 基于邻域搜索的在线特征大数据分类方法[J]. 微电子学与计算机, 2021, 38(9): 61-66.
引用本文: 李越颖. 基于邻域搜索的在线特征大数据分类方法[J]. 微电子学与计算机, 2021, 38(9): 61-66.
LI Yueying. Big data classification method of neighborhood search for online feature selection[J]. Microelectronics & Computer, 2021, 38(9): 61-66.
Citation: LI Yueying. Big data classification method of neighborhood search for online feature selection[J]. Microelectronics & Computer, 2021, 38(9): 61-66.

基于邻域搜索的在线特征大数据分类方法

Big data classification method of neighborhood search for online feature selection

  • 摘要: 针对现有算法在处理海量数据集时处理效率低的问题,提出一种基于邻域搜索的在线特征选择(neighborhood search for online feature selection, NSOFS)并行大数据分类方法.在Map阶段,将大数据集进行分块,针对动态未知特征空间,通过萤火虫算法和模拟退火算法的优化,对于在线特征进行邻域搜索,选择最佳特征集,将获得的特征集作为Reduce阶段输入特征,然后使用内核支持向量机(Kernel Support Vector Machine, KSVM)对数据分类.实验结果表明:所提方法在精确率、召回率、F值和时间等性能方面优于其他现有方法.

     

    Abstract: In order to solve the problem of low efficiency of existing algorithms when dealing with massive data sets, a parallel big data classification method is proposed based on neighborhood search for online feature selection. In the Map phase, the big data set is divided into splits. And the dynamic unknown feature space is optimized by the firefly algorithm and simulated annealing algorithm, the neighborhood search is carried out for the online features, then selecting the best feature. The obtained feature set is used as the input feature of the Reduce stage, and then the kernel support vector machine is used to classify the data. Experimental results show that the proposed method is superior to other existing methods in terms of accuracy, recall, F value and time.

     

/

返回文章
返回