基于自适应领域粗糙集的多标签在线流特征选择

张海翔; 李培培; 胡学钢

doi:10.19304/J.ISSN1000-7180.2022.0004

基于自适应领域粗糙集的多标签在线流特征选择

Multi-label online stream feature selection based on Adaptive Neighborhood Rough Set

摘要

摘要: 多标签特征选择指在多标签场景下选出代表性属性.已有的多标签特征选择方法大多集中在事先获得全部特征空间，而没有考虑流式特征情况.随着时间的推移，这些特征不断地流入模型中.此外，一些流方法需要在学习之前指定参数.因此，在训练不同类型数据集之前，如何选取统一和最优参数成为一种难题.基于此，本文定义自适应邻域粗糙集关系-Gap，并提出自适应领域粗糙集多标签在线流特征选择方法(Multi-Label Online stream Feature Selection based on Adaptive Neighborhood Rough Set，ML-OFS-ANRS).其中邻域粗糙集的数据挖掘不需要任何特征空间结构的先验知识，在处理混合数据时也不会破坏数据的邻域和顺序结构.在第一阶段，根据动态最大依赖将相关和重要的特征选择到已选子集中.为过滤冗余特征，计算每个特征的重要性，并在已选子集中执行并行归约作为第二阶段.因而，采用"动态最大依赖、在线冗余减少"评价标准，ML-OFS-ANRS可以选择高相关性、低冗余的特征.实验表明，在10种不同类型的数据集上，ML-OFS-ANRS在特征数量相同的情况下优于传统特征选择方法和先进的在线流特征选择算法.

Abstract: Multi-label feature selection aims to select representative attributes in multi-label scenarios.Most of the existing multi-label feature selection methods focus on obtaining all the feature spaces in advance without considering the streaming feature situation. These features constantly flow into the model one by one over time. In addition, other streaming feature methods need to specify parameters before learning.Therefore, before training different types of data sets, how to select uniform and optimal parameters becomes a difficult problem..Motivated by this, this paper defines the adaptive neighborhood rough set relationship-Gap, and proposes the Multi-Label Online stream Feature Selection based on Adaptive Neighborhood Rough Set(ML-OFS-ANRS).The data mining of neighborhood rough sets does not require any prior knowledge of the feature space structure. It also does not breakingthe neighborhood and order structure of the data when dealing with mixed data.In the first stage, relevant and importantfeatures are selected into the selected subset based on dynamic maximal-dependency. To filter redundant features, the importance of each feature is calculated and parallel reduction is performed in the selected subsetas the second stage.Thus, with the "dynamic maximal-dependency, online irrelevancy discarding"evaluation criteria, ML-OFS-ANRS can select features with high correlation and low redundancy.Experimental results show that ML-OFS-ANRS is superior to traditional feature selection methods and advanced online stream feature selection algorithms when the number of features is the same on 10 different types of data sets.

HTML全文

参考文献(36)

施引文献

资源附件(0)