计算机工程与应用 ›› 2016, Vol. 52 ›› Issue (17): 93-100.

• 大数据与云计算 • 上一篇    下一篇

融合特征排序的多标记特征选择算法

王晨曦1,林梦雷2,刘景华2,王  娟2,林耀进2   

  1. 1.漳州职业技术学院 计算机工程系,福建 漳州 363000
    2.闽南师范大学 计算机学院,福建 漳州 363000
  • 出版日期:2016-09-01 发布日期:2016-09-14

Multi-label feature selection via fusing feature ranking

WANG Chenxi1, LIN Menglei2, LIU Jinghua2, WANG Juan2, LIN Yaojin2   

  1. 1.Department of Computer Engineering, Zhangzhou Institute of Technology, Zhangzhou, Fujian 363000, China
    2.School of Computer Science, Minnan Normal University, Zhangzhou, Fujian 363000, China
  • Online:2016-09-01 Published:2016-09-14

摘要: 在多标记学习框架中,特征选择是解决维数灾难,提高多标记分类器的有效手段。提出了一种融合特征排序的多标记特征选择算法。该算法首先在各标记下进行自适应的粒化样本,以此来构造特征与类别标记之间的邻域互信息。其次,对得到邻域互信息进行排序,使得每个类别标记下均能得到一组特征排序。最后,多个独立的特征排序经过聚类融合成一组新的特征排序。在4个多标记数据集和4个评价指标上的实验结果表明,所提算法优于一些当前流行的多标记降维方法。

关键词: 特征选择, 多标记分类, 聚类融合, 互信息

Abstract: In the framework of multi-label learning, feature selection is a powerful tool for solving the curse of dimensionality, which can improve the classification performance of multi-label classifier. In this paper, a multi-label feature selection algorithm via fusing  feature ranking is proposed. First, it?conducts adaptive graining samples based on different labels and employs the neighborhood of sample to compute the neighborhood mutual?information between feature and label, which can measure the importance degree of feature. Then, all features are?sorted in?descending order by the value of their neighborhood mutual?information under each label. Finally, it acquires a new feature rank by fusing all individual feature rank lists. Experiment is conducted on four data sets, and four evaluation?criteria are used to measure the effectiveness. Experimental results show that the proposed algorithm is superior to several state-of-the-art multi-label feature selection algorithms.

Key words: feature?selection, multi-label classification, clustering ensemble, mutual?information