计算机工程与应用 ›› 2018, Vol. 54 ›› Issue (3): 166-171.DOI: 10.3778/j.issn.1002-8331.1608-0122

• 模式识别与人工智能 • 上一篇    下一篇

动态误分类代价下代价敏感属性选择分治算法

黄伟婷1,赵  红2   

  1. 1.闽南师范大学 计算机学院,福建 漳州 363000
    2.闽南师范大学 粒计算及其应用重点实验室,福建 漳州 363000
  • 出版日期:2018-02-01 发布日期:2018-02-07

Divide and conquer algorithm for cost-sensitive feature selection based on dynamic misclassification costs

HUANG Weiting1, ZHAO Hong2   

  1. 1.School of Computing, Minnan Normal University, Zhangzhou, Fujian 363000, China
    2.Lab of Granular Computing, Minnan Normal University, Zhangzhou, Fujian 363000, China
  • Online:2018-02-01 Published:2018-02-07

摘要: 代价敏感属性选择问题的目的是通过权衡测试代价和误分类代价,得到一个具有最小总代价的属性子集。目前,多数代价敏感属性选择方法只考虑误分类代价固定不变的情况,不能较好地解决类分布不均衡等问题。而在大规模数据集上,算法效率不理想也是代价敏感属性选择的主要问题之一。针对这些问题,以总代价最小为目标,设计了一种新的动态误分类代价机制。结合分治思想,根据数据集规模按列自适应拆分各数据集。基于动态误分类代价重新定义最小代价属性选择问题,提出了动态误分类代价下的代价敏感属性选择分治算法。通过实验表明,该算法能在提高效率的同时获得最优误分类代价,从而保证所得属性子集的总代价最小。

关键词: 粗糙集, 代价敏感, 属性选择, 动态误分类代价, 自适应分治

Abstract: Cost-sensitive feature selection problem aims at getting an attribute subset with the minimal total cost, through considering the trade-off between test costs and misclassification costs. There are two main challenges in cost-sensitive feature selection problem. On the one hand, most of the cost-sensitive attribute selection methods only take fixed misclassification costs into account, thus these methods can’t solve imbalance class problems. On the other hand, the efficiency is not ideal when dealing with cost-sensitive feature selection on large scale datasets. In this paper, the contributions for the two challenges are summarized as follows. Firstly, it designs a new dynamic mechanism of misclassification costs to minimize total cost. Secondly, each of datasets is adaptively divided according to the scale of the dataset based on divide and conquer method. Finally, cost-sensitive feature selection problem is redefined based on dynamic misclassification costs, and a divide and conquer algorithm is proposed for cost-sensitive feature selection problem. The proposed algorithm is compared with two other algorithms on seven UCI datasets. Some experiments demonstrate that the proposed algorithm can improve the efficiency and obtain the optimal misclassification costs as well, so as to ensure to minimize total cost.

Key words: rough sets, cost-sensitive, feature selection, dynamic misclassification cost, adaptive divide and conquer