计算机工程与应用 ›› 2011, Vol. 47 ›› Issue (16): 135-137.

• 数据库、信号与信息处理 • 上一篇    下一篇

新的结合互信息和粗糙集的特征选择

史岳鹏1,张明慧2,朱颢东3   

  1. 1.郑州牧业工程高等专科学校 信息工程系,郑州 450011
    2.郑州师范学院 信息技术系,郑州 450044
    3.郑州轻工业学院 计算机与通信工程学院,郑州 450002
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2011-06-01 发布日期:2011-06-01

New feature selection combined MI with RS

SHI Yuepeng1,ZHANG Minghui2,ZHU Haodong3   

  1. 1.Department of Information Engineering,Zhengzhou College of Animal Husbandry Engineering,Zhengzhou 450011,China
    2.Department of Information Technology,Zhengzhou Normal University,Zhengzhou 450044,China
    3.School of Computer and Communication Engineering,Zhengzhou University of Light Industry,Zhengzhou 450002,China
  • Received:1900-01-01 Revised:1900-01-01 Online:2011-06-01 Published:2011-06-01

摘要: 特征选择是文本分类的一个重要步骤。分析了互信息,针对其不足引进了粗糙集给出了一个基于关系积的属性约简算法,并以此为基础提出了一个新的适用于海量文本数据集的特征选择方法。该方法使互信息进行特征初选,利用基于关系积的属性约简算法消除冗余词。实验结果表明此种特征选择方法的微平均F1和宏平均F1较高。

关键词: 特征选择, 文本分类, 互信息, 粗糙集, 属性约简

Abstract: Feature selection is an important step in text categorization.MI is analyzed,according to deficiency of MI,RS is introduced and an attribute reduction algorithm based on attribute union is proposed.A new feature selection method combined MI with the proposed attribute reduction algorithm is presented which is suitable for massive text data sets.The method uses MI to select features,and employs the proposed attribute reduction algorithm to eliminate redundancy.The experimental results show that micro average F1 and macro average F1 of the new method are higher.

Key words: feature selection, text categorization, Mutual Information(MI), Rough Set(RS), attribute reduction