计算机工程与应用 ›› 2008, Vol. 44 ›› Issue (10): 135-137.
• 数据库、信号与信息处理 • 上一篇 下一篇
刘 健,张维明
收稿日期:
修回日期:
出版日期:
发布日期:
通讯作者:
LIU Jian,ZHANG Wei-ming
Received:
Revised:
Online:
Published:
Contact:
摘要: 通过对互信息(MI)文本特征选择方法与信息增益、卡方统计方法的实验研究比较,发现了影响MI方法性能的主要因素是特征选择过程中的随机性,通过加入扰动因子的方法对MI方法进行了改进,消除了随机性的影响,实验表明,改进后的MI方法与信息增益、卡方统计方法比较,具有较明显的优势。
关键词: 互信息, 信息增益, CHI, 文本分类, 特征选择
Abstract: A study on the contrast of text selection methods based on mutual information,information gain and CHI has been taken in this paper,find the main factor restrains the performance of MI method is randomicity,an improvement is proposed.The experiment shows that the improved method has good performance and is better than IG and CHI method.
Key words: mutual information, information gain, CHI, text classification, feature selection
刘 健,张维明. 基于互信息的文本特征选择方法研究与改进[J]. 计算机工程与应用, 2008, 44(10): 135-137.
LIU Jian,ZHANG Wei-ming. Study and improvement of mutual information based text feature selection method[J]. Computer Engineering and Applications, 2008, 44(10): 135-137.
0 / 推荐
导出引用管理器 EndNote|Ris|BibTeX
链接本文: http://cea.ceaj.org/CN/
http://cea.ceaj.org/CN/Y2008/V44/I10/135