计算机工程与应用 ›› 2010, Vol. 46 ›› Issue (4): 8-11.DOI: 10.3778/j.issn.1002-8331.2010.04.003

• 博士论坛 • 上一篇    下一篇

使用优化模拟退火算法的文本特征选择

朱颢东1,2,钟 勇1,2   

  1. 1.中国科学院 成都计算机应用研究所,成都 610041
    2.中国科学院 研究生院,北京 100039
  • 收稿日期:2009-09-21 修回日期:2009-11-26 出版日期:2010-02-01 发布日期:2010-02-01
  • 通讯作者: 朱颢东

Text feature selection based on improved simulated annealing algorithm

ZHU Hao-dong1,2,ZHONG Yong1,2   

  1. 1.Chengdu Institute of Computer Application,Chinese Academy of Sciences,Chengdu 610041,China
    2.The Graduate School of the Chinese Academy of Sciences,Beijing 100039,China
  • Received:2009-09-21 Revised:2009-11-26 Online:2010-02-01 Published:2010-02-01
  • Contact: ZHU Hao-dong

摘要: 在文本分类中,特征空间维数通常高达几万,甚至远远超出训练样本的个数,这是一种十分普遍的现象。为了提高文本挖掘算法的运行速度,降低占用的内存空间,提出了一种基于优化的模拟退火算法的特征选择方法。在该方法中,为避免遗失当前最优解,增加了记忆功能,将当前最好的状态记忆下来,从而使得模拟退火算法成为一种智能化算法;设计了一个自适应温度更新函数,并设置双阈值使得在尽量保持最优性的前提下减少计算量,从而较快地获得较具代表性的特征子集。实验结果表明该方法是有效的。

Abstract: In text categorization,one problem is usually confronted with feature spaces containing 10,000 dimensions and more,even exceeding the number of available training samples.In order to enhance operating speed and reduce memory space occupied,a feature selection method based on an improved Simulated Annealing Algorithm is presented.In order to avoid missing current optimal solution,the presented method is increased memory function to remember the current best state so that it becomes an intelligent algorithm.An adaptive temperature update function and a dual-threshold are set up to reduce amount of calculation,so can acquire quickly the feature subsets which are more representative.Experimental results show that presented method is effective.

中图分类号: