使用优化模拟退火算法的文本特征选择

doi:10.3778/j.issn.1002-8331.2010.04.003

计算机工程与应用 ›› 2010, Vol. 46 ›› Issue (4): 8-11.DOI: 10.3778/j.issn.1002-8331.2010.04.003

使用优化模拟退火算法的文本特征选择

朱颢东^1，2，钟勇^1，2

1.中国科学院成都计算机应用研究所，成都 610041
2.中国科学院研究生院，北京 100039

收稿日期:2009-09-21 修回日期:2009-11-26 出版日期:2010-02-01 发布日期:2010-02-01
通讯作者: 朱颢东

Text feature selection based on improved simulated annealing algorithm

ZHU Hao-dong^1，2，ZHONG Yong^1，2

1.Chengdu Institute of Computer Application，Chinese Academy of Sciences，Chengdu 610041，China
2.The Graduate School of the Chinese Academy of Sciences，Beijing 100039，China

Received:2009-09-21 Revised:2009-11-26 Online:2010-02-01 Published:2010-02-01
Contact: ZHU Hao-dong

摘要/Abstract

摘要： 在文本分类中，特征空间维数通常高达几万，甚至远远超出训练样本的个数，这是一种十分普遍的现象。为了提高文本挖掘算法的运行速度，降低占用的内存空间，提出了一种基于优化的模拟退火算法的特征选择方法。在该方法中，为避免遗失当前最优解，增加了记忆功能，将当前最好的状态记忆下来，从而使得模拟退火算法成为一种智能化算法；设计了一个自适应温度更新函数，并设置双阈值使得在尽量保持最优性的前提下减少计算量，从而较快地获得较具代表性的特征子集。实验结果表明该方法是有效的。

Abstract: In text categorization，one problem is usually confronted with feature spaces containing 10，000 dimensions and more，even exceeding the number of available training samples.In order to enhance operating speed and reduce memory space occupied，a feature selection method based on an improved Simulated Annealing Algorithm is presented.In order to avoid missing current optimal solution，the presented method is increased memory function to remember the current best state so that it becomes an intelligent algorithm.An adaptive temperature update function and a dual-threshold are set up to reduce amount of calculation，so can acquire quickly the feature subsets which are more representative.Experimental results show that presented method is effective.

中图分类号:

TP301

朱颢东^1，2，钟勇^1，2. 使用优化模拟退火算法的文本特征选择[J]. 计算机工程与应用, 2010, 46(4): 8-11.

ZHU Hao-dong^1，2，ZHONG Yong^1，2. Text feature selection based on improved simulated annealing algorithm[J]. Computer Engineering and Applications, 2010, 46(4): 8-11.

[1]	洪智勇^1，2，秦克云¹，邓维斌³. 基于VPRS理论的一种混合分类算法[J]. 计算机工程与应用, 2010, 46(9): 23-25.
[2]	姚雄武，郑金华，潘文俊. 用进化算法和函数优化模型分析回溯算法上界[J]. 计算机工程与应用, 2010, 46(9): 26-30.
[3]	程伟，陈森发. 权重自适应调整的混沌量子粒子群优化[J]. 计算机工程与应用, 2010, 46(9): 46-48.
[4]	王晓宇，陆佩忠. 实时任务在异构集群中的自适应容错调度研究[J]. 计算机工程与应用, 2010, 46(9): 75-79.
[5]	姚灿中，杨建梅. 复杂网络分形的盒维数改进算法[J]. 计算机工程与应用, 2010, 46(8): 5-7.
[6]	徐鹏飞^1，2，陈志刚². 增量构造Voronoi区域的改进算法[J]. 计算机工程与应用, 2010, 46(8): 8-10.
[7]	石竑松，秦志光. 对数空间可构造的无向图遍历序列[J]. 计算机工程与应用, 2010, 46(8): 11-15.
[8]	刘峥峥，蒋凡，杨俊. 模型转换规则自动生成研究[J]. 计算机工程与应用, 2010, 46(8): 56-60.
[9]	沈中林，崔建国. 改进的隐私保护关联规则挖掘算法[J]. 计算机工程与应用, 2010, 46(8): 133-136.
[10]	曾江辉¹，曾凤章²，陈嵩辉³. 马田系统与SVM相集成的模式识别技术研究[J]. 计算机工程与应用, 2010, 46(8): 245-248.
[11]	杜卓明，屠宏，耿国华. KPCA方法过程研究与应用[J]. 计算机工程与应用, 2010, 46(7): 8-10.
[12]	苏志同，李晋宏，林满山. 基于差别矩阵的属性约简算法及其应用[J]. 计算机工程与应用, 2010, 46(7): 221-222.
[13]	钟雪灵^1，2. 基于动态规划的分批排序算法[J]. 计算机工程与应用, 2010, 46(7): 229-231.
[14]	申锦标，吕跃进. 广义不完备决策表的知识约简和规则提取[J]. 计算机工程与应用, 2010, 46(6): 33-36.
[15]	宣恒农，何涛，许宏，孙明明. 基于Chwa & Hakimi故障模型的二分诊断算法[J]. 计算机工程与应用, 2010, 46(5): 66-68.

使用优化模拟退火算法的文本特征选择

Text feature selection based on improved simulated annealing algorithm

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics