Computer Engineering and Applications ›› 2010, Vol. 46 ›› Issue (2): 129-131.DOI: 10.3778/j.issn.1002-8331.2010.02.039

• 数据库、信号与信息处理 • Previous Articles     Next Articles

Feature selection for illegitimate contents recognition

ZHANG Yong-kui1,2,GAO Feng1   

  1. 1.Faculty of Computer & Information Technology,Shanxi University,Taiyuan 030006,China
    2.Key Laboratory of Ministry of Education for Computation Intelligence and Chinese Information Processing,Taiyuan 030006,China
  • Received:2008-07-29 Revised:2008-10-20 Online:2010-01-11 Published:2010-01-11
  • Contact: ZHANG Yong-kui

一种不良文本识别特征选择方法

张永奎1,2,高 峰1   

  1. 1.山西大学 计算机与信息技术学院,太原 030006
    2.计算智能与中文信息处理教育部重点实验室,太原 030006
  • 通讯作者: 张永奎

Abstract: To describe a two-steps fearture selection method.Firstly,recognise all the special words from the training texts by finite accepter and add it to the final feature set,recover the original text as well.Then select features from the processed texts and add them to the feature set by the way of‘conbination feature selection method’.The experiment result shows that it can improve the precision of the illegitimate contents recognition

Key words: special words, finite accepter, feature selection, illegitimate contents recognition

摘要: 针对不良文本的特殊性,提出一种两步特征选择方法。首先对训练文本用有限自动机识别其中的特殊词作为特征加入特征集,同时将原文还原为不含特殊词的文本。对还原后文本用“组合特征选择方法”选择特征加入特征集。实验结果表明利用两步特征选择方法能有效提高非法文本识别精度。

关键词: 特殊词, 有限自动机, 特征选择, 不良文本识别

CLC Number: