计算机工程与应用 ›› 2008, Vol. 44 ›› Issue (33): 105-107.DOI: 10.3778/j.issn.1002-8331.2008.33.032

• 网络、通信、安全 • 上一篇    下一篇

关于贝叶斯推理的垃圾邮件特征选择评估函数

闫 鹏1,2,郑雪峰1,李明祥1,陈松华2   

  1. 1.北京科技大学 信息工程学院,北京 100083
    2.国家信息中心,北京 100045
  • 收稿日期:2007-12-19 修回日期:2008-03-21 出版日期:2008-11-21 发布日期:2008-11-21
  • 通讯作者: 闫 鹏

Feature selection approach based on Bayes reasoning in anti-spam classifier

YAN Peng1,2,ZHENG Xue-feng1,LI Ming-xiang1,CHEN Song-hua2   

  1. 1.Information Engineering School of University Science and Technology Beijing,Beijing 100083,China
    2.The State Information Center,Beijing 100045,China
  • Received:2007-12-19 Revised:2008-03-21 Online:2008-11-21 Published:2008-11-21
  • Contact: YAN Peng

摘要: 在各种基于机器学习的垃圾邮件过滤系统中,特征选择是基础且非常关键的一个环节,它对整个系统的性能和效率都有直接的影响。通过对垃圾邮件特点的分析,提出了一种基于贝叶斯推理的特征选择评估函数方法。新方法运算开销较小,且能够区分出不同的特征词在体现垃圾邮件特征时所存在的差异性,因而在进行特征选择时较其它常用方法更具针对性,非常利于提高过滤系统的准确性和运行效率。

Abstract: FS(Feature Selection) is a basal but crucial step within anti-spam classifiers based on ML(Machine Learning) algorithms.Nowadays FS based on Mutual Information(MI) is widely used.In this paper,by analyzing characteristic of spam emails,a new FS approach based on Bayes reasoning is presented.Experiments show that it can achieve much higher performance and efficiency than MI approach.