Computer Engineering and Applications ›› 2013, Vol. 49 ›› Issue (7): 98-101.

Previous Articles     Next Articles

Improved probability-based Bayesian anti-spam mechanism

XUE Zhengyuan   

  1. School of Information Engineering, Zhengzhou University, Zhengzhou 450001, China
  • Online:2013-04-01 Published:2013-04-15

基于改进贝叶斯决策的邮件过滤

薛正元   

  1. 郑州大学 信息工程学院,郑州 450001

Abstract: This paper confers in depth to the limitations of the traditional Bayesian anti-spam mechanism. It seldom thinks about whether the threshold is suitable or not, so the recalling is reduced. Aiming at this question, the paper proposes a lower-error policy decision based on chance variable; and considering the particularity of email classification, a lower-risk policy decision based on chance variable is proposed. The experimental results show that the former one maybe a better way to classify the common text; and the latter one makes better performance on recalling and F value when dealing with emails, at the same time it keeps a lower risk of error judging.

Key words: spam email, email filter, probability, threshold, classify decision

摘要: 探讨了基于概率阈值的贝叶斯邮件过滤模型的局限性:由于很少考虑所设定阈值的适用性和实用性,损失了一定的召回率。改进贝叶斯决策,提出了基于随机变量的较小错误分类决策方法;针对邮件处理的特殊性,进一步提出了基于随机变量的较小风险分类决策方法。实验结果表明,处理普通文本分类问题时,前者的分类决策效果更好;而后者在处理邮件问题时性能更优,能够在保持较小误判风险的同时,提高贝叶斯邮件过滤器的召回率以及F值。

关键词: 垃圾邮件, 邮件过滤, 概率, 阈值, 分类决策