计算机工程与应用 ›› 2007, Vol. 43 ›› Issue (25): 178-181.

• 数据库与信息处理 • 上一篇    下一篇

基于改进K近邻的垃圾邮件过滤技术

田 泽,颜松远,徐敬东   

  1. 南开大学 信息技术科学学院,天津 300071
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2007-09-01 发布日期:2007-09-01
  • 通讯作者: 田 泽

Spam filtering method based on improved KNN

TIAN Ze,YAN Song-yuan,XU Jing-dong   

  1. School of Information Science and Technology,Nankai University,Tianjin 300071,China
  • Received:1900-01-01 Revised:1900-01-01 Online:2007-09-01 Published:2007-09-01
  • Contact: TIAN Ze

摘要: 提出了一种基于K近邻(KNN)原理的快速文本分类算法。该算法不仅具有原始K近邻算法分类效果好的优点,还通过对训练样本进行压缩,消除相似度之间的比较,提高了分类效率。实验表明,该算法用于邮件过滤系统时,分类效果要优于基于朴素贝叶斯分类器的二项独立模型和多项式模型,而分类的时间复杂度与其相当,完全可以应用于实时邮件过滤。

关键词: 快速KNN算法, 文本分类, 邮件过滤

Abstract: This paper presents a fast text classification algorithm based on KNN(K Nearest Neighbor).It increases the classification efficiency by compressing training samples and eliminating comparisons between similarities,while maintaining high classification performance of the original KNN algorithm.The experiment shows that in E-mail filter system,the new algorithm has a better classification performance than Binary Bernoulli Model or Multinomial Model,both of which are based on Naive Bayes classifier.And its computational complexity of classification is equal to these two algorithms,so it can be applied to real-time E-mail filtering.

Key words: fast KNN algorithm, text classification, spam filtering