计算机工程与应用 ›› 2013, Vol. 49 ›› Issue (2): 92-96.

• 网络、通信、安全 • 上一篇    下一篇

一种针对同音词伪装的反垃圾短信系统设计

胡德敏,胡金龙   

  1. 上海理工大学 光电信息与计算机工程学院,上海 200093
  • 出版日期:2013-01-15 发布日期:2013-01-16

System design against spam message disguised with homonym

HU Demin, Hu Jinlong   

  1. School of Optical-Electrical and Computer Engineering, Shanghai 200093, China
  • Online:2013-01-15 Published:2013-01-16

摘要: 近年来随着垃圾短信过滤技术的进步,垃圾短信的特征也在发生变化,其中利用同音词伪装的垃圾短信,就能轻松逃避很多过滤系统的拦截。针对这个问题,利用同音词伪装其拼音不变的特点,提出了以拼音串作为提取垃圾短信特征的关键字,从短信中提取出普通向量和伪装向量,并分别作为输入量,进行相互独立的贝叶斯过滤的方法,最后综合两次过滤的结果,判断是否为垃圾短信。实验结果表明,该方法能有效地识利用同音字伪装的垃圾短信。

关键词: 垃圾短信, 贝叶斯分类, 分词, 概率, 提取

Abstract: As the progress of the spam message filtering technology, characteristics of spam message are changing all the time. Of them, spam message disguised with homonym can easily escape from filtering system. Feature that homonym shares same pinyin makes it possible that by replacing key words with pinyin it can pick up common vector and disguised vector. Making such two vectors as input of the filter system based on Bayesian respectively, it can get two independent outputs, by analyzing the outputs, the system can tell the spam message from the normal. Experimental result confirms that this system can identify spam message disguised with homonym effectively.

Key words: spam message, Bayesian classification, words spit, possibility, extract