计算机工程与应用 ›› 2008, Vol. 44 ›› Issue (25): 109-112.DOI: 10.3778/j.issn.1002-8331.2008.25.033

• 网络、通信、安全 • 上一篇    下一篇

基于向量空间模型的网页文本可信性分类方法

毛雪云1,2,曾国荪1,2,王 伟1,2   

  1. 1.同济大学 计算机科学与技术系,上海 201804
    2.国家高性能计算机工程技术中心 同济分中心,上海 201804
  • 收稿日期:2008-01-24 修回日期:2008-07-14 出版日期:2008-09-01 发布日期:2008-09-01
  • 通讯作者: 毛雪云

Web text trustworthiness classification method based on VSM

MAO Xue-yun1,2,ZENG Guo-sun1,2,WANG Wei1,2   

  1. 1.Department of Computer Science and Engineering,Tongji University,Shanghai 201804,China
    2.Tongji Branch,National Engineering & Technology Center of High Performance Computer,Shanghai 201804,China
  • Received:2008-01-24 Revised:2008-07-14 Online:2008-09-01 Published:2008-09-01
  • Contact: MAO Xue-yun

摘要: 开放网络环境下存在大量的信息文档,如何判断文档内容的可信性、安全性一直是一个值得深入研究的问题。论文研究了可信文本分类的方法,收集了体现文本可信性的点滴素材,建立了文本的信任特征向量,并结合已有的特征选择方法,实现了一个基于向量空间模型的文本可信性分类算法,实验表明该方法具有较好的分类效果。

关键词: 可信文本分类, 信任特征向量, 分类

Abstract: There are vast information documents in the open Internet environment.But how to judge their trustworthiness and security is a problem worthy of deep research.This paper introduces web text classification method,extracts many trust materials from documents,and establishes the trust eigenvectors of texts.Combining existing technique of feather selecting with the method of trustworthiness feather selection,this paper implements web text trustworthiness classification algorithm based on VSM,and achieves preferable effects.

Key words: content trust, text trustworthiness classification, trust eigenvector