计算机工程与应用 ›› 2007, Vol. 43 ›› Issue (22): 115-119.

• 网络、通信与安全 • 上一篇    下一篇

基于潜在语义的网络社区发现

班 磊,方启明,武永卫,杨广文   

  1. 清华大学 计算机系 清华信息科学与技术国家实验室(筹),北京 100084
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2007-08-01 发布日期:2007-08-01
  • 通讯作者: 班 磊

Web community detection with latent semantics

BAN Lei,FANG Qi-ming,WU Yong-wei,YANG Guang-wen   

  1. Tsinghua National Laboratory for Information Science and Technology,Department of Computer Science and Technology,Tsinghua University,Beijing 100084,China
  • Received:1900-01-01 Revised:1900-01-01 Online:2007-08-01 Published:2007-08-01
  • Contact: BAN Lei

摘要: 采用类似于LSI的方法,对于blog网页的链接进行了一次关于潜在语义的探索,借以发现网络社区。从实验的结果来看,基本验证了最初的想法,网页链接在一定程度上包含潜在语义的信息。注意到语义网与现今的HTML网页在链接问题上思想基本一致(只是多了语义的标记),因此该方法同样适用于语义网内的社区发现与信息检索,这也是进行研究初衷。另一个贡献是通过幂迭代对GMC聚类作了算法上的优化,使得在海量数据上的处理速度大大加快。

关键词: 语义检索, 网络社区, 潜在语义, GMC聚类, 幂迭代

Abstract: We explore the latent semantic relations between blog pages with links by a method similar to LSI to detect web communities.The result of our experiment confirms our original ideas that web links contain some latent semantic information.Notice that semantic web has no difference with current HTML web on links except for some semantic tags,we believe this method can also be applied to community detection and information retrieval on semantic web,which is the initial goal of our work.Another contribution of this paper is that we do some optimizations on GMC clustering method by power iteration,which makes it much faster when dealing with huge data source.

Key words: semantic search, web community, latent semantic, GMC clustering, power iteration