Computer Engineering and Applications ›› 2009, Vol. 45 ›› Issue (5): 113-116.DOI: 10.3778/j.issn.1002-8331.2009.05.033

• 网络、通信、安全 • Previous Articles     Next Articles

Mining Web communities with PH-MaxFlow algorithm

GUO Xi-juan1,LIU Jing2   

  1. College of Information Science and Engineering,Yanshan University,Qinhuangdao,Hebei 066004,China
  • Received:2008-01-11 Revised:2008-04-21 Online:2009-02-11 Published:2009-02-11
  • Contact: GUO Xi-juan

PH-MaxFlow算法发现Web社区

郭希娟1,刘 静2   

  1. 燕山大学 信息科学与工程学院,河北 秦皇岛 066004
  • 通讯作者: 郭希娟

Abstract: HITS is a classical algorithm for the computation of the authority value and hub value of Web pages using link technology,it can mine Web communities related to some topic quickly,but sometimes there is“Topic Drift” phenomenon.This paper presents PHITS algorithm controlling the“Topic Drift” phenomenon to a certain extent,the PH-MaxFlow algorithm using the pages with high authority value as seeds can mine precise communities.An effective method is presented to appraise the identified Web communities.The results of experiment show that the PH-MaxFlow algorithm can mine more reasonable Web communities.

Key words: Web communities, Hyperlink-Induced Topic Search(HITS) algorithm, maximum flow algorithm

摘要: HITS是一种经典的利用链接技术计算网页权威值和中心值的算法,它能够快速发现主题相关网页,其缺点是会发生“主题偏移”现象,首先提出PHITS算法,在一定程度上抑制了这种现象的发生。运用该方法提取权威值高的页面,作为PH-MaxFlow算法的种子节点,使得发现的Web社区更精确。同时提出了一种有效的评价Web社区的标准,用这个标准对原始最大流算法和提出的PH-MaxFlow算法进行比较,从而得出PH-MaxFlow算法发现的Web社区与主题更相关。

关键词: Web社区, 基于超链接分析的主题搜索算法, 最大流算法