计算机工程与应用 ›› 2007, Vol. 43 ›› Issue (10): 170-173.

• 数据库与信息处理 • 上一篇    下一篇

基于PageRank和锚文本的网页排序研究

刘菁菁 林鸿飞 赵晶   

  1. 大连理工大学计算机系 大连理工大学模具研究所
  • 收稿日期:2006-07-26 修回日期:1900-01-01 出版日期:2007-04-01 发布日期:2007-04-01
  • 通讯作者: 林鸿飞

Study on Ranking Web Pages Based on PageRank and Anchor Text

Liu Jingjing Hongfei Lin Jing Zhao   

  • Received:2006-07-26 Revised:1900-01-01 Online:2007-04-01 Published:2007-04-01
  • Contact: Hongfei Lin

摘要: 网页和纯文本结构差异性决定了传统的IR排序技术不能适应网络发展。为合理排序检索结果,引入了基于文献引文分析法原理的链接分析方法。该方法对被多个网页链接的网页赋予较高评价,同时考虑锚文本与查询词的相似度。源网页质量参差不齐,链向相同网页的锚文本质量也有优劣之分,但高质量源网页的锚文本不一定比质量低源网页的准确。本文对相似度高的锚文本加以修正,即通过计算查询词和锚文本相似度,对于相似度较高但源于PageRank值低的源网页的锚文本加以补偿,并重新排序查询结果。

关键词: 链接分析, 锚文本, PageRank, 网页排序

Abstract: Differences in the structure of Web pages and text make traditional IR methods not meet development of Web. In order to rank results properly, link analysis is introduced into results analysis, which based on academic citation analysis. It gives high evaluation to pages having many back links. Meantime, similarity between anchor text and query is paid attention to. The qualities of source pages vary much and anchor texts from different source pages also differentiate. But maybe anchor text from low quality source pages is more precious than those from high ones. This paper amends the anchor text having large similarity. That is: calculating the similarity between anchor texts and query and amending those anchor text of high similarity from source web pages with low Page Rank. And then use the new value to rank web pages again.

Key words: Link Analysi, Anchor Text, PageRank, ranking web page