计算机工程与应用 ›› 2017, Vol. 53 ›› Issue (7): 97-103.DOI: 10.3778/j.issn.1002-8331.1509-0266

• 大数据与云计算 • 上一篇    下一篇

一种多特征因子融合的PageRank算法研究

齐向明,孙文心   

  1. 辽宁工程技术大学 软件学院,辽宁 葫芦岛 125105
  • 出版日期:2017-04-01 发布日期:2017-04-01

Research on pagerank algorithm based on multi-feature factor fusion

QI Xiangming, SUN Wenxin   

  1. College of Software, Liaoning Technical University, Huludao, Liaoning 125105, China
  • Online:2017-04-01 Published:2017-04-01

摘要:
摘  要:针对PageRank算法完全依据链接结构排序,未考虑网页内容分析,造成平均分配PR值、主题漂移、偏重旧网页的现象,且已有改进算法存在单一性优化等问题,提出一种多特征因子融合的PageRank算法。该算法为使搜索结果更接近用户查询需求,同时兼顾搜索内容的相关度和查准率,通过添加链入链出权重因子、用户反馈因子、主题相关因子和时间因子,共同改善PageRank算法存在的不足。实验结果表明,所提算法在内容相关性和查准率方面,较其他网页排序算法有明显提高,达到优化PageRank算法的目的。

关键词: PageRank算法, 链接结构, 网页内容, 链入链出权重因子, 用户反馈因子, 主题相关因子, 时间因子

Abstract: Aiming at the PageRank algorithm ranking only taking the inlinks into account, not considering Web content analysis, leads to problems of average distribution of Webpage PR, topic drift, emphasis old pages, and uniqueness optimization in the existing improved algorithm, this paper proposes an improved PageRank algorithm based on link and Web content. The algorithm makes the search results more close to user needs, taking the correlation and precision of search content into account, by adding inlinks and outlinks weighted factor, user feedback factor, topic relevance factor and time factor, jointly to improve the existing problems of PageRank algorithm. The experimental results show that, the proposed algorithm in the content relevance and precision, has a great improvement compared with other Webpage ranking algorithm,achieves the purpose of optimizing the PageRank algorithm.

Key words: PageRank algorithm, link structure, Web content, inlinks and outlinks weighted factor, user feedback factor, topic relevance factor, time factor