计算机工程与应用 ›› 2013, Vol. 49 ›› Issue (8): 142-145.

• 数据库、数据挖掘、机器学习 • 上一篇    下一篇


朱  戈   

  1. 黑龙江大学 信息与网络建设管理中心,哈尔滨 150080
  • 出版日期:2013-04-15 发布日期:2013-04-15

PageRank-based document similarity search algorithm

ZHU Ge   

  1. Center of Information and Network, Heilongjiang University, Harbin 150080, China
  • Online:2013-04-15 Published:2013-04-15

摘要: 在分析了PageRank算法基础上,提出了PageRank应用于科技文献相似性搜索的可行性,针对PageRank的不足提出了一种改进算法,该算法结合了对文献内容和文献间的引用关系的分析,综合计算文献间相似度,提高了搜索结果的准确率,并通过实验验证了算法的有效性和可行性。

关键词: 科技文献, 相似性搜索, PageRank算法

Abstract: After analyzing the original PageRank algorithm several times, the feasibility of PageRank algorithm applying to similarity search scientific document is proposed, and the improved  algorithm is proposed to solve the disadvantages of PageRank. The algorithm combines the contents of documents and analysis of references between documents and solves the related needs and authority needs from the perspective of content analysis citation analysis, integrated computing similarity between documents to improve the accuracy of search results. It proves the effectiveness and feasibility of the algorithm by experiments.

Key words: scientific document, similarity search, PageRank algorithm