计算机工程与应用 ›› 2007, Vol. 43 ›› Issue (21): 177-179.

• 数据库与信息处理 • 上一篇    下一篇

基于概念的论文相似性检索

李信利,吕月娥   

  1. 临沂师范学院 信息学院,山东 临沂 276003
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2007-07-21 发布日期:2007-07-21
  • 通讯作者: 李信利

Algorithms based on concept for theses similarity retrieval

LI Xin-li,LV Yue-e   

  1. School of Information,Linyi Normal University,Linyi,Shandong 276003,China
  • Received:1900-01-01 Revised:1900-01-01 Online:2007-07-21 Published:2007-07-21
  • Contact: LI Xin-li

摘要: Web上越来越多的论文给我们提出了一个新的课题:如何检索满足需求的论文。传统的基于查询项匹配检索方法往往无法准确地检索出满足用户需求的论文。这里给出了一种基于概念的论文相似性检索方法,有效地改进了传统的论文检索方法。介绍了一种对论文关键词进行层次聚类的算法,首先把论文关键词聚类为概念,从而生成一个概念树,然后用概念向量表示论文,每篇论文对应一个概念子树。在相似性检索时,采用改进的余弦相似性方法,根据概念向量计算论文的相似性,把与给定论文最相似的论文返回给用户。用这种算法,能很好地对论文进行基于概念的相似性检索。算法克服了基于查询项匹配检索的缺点,实验证明其有较高的查全率和查准率。

关键词: 论文检索, 层次聚类, 概念树, 相似性检索

Abstract: The growing number of theses accessible on the web raises a new and challenging search problem:locating desired theses.Traditional keyword search is insufficient:the specific thesis users require is possible not captured.We describe the algorithms of hierarchical clustering.With this algorithm,we clustering the keywords into a concept tree,then we turn every thesis into an induced tree.we propose a new method for theses retrieval,which based on concept similarity.This method improves in recall and precision.An experimental study shows the high recall and precision of our algorithms.

Key words: these retrieval, hierarchical clustering, concept tree, similarity search