计算机工程与应用 ›› 2011, Vol. 47 ›› Issue (33): 126-128.

• 数据库、信号与信息处理 • 上一篇    下一篇

用于网页相关性判断的聚焦查询文摘算法研究

蒋效宇   

  1. 北京服装学院 商学院,北京 100029
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2011-11-21 发布日期:2011-11-21

Study on query-focused summary algorithm for Web pages relevance judgment

JIANG Xiaoyu   

  1. Business School,Beijing Institute of Fashion Technology,Beijing 100029,China
  • Received:1900-01-01 Revised:1900-01-01 Online:2011-11-21 Published:2011-11-21

摘要: 为了进一步提高网页相关性判断的速度和准确率,提出了一种新的用于聚焦文摘的句子权重计算方法。在查询返回的结果集的基础上,通过计算关键词间的互信息,对输入的查询语句进行短语识别;利用网页文本中的标签信息,对网页结构进行分析,并将关键词短语和网页结构等信息融入句子权重计算。实验结果表明,基于该算法生成的查询摘要在相关性判断的速度和准确率等方面均优于现有方法。

关键词: 信息检索, 聚焦查询文摘, 相关性判断, 短语识别

Abstract: In order to improve the speed and accuracy of relevance judgment,a new sentence scoring algorithm for query-focused summary is proposed.According to the results set from search engines,Mutual Information(MI) between keywords is used to identify phrases in query input;useful information for the summary is mined from the structure of Web pages based on the HTML tags.The improved algorithm incorporates weighted query phrases with the structure information into the sentence scoring.Preliminary experimental results show that the augmented algorithm performs better in the speed and accuracy of relevance judgment than the existing methods.

Key words: information retrieval, query-focused summary, relevance judgment, phrase recognition