计算机工程与应用 ›› 2007, Vol. 43 ›› Issue (11): 181-184.

• 数据库与信息处理 • 上一篇    下一篇

Web信息检索结构化排序函数与标引词加权技术

赵正文 康耀红   

  1. 海南大学信息科学技术学院 海南大学信息科学技术学院
  • 收稿日期:2006-05-10 修回日期:1900-01-01 出版日期:2007-04-11 发布日期:2007-04-11
  • 通讯作者: 赵正文

Survey on Structured Ranking Function and Term Weighting Technology of Web Information Retrieval

Zhao Zhengwen YaoHong Kang   

  • Received:2006-05-10 Revised:1900-01-01 Online:2007-04-11 Published:2007-04-11
  • Contact: Zhao Zhengwen

摘要: 本文分析了当前Web信息检索的技术现状,指出检索效率不高的根本原因在于搜索引擎所采用的排序函数和标引词加权技术。介绍了传统的信息检索排序函数和标引词加权技术。分析了Web文档的特点,指出其主要形式HTML文档是一种结构化文档,结构由标签显式的定义,不同文档结构对检索性能的贡献不同。对本领域国内外学者的成果作了对比研究。最后探讨了Web信息检索排序函数及标引词加权技术的发展方向。

Abstract: Current technological status of Web information retrieval (IR) analyzed, we point out the root of its inefficiency is the ranking function and term weighting algorithms searching engine adopted. Then classic IR ranking function and term weighting technologies are introduced. Characters of Web documents studied, the fact is most of them are HTML documents, a kind of structured documents. Its structure is defined explicitly by predefined HTML tags, which has different importance and influence on the performance of search engine. The studies of researchers on structures of HTML documents introduced, that is, making use of the peculiarity of Web documents to extend classic ranking function and term weighting technology to a structured one. Finally we discussed the development trend of these technologies mentioned above.