Computer Engineering and Applications ›› 2007, Vol. 43 ›› Issue (34): 174-176.

• 数据库与信息处理 • Previous Articles     Next Articles

Research on crawling Hidden Web based on heuristic query selection algorithm

YAO Quan-zhu,YANG Zeng-hui,ZHANG Nan,TIAN Yuan   

  1. School of Computer Science,Xi’an University of Technology,Xi’an 710048,China
  • Received:1900-01-01 Revised:1900-01-01 Online:2007-12-01 Published:2007-12-01
  • Contact: YAO Quan-zhu

基于启发式查询词选择算法的Hidden Web获取研究

姚全珠,杨增辉,张 楠,田 元   

  1. 西安理工大学 计算机学院,西安 710048
  • 通讯作者: 姚全珠

Abstract: Because of the hidden feature,Hidden Web is hard to crawl.It becomes a new direction in the field of information retrieval.In this paper a new method of Hidden Web information retrieval is proposed.It presents a generic operational model of the Hidden Web information retrieval and describes the key techniques.It introduces a new heuristic query selection algorithm which designed by this paper.Based on this technique,the crawling is more efficient.Experiments show the effectiveness of both the model and the algorithm.

Key words: information retrieval, Hidden Web, crawler, heuristic algorithm

摘要: Hidden Web因为其隐蔽性而难以直接抓取,因此成为信息检索研究的一个新领域。提出了一种获取Hidden Web信息的方法,讨论了实现的关键技术。通过设计提出的启发式查询词选择算法,提高了抓取的效率。实验证明了该模型和算法的有效性。

关键词: 信息检索, Hidden Web, 爬虫, 启发式算法