计算机工程与应用 ›› 2012, Vol. 48 ›› Issue (33): 85-88.

• 研发、设计、测试 • 上一篇    下一篇

面向地学信息领域垂直搜索引擎设计与实现

张思发,马永格   

  1. 中国地质大学 计算机学院,武汉 430074
  • 出版日期:2012-11-21 发布日期:2012-11-20

Design and implementation of vertical search engine for field of geosciences

ZHANG Sifa, MA Yongge   

  1. School of Computer Science, China University of Geosciences, Wuhan 430074, China
  • Online:2012-11-21 Published:2012-11-20

摘要: 垂直搜索引擎是搜索引擎领域的行业化分工,根据地学信息领域的行业特征、整体需求及其工作流程,在Nutch开源搜索引擎平台上添加了“庖丁解牛”中文分词算法、主题相关度评分算法、“主题词管理”选项等技术,建立了基于网络蜘蛛模型的面向地学信息领域的垂直搜索引擎。经过测试及结果比较,该系统相对于通用搜索引擎有明显的优势,使地学信息的定位和查找更加准确。该系统具有良好的扩展性和通用性,对垂直搜索引擎的研究和开发具有一定的借鉴作用。

关键词: 地学信息领域, 垂直搜索引擎, Nutch, 中文分词, 页面排序, 主题词管理

Abstract: Vertical search engines are the industrial division of comprehensive search engines, and in this paper, according to the industry characteristics, the overall demand and the workflow of geosciences field, the “Paodingjieniu” Chinese word segmentation algorithm, the subject-correlation judgment function and the “Subject Management” option are added to the Nutch system, thus establishing the vertical search engine for geosciences. The online test and result comparison show that this system has obvious advantages compared with universal search engines, making it more accurate to locate and search geo-information. Besides, the system has good extensibility and versatility, providing some reference to the vertical search engine research and development.

Key words: geo-information, vertical search engines, Nutch, Chinese word segmentation, page ranking, subject management