计算机工程与应用 ›› 2008, Vol. 44 ›› Issue (4): 172-175.

• 数据库与信息处理 • 上一篇    下一篇

一种基于Lucene改进的全文检索工具包

宋 佳1,2,诸云强1,刘润达1,2   

  1. 1.中国科学院 地理科学与资源研究所,北京 100101
    2.中国科学院 研究生院,北京 100039
  • 收稿日期:2007-05-25 修回日期:2007-05-25 出版日期:2008-02-01 发布日期:2008-02-01
  • 通讯作者: 宋 佳

Enhanced full text retrieval kit based on Lucene

SONG Jia1,2,ZHU Yun-qiang1,LIU Run-da1,2   

  1. 1.Institute of Geographical Sciences and Natural Resources Research,Chinese Academy of Sciences,Beijing 100101,China
    2.Graduate University of Chinese Academy of Sciences,Beijing 100039,China
  • Received:2007-05-25 Revised:2007-05-25 Online:2008-02-01 Published:2008-02-01
  • Contact: SONG Jia

摘要: 基于Lucene实现了一种改进的全文检索引擎工具包ELucene。它引入了索引配置文件,可针对不同应用背景来灵活定制索引的细节;提供了定时自动更新索引的功能;通过动态多态机制实现了支持多种索引数据源的功能;ELucene内部设计了引擎基础对象类,并以静态对象的方式运行来避免频繁读取索引文件带来的性能损失。面向检索,提供了检索请求类和检索响应类来分别封装用户的查询要求和查询结果集,并设计实现了一些实用的查询输入和输出处理的方法。基于ELucene的元数据搜索系统已成功应用到“国家科学数据共享工程:地球系统科学数据共享网”中。

关键词: Lucene, ELucene, 搜索引擎, 检索, 索引, 数据共享

Abstract: ELucene,an enhanced full text retrieval kit based on Lucene,is illustrated in this paper.The details of indexing can be customized via index configuration file according to different applications in ELucene.The function of periodic indices update is provideds,and the support to kinds of data sources for indexing based on polymorphism is implemented.Class foundation objects in ELucene,which runs as static object mode,is designed for avoiding performance loss arising from frequent the disk indices reading.In searching process,the search request object and the search response object,in which some applied methods related to query input and query output are implemented,and provided for respectively encapsulating query requirements and query results.An application example of ELucene is given at the end of the paper.

Key words: Lucene, ELucene, search engine, retrieval, index, data sharing