计算机工程与应用 ›› 2008, Vol. 44 ›› Issue (29): 8-10.DOI: 10.3778/j.issn.1002-8331.2008.29.002

• 博士论坛 • 上一篇    下一篇

一种基于Lucene的影片搜索引擎的研究和应用

匡振国1,2,倪 宏2,嵇智辉1,2,刘 磊1,2   

  1. 1.中国科学院 研究生院,北京 100039
    2.中国科学院声学研究所 国家网络新媒体工程技术研究中心,北京 100190
  • 收稿日期:2008-05-26 修回日期:2008-06-23 出版日期:2008-10-11 发布日期:2008-10-11
  • 通讯作者: 匡振国

Research and application of VoD search engine based on Lucene

KUANG Zhen-guo1,2,NI Hong2,JI Zhi-hui1,2,LIU Lei1,2   

  1. 1.Graduate University of Chinese Academy of Sciences,Beijing 100039,China
    2.National Network New Media Engineering Research Center,Institute of Acoustics,CAS,Beijing 100190,China
  • Received:2008-05-26 Revised:2008-06-23 Online:2008-10-11 Published:2008-10-11
  • Contact: KUANG Zhen-guo

摘要: Lucene是一个优秀的开源搜索引擎框架,已经广泛应用于信息搜索领域。分析点播门户中现有的搜索引擎存在的不足,设计一种基于双字哈希算法支持中文的分词器,并利用该分词器和Lucene工具包,设计并实现了一个视频点播影片快速搜索引擎,它不仅支持中文检索,还具有搜索速度快、易于扩展等优点。仿真实验证明提出的基于Lucene的影片搜索引擎具有良好的性能。

关键词: Lucene, 搜索引擎, 双字哈希, 中文分词, 倒排索引

Abstract: Lucene is an excellent framework for search engine of open source code,and it has been widely used in the field of information retrieval.After analyzing the disadvantage of existing search engines in VoD portal,a word segmentation method supporting Chinese based on double character hash index algorithm is designed.With the use of the word segmentation method and Lucene tool lib,a VoD Quick video search engine is implemented,which not only supports Chinese search but also has the benefits of searching fast,easy expansion and so on.The simulation results show that the video search engine based on the Lucene designed in this paper has a good performance.

Key words: Lucene, search engine, double character hash index, Chinese word segmentation, inverted index