计算机工程与应用 ›› 2017, Vol. 53 ›› Issue (19): 257-264.DOI: 10.3778/j.issn.1002-8331.1611-0192

• 工程与应用 • 上一篇    下一篇

网站简约本体垂直搜索系统的设计与实现

杨和平1,陈  瑜2,3,张志强1   

  1. 1.国家气象信息中心 资料服务室,北京 100081
    2.中国农业科学院 植物保护研究所,北京 100193
    3.列日大学 生物技术学院,比利时 让布鲁 5030
  • 出版日期:2017-10-01 发布日期:2017-10-13

Design and implementation of Web concise ontology-base vertical search engine

YANG Heping1, CHEN Yu2, 3, ZHANG Zhiqiang1   

  1. 1.Division of Data Services, National Meteorological Information Center, Beijing 100081, China
    2.Institute of Plant Protection, Chinese Academy of Agricultural Sciences, Beijing 100193, China
    3.Gembloux Agro-Bio Technology, University of Liège, Gembloux 5030, Belgium
  • Online:2017-10-01 Published:2017-10-13

摘要: 针对单个网站构建本体库垂直搜索引擎的过程中,叙词及其间逻辑关系等收集整理所耗人力成本高,导致该技术框架虽成熟,而大多网站搜索功能仍以字符匹配为主,缺乏分词、查询扩展及结果的相关度排序,很难准确命中相关查询内容等问题,设计并开发了一套基于网站简约本体库的垂直搜索系统。该系统以中国气象数据网(http://data.cma.cn)为例,利用protégé根据网站的导航目录,构建了中国气象数据网的本体库,基于Lucene引擎构建技术框架,对本体库中的对象及网页内容分别进行分词,并构建本体对象索引库及网页索引库;前端对查询内容分词后,先在本体对象索引库中进行扩展,利用TF-IDF相关度算法计算扩展结果的相关度并排序,该值作为各扩展本体对象的权值,并将各自的权值动态赋给利用Jena二次语义分析技术扩展的对象,最后将所有带有权值的关键词在网页索引库中查询检索,计算结果相关度并排序。实验结果表明,该系统构建简便,能为用户扩展、推荐相关查询内容,提高了针对网站检索的查准率及查全率。

关键词: 本体库, 垂直搜索引擎, 语义扩展, 中国气象数据网

Abstract: As the progress is both time and effort consuming to build a Web ontology-based vertical search engine by collating the descriptors and the relation for each descriptor, it is not suitable for most of website search system but search engine. And thus, the Web retrieval system remains the character-matching search function which lacks of segmentation, semantic query expansion, ranking the results by semantic relatedness and so on. To solve those problems, a vertical search engine based on a concise ontology has been designed and implemented. Taking the case of China Meteorological Data Service Center(CMDC), firstly, a concise ontology library will be built by protégé with the list of website navigation, which is used to design a vertical search engine on the frame of Lucene. Meanwhile, the segmentation algorithm(IKanalyzer) is used for this system in the progress of indexing and searching. After that, the semantics is expanded by the semantic analysis techniques(Jena). Remarkably, the correlation degree of the semantic expansion has been calculated used as the weight value of each segmented words. This is used to rank the search result by the TF-IDF algorithm. The results show that the system can be used to expand and recommend the relative search content, and there is a great promotion of both precision and recall of results within these improvements.

Key words: ontology, vertical search engine, semantic expansion, China Meteorological Data Service Center(CMDC)