Computer Engineering and Applications ›› 2018, Vol. 54 ›› Issue (13): 252-257.DOI: 10.3778/j.issn.1002-8331.1702-0183

Previous Articles     Next Articles

Research on domain ontology concept acquisition method based on Latent Dirichlet Allocation

WANG Hong, ZHANG Hao, SHI Jinchuan   

  1. School of Computer Science and Technology, Civil Aviation University of China, Tianjin 300300, China
  • Online:2018-07-01 Published:2018-07-17

基于LDA的领域本体概念获取方法研究

王  红,张  昊,史金钏   

  1. 中国民航大学 计算机学院,天津 300300

Abstract: Aiming at the automatic updating of the ontology of emergency management of civil aviation emergencies, a method of domain ontology based on LDA(Latent Dirichlet Allocation) is proposed. Based on the text information as the data source, the NLPIR adaptive word segmentation and filtering method are used to obtain the candidate term set. The LDA theme model of the domain ontology is designed, and the LDA model training is carried out by Gibbs sampling and the topic is deduced, which realizes the core concept of the domain ontology. The semantic relation recognition rule construction method is studied based on the probability distribution of LDA subject, and the recognition and realization process of semantic relation of concept and its related term are given. The experimental results show that this method can effectively solve the problem of automatic updating of large-scale domain ontology concept, and provides good data support for the sharing and reasoning of cross - media information of civil aviation emergencies in large data environment.

Key words: civil aviation emergencies, text information, domain ontology, concept acquisition, Latent Dirichlet Allocation(LDA)

摘要: 针对民航突发事件应急管理领域本体的自动更新问题,提出了基于LDA的领域本体概念获取方法。以文本信息作为数据源,采用NLPIR自适应分词与过滤方法获取候选术语集,设计了领域本体的LDA主题模型,通过吉布斯采样进行LDA模型训练与主题推断,实现了领域本体核心概念的相关术语提取;基于LDA主题概率分布研究了语义关系识别规则的构建方法,给出了概念及其相关术语语义关系的识别与实现过程。实验效果表明,该方法可以有效解决大规模领域本体概念的自动更新问题,为大数据环境下民航突发事件跨媒体信息的共享与推理提供了良好的数据支持。

关键词: 民航突发事件, 文本信息, 领域本体, 概念获取, LDA模型