Computer Engineering and Applications ›› 2019, Vol. 55 ›› Issue (7): 23-29.DOI: 10.3778/j.issn.1002-8331.1811-0210

Previous Articles     Next Articles

Named Entity Recognition for New Energy Vehicles Based on Active MCNN-SCRF

MA Jianhong, ZHANG Bingfei, ZHANG Shaoguang, LIU Shuangyao   

  1. School of Artificial Intelligence, Hebei University of Technology, Tianjin 300401, China
  • Online:2019-04-01 Published:2019-04-15

基于主动MCNN-SCRF的新能源汽车命名实体识别

马建红,张炳斐,张少光,刘双耀   

  1. 河北工业大学 人工智能与数据科学学院,天津 300401

Abstract: New energy vehicles Named Entity Recognition(NER) is challenged by implicit words boundary, rich unregistered words and lack of labeled dataset, resulting for low identification precision and recall. This paper presents a NER model based Multiple Channel Neural Network(MCNN), which incorporates features of characters, words and segments. The model does not regard the NER task as a sequence labelling problem, but uses Semi-Markov CRF(SCRF) to learn the effective segment-level representation and contextual information and then assign tags to the segments. In response to the lack of labeled dataset, this model uses an active query strategy based on uncertainty and density to select unlabeled data for future training. The strategy proposed makes the data with representative information and uncertainty have much higher selection opportunity and improves the system learning ability effectively. Experiments show that this model can improve the precision and recall while reducing manual annotation efforts greatly.

Key words: new energy vehicle NER, deep learning, Semi-Markov CRF(SCRF), segment feature, active learning

摘要: 新能源汽车命名实体存在实体边界模糊,多为未登录词,现存标注样本较少等问题,识别精确率和召回率较低。据此,提出了一种基于多通道神经网络(Multiple Channel Neural Network,MCNN)的新能源汽车实体识别模型,该模型融合了字词特征和片段特征,不再将实体识别当作传统的序列标注任务,利用半马尔科夫条件随机场(Semi-Markov CRF,SCRF)针对片段特征建模,对输入的句子切分片段并对片段整体分配标记,同时完成实体边界的识别和实体分类,弥补了传统字词序列标注模型采用局部标记区分实体边界的不足。为解决现存标注样本较少的问题,在训练模型的过程中,引入了一种基于不确定性和相似度相结合的主动学习(Active Learning,AL)。通过多组对比实验表明,该模型在大幅度减少人工标注量的同时,能够提高识别精确率和召回率。

关键词: 新能源汽车命名实体识别, 深度学习, 半马尔可夫条件随机场, 片段特征, 主动学习