Computer Engineering and Applications ›› 2024, Vol. 60 ›› Issue (2): 121-128.DOI: 10.3778/j.issn.1002-8331.2208-0345

• Pattern Recognition and Artificial Intelligence • Previous Articles     Next Articles

Enhanced Cascading Recognition with Positional Labels for Chinese Medicine Named Entity

WANG Xuyang, ZHAO Lijie, ZHANG Jiyuan   

  1. School of Computer and Communication, Lanzhou University of Technology, Lanzhou 730050, China
  • Online:2024-01-15 Published:2024-01-15

位置标签增强的中文医学命名实体级联识别

王旭阳,赵丽婕,张继远   

  1. 兰州理工大学 计算机与通信学院,兰州 730050

Abstract: Aiming at the problems that named entity recognition methods in the general field cannot be directly used for the recognition of Chinese medical professional entities and existing related research only focuses on the recognition of medical entities in English text and flat structure, by studying the methods of named entities in medical field, and combined with the characteristics of Chinese medical entities, it proposes a cascade recognition method for Chinese medical entities. The position label of each character element relative to the entity is embedded into the model, and the fusion representation of the entity is carried out by combining the importance of different elements within the span of Chinese medical entities. Firstly, the position labels of characters are detected by the sequence labeling method, and then the position information of characters is used to guide the generation of candidate entities. Finally, the entity semantic classification is carried out. The model performs recognition experiments of flat entities, nested entities and discontinuous long entities on the CMeEE and CCKS2018 datasets and the Chinese diabetes research literature dataset, respectively. Experimental results show that the method can effectively identify entities with different structures in Chinese medical texts.

Key words: Chinese medical named entity, positional label embedding, entity fusion representation combining element importance, cascade recognition, linear structure

摘要: 针对一般领域的命名实体识别方法不能直接用于中文医学专业实体的识别,现有的相关研究只专注于英文文本和扁平结构的医学实体识别等问题,通过对专业领域实体识别方法的研究,结合中文医学实体的特点提出了一种面向中文医学实体的级联识别方法。将每个字符元素相对于实体的位置标签嵌入模型,并结合中文医学实体跨度内不同元素的重要程度进行实体的融合表示。通过序列标注方法检测字符的位置标签,利用字符的位置信息指导候选实体生成,并进行实体语义分类。模型在CMeEE和CCKS2018数据集以及中文糖尿病科研文献数据集上分别进行扁平实体、嵌套实体和不连续性长实体的识别实验。实验结果表明,该方法能够有效地识别中文医学文本中不同结构的实体。

关键词: 中文医学命名实体, 位置标签嵌入, 结合元素重要程度的实体融合表示, 级联识别, 线性结构