计算机工程与应用 ›› 2023, Vol. 59 ›› Issue (5): 289-296.DOI: 10.3778/j.issn.1002-8331.2109-0492

• 工程与应用 • 上一篇    下一篇

汉语V+V序列关系识别研究

李胜男,曲维光,魏庭新,周俊生,顾彦慧,李斌   

  1. 1.南京师范大学 计算机与电子信息学院/人工智能学院,南京 210023
    2.南京师范大学 文学院,南京 210097
    3.南京师范大学 国际文化教育学院,南京 210097
  • 出版日期:2023-03-01 发布日期:2023-03-01

Research on Chinese V+V Sequence Relation Recognition

LI Shengnan, QU Weiguang, WEI Tingxin, ZHOU Junsheng, GU Yanhui, LI Bin   

  1. 1.School of Computer and Electronic Information/School of Artificial Intelligence, Nanjing Normal University, Nanjing 210023, China
    2.School of Chinese Language and Literature, Nanjing Normal University, Nanjing 210097, China
    3.International College for Chinese Studies, Nanjing Normal University, Nanjing 210097, China
  • Online:2023-03-01 Published:2023-03-01

摘要: “V+V”是现代汉语中的常见结构,能够形成兼语、连动等多种完全不同的句法结构,给句法和语义解析造成困难。针对“V+V”形成的句法结构类型和序列关系识别问题,设计并制定了一套语料库标注规范,以解决语料库中存在的“V+V”结构的嵌套标注问题,并据此构建起一个包含5?381个兼语句子、7?987个连动句子,以及1?212个兼语连动嵌套句子的“V+V”语料库。提出一个基于BiLSTM-CRF和多头注意力机制的模型,能够同时识别结构中的多个动词和名词的句法、语义角色。相比于以往只研究单项识别兼语或者连动结构,该模型不仅可以同时识别兼语结构、连动结构,还可以解决兼语连动嵌套结构的识别问题。实验结果表明:该方法能够很好地解决“V+V”序列关系的识别问题,在测试集语料上达到92.12%的F1值。

关键词: V+V序列关系, 连动结构, 兼语结构, 中文抽象语义表示

Abstract: “V+V” is one of the most common structures in modern Chinese. Due to the fact that noun and verb bear various semantic roles, many different types of grammatical structures such as serial verb structures and concurrent structures can be formed by “V+V”, which causes difficulties in syntactic and semantic parsing. To identify the syntactic types and sequential relations entailed in the structure, it firstly constructs a “V+V” corpus according to the designed nested structure annotation specification, which contains 5?381 concurrent sentences, 7?987 serial verb sentences and 1?212 concurrent serial verb nested sentences, then it proposes a model based on BiLSTM-CRF and multi-head attention to identity the structure’s grammatical type and the semantic types of its components. A unified framework is designed to identify the concurrent structures and serial verb structures. Besides, it can identify the nested structures which has not been addressed in previous works. The experimental results on the constructed corpus show that the proposed model can achieve better performance and the F1 value reaches 92.12%.

Key words: V+V sequential relations, serial verb structures, concurrent structures, Chinese abstract meaning representation