Research on Chinese V+V Sequence Relation Recognition

doi:10.3778/j.issn.1002-8331.2109-0492

Abstract

Abstract: “V+V” is one of the most common structures in modern Chinese. Due to the fact that noun and verb bear various semantic roles, many different types of grammatical structures such as serial verb structures and concurrent structures can be formed by “V+V”, which causes difficulties in syntactic and semantic parsing. To identify the syntactic types and sequential relations entailed in the structure, it firstly constructs a “V+V” corpus according to the designed nested structure annotation specification, which contains 5?381 concurrent sentences, 7?987 serial verb sentences and 1?212 concurrent serial verb nested sentences, then it proposes a model based on BiLSTM-CRF and multi-head attention to identity the structure’s grammatical type and the semantic types of its components. A unified framework is designed to identify the concurrent structures and serial verb structures. Besides, it can identify the nested structures which has not been addressed in previous works. The experimental results on the constructed corpus show that the proposed model can achieve better performance and the F1 value reaches 92.12%.

Key words: V+V sequential relations, serial verb structures, concurrent structures, Chinese abstract meaning representation

摘要： “V+V”是现代汉语中的常见结构，能够形成兼语、连动等多种完全不同的句法结构，给句法和语义解析造成困难。针对“V+V”形成的句法结构类型和序列关系识别问题，设计并制定了一套语料库标注规范，以解决语料库中存在的“V+V”结构的嵌套标注问题，并据此构建起一个包含5?381个兼语句子、7?987个连动句子，以及1?212个兼语连动嵌套句子的“V+V”语料库。提出一个基于BiLSTM-CRF和多头注意力机制的模型，能够同时识别结构中的多个动词和名词的句法、语义角色。相比于以往只研究单项识别兼语或者连动结构，该模型不仅可以同时识别兼语结构、连动结构，还可以解决兼语连动嵌套结构的识别问题。实验结果表明：该方法能够很好地解决“V+V”序列关系的识别问题，在测试集语料上达到92.12%的F1值。

关键词: V+V序列关系, 连动结构, 兼语结构, 中文抽象语义表示

LI Shengnan, QU Weiguang, WEI Tingxin, ZHOU Junsheng, GU Yanhui, LI Bin. Research on Chinese V+V Sequence Relation Recognition[J]. Computer Engineering and Applications, 2023, 59(5): 289-296.

李胜男, 曲维光, 魏庭新, 周俊生, 顾彦慧, 李斌. 汉语V+V序列关系识别研究[J]. 计算机工程与应用, 2023, 59(5): 289-296.

References

[1] 曲维光，周俊生，吴晓东，等.自然语言句子抽象语义表示AMR研究综述[J].数据采集与处理，2017，32（1）：26-36.
QU W G，ZHOU J S，WU X D，et al.Survey on abstract meaning representation[J].Journal of Data Acquisition and Processing，2017，32（1）：26-36.
[2] 戴茹冰，侍冰清，李斌，等.基于AMR语料库的汉语省略与论元共享现象考察[J].外语研究，2020，37（2）：16-23.
DAI R B，SHI B Q，LI B，et al.An investigation of Chinese ellipsis and argument sharing based on AMR corpus[J].Foreign Language Research，2020，37（2）：16-23.
[3] 周强.汉语句法树库标注体系[J].中文信息学报，2004，18（4）：1-8.
ZHOU Q.Annotation scheme for Chinese treebank[J].Journal of Chinese Information Processing，2004，18（4）：1-8.
[4] 陈静，王东波，谢靖，等.基于条件随机场的兼语结构自动识别[J].情报科学，2012，30（3）：439-443.
CHEN J，WANG D B，XIE J，et al.Automatic identification of concurrent structure based on conditional random field[J].Information Science，2012，30（3）：439-443.
[5] 杨红.从词组本位观看连动结构的特殊性[J].湖北大学学报（哲学社会科学版），2017，44（5）：144-150.
YANG H.On the particularity of serial structure from the perspective of phrase standard[J].Journal of Hubei University（Philosophy and Social Science），2017，44（5）：144-150.
[6] 李斌，闻媛，宋丽，等.融合概念对齐信息的中文AMR语料库的构建[J].中文信息学报，2017，31（6）：93-102.
LI B，WEN Y，SONG L，et al.Construction of Chinese abstract meaning representation corpus with concept-to-word alignment[J].Journal of Chinese Information Processing，2017，31（6）：93-102.
[7] 侯文惠，曲维光，魏庭新，等.面向中文AMR标注体系的兼语语料库构建及兼语结构识别[J].清华大学学报（自然科学版），2021，61（9）：920-926.
HOU W H，QU W G，WEI T X，et al.Construction of a concurrent corpus for a Chinese AMR annotation system and recognition of concurrent structures[J].Journal of Tsinghua University（Sciences and Technology），2021，61（9）：920-926.
[8] SUN C，QU W G，WEI T X，et al.Recognition of serial-verb sentences based on neural network[C]//Proceedings of the 19th Chinese National Conference on Computational Linguistics，Hainan，China，2020：172-182.
[9] 傅成宏.现代汉语兼语结构的自动识别[D].南京：南京师范大学，2007.
FU C H.Automatic recognition of subjective-object structure in modern Chinese[D].Nanjing：Nanjing Normal University，2007.
[10] 刘雯旻，张晓如.一种基于规则和统计的连动句识别方法[J].电子设计工程，2017，25（22）：18-22.
LIU W M，ZHANG X R.A method based on rules and statistic for serial-verb sentence recognition[J].Electronic Design Engineering，2017，25（22）：18-22.
[11] XIA C，ZHANG C，YANG T，et al.Multi-grained named entity recognition[C]//The 57th Annual Meeting of the Association for Computational Linguistics，Florence，Italy，2019：1430-1440.
[12] STRAKOVA J，STRAKA M，HAJI? J.Neural architectures for nested NER through linearization[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics，Florence，Italy，2019：5326-5331.
[13] LI X Y，FENG J R，MENG Y X，et al.A unified MRC framework for named entity recognition[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics，Seattle，USA，2020：5849-5859.
[14] VASWANI A，SHAZEER N，PARMAR N，et al.Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems，New York，USA，2017：6000-6010.
[15] WANG Y，LI J，LYU M，et al.Cross-media key phrase prediction：a unified framework with multi-modality multi-head attention and image wordings[C]//Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing，2020：3311-3324.
[16] 胡艳霞，王成，李弼程，等.基于多头注意力机制Tree-LSTM的句子语义相似度计算[J].中文信息学报，2020，34（3）：23-33.
HU Y X，WANG C，LI B C，et al.Sentence semantic similarity computation based on tree-LSTM with multi-head attention[J].Journal of Chinese Information Processing，2020，34（3）：23-33.
[17] SUN Z，HUANG S，WEI H R，et al.Generating diverse translation by manipulating multi-head attention[C]//Proceedings of the 34th AAAI Conference on Artificial Intelligence，New York，USA，2020：8976-8983.