Computer Engineering and Applications ›› 2023, Vol. 59 ›› Issue (22): 144-150.DOI: 10.3778/j.issn.1002-8331.2208-0298

• Pattern Recognition and Artificial Intelligence • Previous Articles     Next Articles

Predicate Head Identification Based on Boundary Regression

GUO Xiao, CHEN Yanping, TANG Ruixue, HUANG Ruizhang, QIN Yongbin   

  1. 1.State Key Laboratory of Public Big Data, Guizhou University, Guiyang 550025, China
    2.College of Computer Science and Technology, Guizhou University, Guiyang 550025, China
    3.College of Information, Guizhou University of Finance and Economics, Guiyang 550025, China
  • Online:2023-11-15 Published:2023-11-15

边界回归的谓语中心词识别

郭晓,陈艳平,唐瑞雪,黄瑞章,秦永彬   

  1. 1.贵州大学 公共大数据国家重点实验室,贵阳 550025
    2.贵州大学 计算机科学与技术学院,贵阳 550025
    3.贵州财经大学 信息学院,贵阳 550025

Abstract: In Chinese, the identification of predicate head is the key to understand sentence, plays an important part in analyzing sentence structure. Then, with loose structure in Chinese, the identification of predicate head is a hard nut in information processing. Because there is merely one predicate head in a sentence, large negative samples are generated by enumeration span, which gives rise to unbalances in positive and negative samples. In addition, the same context is shared by both the predicate head and the highly overlapping negative samples, so similar semanteme is easy to cause identification errors. To solve the above problems, this paper presents a method of predicate head identification based on boundary regression. Firstly, the boundary of the predicate head is identified. Then, a span is created by identifying the boundary, which helps to reduce the number of span negative samples and the computational amount. Secondly, by updating the same position of the span as the predicate head in the sentence, the accuracy of the span boundary is improved. Additionally, the unique predicate head is output through adding the constraint strategy. Experimental results show that the [F] value of the model reaches 84.41%, which verifies the effectiveness of the model in identifying predicate head.

Key words: predicate, central words, span, boundary regression

摘要: 识别谓语中心词是理解句子的关键,对于分析汉语结构具有重要意义。汉语结构松散导致谓语中心词识别困难,成为中文信息处理中的难点问题。由于单个句子中只有一个谓语中心词,枚举跨度将会产生大量负样本,导致正负样本不平衡。谓语中心词及高度重叠的负例样本之间共享相同的上下文,语义相近,容易产生误报。为了解决这些问题,提出一种基于边界回归的谓语中心词识别方法。首先识别谓语中心词的边界,然后通过边界组合生成跨度,从而减少跨度负样本的数量并且降低计算的复杂度。通过边界回归模块,更新跨度在句子中相当于谓语中心词的位置,提高跨度边界的准确性。通过增加约束策略,输出唯一的谓语中心词。实验结果显示,该模型的[F]值达到了84.41%,验证了该模型识别谓语中心词的有效性。

关键词: 谓语, 中心词, 跨度, 边界回归