边界回归的谓语中心词识别

doi:10.3778/j.issn.1002-8331.2208-0298

摘要/Abstract

摘要： 识别谓语中心词是理解句子的关键，对于分析汉语结构具有重要意义。汉语结构松散导致谓语中心词识别困难，成为中文信息处理中的难点问题。由于单个句子中只有一个谓语中心词，枚举跨度将会产生大量负样本，导致正负样本不平衡。谓语中心词及高度重叠的负例样本之间共享相同的上下文，语义相近，容易产生误报。为了解决这些问题，提出一种基于边界回归的谓语中心词识别方法。首先识别谓语中心词的边界，然后通过边界组合生成跨度，从而减少跨度负样本的数量并且降低计算的复杂度。通过边界回归模块，更新跨度在句子中相当于谓语中心词的位置，提高跨度边界的准确性。通过增加约束策略，输出唯一的谓语中心词。实验结果显示，该模型的[F]值达到了84.41%，验证了该模型识别谓语中心词的有效性。

关键词: 谓语, 中心词, 跨度, 边界回归

Abstract: In Chinese, the identification of predicate head is the key to understand sentence, plays an important part in analyzing sentence structure. Then, with loose structure in Chinese, the identification of predicate head is a hard nut in information processing. Because there is merely one predicate head in a sentence, large negative samples are generated by enumeration span, which gives rise to unbalances in positive and negative samples. In addition, the same context is shared by both the predicate head and the highly overlapping negative samples, so similar semanteme is easy to cause identification errors. To solve the above problems, this paper presents a method of predicate head identification based on boundary regression. Firstly, the boundary of the predicate head is identified. Then, a span is created by identifying the boundary, which helps to reduce the number of span negative samples and the computational amount. Secondly, by updating the same position of the span as the predicate head in the sentence, the accuracy of the span boundary is improved. Additionally, the unique predicate head is output through adding the constraint strategy. Experimental results show that the [F] value of the model reaches 84.41%, which verifies the effectiveness of the model in identifying predicate head.

Key words: predicate, central words, span, boundary regression

郭晓, 陈艳平, 唐瑞雪, 黄瑞章, 秦永彬. 边界回归的谓语中心词识别[J]. 计算机工程与应用, 2023, 59(22): 144-150.

GUO Xiao, CHEN Yanping, TANG Ruixue, HUANG Ruizhang, QIN Yongbin. Predicate Head Identification Based on Boundary Regression[J]. Computer Engineering and Applications, 2023, 59(22): 144-150.

参考文献

[1] 李婷，秦永彬，黄瑞章，等.基于神经网络的中文谓语动词识别研究[J].数据采集与处理，2020，35（3）：582-590.
LI T，QIN Y B，HUANG R Z，et al.A research on Chinese predicate verb recognition based on neural network[J].Data Acquisition and Processing，2020，35（3）：582-590.
[2] 李琳，赵维纳，泽旺宽卓.基于词向量特征的藏语谓语动词短语识别模型[J].电子技术与软件工程，2019（4）：242-243.
LI L，ZHAO W N，ZEWANG K Z.Tibetan predicate verb phrase recognition model based on word vector features[J].Electronic Technology & Software Engineering，2019（4）：242-243.
[3] SOHRAB M G，MIWA M.Deep exhaustive model for nested named entity recognition[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing，2018：2843-2849.
[4] LIU C，FAN H，LIU J.Handling negative samples problems in span-based nested named entity recognition[J].Neurocomputing，2022，505：353-361.
[5] YU J，JI B，LI S，et al.S-NER：a concise and efficient span-based model for named entity recognition[J].Sensors，2022，22（8）：2852.
[6] DEVLIN J，CHANG M W，LEE K，et al.BERT：pretraining of deep bidirectional transformers for language understanding[J].arXiv：1810.04805，2018.
[7] SCHUSTER M，PALIWAL K K.Bidirectional recurrent neural networks[J].IEEE Transactions on Signal Processing，1997，45（11）：2673-2681.
[8] CHEN Y，WU L，DENG L，et al.A boundary regression model for nested named entity recognition[J].arXiv：2011.
14330，2020.
[9] 李国臣，孟静.利用主语和谓语的句法关系识别谓语中心词[J].中文信息学报，2005，19（1）：1-7.
LI G C，MENG J.Predicate head identification by the correspondence between the subject and the predicate[J].Journal of Chinese Information Processing，2005，19（1）：1-7.
[10] 穗志方，俞士汶.面向EBMT的汉语单句谓语中心词识别研究[J].中文信息学报，1998，12（4）：40-47.
SUI Z F，YU S W.A EBMT model for predicate head identification in Chinese simple sentence[J].Journal of Chinese Information Processing，1998，12（4）：40-47.
[11] 张宜浩，金澎.谓词自动识别中的特征选择度量研究[J].计算机工程与科学，2012，34（9）：188-192.
ZHANG Y H，JIN P.A research on feature selection measurement for automatic predicate recognition[J].Computer Engineering and Science，2012，34（9）：188-192.
[12] 汪红林，王红玲，周国栋.语义分析中谓词标识的特征工程[J].计算机工程与应用，2010，46（9）：134-137.
WANG H L，WANG H L，ZHOU G D.Feature engineering for predicate identification and classification in semantic analysis[J].Computer Engineering and Applications，2010，46（9）：134-137.
[13] 龚小谨，罗振声，骆卫华.汉语句子谓语中心词的自动识别[J].中文信息学报，2003，17（2）：7-13.
GONG X J，LUO Z S，LUO W H.Automatic identification of predicate head in Chineses sentences[J].Journal of Chinese Information Processing，2003，17（2）：7-13.
[14] 韩磊，罗森林，潘丽敏，等.融合词法和句法特征的汉语谓词高精度识别方法[J].浙江大学学报（工学版），2014，48（12）：2107-2114.
HAN L，LUO S L，PAN L M，et al.A precise Chinese predicate recognition method combining lexical and syntactic feature[J].Journal of Zhejiang University（Engineering Edition），2014，48（12）：2107-2114.
[15] KUMAR A，SARKAR B K.A hybrid predictive model integrating C4.5 and decision table classifiers for medical data sets[J].Journal of Information Technology Research，2018，11（2）：150-167.
[16] LAFFERTY J，MCCALLUM A，PEREIRA F C N.Conditional random fields：probabilistic models for segmenting and labeling sequence data[C]//Proceedings of the 18th International Conference on Machine Learning.New York：ACM，2001：282-289.
[17] 黄瑞章，靳文繁，陈艳平，等.基于Highway-BiLSTM网络的汉语谓语中心词识别研究[J].通信学报，2021，42（1）：100-107.
HUANG R Z，JIN W F，CHEN Y P，et al.A research on Chinese predicate head word recognition based on Highway-BiLSTM network[J].Journal of Communications，2021，42（1）：100-107.
[18] SRIVASTAVA R K，GREFF K，SCHMIDHUBE J.Training very deep networks[C]//Advances in Neural Information Processing Systems 28，2015.
[19] 靳文繁.基于神经网络的中文谓语中心词识别研究[D].贵阳：贵州大学，2021.
JIN W F.A research on Chinese predicate head word recognition based on neural network[D].Guiyang：Guizhou University，2021.
[20] ROSENBLATT F.Principles of neuro dynamics perceptrons and the theory of brain mechanisms[R].Buffalo：Cornell Aeronautical Lab Inc.，1961.
[21] HENDRYCKS D，GIMPEL K.Gaussian error linear units （GELUs）[J].arXiv：1606.08415，2016.
[22] SUN L，CHEN Z，WU Q M，et al.AMPNet：average-and max-pool networks for salient object detection[J].IEEE Transactions on Circuits and Systems for Video Technology，2021，31（11）：4321-4333.
[23] GIRSHICK R.Fast R-CNN[C]//Proceedings of the 2015 IEEE International Conference on Computer Vision，2015.
[24] LIN T Y，GOYAL P，GIRSHICK R，et al.Focal loss for dense object detection[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision，2017：2980-2988.
[25] CHEN Y，JIN W，QIN Y，et al.Annotation of Chinese predicate heads and relevant elements[J].arXiv：2103.12280，2021.
[26] LOSHCHILOV I，HUTTER F.Fixing weight decay regularization in Adam[J].arXiv：1711.05101，2018.
[27] HUANG Z，XU W，YU K.Bidirectional LSTM-CRF models for sequence tagging[J].arXiv：1508.01991，2015.
[28] 谢腾，杨俊安，刘辉.基于BERT-BiLSTM-CRF模型的中文实体识别[J].计算机系统应用，2020，29（7）：48-55.
XIE T，YANG J A，LIU H.A BERT-BiLSTM-CRF model for Chinese entity recognition[J].Computer System Application，2020，29（7）：48-55.
[29] MIKOLOV T，CHEN K，CORRADO G，et al.Efficient estimation of word representations in vector space[J].arXiv：1301.3781，2013.