Optimal proportion of training data for Chinese coreference resolution

Abstract

Abstract: Most of the Chinese coreference resolution systems are based on supervised machine learning, proportion of positive and negative examples in the training data set greatly affects classifier performance. To determine the proportion of positive and negative examples in the system, a Chinese coreference resolution is implemented, a mathematical model of proportion of training data and evaluation of system is proposed, applying an improved genetic algorithm to solve the optimization model. Evaluation on ACE 2005 Chinese corpus shows that the improved algorithm is more effective and better performance can be achieved by increasing the negative examples.

Key words: coreference resolution, training data, Genetic Algorithm（GA）

摘要： 已有的中文指代消解系统研究大多是基于有监督的机器学习方法，训练集中正负例的比值直接影响到分类器模型，进而影响指代消解结果。针对如何选取训练集正负例比值的问题，实现了一个中文指代消解系统，提出了训练数据正负例比值与指代消解系统评测结果之间的数学模型，并引入一种改进的遗传算法计算训练数据最优比值，使系统评测结果最优。在ACE 2005中文语料上的实验表明，改进的遗传算法更适合指代消解任务，适当增大负例的比值能够提高指代消解系统的性能。

关键词: 指代消解, 训练数据, 遗传算法

YAN Han, LIU Juan, ZHOU Xuanyu. Optimal proportion of training data for Chinese coreference resolution[J]. Computer Engineering and Applications, 2016, 52(17): 140-145.

颜晗，刘娟，周炫余. 面向中文指代消解的最优样本比例研究[J]. 计算机工程与应用, 2016, 52(17): 140-145.

[1]	WU Congcong, HE Yichao, ZHAO Jianli. New Genetic Algorithm for Discounted {0-1} Knapsack Problem [J]. Computer Engineering and Applications, 2020, 56(7): 57-66.
[2]	QIN Qin, LIANG Chengji. Optimization of Equipment Coordination Scheduling with Considering Buffer Space in Automated Container Terminal [J]. Computer Engineering and Applications, 2020, 56(6): 262-270.
[3]	LIANG Chengji, YU Jian. Research on Dispatching of Container Terminals Quay Crane Scheduling with Interference Constraints [J]. Computer Engineering and Applications, 2020, 56(10): 273-278.
[4]	DONG Hao1, ZHANG Haiping2, LI Zhongjin1, LIU Hui1. Computation Offloading for Service Workflow in Mobile Edge Computing [J]. Computer Engineering and Applications, 2019, 55(2): 36-43.
[5]	XUE Meng, JIANG Shujuan, WANG Rongcun. Systematic review of test data generation based on intelligent optimization algorithm [J]. Computer Engineering and Applications, 2018, 54(17): 16-23.
[6]	LOU Gaoxiang, CAI Zongyan, LIU Qingtao. Application of new hybrid algorithm in mixed model assembly scheduling [J]. Computer Engineering and Applications, 2018, 54(16): 254-259.
[7]	WANG Xinyue, GUO Jianquan. Subsidy strategy model of multi-period closed-loop hybrid system under carbon tax [J]. Computer Engineering and Applications, 2018, 54(14): 236-240.
[8]	LUO Xianglong1，2, ZHANG Shengrui1, NIU Liyao2. Short-term traffic flow prediction based on detector optimal selection [J]. Computer Engineering and Applications, 2017, 53(8): 199-202.
[9]	LIANG Chengji, WU Yu. Simultaneous berth and quay crane scheduling under uncertainty environments in container terminals [J]. Computer Engineering and Applications, 2017, 53(7): 212-219.
[10]	CAO Xiaoning, XUE Han, WANG Yongming. Optimization for control strategy of serial production system with machine breakdowns [J]. Computer Engineering and Applications, 2017, 53(16): 199-204.
[11]	CUI Yongfeng, ZHOU Dingding. Analysis of optimal scheduling model in variable speed environment at night [J]. Computer Engineering and Applications, 2016, 52(9): 239-242.
[12]	XU Xian1，2, LU Xianling1，2, WANG Hongbin1，2. Accelerometer data feature selection for activity recognition based on GA optimization [J]. Computer Engineering and Applications, 2016, 52(6): 139-143.
[13]	LU Yan1, HUI Qiaojuan2. Improved framework for 3D face feature points extraction method based on statistic deformable model [J]. Computer Engineering and Applications, 2016, 52(24): 166-170.
[14]	WANG Lucai, CAO Pengxia, JIANG Xiaolong. Improved method for speech endpoint detection with noise [J]. Computer Engineering and Applications, 2016, 52(15): 162-167.
[15]	LIANG Yanjie, YANG Mingshun, GAO Xinqin, BA Li, LEI Fengdan. Multi-objective optimizing model for solving mixed model shop of fabrication and assembly [J]. Computer Engineering and Applications, 2016, 52(10): 247-253.

Optimal proportion of training data for Chinese coreference resolution

面向中文指代消解的最优样本比例研究

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics