面向中文指代消解的最优样本比例研究

计算机工程与应用 ›› 2016, Vol. 52 ›› Issue (17): 140-145.

面向中文指代消解的最优样本比例研究

颜晗，刘娟，周炫余

武汉大学计算机学院，武汉 430072

出版日期:2016-09-01 发布日期:2016-09-14

Optimal proportion of training data for Chinese coreference resolution

YAN Han, LIU Juan, ZHOU Xuanyu

School of Computer, Wuhan University, Wuhan 430072, China

Online:2016-09-01 Published:2016-09-14

摘要/Abstract

摘要： 已有的中文指代消解系统研究大多是基于有监督的机器学习方法，训练集中正负例的比值直接影响到分类器模型，进而影响指代消解结果。针对如何选取训练集正负例比值的问题，实现了一个中文指代消解系统，提出了训练数据正负例比值与指代消解系统评测结果之间的数学模型，并引入一种改进的遗传算法计算训练数据最优比值，使系统评测结果最优。在ACE 2005中文语料上的实验表明，改进的遗传算法更适合指代消解任务，适当增大负例的比值能够提高指代消解系统的性能。

关键词: 指代消解, 训练数据, 遗传算法

Abstract: Most of the Chinese coreference resolution systems are based on supervised machine learning, proportion of positive and negative examples in the training data set greatly affects classifier performance. To determine the proportion of positive and negative examples in the system, a Chinese coreference resolution is implemented, a mathematical model of proportion of training data and evaluation of system is proposed, applying an improved genetic algorithm to solve the optimization model. Evaluation on ACE 2005 Chinese corpus shows that the improved algorithm is more effective and better performance can be achieved by increasing the negative examples.

Key words: coreference resolution, training data, Genetic Algorithm（GA）

颜晗，刘娟，周炫余. 面向中文指代消解的最优样本比例研究[J]. 计算机工程与应用, 2016, 52(17): 140-145.

YAN Han, LIU Juan, ZHOU Xuanyu. Optimal proportion of training data for Chinese coreference resolution[J]. Computer Engineering and Applications, 2016, 52(17): 140-145.

[1]	李昱奇，刘志乾，程凝怡，王莹莹，朱春丽. 多约束条件下无人机航迹规划[J]. 计算机工程与应用, 2021, 57(4): 225-230.
[2]	杨玮，吴莹莹，王婷. 子母式穿梭车仓储系统配置优化问题研究[J]. 计算机工程与应用, 2021, 57(4): 258-265.
[3]	李倩，蒋丽，梁昌勇. 基于模糊时间窗的多目标冷链配送优化[J]. 计算机工程与应用, 2021, 57(23): 255-262.
[4]	杜守信，毋涛. 双种群混合遗传算法的裁剪分床应用研究[J]. 计算机工程与应用, 2021, 57(22): 182-189.
[5]	曹立佳，刘洋. 制造车间自动导引车调度新进展[J]. 计算机工程与应用, 2021, 57(21): 59-67.
[6]	陈倩茹，李雅丽，许科全，刘铱龙，王淑琴. 自调优自适应遗传算法的WKNN特征选择方法[J]. 计算机工程与应用, 2021, 57(20): 164-171.
[7]	石宇强，田永政，张雨琦，石小秋. 运用含复杂网络结构的多种群遗传算法求解FJSP[J]. 计算机工程与应用, 2021, 57(2): 257-266.
[8]	代江涛，韩晓龙. 考虑作业状态能耗的集装箱码头设备协调调度[J]. 计算机工程与应用, 2021, 57(19): 290-298.
[9]	冯晓东，黄世荣，戴冠鸥，杨伟家，罗尧治. 天牛须遗传杂交算法的研究与应用[J]. 计算机工程与应用, 2021, 57(15): 90-100.
[10]	苏庆，林华智，黄剑锋，林志毅. 结合CNN和Catboost算法的恶意安卓应用检测模型[J]. 计算机工程与应用, 2021, 57(15): 140-146.
[11]	姜良重，雷航，李贞昊，钱伟中，施甘图. 采用自适应优化权重的出库货位优化方法研究[J]. 计算机工程与应用, 2021, 57(15): 271-278.
[12]	贺娇，谭代伦. 基于视野范围和遗传算法的三维地形路径规划[J]. 计算机工程与应用, 2021, 57(15): 279-285.
[13]	王秀丽，周鹏，侯静楠，王仕俊，林霞. 面向变电站机器人巡检路径规划中的算法研究[J]. 计算机工程与应用, 2021, 57(14): 245-250.
[14]	陈元文. MapReduce技术在物资调运与配载问题中的应用[J]. 计算机工程与应用, 2021, 57(12): 273-278.
[15]	邱云飞，高华聪. 混合Filter与改进自适应GA的特征选择方法[J]. 计算机工程与应用, 2021, 57(11): 95-102.