Computer Engineering and Applications ›› 2009, Vol. 45 ›› Issue (16): 40-42.DOI: 10.3778/j.issn.1002-8331.2009.16.010

• 博士论坛 • Previous Articles     Next Articles

Identifying cross-clause arguments based on statistics and rules

CHEN Li-jiang,CHEN Xiao-he   

  1. School of Chinese Language and Culture,Nanjing Normal University,Nanjing 210097,China
  • Received:2009-02-27 Revised:2009-03-31 Online:2009-06-01 Published:2009-06-01
  • Contact: CHEN Li-jiang

统计和规则结合识别动词的跨分句论元

陈丽江,陈小荷   

  1. 南京师范大学 文学院,南京 210097
  • 通讯作者: 陈丽江

Abstract: Different from European languages,Chinese sentences often contain several clauses.But the up-to-date corpora and systems for Chinese semantic role labeling do not place much emphasis on this trait of modern Chinese.Because of data-sparse problem,people do not have a method to identify the arguments that are not in the same clause with the verb.This paper combines statistical method and rule method to identify the cross-clause arguments.First authors use a basic rule to identify a majority of the arguments,then find the weak spot of rule and use the statistic decision tree to construct the model including many attributes.The experimental results show that the basic rule can achieve the F-score of 65.3%.And the F-score is improved to 67.2% when using statistic decision tree.

Key words: semantic role labeling, cross-clause, argument, statistic decision tree

摘要: 与印欧语言不同,汉语的句子往往是由多个分句组成的复句。但目前的中文语义角色的标注语料和标注系统并没有对现代汉语的这个特点给予充分的重视。由于数据稀疏的问题,对于与动词跨分句的论元还没有一个有效的识别方法,直接影响了汉语真实文本语义角色标注的研究。运用统计和规则结合的方法,对与动词跨分句的论元进行识别。先用一条基本的规则识别出大部分的动词的论元,再找到规则识别的薄弱点,运用统计决策树融合多种特征构造模型,以进一步提高识别的准确率。实验结果表明,对于与动词的跨分句的论元,仅仅规则识别的F值就达到了65.3%,使用决策树后,F值提高到67.2%。

关键词: 语义角色标注, 跨分句, 论元, 统计决策树