计算机工程与应用 ›› 2010, Vol. 46 ›› Issue (30): 8-10.DOI: 10.3778/j.issn.1002-8331.2010.30.003

• 博士论坛 • 上一篇    下一篇

现代汉语“V+N”序列关系的平行处理

冯敏萱   

  1. 南京师范大学 文学院,南京 210097
  • 收稿日期:2010-05-10 修回日期:2010-09-06 出版日期:2010-10-21 发布日期:2010-10-21
  • 通讯作者: 冯敏萱

Parallel processing of contemporary Chinese “V+N” sequence relations

FENG Min-xuan   

  1. School of Chinese Language and Literature,Nanjing Normal University,Nanjing 210097,China
  • Received:2010-05-10 Revised:2010-09-06 Online:2010-10-21 Published:2010-10-21
  • Contact: FENG Min-xuan

摘要: 目前,在英汉平行语料中,对汉语文本的深加工多局限于只利用单语分析的成果,没有充分利用双语资源。以现代汉语v+n序列的结构关系为研究对象,设计出在英汉平行语料中识别v+n结构关系的平行处理算法:首先利用各种单语资源,提取出构成不同结构关系的动词和名词相互间的制约规则,再分别依据v+n中汉语名词、动词的语义在英语译文中的具体形式及上下文模板来判断v+n的结构关系类型。实验证明,在自动分词和词性标注的PCCE1000文本中,v+n单语处理的F值为72.14%,而进一步利用汉英词典和英语译文信息,F值到达了88.81%,提高了16.67个百分点。

关键词: 平行语料, 词语搭配, 短语分析, 自动识别, 中文信息处理

Abstract: At present,the Chinese text processing in English-Chinese parallel corpus,more confined to only use monolingual analysis results,without sufficient use bilingual resources.Structural relation of contemporary Chinese v+n sequence is regarded as the research object,and the parallel processing algorithm is designed for recognizing v+n structural relation in English-Chinese parallel corpus.At first,this paper utilizes various form single language resources to extract the restriction rules of verb and noun that having different structural relations.And then judges v+n structural relation type separately according to translation of Chinese noun and verb,and context template in parallel English text.The experiment proves,in PCCE1000 which having been word-segmented and POS-tagged,F value that using single language resources to process v+n is 72.14%,and further utilizing the Chinese-English dictionary and English translation information,F value has reached 88.81%,having improved by 16.67 percentage points.

Key words: parallel corpus, collocation, phrase analysis, automatic recognition, Chinese information processing

中图分类号: