Computer Engineering and Applications ›› 2010, Vol. 46 ›› Issue (30): 8-10.DOI: 10.3778/j.issn.1002-8331.2010.30.003
• 博士论坛 • Previous Articles Next Articles
FENG Min-xuan
Received:
Revised:
Online:
Published:
Contact:
冯敏萱
通讯作者:
Abstract: At present,the Chinese text processing in English-Chinese parallel corpus,more confined to only use monolingual analysis results,without sufficient use bilingual resources.Structural relation of contemporary Chinese v+n sequence is regarded as the research object,and the parallel processing algorithm is designed for recognizing v+n structural relation in English-Chinese parallel corpus.At first,this paper utilizes various form single language resources to extract the restriction rules of verb and noun that having different structural relations.And then judges v+n structural relation type separately according to translation of Chinese noun and verb,and context template in parallel English text.The experiment proves,in PCCE1000 which having been word-segmented and POS-tagged,F value that using single language resources to process v+n is 72.14%,and further utilizing the Chinese-English dictionary and English translation information,F value has reached 88.81%,having improved by 16.67 percentage points.
Key words: parallel corpus, collocation, phrase analysis, automatic recognition, Chinese information processing
摘要: 目前,在英汉平行语料中,对汉语文本的深加工多局限于只利用单语分析的成果,没有充分利用双语资源。以现代汉语v+n序列的结构关系为研究对象,设计出在英汉平行语料中识别v+n结构关系的平行处理算法:首先利用各种单语资源,提取出构成不同结构关系的动词和名词相互间的制约规则,再分别依据v+n中汉语名词、动词的语义在英语译文中的具体形式及上下文模板来判断v+n的结构关系类型。实验证明,在自动分词和词性标注的PCCE1000文本中,v+n单语处理的F值为72.14%,而进一步利用汉英词典和英语译文信息,F值到达了88.81%,提高了16.67个百分点。
关键词: 平行语料, 词语搭配, 短语分析, 自动识别, 中文信息处理
CLC Number:
TP391.1
FENG Min-xuan. Parallel processing of contemporary Chinese “V+N” sequence relations[J]. Computer Engineering and Applications, 2010, 46(30): 8-10.
冯敏萱. 现代汉语“V+N”序列关系的平行处理[J]. 计算机工程与应用, 2010, 46(30): 8-10.
0 / Recommend
Add to citation manager EndNote|Ris|BibTeX
URL: http://cea.ceaj.org/EN/10.3778/j.issn.1002-8331.2010.30.003
http://cea.ceaj.org/EN/Y2010/V46/I30/8