Computer Engineering and Applications ›› 2012, Vol. 48 ›› Issue (31): 134-139.

Previous Articles     Next Articles

Parallel corpus retrieval technology research

CHENG Nanchang1,2, HOU Min3   

  1. 1.College of Liberal Arts, Communication University of China, Beijing 100024, China
    2.Department of Chinese, Baise University, Baise, Guangxi 533000, China
    3.Broadcast Media Language Branch, Communication University of China, Beijing 100024, China
  • Online:2012-11-01 Published:2012-10-30

平行语料检索技术研究

程南昌1,2,侯  敏3   

  1. 1.中国传媒大学 文学院,北京 100024
    2.百色学院 中文系,广西 百色 533000
    3.中国传媒大学 有声媒体语言分中心,北京 100024

Abstract: Parallel corpus retrieval technology is discussed in the light of CUC_ParaConc, Communication University of China’s parallel corpus retrieval software. On the basis of the alignment of the plain text corpus form, how to store and read parallel data is prsented, bilingual and multilingual keyword retrieval technology is illustrated. Parallel corpus retrieval can be conducted on either “one-to-one” technique or “one-to-many” technique. As for “one-to-one” technique, Chinese-English parallel data are employed to expound and compare non-phonetic corpus retrieval technology for Chinese and phonetic corpus retrieval technology for English. Special attention has been given to a multi-lingual keyword search technology in “one-to-many” parallel corpus retrieval.

Key words: parallel corpus, retrieval, bilingual, multilingual

摘要: 以中国传媒大学平行语料检索软件(CUC_ParaConc)为例论述平行语料检索技术,主要以纯文本形式的对齐语料为例进行阐述,包括平行语料的存储、读取技术以及双语、多语关键词检索技术。平行语料检索可分为“一对一”与“一对多”两种形式。在一对一平行语料检索中,以汉英平行语料为例分别论述了以汉语为对象的非拼音文字语料的检索技术,以英语为对象的拼音文字语料检索技术,对两者的异同进行了对比;在一对多平行语料检索中,重点论述了多语关键词检索技术。

关键词: 平行语料, 检索, 双语, 多语