计算机工程与应用 ›› 2012, Vol. 48 ›› Issue (16): 129-132.

• 数据库、信号与信息处理 • 上一篇    下一篇

三元组可比语料库自动剖析技术研究与应用

袁  琦,肖  健,宋金平,朱  姝,万  缨,许  亮   

  1. 中国电子信息产业发展研究院,北京 100044
  • 出版日期:2012-06-01 发布日期:2012-06-01

Research and application of automatic profiling technologies based on 3-tuple comparable corpora

YUAN Qi, XIAO Jian, SONG Jinping, ZHU Shu, WAN Ying, XU Liang   

  1. China Center for Information Industry Development, Beijing 100044, China
  • Online:2012-06-01 Published:2012-06-01

摘要: 国内外基于语料库的翻译研究主要集中在翻译共性、翻译规范、译者风格和翻译培训等涉及翻译理论和翻译实践方面的研究;提出的基于三元组可比语料库的自动语言剖析技术扩大了该研究领域的内涵,使其包括面向自然语言处理的应用研究。从工程可实现性考虑,创新性地提出了建造三元组可比语料库,利用n-元词串、关键词簇和语义多词表达等自动抽取技术,通过对比中式英语表达,发掘英语本族语言模型,实现改进和发展机器翻译、跨语言信息检索等自然语言处理应用的目标。

关键词: 基于语料库的翻译研究, 三元组可比语料库, 自动语言剖析, n元词串

Abstract: At present, the Corpus-Based Translation Studies(CBTS) at home and abroad mainly focus on the studies of translation universals, translation norms, translator’s style and translation training, which involve studies in both translation theory and practice. The research and application of the automatic language profiling technologies proposed in this paper expand CBTS’s scientific connotation to include the NLP-oriented research and application. Based on the consideration of project feasibility, this paper puts forward the building of 3-tuple comparable corpora and apply the automatic extracting technologies like n-grams, key clusters and semantic multi-word expressions to exploit the English native language models, so as to achieve improvement and further development of natural language processing applications such as machine translation and cross-language information retrieval.

Key words: Corpus-Based Translation Studies(CBTS), 3-tuple comparable corpora, automatic language profiling, n-grams