Computer Engineering and Applications ›› 2019, Vol. 55 ›› Issue (23): 163-169.DOI: 10.3778/j.issn.1002-8331.1809-0240

Previous Articles     Next Articles

Chinese-Thai Bilingual Name Alignment with Merging Name Knowledge Distribution Characteristics

ZHANG Jinpeng, SU Jiao, YANG Bei, ZHANG Zhan   

  1. 1.Center of Information Management, Yunnan University of Finance and Economics, Kunming 650221, China
    2.School of International Languages and Cultures, Yunnan University of Finance and Economics, Kunming 650221, China
    3.School of Information Engineering, Wuchang University of Technology, Wuhan 430223, China
    4.School of Data and Compute Science, Sun Yat-sen University, Guangzhou 510006, China
  • Online:2019-12-01 Published:2019-12-11

融合人名知识分布特征的汉泰双语人名对齐

张金鹏,苏姣,杨蓓,张占   

  1. 1.云南财经大学 信息管理中心,昆明 650221
    2.云南财经大学 国际语言文化学院,昆明 650221
    3.武昌理工学院 信息工程学院,武汉 430223
    4.中山大学 数据科学与计算机学院,广州 510006

Abstract: The study of bilingual name alignment method directly affects the effect of cross-language information processing. For the pronunciation of Chinese is quite different from Thai, and the resources of Chinese-Thai bilingual corpus are limited, and the present transliteration bilingual name alignment model based on statistics is not enough to solve those problems, this paper proposes a method which based on transliteration features, merges the similarity of the name knowledge distribution characteristics. Firstly, it calculates the similarity characteristics of bilingual name transliteration. Then the similarity of knowledge distribution characteristics between Chinese and Thai names is calculated by Chi-square test and others. Support vector machine is used to learn two features of translation of Chinese-Thai personal names to generate personal names translation pair classifier,the alignment results are generated by optimizing the classifier classification results. Experimental results show that this method has also achieved better results, even if bilateral people’s pronunciation is quite difference and lacking of bilingual corpus resources.

Key words: Chinese, Thai, bilingual name alignment, name knowledge distribution, adjusting and optimizing classification results

摘要: 双语人名对齐方法研究直接影响到跨语言信息处理的效果,由于泰语与汉语的发音差异大,汉泰双语平行语料库资源有限,基于统计的音译人名对齐模型难以解决汉泰双语人名对齐问题,提出一种在音译特征基础上融合人名知识分布特征相似性的汉泰双语人名对齐方法。计算双语人名音译相似度特征,通过卡方检验等计算汉语人名与泰语人名的知识分布相似度特征,借助支持向量机学习汉泰人名翻译对的两种特征生成人名翻译对分类器,对分类器分类结果调优生成对齐结果。实验结果表明该方法在汉泰人名发音差异大和缺少双语语料资源支持的情况下取得了较好效果。

关键词: 汉语, 泰语, 双语人名对齐, 人名知识分布, 分类结果调优