计算机工程与应用 ›› 2009, Vol. 45 ›› Issue (29): 134-136.DOI: 10.3778/j.issn.1002-8331.2009.29.040

• 数据库、信号与信息处理 • 上一篇    下一篇

藏文同元码与基本集相互转换的规则与实现

武光利1,于洪志1,柳 春1,2   

  1. 1.西北民族大学 中国民族语言文字信息技术重点实验室,兰州 730030
    2.甘肃中医学院 公共课部,兰州 730000
  • 收稿日期:2008-06-02 修回日期:2008-07-21 出版日期:2009-10-11 发布日期:2009-10-11
  • 通讯作者: 武光利

Regulars and realization in code transform between Tibetan Tongyuan codes and component sets

WU Guang-li1,YU Hong-zhi1,LIU Chun1,2

  

  1. 1.State Key Lab. of National Languages Information Technology,China,Northwest University for Nationalities,Lanzhou 730030,China
    2.Department of Public Course,Gansu College of Traditional Chinese Medicine,Lanzhou 730000,China
  • Received:2008-06-02 Revised:2008-07-21 Online:2009-10-11 Published:2009-10-11
  • Contact: WU Guang-li

摘要: 在当今的计算机信息处理过程中,不同文字处理平台上相同字符的不同编码问题,即文字处理的不兼容,是一个亟待解决的重要问题。而在藏文信息处理的研究中,藏文的编码转换也是一个研究热点。藏文的文本、网站大多采用同元编码方式,而微软的Vista操作系统采用的是基本集的编码方式,所以两种编码的转换在藏文信息处理领域是非常重要的。主要介绍了藏文同元编码与基本集的相互转换技术,采用了将藏文按照拉丁转写拆分的方法,利用层数作为藏文同元编码字符结构与基本集编码字符结构的桥梁,通过一系列规则,实现了两种编码的相互转换。

关键词: 藏文, 拉丁转写, 同元编码, 基本集, 编码转换

Abstract: Nowadays,in the processing course of computer information,the problem of using different codes to stand for the same characters on different characters processing platform,that is to say,the non-compatible of characters processing is a main problem to be settled.Well,in the research of Tibetan information processing,the research of Tibetan codes transforming is a hot point.Most Tibetan texts and websites use the Tongyuan codes while the Vista OS of Microsoft uses component sets.Therefore,in the field of Tibetan information processing,the codes transforming between these two is rather important.This paper mainly talks about transformational technique between Tibetan Tongyuan codes and component sets.The method of splitting Tibetan characters using Latin transliteration is taken.Tiers are taken as the bridge of Tibetan Tongyuan codes character structure and component set character structure,using a set of rules,to accomplish the transform of these two codes.

Key words: Tibetan, Latin transliteration, Tongyuan code, component set, code transform

中图分类号: