计算机工程与应用 ›› 2018, Vol. 54 ›› Issue (19): 114-121.DOI: 10.3778/j.issn.1002-8331.1706-0190

• 模式识别与人工智能 • 上一篇    下一篇

互联网上的维语多文转换机制的设计与实现

依不拉音·吾斯曼1,张绍武2,于  凯1   

  1. 1.新疆财经大学 计算机科学与工程学院,乌鲁木齐 830012
    2.大连理工大学 电子信息与电气工程学部,辽宁 大连 116000
  • 出版日期:2018-10-01 发布日期:2018-10-19

Research and implementation of converting mechanism of multiple characters Uyghur on the Internet

Yibulayin·WUSIMAN1, ZHANG Shaowu2, YU Kai1   

  1. 1.School of Computer Science and Engineering, Xinjiang University of Finance and Economics, Urumqi 830012, China
    2.Faculty of Electronic Information and Electrical Engineering, Dalian University of  Technology, Dalian, Liaoning 116000, China
  • Online:2018-10-01 Published:2018-10-19

摘要: 近年来,随着互联网技术在新疆地区的发展和普及、微信、QQ、论坛、微博等网络交流逐渐成为新疆人民日常交流的主要方式。由于历史和地理原因,网络平台上的维吾尔语言呈现传统维文、拉丁维文、西里尔维文等多种字母体系共存的“一语多文”的特点。由于这些文字缺乏科学的对应标准、互相转换的工具等原因,造成实际使用中存在很多问题,给维吾尔网民的日常互联网使用及“一带一路”沿线国家间和居民间的沟通和交流带来不便。为此首先研究传统维文、拉丁维文及西里尔维文之间的渊源,以及三种字母目前的对应标准存在的问题和转换规则。借此提出三种字母之间的Unicode字符编码转换算法,以期解决国内外维吾尔人间的在线文字交流困难的问题,进而实现维文搜索引擎系统中使用后两种文字的信息检索。通过实验验证了所提的LUTC和CUTC转换算法的字符编码转换效率有明显提升,拉丁维文和西里尔维文的信息检索效果与传统维文一致。

关键词: 一语多文, 网络交流, 多文转换, 拉丁维文, 西里尔维文

Abstract: As the development and increasing popularity of the Internet technology in Xinjiang area in recent years, online communication such as We Chat and QQ have become more and more significant. But due to historical and geographical reasons, the Uyghur on the Internet has displayed a “One Language, Multiple Characters” characteristic, i.e., the co-existence of multiple character systems of the Old-Uyghur Alphabet, the Latin-Uyghur Alphabet, and the Cyrillic-Uyghur Alphabet. There is a lack of reasonable correspondence standard among these character systems and there is not an effective conversion tool. This has brought many problems to real world applications and greatly impaired the “One Belt and One Road” strategy of the nation. This paper investigates the origin and the current situation of the problem, as well as the correspondence standard and its problem in use. Based on the investigation, it discusses the deficiency of the correspondence standard between Uyghur and Latin-Uyghur, and provides guidance for improvement. Moreover, this paper also suggests a way of Latin-Uyghur and Cyrillic-Uyghur information retrieval implemented on a Uyghur search engine, and a way of mutual conversion among Latin-Uyghur, Cyrillic-Uyghur, and Old-Uygur.

Key words: one language multiple characters, network communication, converting of multiple characters, Latin-Uygur, Cyrillic-Uyghur