Tibetan corpus processing method

Computer Engineering and Applications ›› 2011, Vol. 47 ›› Issue (6): 138-139.

• 数据库、信号与信息处理 • Previous Articles Next Articles

Tibetan corpus processing method

CAI Rangjia

Research Center of Tibetan Information，Qinghai Normal University，Xining 810008，China

Received:1900-01-01 Revised:1900-01-01 Online:2011-02-21 Published:2011-02-21

藏语语料库加工方法研究

才让加

青海师范大学藏文信息研究中心，西宁 810008

Abstract

Abstract: In order to make the Tibetan corpus standardization，unity，practicability and to improve the overall level of processing.The multifarious Tibetan corpus in the processing part should be arranged and unified，which can get high quality of raw corpora.Then the processing units of Tibetan ancillary facilities for segmentation is determined，the Tibetan language syntax category and Tibetan words corpus are put forward，and participles tagging dictionary is set up based on the selection in the words of Tibetan categorize and statistics.The Tibetan automatic word segmentation tagging is designed and carried.The large-scale Tibetan corpus is segmented and labeled by using participle labeling software.The multilevel processing of Tibetan Corpus is implemented.

Key words: Tibetan corpus, norms, lexicon, mark sets, dictionary, participle labeling

摘要： 为了使藏语语料库具有规范性、统一性和实用性，提高加工的整体水平，在藏语语料库的加工过程中首先要对五花八门的藏语语料库进行整理和统一，得到高质量的原始语料库，其次确定藏语原料库加工的切分单位，针对藏语的语法特征提出藏语语料库藏语词语类别和词类标记集，同时在对藏语词语进行归类和统计的基础上建立分词标注词典库，设计并实现藏文自动分词标注软件，利用分词标注软件对大规模藏语语料库进行切分和标注，最终实现藏语语料库的多级加工。

关键词: 藏语语料库, 规范, 词类, 标记集, 词典, 分词标注

CAI Rangjia. Tibetan corpus processing method[J]. Computer Engineering and Applications, 2011, 47(6): 138-139.

才让加. 藏语语料库加工方法研究[J]. 计算机工程与应用, 2011, 47(6): 138-139.

[1]	WANG Ziru, LI Zhenmin. Transferable Dictionary Learning Fused Data Augmentation [J]. Computer Engineering and Applications, 2021, 57(23): 193-199.
[2]	DING Yuxiang, BIAN Weixin, JIE Biao, ZHAO Jun. Super-Resolution Image Reconstruction Based on Neighborhood Regression and Sparse Representation [J]. Computer Engineering and Applications, 2021, 57(2): 230-236.
[3]	CHEN Di, CHENG Lang, WANG Zhifeng, XIONG Jinpeng, ZHANG Yuru, LI Gaozan. Sentiment Analysis for Web Forum：Status, Challenges and Trends [J]. Computer Engineering and Applications, 2021, 57(17): 17-28.
[4]	WANG Ting, YANG Wenzhong. Review of Text Sentiment Analysis Methods [J]. Computer Engineering and Applications, 2021, 57(12): 11-24.
[5]	WANG Yu, LIU Fan, WANG Fei. Autoencoder Based Sparse Representation for Single Sample Face Recognition [J]. Computer Engineering and Applications, 2021, 57(1): 168-172.
[6]	DONG Yanhua, ZHANG Shumei, ZHAO Junli. Review of Occlusion Face Recognition Method [J]. Computer Engineering and Applications, 2020, 56(9): 1-12.
[7]	LI Qiao, CHEN Huazhu, YANG Chunyu, LI Dan. Discriminative Analysis Dictionary and Classifier Learning for Pattern Classification [J]. Computer Engineering and Applications, 2020, 56(6): 165-171.
[8]	XU Ge, YANG Xiaoyan, WANG Tao. Survey on Semantic Similarity Calculation of Words [J]. Computer Engineering and Applications, 2020, 56(4): 9-15.
[9]	DAI Qianlong, SUN Wei. Vehicle Identification Based on Improved Sparse Stack Coding [J]. Computer Engineering and Applications, 2020, 56(1): 136-141.
[10]	LIANG Yuying. Selection of Data Products Based on Probabilistic Hesitant Fuzzy Information Aggregation Algorithm [J]. Computer Engineering and Applications, 2019, 55(3): 219-224.
[11]	ZHANG Kaibing, ZHENG Dongdong, JING Junfeng. Survey of Low-Resolution Face Recognition [J]. Computer Engineering and Applications, 2019, 55(22): 14-24.
[12]	ZHANG Kaibing, WANG Zhen, YAN Yadi, ZHU Danni. Optimized Regression-Based Image Super-Resolution Method via AdaBoost [J]. Computer Engineering and Applications, 2019, 55(20): 159-163.
[13]	MA Lihong, TAN Xueshi. Zero-Shot Classification with Manifold Regularization Graph Based on Common Structure Assumption [J]. Computer Engineering and Applications, 2019, 55(15): 153-160.
[14]	ZHANG Zheyuan, ZHANG Ling, CHEN Yunhua. Facial Expression Recognition Combined with Block LBP and Projective Dictionary Pair Learning [J]. Computer Engineering and Applications, 2019, 55(12): 149-154.
[15]	NIE Dongdong, HE Yueyue. Fast face recognition based on dictionary expansion [J]. Computer Engineering and Applications, 2018, 54(8): 201-206.

Tibetan corpus processing method

藏语语料库加工方法研究

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics