Machine learning based Uyghur language text categorization

Computer Engineering and Applications ›› 2012, Vol. 48 ›› Issue (5): 110-112.

• 数据库、信号与信息处理 • Previous Articles Next Articles

Machine learning based Uyghur language text categorization

Alimjan AYSA1，2, Turgun IBRAHIM2, Hasan OMAR2, Marhaba ALI2

1.Modern Education Technology Center, Xinjiang University, Urumqi 830046, China
2.College of Information Science and Engineering, Xinjiang University, Urumqi 830046, China

Received:1900-01-01 Revised:1900-01-01 Online:2012-02-11 Published:2012-02-11

基于机器学习的维吾尔文文本分类研究

阿力木江·艾沙1，2，吐尔根·依布拉音2，艾山·吾买尔2，马尔哈巴·艾力2

1.新疆大学现代教育技术中心，乌鲁木齐 830046
2.新疆大学信息科学与工程学院，乌鲁木齐 830046

Abstract

Abstract: With the rapid increase of Uyghur language text information on the Internet, Uyghur language text categorization has become a key technique for processing and organizing these text data. As to the high dimensionality of Uyghur language texts under vector space model representation, the stemming technique is used along with IG to reduce the dimensionality. The categorization experiments are performed using machine learning based text categorization algorithms such as Na?ve Bayes and kNN on Uyghur language text corpus and the experimental results are analyzed.

Key words: text categorization, Na?ve Bayes, k-Nearest Neighbor（kNN）, Uyghur language, feature selection

摘要： 随着Internet上维吾尔文信息的迅速发展，维吾尔文文本分类成为处理和组织这些大量文本数据的关键技术。研究维吾尔文文本分类相关技术和方法，针对维吾尔文文本在向量空间模型（VSM）表示下的高维性，采用词干提取和IG相结合的方法对表示空间进行降维。采用基于机器学习的分类算法（kNN和Na?ve Bayes）对维吾尔文文本语料进行了分类实验并分析了实验结果。

关键词: 文本分类, 朴素贝叶斯方法, k-最近邻方法（kNN）, 维吾尔语, 特征选择

Alimjan AYSA1，2, Turgun IBRAHIM2, Hasan OMAR2, Marhaba ALI2. Machine learning based Uyghur language text categorization[J]. Computer Engineering and Applications, 2012, 48(5): 110-112.

阿力木江·艾沙1，2，吐尔根·依布拉音2，艾山·吾买尔2，马尔哈巴·艾力2. 基于机器学习的维吾尔文文本分类研究[J]. 计算机工程与应用, 2012, 48(5): 110-112.

[1]	LI Li, JI Xinyuan, SONG Song. Prediction Model for Number of Software Defects in Loop [J]. Computer Engineering and Applications, 2021, 57(7): 158-163.
[2]	LI Jingxing, YANG Youlong. Feature Selection of Markov Blanket for High Dimensional Data [J]. Computer Engineering and Applications, 2021, 57(6): 58-66.
[3]	LEI Henglin, Gulanbaier Tuerhong, Mairidan Wushouer, ZHANG Dongmei. Review of Novelty Detection [J]. Computer Engineering and Applications, 2021, 57(5): 47-55.
[4]	LIN Weixing, WANG Yujia, CHEN Wanfen, LIANG Haina. High-Dimensional Data Feature Selection Algorithm Based on Multifactor Particle Swarm Optimization [J]. Computer Engineering and Applications, 2021, 57(22): 199-207.
[5]	LI Longzhu, LIN Yaojin, LYU Yan, LU Shun, WANG Chenxi. Online Streaming Feature Selection Algorithm Using Neighborhood Information Interaction [J]. Computer Engineering and Applications, 2021, 57(21): 102-108.
[6]	CHEN Qianru, LI Yali, XU Kequan, LIU Yilong, WANG Shuqin. WKNN Feature Selection Method Based on Self-Tuning Adaptive Genetic Algorithm [J]. Computer Engineering and Applications, 2021, 57(20): 164-171.
[7]	WU Weijie, ZHANG Jingxiang. Random Forest Feature Selection Algorithm Based on Categorization Information and Application [J]. Computer Engineering and Applications, 2021, 57(17): 147-156.
[8]	QIU Yunfei, GAO Huacong. Hybrid Filter and Improved Adaptive GA for Feature Selection [J]. Computer Engineering and Applications, 2021, 57(11): 95-102.
[9]	SHEN Yanguang, JIA Yaoqing. Text Categorization Method Based on Word Co-occurrence and Graph Convolution [J]. Computer Engineering and Applications, 2021, 57(11): 173-178.
[10]	HUO Lin, LU Yinli. Improved Particle Swarm Optimization for Android Malware Detection [J]. Computer Engineering and Applications, 2020, 56(7): 96-101.
[11]	LIAO Wenxiong, ZENG Bi, LIANG Tiankai, XU Yayun, ZHAO Junfeng. Personal Credit Risk Assessment Method for High-Dimensional Data [J]. Computer Engineering and Applications, 2020, 56(4): 219-224.
[12]	PENG Ming, ZHANG Haipeng. Unsupervised Feature Selection via Schatten-p Norm and Feature Self-Representation [J]. Computer Engineering and Applications, 2020, 56(23): 45-52.
[13]	LIU Feng, Godfred Kim Mensah, LI Xinyun, LIU Hongli, LI Yao, GUO Hao. Abnormal Topological Analysis and Classification Research of Uncertain Brain Networks [J]. Computer Engineering and Applications, 2020, 56(2): 127-132.
[14]	YUE Peng, HOU Lingyan, YANG Dali, TONG Qiang. XLC-Stacking Method for Disease Diagnosis Based on XGBoost Feature Selection [J]. Computer Engineering and Applications, 2020, 56(17): 136-141.
[15]	HUANG Xin, MO Haimiao, ZHAO Zhigang, ZENG Min. Research on Discrete Enhanced Fireworks Algorithm and [kNN] in Feature Selection [J]. Computer Engineering and Applications, 2020, 56(16): 112-117.

Machine learning based Uyghur language text categorization

基于机器学习的维吾尔文文本分类研究

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics