Weighted fusion of texture features based Central Asian multi-scripts identification

doi:10.3778/j.issn.1002-8331.1701-0305

Abstract

Abstract: Many similar shaped scripts are used all over the world today, script identification with similar shaped characters is difficult task in pattern recognition area, and it is one of the urgently solved problems. However, there are a few reports for identification of Central Asian countries and Chinese Minority scripts, especially for scripts with similar shaped characters. In this paper, firstly, two multi-script document image databases are established, which are including 1, 600 and 2, 200 plain text document images respectively in 11 scripts such as English, Chinese, Russian, Mongol, Arabic, Tibet, Uyghur, Turkish, Uzbekistan, Tajikistan and Kazakhstan. Then, six texture features such as mean, standard deviation, entropy, consistency, third order moment and smoothness are extracted from whole page image respectively, and they are classified using seven different kinds of classifiers. On the basis of finding the sensitivity of each feature for the document image, it is determined the optimal weights suitable for identification of central Asian multilingual scripts after the weighted fusion method is used to extract the fusion features. Finally, they are classified by using different classifier via multi- features weighted coefficient fusion, and it is obtained 99.38% and 95.42% of average identification rate with the two established dataset separately. Experimental results indicate that texture features and weighted fusion texture features can better describe the multi-script document images, and they can effectively classify these 11 kinds scripts mentioned above.

Key words: script identification, texture feature, discriminant analysis, Mahalanobis distance, weighted fusion

摘要： 全球各地目前使用很多种相似的文种，相似文种的识别是模式识别领域内难点并迫切需要解决的问题之一。然而，针对中亚文种文本文档和少数民族文种也就是相似文种分类识别方面的文献报道几乎没有。首先建立了两个多文种文档图像数据库，分别有1 600幅和2 200幅纯文本整篇文档图像，包含英文，汉文，俄文，蒙文，阿拉伯文，藏文，维吾尔文，土耳其文，乌兹别克文，塔吉克文和哈萨克文等共有11种文档图像。其次分别提取文档图像的均值，标准差，熵，一致性，三阶矩，平滑度等六个纹理特征，利用不同7种分类器分类。在找到各个特征对多文种文本文档图像的灵敏度的基础上，采用加权特征融合方法提取融合特征，确定了适合中亚多文种文档图像识别的最佳权值。最后用不同分类器分类识别，通过多特征以系数加权融合之后，以建立的两个数据库基础下获得平均的识别率分别为99.38%和95.69%。实验结果表明，提取的纹理特征和加权融合的纹理特征能较好地描述文档图像特征，并且它们可以有效地分类以上所述的11个文种。

关键词: 文种识别, 纹理特征, 判别分析, 马氏距离, 加权融合

Buvajar Mijit1, Kurban Ubul1, Nurbiya Yadikar1, Tuergen Yibulayin1, Alimjan Aysa2. Weighted fusion of texture features based Central Asian multi-scripts identification[J]. Computer Engineering and Applications, 2017, 53(20): 187-194.

布阿加姑丽·米吉提1，库尔班·吾布力1，努尔毕亚·亚地卡尔1，吐尔根·依不拉因1，阿力木江·艾沙2. 纹理特征加权融合的中亚多文种文档图像文种识别[J]. 计算机工程与应用, 2017, 53(20): 187-194.

[1]	CHEN Mingyue, LIU Sanyang. Application of Adaptive Manifold Learning in Fault Diagnosis [J]. Computer Engineering and Applications, 2021, 57(3): 247-252.
[2]	AN Weichao, YAN Ting, ZHANG Nan, ZHANG Shan, XIANG Jie, CAO Rui, WANG Bin. Application of Pathological Image Texture Analysis in MSI Prediction of Gastric Cancer [J]. Computer Engineering and Applications, 2021, 57(24): 205-211.
[3]	SONG Lili, LI Bin, ZHAO Junya, LIU Guofeng. Normality Resampling of Improved Metric Learning Method for Person Re-Identification [J]. Computer Engineering and Applications, 2020, 56(8): 158-165.
[4]	HUANG Guangjun, DENG Yuanlong. Polarizer Visual Defect Detection and Classification Based on Improved LBP and SVM Algorithm [J]. Computer Engineering and Applications, 2020, 56(22): 251-255.
[5]	CHEN Xi, DAO Erji, LI Yunlan, WEI Li, XIA Daoxun, XIONG Xiangguang. Multi-directional Local Phase Quantization Pattern for Texture Classification [J]. Computer Engineering and Applications, 2020, 56(13): 216-222.
[6]	SHI Kai1，2, NIE Fuqiang1, SUN Feng2. Research on Algorithm of Nonparametric Kernel Density for Discriminant Analysis of Multidimensional Data [J]. Computer Engineering and Applications, 2019, 55(6): 8-12.
[7]	DONG Xiwei1，2, WANG Yuwei3, ZHOU Jun1. Robust Multi-View Collaboration Intact Discriminant Subspace Learning Algorithm [J]. Computer Engineering and Applications, 2019, 55(3): 108-114.
[8]	DING Hua, WANG Xiaodong, ZHANG Lianjun, CHEN Xiao’ai, LAI Peixia. Image Saliency Detection Based on SLIC Fusion Texture and Histogram [J]. Computer Engineering and Applications, 2019, 55(3): 159-166.
[9]	WU Baorong, QIANG Yan, WANG Sanhu, TANG Xiaoxian, LIU Xijing. Fusing Multi-Dimensional Convolution Neural Network for Lung Nodules Classification [J]. Computer Engineering and Applications, 2019, 55(24): 171-177.
[10]	HUANG Dongmei, ZHANG Xiaotong, ZHANG Minghua, SONG Wei. Global Discriminant and Local Sparse Preserving Semi-Supervised Feature Extraction for HSI [J]. Computer Engineering and Applications, 2019, 55(20): 184-191.
[11]	XU Jingze, WU Zuohong, XU Yan, ZENG Jianhang. Face Recognition Based on PCA，LDA and SVM Algorithms [J]. Computer Engineering and Applications, 2019, 55(18): 34-37.
[12]	LIU Qinghua1, LAI Yuping2, DING Hongwei1, YANG Zhijun1, Cui Xiaolong3. Protein Subcellular Localization Prediction Based on SVM [J]. Computer Engineering and Applications, 2019, 55(11): 136-141.
[13]	YU Jingli, HU Enliang, ZHANG Tao. Study of Fisher linear discriminant analysis based on [L1]-norm [J]. Computer Engineering and Applications, 2018, 54(4): 128-134.
[14]	SUN Xianming, FAN Xiaoguang, ZHUO Zhenfu, CONG Wei, CHEN Shaohua. New method of two classes feature extraction based on kernel linear discriminant analysis following integrated time and frequency domains [J]. Computer Engineering and Applications, 2018, 54(3): 115-119.
[15]	FENG Wenbin1, LIU Baohua2. Research on image matching based on improved SIFT algorithm [J]. Computer Engineering and Applications, 2018, 54(3): 200-205.

Weighted fusion of texture features based Central Asian multi-scripts identification

纹理特征加权融合的中亚多文种文档图像文种识别

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics