Sentence boundary detection of Uyghur based on rules and statistics

doi:10.3778/j.issn.1002-8331.2010.14.047

Computer Engineering and Applications ›› 2010, Vol. 46 ›› Issue (14): 162-165.DOI: 10.3778/j.issn.1002-8331.2010.14.047

• 数据库、信号与信息处理 • Previous Articles Next Articles

Sentence boundary detection of Uyghur based on rules and statistics

AISHAN Wumaier，TUERGEN Yibulayin

School of Information Science and Engineering，Xinjiang University，Urumqi 830046，China

Received:2009-04-22 Revised:2009-06-22 Online:2010-05-11 Published:2010-05-11
Contact: AISHAN Wumaier

统计与规则相结合的维吾尔语句子边界识别

艾山·吾买尔，吐尔根·依步拉音

新疆大学信息科学与工程学院，乌鲁木齐 830046

通讯作者: 艾山·吾买尔

Abstract

Abstract: Sentence boundary is an important initial task for many natural language processing applications，such as part-of-speech tagging and parsing etc.This paper proposes an automatic sentence boundary detection method of Uyghur based on rules and statistic.Firstly，the paragraph detecting algorithm classifies the ambiguous and unambiguous paragraph.In the second step，the rule based sentence boundary detector process the unambiguous paragraphs.Finally，the maximum entropy based sentence boundary detecting model identifies the ambiguous paragraph sentences.This method improves robustness of the method by making plenty use of rule to reduce the failure of the ME model to identify the unambiguous paragraphs which can be attributed to the sparsity of the training data used and the ME model to resolve ambiguity，the recall of this method reaches 98.77%.

Key words: Uyghur, sentence boundary detection, rule, feature extraction, maximum entropy

摘要： 句子边界识别是词性标注和句法分析等自然语言处理系统的基础问题。提出了一种统计与规则相结合的维吾尔语句子边界识别方法，首先利用歧义段落分类算法分类段落，第二步对无歧义段落进行基于规则的句子边界识别，最后使用最大熵模型对有歧义段落进行句子边界识别。该方法有效利用规则弥补最大熵模型因数据稀疏而误判不存在任何歧义情况的不足，使用最大熵模型有效地消除歧义，提高算法的鲁棒性，召回率达到了98.77%。

关键词: 维吾尔文, 句子边界识别, 规则, 特征选择, 最大熵

CLC Number:

TP391

AISHAN Wumaier，TUERGEN Yibulayin. Sentence boundary detection of Uyghur based on rules and statistics[J]. Computer Engineering and Applications, 2010, 46(14): 162-165.

艾山·吾买尔，吐尔根·依步拉音. 统计与规则相结合的维吾尔语句子边界识别[J]. 计算机工程与应用, 2010, 46(14): 162-165.

[1]	BAO Zhiqiang, XING Yu, LYU Shaoqing, HUANG Qiongdan. Improved YOLO V2 6D Object Pose Estimation Algorithm [J]. Computer Engineering and Applications, 2021, 57(9): 148-153.
[2]	XU Degang, WANG Lu, LI Fan. Review of Typical Object Detection Algorithms for Deep Learning [J]. Computer Engineering and Applications, 2021, 57(8): 10-25.
[3]	HU Wentao, CHEN Xiuhong. Low-Rank Projection Learning Based on Neighbor Graph [J]. Computer Engineering and Applications, 2021, 57(7): 209-214.
[4]	ZHANG Xiaoli, ZHANG Kuixing, JIANG Mei, WEI Benzheng, CONG Jinyu. Review of Image Classification Technology for Lymphoma [J]. Computer Engineering and Applications, 2021, 57(6): 1-9.
[5]	LIU Teng, CHEN Heng, LI Guanyu. Knowledge Graph Representation Learning Method Jointing FOL Rules [J]. Computer Engineering and Applications, 2021, 57(4): 100-107.
[6]	Hasan Wumaier, Sirajahmat Ruzmamat, Xireaili Hairela, LIU Wenqi, Tuergen Yibulayin, WANG Liejun, Wayit Abulizi. Bi-directional Uyghur-Chinese Neural Machine Translation with Marked Syllables [J]. Computer Engineering and Applications, 2021, 57(4): 161-168.
[7]	XIONG Jian, QIN Renchao, HE Mengyi, LIU Jianlan, TANG Fengyang. Application of Improved Random Forest Algorithm in Android Malware Detection [J]. Computer Engineering and Applications, 2021, 57(3): 130-136.
[8]	SONG Haonan, ZHAO Gang, WANG Xingfen. Knowledge Reasoning Method Combining Knowledge Representation with Deep Reinforcement Learning [J]. Computer Engineering and Applications, 2021, 57(19): 189-197.
[9]	LI Longlong, HE Dongjian, WANG Meili. Study of Plant Leaf Image Recognition Based on Improved Local Binary Pattern Algorithm [J]. Computer Engineering and Applications, 2021, 57(19): 228-234.
[10]	LI Jie, LI Miao, YUAN Xiguo. Detection Algorithm?of Pathogenic Microbes from Next-Generation Sequencing Data [J]. Computer Engineering and Applications, 2021, 57(19): 282-289.
[11]	GUO Hengguang, LIU Wenbiao, YU Renbo. Shape Feature Extraction Using Spike Function [J]. Computer Engineering and Applications, 2021, 57(18): 220-226.
[12]	LI Zhenqiang, WANG Shucai, ZHAO Shida, BAI Yu. Cutting Methods of Sheep’s Trunk Based on Improved DeepLabv3+ and XGBoost [J]. Computer Engineering and Applications, 2021, 57(18): 263-269.
[13]	LIU Xingchen, JIA Juncheng, ZHANG Li, HU Qinhan. Feature Concentration Network for Image Super-Resolution [J]. Computer Engineering and Applications, 2021, 57(16): 213-219.
[14]	MENG Xiaojuan, ZHANG Yueqin, HAO Xiaoli, LYU Jinlai. Multi-class Deep Convolutional Generative Adversarial Networks for Belt Tear Detection [J]. Computer Engineering and Applications, 2021, 57(16): 269-275.
[15]	TONG Wenlin, CHEN Dewang, HUANG Yunhu, LYU Yisheng. Fuzzy System Optimization Method Based on Simulated Annealing and Rule Reduction [J]. Computer Engineering and Applications, 2021, 57(16): 142-150.

Sentence boundary detection of Uyghur based on rules and statistics

统计与规则相结合的维吾尔语句子边界识别

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics