Computer Engineering and Applications ›› 2015, Vol. 51 ›› Issue (18): 120-125.

### Chinese comma classification based on segmentation and part of speech tagging

GU Jingjing, ZHOU Guodong

1. School of Computer Science & Technology, Soochow University, Suzhou, Jiangsu 215006, China
• Online:2015-09-15 Published:2015-10-13

### 基于分词与词性标注的汉语逗号自动分类

1. 苏州大学 计算机科学与技术学院，江苏 苏州 215006

Abstract: In recent years,  punctuation as an important part of discourse is attracting more and more attention of the researchers. However, most methods are based on syntactic analysis. Research of Chinese comma classification using the surface information of Chinese sentences does not exist. This paper proposes a method for Chinese comma classification based on segmentation and part-of-speech tagging and adopts two supervised machine learning classifiers, namely the maximum entropy classifier and CRF classifier, to complete the  automatic classification of commas. Experimental results on the CTB 6.0 corpus show that CRF model is better than maximum entropy model, and the accuracy of the two classifiers are very close to the method based on syntactic analysis. It demonstrates that the method for Chinese comma classification based on segmentation and part-of-speech tagging is feasible.