Computer Engineering and Applications ›› 2013, Vol. 49 ›› Issue (3): 202-206.

Previous Articles     Next Articles

Analysis and extraction of structural features of off-line handwritten characters based on principal curves algorithm

MA Chi1,2, YU Miao3   

  1. 1.School of Economics and Management, University of Science and Technology Beijing, Beijing 100083, China
    2.School of Software Engineering, University of Science and Technology Liaoning, Anshan, Liaoning 114051, China
    3.Anshan Meteorological Bureau of Liaoning Province, Anshan, Liaoning 114051, China
  • Online:2013-02-01 Published:2013-02-18

基于主曲线算法的手写字符特征分析与提取

马  驰1,2,于  淼3   

  1. 1.北京科技大学 经济管理学院,北京 100083
    2.辽宁科技大学 软件学院,辽宁 鞍山 114051
    3.辽宁鞍山气象局,辽宁 鞍山 114051

Abstract: Extraction and choice of features are critical factor to improving the recognition rate of off-line handwritten characters. Principal curves are nonlinear generalizations of principal components analysis. They are smooth self-consistent curves that pass through the “middle” of the distribution. They preferably describe the structural features of the data. Soft k-segments algorithm of principal curves is used to extract the structural features of training data; the classification features used for characters coarse classification and precise classification are chosen by analyzing the structural features of principal curves in detail; coarse classification and precise classification are separately carried out in handwritten characters recognition. The CEDAR handwritten digit and letter database is used in the experiment. The result of the experiment shows that these features have good discriminating power of similar characters and the algorithm can effectively improve the recognition rate of off-line handwritten characters. The proposed method provides a new approach to the research for off-line handwritten characters recognition.

Key words: principal curve, off-line handwritten characters recognition, structural features, features extraction

摘要: 模式特征的提取与选择是提高手写体字符识别率的关键因素。主曲线是主成分分析的非线性推广,它是通过数据分布“中间”并满足“自相合”的光滑曲线,能够很好地描述数据分布的结构特征。利用软K段主曲线算法提取训练数据的特征,在分析手写体字符结构特点的基础上,选出手写体字符识别所使用的粗分类与细分类特征,利用这些分类特征对手写字符进行识别。该方法在CEDAR手写体数字和字符数据库上的实验表明:选取的分类特征能够有效区分相似的手写体字符,提高手写字符的识别率,为脱机手写字符识别研究提供了一种新的方法。

关键词: 主曲线, 手写体字符识别, 结构特征, 特征选取