Computer Engineering and Applications ›› 2010, Vol. 46 ›› Issue (15): 146-149.DOI: 10.3778/j.issn.1002-8331.2010.15.043

• 图形、图像、模式识别 • Previous Articles     Next Articles

Printed mathematical expressions extraction method based on ICSA-SVM and K-L transform

ZHANG Can-long1,3,TANG Yan-ping2,WANG Qiang1,WEI Chun-rong1   

  1. 1.College of Computer Science and Information Engineering,Guangxi Normal University,Guilin,Guangxi 541004,China
    2.School of Material Science and Engineering,Guilin University of Electronic Technology,Guilin,Guangxi 541004,China
    3.School of Aeronautics & Astronautics,Shanghai Jiaotong University,Shanghai 200240,China
  • Received:2009-12-08 Revised:2010-03-17 Online:2010-05-21 Published:2010-05-21
  • Contact: ZHANG Can-long

一种印刷体数学公式优化提取策略

张灿龙1,3,唐艳平2,王 强1,韦春荣1   

  1. 1.广西师范大学 计算机科学与信息工程学院,广西 桂林 541004
    2.桂林电子科技大学 材料科学与工程学院,广西 桂林 541004
    3.上海交通大学 航空航天学院,上海 200240
  • 通讯作者: 张灿龙

Abstract: A new approach for separating both isolated and embedded expressions in printed Chinese technical documents is presented,which includes two steps:line classification and symbol recognition.In the approach,K-L transform is applied to eliminate interrelation among line features and extract symbol features,and immune clone selection algorithm is used to optimize parameters of line classifier and symbol classifier based on support vector machine.The testing result to about 300 printed Chinese technical documents indicates that the expressions extraction accuracy is 94% above.

Key words: printed mathematical expression, support vector machine, K-L transform, Immune Clone Selection

摘要: 提出了一种先版面行分类后符号识别的印刷体数学公式提取策略。策略中两次应用K-L变换,分别完成版面行特征的降维和公式符号特征的提取,并采用免疫克隆选择算法优化支持向量机的训练参数,以构建出最优的版面行分类器和公式符号识别器。通过对300多份印刷体中文科技文档进行扫描识别测试,所得结果的公式提取率可达94%以上。

关键词: 印刷体数学公式, 支持向量机, K-L变换, 免疫克隆选择

CLC Number: