计算机工程与应用 ›› 2009, Vol. 45 ›› Issue (7): 1-3.DOI: 10.3778/j.issn.1002-8331.2009.07.001

• 博士论坛 • 上一篇    下一篇

一种连续手写中文的演化分割方法

付永刚1,张习文1,戴国忠2   

  1. 1.北京语言大学 信息科学学院 数字媒体实验室,北京 100083
    2.中国科学院 软件研究所 人机交互与智能信息处理实验室,北京 100080
  • 收稿日期:2008-12-04 修回日期:2009-01-13 出版日期:2009-03-01 发布日期:2009-03-01
  • 通讯作者: 付永刚

Character extraction from continuous handwriting Chinese using genetic algorithm

FU Yong-gang1,ZHANG Xi-wen1,DAI Guo-zhong2   

  1. 1.Digital Media Laboratory,College of Information Sciences,Beijing Language and Culture University,Beijing 100083,China
    2.Laboratory of Human-Computer Interaction & Intelligent Information Processing,Institute of Software,CAS,Beijing 100080,China
  • Received:2008-12-04 Revised:2009-01-13 Online:2009-03-01 Published:2009-03-01
  • Contact: FU Yong-gang

摘要: 在连续手写中文中,有偏旁部首离得较远的单字,单字之间可能会存在粘连、重叠。针对这种情况给出了一种基于识别得分提取单字的演化方法。对行笔划序列进行二进制编码,采用改进的遗传算法实现演化过程。染色体中连续0或1对应的笔划组成候选单字。用汉王手写单字识别器获取它们的识别得分,以单字个数较少和总的识别得分较大为优化目标。遗传算法中的变异概率和交叉概率自适应生成。测试结果表明该方法对连续手写中文具有较好的分割效果。

关键词: 连续手写中文, 单字提取, 遗传算法, 识别得分

Abstract: There are characters with apart far radicals and touching and overlapping characters in continuous handwriting Chinese.To address this problem,the paper proposes a novel approach to extract characters from handwriting Chinese based on character recognition score using an improved genetic algorithm.Chromosomes are randomly encoded into binary strings according to the number of strokes,adjacent genes with 1s or 0s form candidate characters.Characters are recognized using Hanwang handwriting character recognizer.The approach is to produce a recognized string with less characters and a larger score.Mutation probabilities and crossover ones are calculated adaptively.Many applications show that the approach is effective and robust for character extraction from continuous handwriting Chinese.

Key words: continuous handwriting Chinese, character extraction, genetic algorithm, character recognition score