Computer Engineering and Applications ›› 2010, Vol. 46 ›› Issue (20): 116-118.DOI: 10.3778/j.issn.1002-8331.2010.20.033

• 图形、图像、模式识别 • Previous Articles     Next Articles

Lexicon driven approach for on-line handwritten Japanese disease name recognition

LIANG Jian-juan1,ZHU Bi-lan2,LIU Ben-yong1,NAKAGAWA Masaki2   

  1. 1.College of Computer Science and Information,Guizhou University,Guiyang 550025,China
    2.Tokyo University of Agriculture and Technology,Tokyo 184-8588,Japan
  • Received:2010-01-13 Revised:2010-04-01 Online:2010-07-11 Published:2010-07-11
  • Contact: LIANG Jian-juan

词典驱动的联机手写日文病名识别研究

梁建娟1,朱碧兰2,刘本永1,中川正樹2   

  1. 1.贵州大学 计算机科学与信息学院,贵阳 550025
    2.日本东京农工大学,日本 东京 184-8588

  • 通讯作者: 梁建娟

Abstract: This paper studies an effective lexicon driven recognition method for on-line handwritten Japanese disease name recognition.The lexicon contains 21,713 disease name phrases,which are stored in a Trie structure.In segmentation,an online handwritten disease name string inputted is over-segmented into primitive segments according to the features such as spatial information between adjacent strokes.Then one or more consecutive primitive segments form a candidate character pattern.The combination of all candidate patterns is represented by a segmentation candidate lattice,where each node denotes a segmentation point and each arc denotes a candidate character pattern.In recognition,this paper uses the beam search strategy to find an optimal segmentation and recognition result,with restricting the candidate character class of each candidate character pattern by the disease lexicon structured into Trie.The algorithm is tested on 500 actual handwritten disease name samples,the average time for processing a disease name is 0.87 second and the recognition rate is 83.16%.

Key words: disease name recognition, lexicon driven recognition, handwritten character string recognition, beam search

摘要: 研究了一种有效的词典驱动的联机手写日文病名识别方法。病名词典以树结构存储,包含21 713个病名短语。在切分中,手写病名字符串通过分析相邻笔划之间的空间信息等特征被切分为原始的片段序列。连续的片段动态地合并为候选字符模式,不同的合并方式产生不同的候选字符序列,这样可构成一个切分候选网格。在识别过程中,结合病名词典匹配来限制候选字符模式的类别扩展,采用集束搜索策略来寻找到一条最优路径作为识别结果。用500个实际的手写病名样本做实验,平均每个病名的识别时间为0.87 s,识别正确率为83.16%。

关键词: 病名识别, 词典驱动识别, 手写字符串识别, 集束搜索

CLC Number: