Computer Engineering and Applications ›› 2014, Vol. 50 ›› Issue (1): 163-166.

Segmentation method for merged characters in CAPTCHA based on drop fall algorithm

LI Xingguo1,2, GAO Wei1   

  1. 1.School of Management, Hefei University of Technology, Hefei 230009, China
    2.Key Laboratory of Process Optimization and Intelligent Decision-making, Ministry of Education, Hefei 230009, China
  • Online:2014-01-01 Published:2013-12-30


李兴国1,2,高  炜1   

  1. 1.合肥工业大学 管理学院,合肥 230009
    2.过程优化与智能决策教育部重点实验室,合肥 230009

Abstract: Many researches demonstrate that good result can be gained by existing machine learning algorithms in the recognization of CAPTCHA(Completely Automated Public Turing test to tell Computers and Humans Apart) if single characters can be split. A method is presented to segment the merged characters in the recognization of CAPTCHA with touching characters. It seeks division points by combining the statistics of character width and the vertical histogram projection minimums, and then uses these points as the starting points of the drop fall algorithm to segment merged characters in CAPTCHA. The experiments show that it is a general method and can improve the recognization rate.

Key words: drop fall algorithm, CAPTCHA, merged characters, segmentation, vertical histogram projection

摘要: 众多研究表明,如果能将验证码中的字符分割开来,用现有的机器学习算法一般都能取得比较好的识别效果。针对字符粘连情况下的验证码的识别问题,提出了一种粘连字符的分割方法。该方法将字符的宽度统计值和竖直投影直方图中的投影极小值点相结合找到分割点,以这些分割点作为滴水算法的起始滴落点对粘连字符进行分割。实验结果证明,该方法用于分割验证码中的粘连字符具有一般性,能够提高验证码识别率。

关键词: 滴水算法, 验证码, 粘连字符, 分割, 竖直投影