Computer Engineering and Applications ›› 2018, Vol. 54 ›› Issue (3): 160-165.DOI: 10.3778/j.issn.1002-8331.1706-0304

Previous Articles     Next Articles

Chinese character CAPTCHA recognition based on convolution neural network

FAN Wang, HAN Jungang, GOU Fan, LI Shuai   

  1. School of Postgraduate, Xi’an University of Posts and Telecommunications, Xi’an 710121, China
  • Online:2018-02-01 Published:2018-02-07

卷积神经网络识别汉字验证码

范  望,韩俊刚,苟  凡,李  帅   

  1. 西安邮电大学 研究生学院,西安 710121

Abstract: CAPTCHAs(Completed Automated Public Turing test to tell Computers and Humans Apart) have already been widely applied in various fields of social life. Automatic recognition of CAPTCHAs consisting of English letters and Arabic numerals has already reached an advanced level. While with general methods identifing the CAPTCHAs?consisting of Chinese characters seems too difficult and the accuracy needs to be promoted. This paper mainly proposes a method of automatic identification CAPTCHAs which is based on convolutional neural network to improve the accuracy of characters recognition. In order to improve the generalization performance of the model by which adopting the framework of Keras convolution neural network and designing of multilayer convolution to extract deep-layer image information of which identifing Chinese characters CAPTCHAs and alphanumeric CAPTCHAs respectively. The experimental results indicate that the accuracy of identification has been promoted remarkably. The identification rate of Chinese characters is up to 99.4%. Meanwhile, the maximum of the identification rate of alphanumeric four-character CAPTCHAs is as high as 99.3%. These findings show that the Deep Neural Network possesses an excellent perceptivity against complex structures. It can be seen from the comparative experiments that the framework of Keras convolution neural network has better performance than other frameworks in CAPTCHAs recognition.

Key words: CAPTCHAs(Completed Automated Public Turing test to tell Computers and Humans Apart), Chinese character CAPTCHAs, CNN, Keras framework

摘要: 验证码今已广泛应用在各个领域,常见的英文字母与数字组合的验证码自动识别准确率已达到较高的水准,而汉字因其字符复杂,用传统方法进行自动识别难度很大。提出一种基于卷积神经网络的验证码自动识别方法来提高字符的识别准确率。采用Keras卷积神经网络框架,设计多层卷积来提取深层次图像信息,分别对汉字验证码和字母数字验证码进行识别,以提高模型的泛化性。实验结果表明用该方法汉字验证码的单字识别率已达到99.4%;传统四字符字母数字验证码的识别率最高达到99.3%。这一结果表明深度神经网络对验证码复杂结构的感知能力很强大,通过对比实验发现Keras框架在验证码识别领域有较好效果。

关键词: 验证码, 汉字验证码, CNN, Keras框架