Multi-Scale Deformable Transformer for Banknote Serial Number Recognition

doi:10.3778/j.issn.1002-8331.2211-0245

Abstract

Abstract: The serial number recognition system of banknotes plays an important role in the supervision of the circulation of banknotes, and the banknotes will be deformed or contaminated in the process of actual use, resulting in the irregular text of the serial number of the banknotes. In order to meet the identification requirements of irregular banknote serial numbers, a multi-scale deformable Transformer-based identification method for banknote serial numbers is proposed. Through the self-built banknote serial number detection platform, the image of the banknote serial number is obtained and transmitted to the computer. The feature map of the serial number image is extracted through the backbone network and transmitted to the encoder. The multi-scale feature information of the text is further extracted through the multi-scale deformable attention mechanism, and then the thick bounding box of the text is extracted by the candidate box generator using the polygon bounding box detection mechanism. The regression training of the polygon bounding box coordinates in the position decoder is guided. The character decoder performs character prediction while the position decoder predicts the text bounding box, and finally the text recognition result of the serial number of the banknote is output. The experimental results show that this method can meet the needs of online detection and identification of banknote serial numbers.

Key words: banknote serial number, deep learning, Transformer, character recognition

摘要： 纸币序列号识别系统在纸币流通的监管中扮演着重要角色，而纸币在实际使用的过程中会产生变形或受到污染导致纸币序列号呈现出不规则文本的特点。为满足不规则纸币序列号的识别需求，提出基于多尺度可变形Transformer的纸币序列号的识别方法。通过自主搭建的纸币序列号检测平台获取纸币序列号图像传输至计算机。序列号图像通过骨干网络提取特征图并传输至编码器，通过多尺度可变形注意力机制进一步提取文本的多尺度特征信息，随后采用多边形边界框检测机制，经候选框生成器提取文本的粗边界框，引导位置解码器中的多边形边界框坐标的回归训练，字符解码器在位置解码器预测文本边界框的同时进行字符预测，最终输出纸币序列号文本识别结果。实验结果表明，该方法能够满足纸币序列号在线检测与识别的需求。

关键词: 纸币序列号, 深度学习, Transformer, 字符识别

ZHANG Kaisheng, LI Xuyang. Multi-Scale Deformable Transformer for Banknote Serial Number Recognition[J]. Computer Engineering and Applications, 2023, 59(18): 105-118.

张开生, 李旭洋. 多尺度可变形Transformer纸币序列号识别[J]. 计算机工程与应用, 2023, 59(18): 105-118.

References

[1] 闫新广，郭亮.基于服务实体经济视角下的大额现金管理研究[J].金融理论与实践，2020，42（8）：71-77.
YAN X G，GUO L.Research on large cash management from the perspective of serving entity economy[J].Financial Theory and Practice，2020，42（8）：71-77.
[2] 徐扬，许万征.新形势下地方国库现金管理[J].中国金融，2021，72（11）：102.
XU Y，XU W Z.Local treasury cash management under the new situation[J].China Finance，2021，72（11）：102.
[3] 张力芝，赵胜利，钟妤玥.基于银行流水数据的洗钱风险综合评估[J].统计学与应用，2021，10（1）：1-9.
ZHANG L Z，ZHAO S L，ZHONG Y Y.Comprehensive evaluation of money laundering risk based on bank flow data[J].Statistics and Application，2021，10（1）：1-9.
[4] 张开生，张晨静，秦博.一种基于大数据的多功能纸币收付款系统：CN112133044B[P].2022-03-25.
ZHANG K S，ZHANG C J，QIN B.A multi-functional banknote collection and payment system based on big data：CN112133044B[P].2022-03-25.
[5] LECUN Y，BOSER B，DENKER J，et al.Handwritten digit recognition with a back-propagation network[C]//Advances in Neural Information Processing Systems，1989：396-404.
[6] NAKAYAMA T.Content-oriented categorization of document images[C]//COLING 1996 Volume 2：The 16th International Conference on Computational Linguistics，1996：818-823.
[7] LEE C Y，OSINDERO S.Recursive recurrent nets with attention modeling for ocr in the wild[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2016：2231-2239.
[8] SU B，LU S.Accurate scene text recognition based on recurrent neural network[C]//Asian Conference on Computer Vision.Cham：Springer，2014：35-48.
[9] SHI B，BAI X，YAO C.An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition[J].IEEE Transactions on Pattern Analysis and Machine Intelligence，2016，39（11）：2298-2304.
[10] LIAO M，SHI B，BAI X.Textboxes++：a single-shot oriented scene text detector[J].IEEE Transactions on Image Processing，2018，27（8）：3676-3690.
[11] ZHOU X，YAO C，WEN H，et al.East：an efficient and accurate scene text detector[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2017：5551-5560.
[12] BAEK Y，LEE B，HAN D，et al.Character region awareness for text detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition，2019：9365-9374.
[13] WANG X，JIANG Y，LUO Z，et al.Arbitrary shape scene text detection with adaptive text region representation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition，2019：6449-6458.
[14] LIU Y，CHEN H，SHEN C，et al.Abcnet：real-time scene text spotting with adaptive bezier-curve network[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition，2020：9809-9818.
[15] LIU Y，SHEN C，JIN L，et al.Abcnet v2：adaptive bezier-curve network for real-time end-to-end text spotting[J].arXiv：2105.03620，2021.
[16] FENG W，HE W，YIN F，et al.Textdragon：an end-to-end framework for arbitrary shaped text spotting[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision，2019：9076-9085.
[17] QIN S，BISSACCO A，RAPTISR M，et al.Towards unconstrained end-to-end text spotting[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision，2019：4704-4714.
[18] 刘文婷，卢新明.基于计算机视觉的Transformer研究进展[J].计算机工程与应用，2022，58（6）：1-16.
LIU W T，LU X M.Research progress of Transformer based on computer vision[J].Computer Engineering and Applications，2022，58（6）：1-16.
[19] 罗岩，冯天波，邵洁.基于注意力及视觉Transformer的野外人脸表情识别[J].计算机工程与应用，2022，58（10）：200-207.
LUO Y，FENG T B，SHAO J.Facial expression recognition in wild based on attention and vision Transformer[J].Computer Engineering and Applications，2022，58（10）：200-207.
[20] 胡章芳，蹇芳，唐珊珊，等.DFSMN-T：结合强语言模型Transformer的中文语音识别[J].计算机工程与应用，2022，58（9）：187-194.
HU Z F，JIAN F，TANG S S，et al.DFSMN-T：Mandarin speech recognition with language model Transformer[J].Computer Engineering and Applications，2022，58（9）：187-194.
[21] YU D，LI X，ZHANG C，et al.Towards accurate scene text recognition with semantic reasoning networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition，2020：12113-12122.
[22] SHENG F，CHEN Z，XU B.NRTR：a no-recurrence sequence-to-sequence model for scene text recognition[C]//2019 International Conference on Document Analysis and Recognition（ICDAR），2019：781-786.
[23] 孙敬成，王正彦，李增刚.卷积神经网络数字识别系统的FPGA实现[J].计算机工程与应用，2020，56（13）：181-188.
SUN J C，WANG Z Y，LI Z.FPGA implementation of convolution neural network digital recognition system[J].Computer Engineering and Applications，2020，56（13）：181-188.
[24] LI T，WANG J，ZHANG T.L-DETR：a light-weight detector for end-to-end object detection with transformers[J].IEEE Access，2022，10：105685-105692.
[25] PARK E，BERG A C.Learning to decompose for object detection and instance segmentation[J].arXiv：1511.06449，2015.
[26] BELLO I，ZOPH B，VASWANI A，et al.Attention augmented convolutional networks[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision，2019：3286-3295.
[27] LIN T Y，DOLLAR P，GIRSHICK R，et al.Feature pyramid networks for object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2017：2117-2125.
[28] DONG Q，TU Z，LIAO H，et al.Visual relationship detection using part-and-sum transformers with composite queries[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision，2021：3550-3559.
[29] VAYSSADE J A，PAOLI J N，GEE C，et al.DeepIndices：remote sensing indices based on approximation of functions through deep-learning，application to uncalibrated vegetation images[J].Remote Sensing，2021，13（12）：2261.
[30] LIN T Y，GOYAL P，GIRSHICK R，et al.Focal loss for dense object detection[C]//Proceedings of the IEEE International Conference on Computer Vision，2017：2980-2988.
[31] LIU Y，JIN L，ZHANG S，et al.Curved scene text detection via transverse and longitudinal sequence connection[J].Pattern Recognition，2019，90：337-345.
[32] REZATOFIGHI H，TSOI N，GWAK J Y，et al.Generalized intersection over union：a metric and a loss for bounding box regression[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition，2019：658-666.
[33] HE K，ZHANG X，REN S，et al.Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2016：770-778.
[34] NAYEF N，PATEL Y，BUSTA M，et al.ICDAR2019 robust reading challenge on multi-lingual scene text detection and recognition—RRC-MLT-2019[C]//2019 International Conference on Document Analysis and Recognition（ICDAR），2019：1582-1587.
[35] CH’NG C K，CHAN C S，LIU C L.Total-text：toward orientation robustness in scene text detection[J].International Journal on Document Analysis and Recognition（IJDAR），2020，23（1）：31-52.
[36] LOSHCHILOV I，HUTTER F.Decoupled weight decay regularization[J].arXiv：1711.05101，2017.
[37] SUN Y，ZHANG C，HUANG Z，et al.Textnet：irregular text reading from images with an end-to-end trainable network[C]//Asian Conference on Computer Vision.Cham：Springer，2018：83-99.
[38] XING L，TIAN Z，HUANG W，et al.Convolutional character networks[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision，2019：9126-9136.
[39] WANG H，LU P，ZHANG H，et al.All you need is boundary：toward arbitrary-shaped text spotting[C]//National Conference on Artificial Intelligence.Association for the Advancement of Artificial Intelligence（AAAI），2020.
[40] WANG P，ZHANG C，QI F，et al.PGNET：real-time arbitrarily-shaped text spotting with point gathering network[C]//Proceedings of the AAAI Conference on Artificial Intelligence，2021：2782-2790.
[41] BHARATI P，PRAMANIK A.Deep learning techniques—R-CNN to mask R-CNN：a survey[C]//Computational Intelligence in Pattern Recognition，2020：657-668.
[42] HE T，TIAN Z，HUANG W，et al.An end-to-end textspotter with explicit alignment and attention[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2018：5020-5029.
[43] LIAO M，SHI B，BAI X，et al.Textboxes：a fast text detector with a single deep neural network[C]//Thirty-first AAAI Conference on Artificial Intelligence，2017.
[44] QIAO L，TANG S，CHENG Z，et al.Text perceptron：towards end-to-end arbitrary-shaped text spotting[C]//Proceedings of the AAAI Conference on Artificial Intelligence，2020：11899-11907.
[45] LIAO M，PANG G，HUANG J，et al.Mask textspotter v3：segmentation proposal network for robust scene text spotting[C]//16th European Conference on Computer Vision，Glasgow，UK，August 23-28，2020：706-722.
[46] XUE C，LU S，ZHANG W.MSR：multi-scale shape regression for scene text detection[J].arXiv：1901.02596，2019.
[47] HE W，ZHANG X Y，YIN F，et al.Realtime multi-scale scene text detection with scale-based region proposal network[J].Pattern Recognition，2020，98：107026.
[48] LIU Z，LIN G，GOH W L，et al.Correlation propagation networks for scene text detection[J].arXiv：1810.00304，2018.