Live Barrage Text Recognition Based on Improved CRNN Network

doi:10.3778/j.issn.1002-8331.2304-0417

Abstract

Abstract: In the scenario of live-streaming e-commerce, through the analysis of the bullet screen information sent by consumers, whether the actual evaluation of current commodities is the same as that described by the anchor can be reflected to a certain extent, which plays a regulatory role in the promotion of counterfeit products in the live-streaming industry. For the special characteristics of bullet screen text recognition, in this thesis, a real-time bullet screen recognition network based on improved CRNN (convolutional recurrent neural network) is proposed to solve the problems of incomplete extraction of text feature information by CRNN algorithm in complex background environment. Therefore, the designed network adopts an encoding and decoding structure to enhance the feature extraction module to solve the problem of feature loss during feature extraction caused by the small pixel area of the bullet screen. Moreover, a Transformer model is used to construct long-distance global feature relationships for input frames to strengthen the ability of network model to capture and extract bullet screen information. And the extracted feature information is sequentially modeled and transcribed to obtain the specific bullet screen semantic information. The experimental results show that the detection accuracy of the designed network is tested experimentally up to 0.926 on the test set, which improves the accuracy value by 0.101 on average.

Key words: bullet screen text, deep learning, recurrent convolutional network, Transformer model

摘要： 在直播带货场景中，通过分析消费者发送的弹幕信息，能够在一定程度上反映出当前商品的实际评价是否与主播的描述一致，对直播行业中假冒伪劣产品的监管具有重要指导意义。针对弹幕文本识别的特殊性，提出了一种基于改进CRNN（convolutional recurrent neural network）的实时弹幕识别网络，以解决CRNN算法对于复杂背景环境下的文本特征信息提取不全等问题。为此所设计的网络采用了编解码结构对特征提取模块进行强化设计，以解决弹幕像素区域小造成的特征提取过程中的特征丢失问题。使用Transformer模型对输入的帧画面构建长距离全局特征关系，以强化网络模型对弹幕信息的捕捉能力，并对提取的特征信息进行序列建模及转录得到具体的弹幕语义信息。实验结果表明，所设计的网络在测试集上检测精度高达0.926，平均精度值提高了0.101。

关键词: 文本识别, 深度学习, 循环卷积网络, Transformer模型

ZHANG Rongze, WANG Xiuhui. Live Barrage Text Recognition Based on Improved CRNN Network[J]. Computer Engineering and Applications, 2024, 60(15): 143-149.

张荣泽, 王修晖. 改进CRNN网络的直播弹幕文本识别[J]. 计算机工程与应用, 2024, 60(15): 143-149.

References

[1] 董召锋. 主流媒体“直播带货”现象探析[J]. 传媒, 2022(15): 79-81.
DONG Z F. Analysis of the phenomenon of “live streaming with sales” in mainstream media[J]. Media, 2022(15): 79-81.
[2] RIBEIRO V V, CRUZES D S, TRAVASSOS G H. Moderator factors of software security and performance verification[J]. Journal of Systems & Software, 2022, 184: 111137.
[3] CAMILLI M, JANES A, RUSSO B. Automated test-based learning and verification of performance models for microservices systems[J]. Journal of Systems and Software, 2022, 187: 111225.
[4] LIEBRENZ T, HERBER P, GLESNER S. Service-oriented decomposition and verification of hybrid system models using feature models and contracts[J]. Science of Computer Programming, 2021, 211: 102694.
[5] ALASMARI N, CALINESCU R, PATERSON C, et al. Quantitative verification with adaptive uncertainty reduction[J]. Journal of Systems and Software, 2022, 188: 111275.
[6] 孙瑞安, 张云华. 结合AdaBERT的TextCNN垃圾弹幕识别和过滤算法[J]. 智能计算机与应用, 2021, 11(4): 9-13.
SUN R A, ZHANG Y H. TextCNN based on AdaBERT barrage recognition and filtering algorithm[J]. Intelligent Computer and Applications, 2021, 11(4): 9-13.
[7] 明建华, 胡创, 周建政, 等. 针对直播弹幕的TextCNN过滤模型[J]. 计算机工程与应用, 2021, 57(3): 162-167.
MING J H, HU C, ZHOU J Z, et al. TextCNN based filtering model for barrage in live video broadcast[J]. Computer Engineering and Applications, 2021, 57(3): 162-167.
[8] 叶海燕. 基于情感计算与深度学习的弹幕文本敏感词识别方法[J]. 常州工学院学报, 2022, 35(3): 29-33.
YE H Y. Recognition method of sensitive words in bullet screen text based on emotional computing and deep learning[J]. Journal of Changzhou Institute of Technology, 2022, 35(3): 29-33.
[9] 黄立赫, 石映昕. 面向视频弹幕的网络舆情事件监测研究[J]. 情报杂志, 2022, 41(2): 146-154.
HUANG L H, SHI Y X. Research on network public opinion event monitoring for video bullet screen[J]. Journal of Intelligence, 2022, 41(2): 146-154.
[10] 金丹丹, 于干. 基于多维情感词典的B站视频弹幕倾向性分析[J]. 阜阳师范大学学报 (自然科学版), 2022, 39(2): 99-105.
JIN D D, YU G. Sentiment analysis of Bilibili video barrage based on multidimensional sentiment dictionary[J]. Journal of Fuyang Normal University (Natural Science), 2022, 39 (2): 99-105.
[11] TRINH T D, DANG L T A, TRUONG N N, et al. An improved CRNN for vietnamese identity card information recognition[J]. Computer Systems Science and Engineering, 2022, 40(2): 539-555.
[12] AL-SAFFAR A, AWANG S, AL-SAIAGH W, et al. A sequential handwriting recognition model based on a dynamically configurable CRNN[J]. Sensors, 2021, 21(21): 7306.
[13] LI M, MIAO Z J, XU W R. A CRNN-based attention-seq2seq model with fusion feature for automatic Labanotation generation[J]. Neurocomputing, 2021, 454: 430-440.
[14] LOKESHWAR S, VADIRAJA R M K, SUJAY KUMAR P S, et al. Analog document search using CRNN and keyphrase extraction[J]. International Journal of Image, Graphics and Signal Processing (IJIGSP), 2021, 13(2): 16-24.
[15] ZHAO X B, XIONG Z X, LI T Z, et al. CRNN with 2D attention for word recognition of english exams[C]//Twelfth International Conference on Graphics and Image Processing (ICGIP 2020), 2021.
[16] WANG X H, ZHANG X, LEI S Y, et al. A method of text detection and recognition from receipt images based on CRAFT and CRNN[C]//2020 4th International Conference on Machine Vision and Information Technology (CMVIT 2020), February, 20-22, 2020.
[17] 蔡肖, 陈志华, 盛斌. 基于移位窗口金字塔Transformer的遥感图像目标检测[J]. 计算机科学, 2023, 50(1): 105-113.
CAI X, CHEN Z H, SHENG B. SPT: swin pyramid Transformer for object detection of remote sensing[J]. Computer Science, 2023, 50(1): 105-113.
[18] 付苗苗, 邓淼磊, 张德贤. 基于深度学习和Transformer的目标检测算法[J]. 计算机工程与应用, 2023, 59(1): 37-48.
FU M M, DENG M L, ZHANG D X. Object detection algorithms based on deep learning and Transformer[J]. Computer Engineering and Applications, 2023, 59(1): 37-48.
[19] 周名杰. 基于ResNet与Transformer的离线手写数学公式识别[J]. 科技创新与应用, 2022, 12(21): 18-21.
ZHOU M J. Offline handwritten mathematical formula recognition based on resnet and Transformer[J]. Technology Innovation and Application, 2022, 12(21): 18-21.
[20] SUNDERMEYER M, SCHLüTER R, NEY H. LSTM neural networks for language modeling[C]//Interspeech, 2012.
[21] MOHAN A T, GAITONDE D V. A deep learning based approach to reduced order modeling for turbulent flow control using LSTM neural networks[J]. arXiv:1804.09269, 2018.