计算机工程与应用 ›› 2024, Vol. 60 ›› Issue (15): 143-149.DOI: 10.3778/j.issn.1002-8331.2304-0417

• 模式识别与人工智能 • 上一篇    下一篇

改进CRNN网络的直播弹幕文本识别

张荣泽,王修晖   

  1. 中国计量大学 信息工程学院,杭州 310018
  • 出版日期:2024-08-01 发布日期:2024-07-30

Live Barrage Text Recognition Based on Improved CRNN Network

ZHANG Rongze, WANG Xiuhui   

  1. College of Information Engineering, China Jiliang University, Hangzhou 310018, China
  • Online:2024-08-01 Published:2024-07-30

摘要: 在直播带货场景中,通过分析消费者发送的弹幕信息,能够在一定程度上反映出当前商品的实际评价是否与主播的描述一致,对直播行业中假冒伪劣产品的监管具有重要指导意义。针对弹幕文本识别的特殊性,提出了一种基于改进CRNN(convolutional recurrent neural network)的实时弹幕识别网络,以解决CRNN算法对于复杂背景环境下的文本特征信息提取不全等问题。为此所设计的网络采用了编解码结构对特征提取模块进行强化设计,以解决弹幕像素区域小造成的特征提取过程中的特征丢失问题。使用Transformer模型对输入的帧画面构建长距离全局特征关系,以强化网络模型对弹幕信息的捕捉能力,并对提取的特征信息进行序列建模及转录得到具体的弹幕语义信息。实验结果表明,所设计的网络在测试集上检测精度高达0.926,平均精度值提高了0.101。

关键词: 文本识别, 深度学习, 循环卷积网络, Transformer模型

Abstract: In the scenario of live-streaming e-commerce, through the analysis of the bullet screen information sent by consumers, whether the actual evaluation of current commodities is the same as that described by the anchor can be reflected to a certain extent, which plays a regulatory role in the promotion of counterfeit products in the live-streaming industry. For the special characteristics of bullet screen text recognition, in this thesis, a real-time bullet screen recognition network based on improved CRNN (convolutional recurrent neural network) is proposed to solve the problems of incomplete extraction of text feature information by CRNN algorithm in complex background environment. Therefore, the designed network adopts an encoding and decoding structure to enhance the feature extraction module to solve the problem of feature loss during feature extraction caused by the small pixel area of the bullet screen. Moreover, a Transformer model is used to construct long-distance global feature relationships for input frames to strengthen the ability of network model to capture and extract bullet screen information. And the extracted feature information is sequentially modeled and transcribed to obtain the specific bullet screen semantic information. The experimental results show that the detection accuracy of the designed network is tested experimentally up to 0.926 on the test set, which improves the accuracy value by 0.101 on average.

Key words: bullet screen text, deep learning, recurrent convolutional network, Transformer model