Trademark Detection and Classification Based on YOLO-FGE

doi:10.3778/j.issn.1002-8331.2305-0513

Abstract

Abstract: In order to solve the trademarks’ problems about their numerous styles, complex backgrounds, and large-scale changes, a YOLO-FGE network model based on the YOLOv5 framework is proposed to distinguish trademark category information more accurately. Firstly, a feature enhancement module is put forward to enhance the adaptability of the feature layer to different kinds of trademarks, making the network pay more attention to the useful information of trademarks to be detected. Secondly, the global information attention module is embedded in the C3 module of YOLOv5 to optimize the backbone and neck network. Finally, an enhanced spatial attention module is raised, which uses dilated convolution to expand the receptive field, combines channel attention and Transformer module to improve the detection accuracy. The experimental results on the graphic trademark dataset show that the model improves mAP to 92.3%, which has higher detection accuracy than most existing methods.

Key words: trademark detection, feature enhancement, global attention, spatial attention

摘要： 为了解决商标样式众多、背景复杂、尺度变化大等问题，基于YOLOv5框架，提出了一种YOLO-FGE网络模型，以更精确地分辨出商标类别信息。提出一种新的特征增强模块来提升特征层对不同类型商标的适应性，使网络更多关注待检测商标的有用信息。在YOLOv5的C3模块中嵌入全局注意力模块对骨干网络和颈网络进行优化。提出了一种增强空间注意力模块，利用空洞卷积扩大感受野，并结合通道注意力和Transformer模块来提升商标检测精度。在图形类商标数据集上的实验结果表明，该模型将mAP提升至92.3%，比大多数现有方法具有更高的检测精度。

关键词: 商标检测, 特征增强, 全局注意力, 空间注意力

MIAO Chunyuan, WANG Xiuhui. Trademark Detection and Classification Based on YOLO-FGE[J]. Computer Engineering and Applications, 2024, 60(20): 233-243.

缪春沅, 王修晖. 结合YOLO-FGE网络的商标检测与分类[J]. 计算机工程与应用, 2024, 60(20): 233-243.

References

[1] JORDAN M I, MITCHELL T M. Machine learning: trends, perspectives, and prospects[J]. Science, 2015, 349(6245): 255-260.
[2] MAHESH B. Machine learning algorithms-a review[J]. International Journal of Science and Research, 2020, 9: 381-386.
[3] VOULODIMOS A, DOULAMIS N, DOULAMIS A, et al. Deep learning for computer vision: a brief review[J]. Computational Intelligence and Neuroscience, 2018, 2018: 1-13.
[4] HU H, GU J, ZHANG Z, et al. Relation networks for object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018: 3588-3597.
[5] CHEN S, WANG H, XU F, et al. Target classification using the deep convolutional networks for SAR images[J]. IEEE Transactions on Geoscience and Remote Sensing, 2016, 54(8): 4806-4817.
[6] DOAN V S, HUYNH-THE T, KIM D S. Underwater acoustic target classification based on dense convolutional neural network[J]. IEEE Geoscience and Remote Sensing Letters, 2020, 19: 1-5.
[7] YU C, WANG J, PENG C, et al. BiSeNet: bilateral segmentation network for real-time semantic segmentation[C]//Proceedings of the European Conference on Computer Vision, 2018: 325-341.
[8] FAN M, LAI S, HUANG J, et al. Rethinking BiSeNet for real-time semantic segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 9716-9725.
[9] WANG X, ZHANG R, KONG T, et al. SOLOv2: dynamic and fast instance segmentation[C]//Advances in Neural Information Processing Systems, 2020, 33: 17721-17732.
[10] BOLYA D, ZHOU C, XIAO F, et al. YOLACT: real-time instance segmentation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019: 9157-9166.
[11] ALBAWI S, MOHAMMED T A, AL-ZAWI S. Understanding of a convolutional neural network[C]//Proceedings of the 2017 International Conference on Engineering and Technology, 2017: 1-6.
[12] HOLLANDER D R J, HANJALIC A. Logo recognition in video stills by string matching[C]//Proceedings of the 2003 International Conference on Image Processing, 2003: 517.
[13] BAGDANOV A D, BALLAN L, BERTINI M, et al. Trademark matching and retrieval in sports video databases[C]//Proceedings of the International Workshop on Workshop on Multimedia Information Retrieval, 2007: 79-86.
[14] ZHOU H, YUAN Y, SHI C. Object tracking using SIFT features and mean shift[J]. Computer Vision and Image Understanding, 2009, 113(3): 345-352.
[15] KLEBAN J, XIE X, MA W Y. Spatial pyramid mining for logo detection in natural scenes[C]//Proceedings of the 2008 IEEE International Conference on Multimedia and Expo, 2008: 1077-1080.
[16] SHARMA N, MANDAL R, SHARMA R, et al. Signature and logo detection using deep CNN for document image retrieval[C]//Proceedings of the 2018 16th International Conference on Frontiers in Handwriting Recognition, 2018: 416-422.
[17] SAHEL S, ALSAHAFI M, ALGHAMDI M, et al. Logo detection using deep learning with pretrained CNN models[J]. Engineering, Technology & Applied Science Research, 2021, 11(1): 6724-6729.
[18] YOUSAF W, UMAR A, SHIRAZI S H, et al. Patch-CNN: deep learning for logo detection and brand recognition[J]. Journal of Intelligent & Fuzzy Systems, 2021, 40(3): 3849-3862.
[19] ALSHOWAISH H, AL-OHALI Y, AL-NAFJAN A. Trademark image similarity detection using convolutional neural network[J]. Applied Sciences, 2022, 12(3): 1752.
[20] SENGUPTA A, YE Y, WANG R, et al. Going deeper in spiking neural networks: VGG and residual architectures[J]. arXiv:1802.02627, 2018.
[21] TRAPPEY A J, TRAPPEY C V, LIN E. Intelligent trademark recognition and similarity analysis using a two-stage transfer learning approach[J]. Advanced Engineering Informatics, 2022, 52: 101567.
[22] WANG J, MIN W, HOU S, et al. LogoDet-3K: a large-scale image dataset for logo detection[J]. ACM Transactions on Multimedia Computing, Communications, and Applications, 2022, 18(1): 1-19.
[23] SZEGEDY C, VANHOUCKE V, IOFFE S, et al. Rethinking the inception architecture for computer vision[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 2818-2826.
[24] SZEGEDY C, IOFFE S, VANHOUCKE V, et al. Inception-v4, inception-ResNet and the impact of residual connections on learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2017: 4278-4284.
[25] WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module[C]//Proceedings of the European Conference on Computer Vision, 2018: 3-19.
[26] NIU Z, ZHONG G, YU H. A review on the attention mechanism of deep learning[J]. Neurocomputing, 2021, 452: 48-62.
[27] 夏鸿斌, 肖奕飞, 刘渊. 融合自注意力机制的长文本生成对抗网络模型[J]. 计算机科学与探索, 2022, 16(7): 1603-1610.
XIA H B, XIAO Y F, LIU Y. Long text generation adversarial network model with self-attention mechanism[J]. Journal of Frontiers of Computer Science & Technology, 2022, 16(7): 1603-1610.
[28] 程艳, 蔡壮, 吴刚, 等. 结合自注意力特征过滤分类器和双分支GAN的面部表情识别[J]. 模式识别与人工智能, 2022, 35(3): 243-253.
CHENG Y, CAI Z, WU G, et al. Facial expression recognition combining self-attention feature filtering classifier and two-branch GAN[J]. Pattern Recognition and Artificial Intelligence, 2022, 35(3): 243-253.
[29] HAN K, XIAO A, WU E, et al. Transformer in transformer[C]//Advances in Neural Information Processing Systems, 2021, 34: 15908-15919.
[30] CHOWDHARY K, CHOWDHARY K. Natural language processing[J]. Fundamentals of Artificial Intelligence, 2020: 603-649.
[31] DAI Z, CAI B, LIN Y, et al. UP-DETR: unsupervised pre-training for object detection with transformers[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 1601-1610.
[32] LIU Z, LIN Y, CAO Y, et al. Swin transformer: hierarchical vision transformer using shifted windows[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021: 10012-10022.