计算机工程与应用 ›› 2024, Vol. 60 ›› Issue (18): 248-255.DOI: 10.3778/j.issn.1002-8331.2306-0219

• 图形图像处理 • 上一篇    下一篇

改进YOLOX的唐卡壁画目标检测算法

李洪运,张效娟,赵洋,彭春燕   

  1. 1.青海师范大学 计算机学院,西宁 810016
    2.省部共建藏语智能信息处理及应用国家重点实验室,西宁 810016
    3.合肥工业大学 计算机与信息学院,合肥 230002
  • 出版日期:2024-09-15 发布日期:2024-09-13

Object Detection Algorithm for Thangka and Mural with Improved YOLOX

LI Hongyun, ZHANG Xiaojuan, ZHAO Yang, PENG Chunyan   

  1. 1.School of Computer Science, Qinghai Normal University, Xining 810016, China
    2.State Key Laboratory of Tibetan Intelligent Information Processing and Application, Xining 810016, China
    3.School of Computer and Information, Hefei University of Technology, Hefei 230002, China
  • Online:2024-09-15 Published:2024-09-13

摘要: 热贡唐卡壁画作为人类及国家级非物质文化遗产是藏族文化中独具特色的艺术形式,其画面不仅表现出了佛教本生故事,更体现了藏地的历史、地理、文化、科技等内容。然而,不具备热贡艺术专业知识的人们很难对其进行了解。因此提出了一种唐卡壁画元素的自动检测算法,用于推动唐卡壁画的传播。通过对YOLOX算法进行改进,提出了ECAMH-YOLOX模型对唐卡壁画图像进行检测。ECAMH-YOLOX模型是在YOLOX的基础上增加了高效通道注意力模块,在保持轻量化的同时获得更好的图像全局信息;同时为了更好地检测不同尺度的目标,在检测头模块增加了一个新的检测头,通过四个检测头对图像进行检测,以此来提高不同尺寸目标的检测结果;并使用SIoU损失函数计算回归损失以此来加快模型的收敛速度,提高模型效果。实验结果证明,ECAMH-YOLOX模型在所构建的唐卡壁画数据集上均不存在漏检错检的情况,而YOLOX算法存在对小目标的漏检现象,并且ECAMH-YOLOX模型的mAP0.5:0.95达到了55.9%,比YOLOX算法提升了0.049。该模型在保持轻量化的同时,进一步提高了检测效果。也增加了人们了解热贡艺术的途径。

关键词: 目标检测, YOLOX, 唐卡, 壁画, 通道注意力

Abstract: Regong Tangka and murals, as a distinctive art form in Tibetan culture and recognized as human and national-level intangible cultural heritage, not only depict the stories of Buddhist origins but also embody the history, geography, culture, and technology of the Tibetan region. However, people without specialized knowledge of Regong arts find it challenging to understand their significance. Therefore, an automatic detection algorithm for Tangka and mural elements is proposed to promote the dissemination of Tangka and murals. This study improves the YOLOX algorithm and introduces the ECAMH-YOLOX model for detecting Tangka mural images. The ECAMH-YOLOX model is an improvement of the YOLOX framework, incorporating an efficient channel attention module. This module allows the model to capture better global information from images while maintaining a lightweight design. Additionally, to improve the detection of objects at different scales, a new detection head is added in the detection head module, facilitating detection through four detection heads to enhance results for objects of various sizes. The SIoU loss function is employed to calculate regression loss, which accelerates model convergence and improves model effectiveness. Experimental results demonstrate that the ECAMH-YOLOX model exhibits no instances of missed or false detection on the constructed Tangka and mural dataset, while the YOLOX algorithm shows missed detection for small objects. Moreover, the ECAMH-YOLOX model achieves an mAP0.5:0.95 of 55.9%, a 0.049 improvement over the YOLOX algorithm. The proposed model not only maintains a lightweight structure but also improves detection performance. In addition, it provides a pathway for people to gain a deeper understanding of Regong arts.

Key words: object detection, YOLOX, Thangka, mural, channel attention