计算机工程与应用 ›› 2024, Vol. 60 ›› Issue (24): 222-234.DOI: 10.3778/j.issn.1002-8331.2407-0132

• 图形图像处理 • 上一篇    下一篇

改进YOLOv8的学生课堂行为识别算法:DMS-YOLOv8

陈晨,保文星,陈旭,景永俊,李卫军   

  1. 北方民族大学 计算机科学与工程学院,银川 750021
  • 出版日期:2024-12-15 发布日期:2024-12-12

Improved YOLOv8-Based Algorithm for Classroom Behavior Recognition of Students:DMS-YOLOv8

CHEN Chen, BAO Wenxing, CHEN Xu, JING Yongjun, LI Weijun   

  1. School of Computer Science and Engineering, Northern University for Nationalities, Yinchuan 750021, China
  • Online:2024-12-15 Published:2024-12-12

摘要: 针对智慧教室中存在前后排学生图像尺寸差异较大、后排小目标检测困难的问题,提出了一种改进YOLOv8的学生课堂行为识别方法:DMS-YOLOv8。结合CA注意力机制与深度卷积,提出了动态通道注意力卷积(DCAConv),能够动态调整通道权重,更灵敏地捕获关键特征;引入多尺度卷积注意力(MSCA),通过元素乘法最大化挖掘多尺度卷积特征,增强对空间细节的关注;同时,构建了多尺度上下文融合(LCD)模块,通过卷积和自注意力机制,增强多尺度特征融合。增加小目标检测层,通过较大尺寸特征图的局部特征提取,显著提高模型对后排学生行为的识别能力。与基线模型YOLOv8n相比,该方法在自制学生行为数据集上的mAP50值提高了4.6个百分点,在VOC数据集上提高了18.7个百分点。该方法在学生课堂行为识别方面表现突出,可显著提高智慧教室学生课堂行为识别的准确率。

关键词: 学生行为识别, YOLOv8, 目标检测, 动态通道注意力卷积, 多尺度上下文融合

Abstract: To address significant image size differences and small target detection challenges in smart classrooms, an improved YOLOv8 method for recognizing student behavior, DMS-YOLOv8, is proposed. Firstly, dynamic channel attention convolution (DCAConv) combines CA attention with deep convolution to dynamically adjust channel weights and capture key features. Secondly, multi-scale convolutional attention (MSCA) utilizes element-wise multiplication to enhance spatial details by maximizing multi-scale features. Additionally, a multi-scale context fusion (LCD) module is constructed to improve feature fusion using convolution and self-attention mechanisms. Finally, a small target detection layer is added to enhance the model’s ability to recognize back-row student behavior by extracting local features from larger-sized feature maps. Compared to the baseline YOLOv8n model, this method improves the mAP50 value by 4.6 percengtage points on a custom student behavior dataset and by 18.7 percengtage points on the VOC dataset, significantly increasing the accuracy of student classroom behavior recognition in smart classrooms.

Key words: student behavior recognition, YOLOv8, object detection, dynamic channel attention convolution, multi-scale context fusion