计算机工程与应用 ›› 2026, Vol. 62 ›› Issue (8): 130-141.DOI: 10.3778/j.issn.1002-8331.2509-0032

• 目标检测专题 • 上一篇    下一篇

MCT-YOLO:航拍视角下小目标检测算法

周忠锟+,张扬,张宇,孟可,元赵阳   

  1. 大连交通大学 轨道智能工程学院,辽宁 大连 116028
    + 通信作者 E-mail:djtu_zzkun@163.com
  • 收稿日期:2025-09-03 修回日期:2026-01-15 在线发布日期:2026-04-15 出版日期:2026-04-15
  • 基金资助:
    辽宁省自然科学基金面上项目(2021-MS-298);辽宁省教育厅科学研究项目(JDL2020006)。

MCT-YOLO: Small Object Detection Algorithm for Aerial Imagery

ZHOU Zhongkun+, ZHANG Yang, ZHANG Yu, MENG Ke, YUAN Zhaoyang   

  1. School of Railway Intelligent Engineering, Dalian Jiaotong University, Dalian, Liaoning 116028, China
    + Corresponding author E-mail:djtu_zzkun@163.com
  • Received:2025-09-03 Revised:2026-01-15 Online:2026-04-15 Published:2026-04-15

摘要: 针对智能交通场景,辅助无人机检测图像目标呈现小尺度、低分辨率、高密度分布等特点造成检测精度低的问题,提出了一种改进YOLOv11n的航拍视角下小目标检测算法:MCT-YOLO。优化检测层结构,针对性地添加极小目标检测头,剔除大目标检测头,捕捉更多局部信息,降低特征干扰,以适应小目标检测任务;提出多频交互下采样MFID(multi-frequency interactive downsampling),减少下采样过程中的特征丢失,保留更多图像信息;引入MobiVari结构,构建C3k2_MV模块,进行深度特征提取,增强细节特征表达;设计一种多尺度信息融合方式TCF(triple complementary fusion),通过融合不同语义的上下文信息,实现语义与细节信息互补,提升小目标检测能力。所提算法在VisDrone2019-DET数据集上的mAP50、mAP50-95达到了40.9%、25.0%,较基准网络YOLOv11n分别提高了7.7、5.8个百分点;同时算法的参数量也有所降低,减少了7.6%,适用于无人机航拍视角下的小目标检测场景。

关键词: 无人机(UAV), 小目标检测, YOLOv11, 特征增强, 多尺度信息融合

Abstract: To address the issue of low detection accuracy caused by small scale, low resolution, and high-density distribution of image targets in unmanned aerial vehicle-assisted intelligent transportation scenarios, an improved YOLOv11n-based algorithm for small object detection from an aerial perspective, named MCT-YOLO, is proposed. The detection layer structure is optimized by specifically adding a very small object detection head and removing the large object detection head to capture more local information and reduce feature interference, thereby adapting to small object detection tasks. A multi-frequency interactive downsampling (MFID) module is proposed to minimize feature loss during downsampling and retain more image information. The MobiVari structure is introduced to build the C3k2_MV module for deep feature extraction, enhancing detailed feature representation. Additionally, a multi-scale information fusion method called triple complementary fusion (TCF) is designed to integrate contextual information of different semantics, achieving complementary semantic and detailed information to improve small object detection capability. The proposed algorithm achieves mAP50 and mAP50-95 scores of 40.9% and 25.0%, respectively, on the VisDrone2019-DET dataset, representing improvements of 7.7 and 5.8 percentage points over the baseline network YOLOv11n. Meanwhile, the algorithm??s parameter count is reduced by 7.6%, making it suitable for small object detection scenarios in drone aerial imagery.

Key words: unmanned aerial vehicle (UAV), small object detection, YOLOv11, feature enhancement, multi-scale information fusion