Computer Engineering and Applications ›› 2019, Vol. 55 ›› Issue (19): 214-222.DOI: 10.3778/j.issn.1002-8331.1812-0352

Previous Articles     Next Articles

Multimodal Pedestrian Detection Algorithm Based on Fusion Feature Pyramids

TONG Jingran, MAO Li, SUN Jun   

  1. Jiangsu Provincial Engineering Laboratory of Pattern Recognition and Computational Intelligence, Jiangnan University, Wuxi, Jiangsu 214122, China
  • Online:2019-10-01 Published:2019-09-30

特征金字塔融合的多模态行人检测算法

童靖然,毛力,孙俊   

  1. 江南大学 江苏省模式识别与计算智能工程实验室,江苏 无锡 214122

Abstract: To solve the problems of poor pedestrian detection performance in a single modal due to poor lighting conditions, partial target occlusion and multi-scale target, this paper proposes a multimodal pedestrian detection algorithm based on the fusion of visible and infrared feature pyramids. It uses the deep convolutional neural networks to replace the traditional manual design features, and automatically extracts the features from visible and infrared images. According to the periodic feature maps of ResNet(Residual Net), a feature pyramid network is built to generate the feature pyramid of each mode. The feature pyramids of each modal are fused layer by layer to create the fusion feature pyramid. It chooses the faster R-CNN algorithm do the following target location and classification algorithm to solve the multispectral pedestrian detection. In addition, in order to solve the problem of ignoring weak features and not effectively integrating complementary features in concatenation fusion and max fusion, the paper proposes a new feature pyramid fusion method. It highlights the strong features and complements the weak features by threshold, effectively utilizes the features of each mode. The multimodal pedestrian detection algorithm based on the fusion of visible and infrared feature pyramids can effectively solve the multimodal pedestrian detection problem, and outperforms state-of-art multimodal pedestrian detectors on the KAIST dataset benchmark.

Key words: pedestrian detection, multimodal, feature pyramid, feature fusion

摘要: 针对单模态行人检测在光照条件较差、目标部分遮挡、目标多尺度时检测效果较差的问题,提出了一种基于可见和红外双模态特征金字塔融合的行人检测算法。使用深度卷积神经网络代替传统的手工设计特征方式分别自动从可见模态及红外热模态的图片中提取单模态特征,根据ResNet(Residual Net)的阶段性特征图谱搭建特征金字塔网络,生成每个模态的特征金字塔,并将两个模态的特征金字塔进行逐层融合。选择深度学习通用目标检测算法——Faster R-CNN作为后续的目标定位与分类算法来解决多模态行人检测问题。在特征金字塔融合阶段,针对级联融合和较大值融合容易忽略弱特征,无法有效融合互补特征的问题,提出了一种锐化特征的特征金字塔融合方法,根据阈值强化突出强特征,互补叠加弱特征,有效利用每个模态的特征,进一步提高模型的检测效果。实验结果表明,特征金字塔聚合的多模态行人检测算法可以有效解决多模态行人检测问题,在KAIST数据集上的检测效果超过了目前该数据集上的最佳模型。

关键词: 行人检测, 多模态, 特征金字塔, 特征融合