Computer Engineering and Applications ›› 2020, Vol. 56 ›› Issue (4): 214-218.DOI: 10.3778/j.issn.1002-8331.1810-0363

Previous Articles     Next Articles

Crowd Counting Combined with Neural Networks and Multi-Column Feature Map Aggregation

WU Qingke, WU Xiao, YUAN Yuyang, GUAN Xinqiang   

  1. School of Mechanical Engineering, Southwest Jiaotong University, Chengdu 610031, China
  • Online:2020-02-15 Published:2020-03-06

结合神经网络与多列特征图聚合的人群计数

吴青科,吴晓,袁雨阳,官锌强   

  1. 西南交通大学 机械工程学院,成都 610031

Abstract:

In the field of public security, image-based crowd counting has important social significance and application prospects. The difficulty lies in crowd occlusion, uneven density distribution, background noise and large scale of human scale and appearance in the scene. A deep convolutional neural network structure is proposed. On the one hand, the network structure similar to VGG16 is used to learn the deep semantic information in the pictures, on the other hand, the multi-column neural network is used to learn the feature information of various head sizes. The feature maps obtained from the branch networks with different sizes of receptive fields and depths can be combined to effectively collect the underlying detail features and high-level semantic information in the images. The number of people is calculated by combining these two parts together. Tested on the ShanghaiTech dataset, the mean absolute errors of Part_A and Part_B are 72.0 and 10.1; the mean square errors of Part_A and Part_B are 107.9 and 16.0, respectively.

Key words: crowd counting, convolutional neural network, deep learning, density map

摘要:

在公共安全领域,基于图像的人群计数具有重要的社会意义和应用前景,难题在于人群遮挡、密度分布不均、背景噪声和人在场景中的尺度和外观变化范围大。提出一种深度卷积神经网络结构,一方面使用类似于VGG16的网络结构来学习图片中的深层语义信息,另一方面使用多列神经网络来学习各种头部尺寸的特征信息。将拥有不同大小感受野和深度的分支网络得到的特征图融合在一起,可有效地收集到图片中的底层细节特征和高层语义信息。通过将这两部分结合在一起计算人群数量。在ShanghaiTech数据集上测试,Part_A和Part_B的平均绝对误差分别为72.0和10.1;Part_A和Part_B的均方误差分别为107.9和16.0。

关键词: 人群计数, 卷积神经网络, 深度学习, 密度图