Computer Engineering and Applications ›› 2022, Vol. 58 ›› Issue (11): 33-46.DOI: 10.3778/j.issn.1002-8331.2111-0281

• Research Hotspots and Reviews • Previous Articles     Next Articles

Survey on Reaserch of Crowd Counting

LU Zhenkun, LIU Sheng, ZHONG Le, LIU Shaohang, ZHANG Tian   

  1. School of Electronic Information, Guangxi University for Nationalities, Nanning 530000, China
  • Online:2022-06-01 Published:2022-06-01

人群计数研究综述

卢振坤,刘胜,钟乐,刘绍航,张甜   

  1. 广西民族大学 电子信息学院,南宁 530000

Abstract: Crowd counting is widely used in public security, video surveillance, smart city construction and other fields, which plays an important and positive role in controlling the number of people in special places, directing public transportation, avoiding the spread of the epidemic and ensuring social stability. With the development of deep learning, traditional methods are gradually replaced by convolutional neural network(CNN) methods. This paper introduces the research background, current situation and development trend of crowd counting. Two traditional methods are described. Then the CNN methods are analyzed from counting accuracy, network structure, evaluation index to data sets and other aspects. It is found that CNN technologies can effectively solve multi-sacle and cross-scene problems. The weakly supervised counting method based on Vision Transformer(ViT) sequence is described and various methods are compared. The future research prospect of crowd counting is prospected.

Key words: crowd counting, convolutional neural network(CNN), Vision Transformer(ViT) sequence, density estimation

摘要: 人群计数广泛应用在公共安防、视频监控和智慧城市建设等领域,对控制特定场所人数、指挥公共交通、防止疫情蔓延、保障社会稳定具有重要积极意义。传统的计数方法精度不高、场景受限,随着深度学习的发展,传统方法逐渐被卷积神经网络(convolutional neural network,CNN)方法代替。介绍了人群计数的研究背景、现状和发展趋势,叙述了两种传统方法;从计数精度、网络结构、评价指标和数据集等方面重点分析了CNN方法,发现CNN技术可以有效解决多尺度和跨场景等问题;阐述了基于Vision Transformer(ViT)序列的弱监督计数方法并且对比各类方法。对未来人群计数的研究前景做出展望。

关键词: 人群计数, 卷积神经网络, Vision Transformer(ViT)序列, 密度估计