人群计数研究综述

doi:10.3778/j.issn.1002-8331.2111-0281

摘要/Abstract

摘要： 人群计数广泛应用在公共安防、视频监控和智慧城市建设等领域，对控制特定场所人数、指挥公共交通、防止疫情蔓延、保障社会稳定具有重要积极意义。传统的计数方法精度不高、场景受限，随着深度学习的发展，传统方法逐渐被卷积神经网络（convolutional neural network，CNN）方法代替。介绍了人群计数的研究背景、现状和发展趋势，叙述了两种传统方法；从计数精度、网络结构、评价指标和数据集等方面重点分析了CNN方法，发现CNN技术可以有效解决多尺度和跨场景等问题；阐述了基于Vision Transformer（ViT）序列的弱监督计数方法并且对比各类方法。对未来人群计数的研究前景做出展望。

关键词: 人群计数, 卷积神经网络, Vision Transformer（ViT）序列, 密度估计

Abstract: Crowd counting is widely used in public security, video surveillance, smart city construction and other fields, which plays an important and positive role in controlling the number of people in special places, directing public transportation, avoiding the spread of the epidemic and ensuring social stability. With the development of deep learning, traditional methods are gradually replaced by convolutional neural network（CNN） methods. This paper introduces the research background, current situation and development trend of crowd counting. Two traditional methods are described. Then the CNN methods are analyzed from counting accuracy, network structure, evaluation index to data sets and other aspects. It is found that CNN technologies can effectively solve multi-sacle and cross-scene problems. The weakly supervised counting method based on Vision Transformer（ViT） sequence is described and various methods are compared. The future research prospect of crowd counting is prospected.

Key words: crowd counting, convolutional neural network（CNN）, Vision Transformer（ViT） sequence, density estimation

卢振坤, 刘胜, 钟乐, 刘绍航, 张甜. 人群计数研究综述[J]. 计算机工程与应用, 2022, 58(11): 33-46.

LU Zhenkun, LIU Sheng, ZHONG Le, LIU Shaohang, ZHANG Tian. Survey on Reaserch of Crowd Counting[J]. Computer Engineering and Applications, 2022, 58(11): 33-46.

参考文献

[1] 邵峰，陈刚，陈珂，等.基于权重哈尔小波的XML包含连接估计方法[J].浙江大学学报（工学版），2009，43（1）：28-35.
SHAO F，CHEN G，CHEN K，et al.Estimate XML containment join size using weighted Haar wavelet[J].Journal of Zhejiang University（Engineering），2009，43（1）：28-35.
[2] DALAL N，TRIGGS B.Histograms of oriented gradients for human detection[C]//IEEE Computer Society Conference on Computer Vision and Pattern Recognition，2005：886-893.
[3] 赵超，王腾江，刘士军，等.融合选择提取与子类聚类的快速Shapelet发现算法[J].软件学报，2020，31（3）：763-777.
ZHAO C，WANG T J，LIU S J et al.Fast Shapelet discovery algorithm combining selective extraction and subclass clutering[J].Journal of Software，2020，31（3）：763-777.
[4] GAO C，LIU J，FENG Q，et al.People-flow counting in complex environments by combining depth and color information[J].Multimedia Tools & Applications，2016，75（15）：9315-9331.
[5] VIOLA P.Detecting pedestrians using patterns of motion and appearance[C]//Proceedings of 9th IEEE International Conference on Computer Vision，2003.
[6] GALL J，YAO A，RAZAVI N，et al.Hough forests for object detection，tracking，and action recognition[J].IEEE Transactions on Pattern Analysis & Machine Intelligence，2011，33（11）：2188-2202.
[7] PHAM V Q，KOZAKAYA T，YAMAGUCHI O，et al.COUNT forest：CO-Voting uncertain number of targets using random forest for crowd density estimation[C]//2015 IEEE International Conference on Computer Vision，2015.
[8] 王强，孙红.基于像素统计和纹理特征的人群密度估计[J].电子科技，2015（7）：129-132.
WANG Q，SUN H.Crowd density estimation based on pixel and texture[J].Electronic Science and Technology，2015（7）：129-132.
[9] 张朋，温宏愿.基于混合高斯建模和纹理特征提取的人数统计方法研究[J].价值工程，2018，37（10）：235-236.
ZHANG P，WEN H Y.A statistical method for the number of people based on hybrid Gaussian modeling and texture feature extraction[J].Value Engineering，2018，37（10）：235-236.
[10] 王粟，隗磊锋，曾亮.基于GWO-SVM与随机森林的组合光伏功率预测模型[J].昆明理工大学学报（自然科学版），2021，46（5）：82-88.
WANG S，WEI L F，ZENG L.A combined model for photovoltaic power forecasting based on GMO-SVM and random forest[J].Journal of Kunming University of Science and Technology（Natural Sciences），2021，46（5）：82-88.
[11] FELZENSZWALB P F，GIRSHICK R B，MCALLESTER D，et al.Object detection with discriminatively trained part-based models[J].IEEE Transactions on Pattern Analysis and Machine Intelligence，2010，32（9）：1627-1645.
[12] LIN S F，CHEN J Y，CHAO H X.Estimation of number of people in crowded scenes using perspective transformation[J].IEEE Transactions on Systems，Man，and Cybernetics，Part A：Systems and Humans，2001，31：645-654.
[13] WU B，NEVATIA R.Detection and tracking of multiple，partially occluded humans by Bayesian combination of edgelet based part detectors[J].International Journal of Computer Vision，2007，75（2）：247-266.
[14] MIN L，ZHANG Z，HUANG K，et al.Estimating the number of people in crowded scenes by MID based foreground segmentation and head-shoulder detection[C]//2008 19th International Conference on Pattern Recognition，2009.
[15] RABAUD V，BELONGIE S.Counting crowded moving objects[C]//2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition（CVPR’06），2006：705-711.
[16] LIN S F，CHEN J Y，CHAO H X.Estimation of number of people in crowded scenes using perspective transformation[J].IEEE Transactions on Systems Man and Cybernetics，Part A：Systems and Humans，2001，31（6）：645-654.
[17] XU T，CHEN X，WEI G，et al.Crowd counting using accumulated HOG[C]//2016 12th International Conference on Natural Computation，Fuzzy Systems and Knowledge Discovery（ICNC-FSKD），2016：1877-1881.
[18] LARADJI I H，ROSTAMZADEH N，PINHEIRO P O，et al.Where are the blobs：counting by localization with point supervision[C]//Proceedings of the European Conference on Computer Vision（ECCV），2018：547-562.
[19] LIU Y，SHI M，ZHAO Q，et al.Point in，box out：beyond counting persons in crowds[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition，2019：6469-6478.
[20] CHAN A B，VASCONCELOS N.Bayesian poisson regression for crowd counting[C]//2009 IEEE 12th International Conference on Computer Vision，2009：545-551.
[21] RYAN D，DENMAN S，FOOKES C，et al.Crowd counting using multiple local features[C]//2009 Digital Image Computing：Techniques and Applications，2009：81-88.
[22] KE C，CHEN C L，GONG S，et al.Feature mining for localised crowd counting[C]//British Machine Vision Conference，2012.
[23] PARAGIOS N，RAMESH V.A MRF-based approach for real-time subway monitoring[C]//Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition，2001.
[24] CHAN A B，LIANG Z S J，VASCONCELOS N.Privacy preserving crowd monitoring：counting people without people models or tracking[C]//2008 IEEE Conference on Computer Vision and Pattern Recognition，2008：1-7.
[25] MCDONALD G C.Ridge regression[J].Wiley Interdisciplinary Reviews：Computational Statistics，2009，1（1）：93-100.
[26] MARANA A N，COSTA L F，LOTUFO R A，et al.On the efficacy of texture analysis for crowd monitoring[C]//International Symposium on Computer Graphics，Image Processing，and Vision，1998：354-361.
[27] CHO S Y，CHOW T W S，LEUNG C T.A neural-based crowd estimation by hybrid global learning algorithm[J].IEEE Transactions on Systems，Man，and Cybernetics，Part B：Cybernetics，1999，29（4）：535-541.
[28] KONG D，GRAY D，TAO H.A viewpoint invariant approach for crowd counting[C]//18th International Conference on Pattern Recognition（ICPR’06），2006：1187-1190.
[29] KONG D，GRAY D，TAO H.Counting pedestrians in crowds using viewpoint invariant training[C]//British Machine Vision Conference，2005.
[30] SAFARI N，TANEM J P，ROSTE T.A block-based predistortion for high power-amplifier linearization[J].IEEE Transactions on Microwave Theory and Techniques，2006，54（6）：2813-2820.
[31] LI J，HUANG L，LIU C.Robust people counting in video surveillance：dataset and system[C]//2011 8th IEEE International Conference on Advanced Video and Signal Based Surveillance（AVSS），2011：54-59.
[32] CHAN A B，LIANG Z S，VASCONCELOS N.Privacy preserving crowd monitoring：counting people without people models or tracking[C]//2008 IEEE Conference on Computer Vision and Pattern Recognition，2008：1-7.
[33] LIN T Y，LIN Y Y，WENG M F，et al.Cross camera people counting with perspective estimation and occlusion handling[C]//2011 IEEE International Workshop on Information Forensics and Security，2011：1-6.
[34] LEMPITSKY V，ZISSERMAN A.Learning to count objects in images[C]//Advances in Neural Information Processing Systems，2010：1324-1332.
[35] RODRIGUEZ M，LAPTEV I，SIVIC J，et al.Density-aware person detection and tracking in crowds[C]//2011 International Conference on Computer Vision，2011：2423-2430.
[36] WANG C，ZHANG H，YANG L，et al.Deep people counting in extremely dense crowds[C]//Proceedings of the 23rd ACM International Conference on Multimedia，2015：1299-1302.
[37] FU M，XU P，LI X，et al.Fast crowd density estimation with convolutional neural networks[J].Engineering Applications of Artificial Intelligence，2015，43：81-88.
[38] HAN X B，ZHONG Y F，CAO L Q，et al.Pre-trained Alexnet architecture with pyramid pooling and supervision for high spatial resolution remote sensing image scene classification[J].Remote Sensing，2017，9（8）：848.
[39] ZHANG C，LI H，WANG X，et al.Cross-scene crowd counting via deep convolutional neural networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2015：833-841.
[40] TANG G L，LIU Z J，XIONG J.Distinctive image features from illumination and scale invariant keypoints[J].Multimedia Tools and Applications，2019，78（16）：23415-23442.
[41] BOOMINATHAN L，KRUTHIVENTI S S S，BABU R V，et al.CrowdNet：a deep convolutional network for dense crowd counting[C]//Proceedings of the 24th ACM International Conference on Multimedia.New York：ACM，2016：640-644.
[42] CHENG S H，ZHANG G C，LI S.Handwritten digit recognition based on improved VGG16 network[C]//International Conference on Graphic and Image Processing，2019.
[43] ZHANG Y Y，ZHOU D S，CHEN S Q，et al.Single-image crowd counting via multi-column convolutional neural network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Piscataway，NJ：IEEE，2016：589-597.
[44] XU X，MA Y，SUN W，et al.Exploiting raw images for real-scene super-resolution[J].arXiv：2102.01579，2021.
[45] 彭超，王平安，张平.基于加权平均的机载雷达天线随机振动激励条件分解与应用[J].机械与电子，2021，39（7）：28-32.
PENG C，WANG P A，ZHANG P.Design and verification of a composing space-borne antenna reflector under mechanical environment[J].Machinery and Electronics，2021，39（7）：28-32.
[46] SAM D B，SURYA S，BABU R V，et al.Switching convolutional neural network for crowd counting[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Honolulu.Piscataway，NJ：IEEE，2017：4031-4039.
[47] 朱海龙.复杂气象条件下动态人群场景分析方法研究[D].哈尔滨：哈尔滨工业大学，2012.
ZHU H L.Research crowd scene analysis under complicated weather condition[D].Harbin：Harbin Institute of Technology，2012.
[48] CHENG Z Q，LI J X，DAI Q，et al.Improving the learning of multi-column convolutional neural network for crowd counting[C]//Proceedings of the 27th ACM International Conference on Multimedia，2019：1897-1906.
[49] LI Y，ZHANG X，CHEN D.Csrnet：dilated convolutional neural networks for understanding the highly congested scenes[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2018：1091-1100.
[50] HUANG S Y，LI X，ZHANG Z F，et al.Body structure aware deep crowd counting[J].IEEE Transactions on Image Processing，2017，27（3）：1049-1059.
[51] RANJAN V，LE H，HOAI M.Iterative crowd counting[C]//Proceedings of the European Conference on Computer Vision（ECCV），2018：270-285.
[52] CHEN Z，CHENG J，YUAN Y，et al.Deep density-aware count regressor[J].arXiv：1908.03314，2019.
[53] DEB D，VENTURA J.An aggregated multicolumn dilated convolution network for perspective-free counting[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2018：195-204.
[54] LIU M，JIANG J，GUO Z，et al.Crowd counting with fully convolutional neural network[C]//2018 25th IEEE International Conference on Image Processing（ICIP），2018：953-957.
[55] WANG Z，XIAO Z，XIE K，et al.In defense of single-column networks for crowd counting[J].arXiv：1808. 06133，2018.
[56] DAI F，LIU H，MA Y，et al.Dense scale network for crowd counting[C]//Proceedings of the 2021 International Conference on Multimedia Retrieval，2021：64-72.
[57] KANG D，CHAN A.Crowd counting by adaptively fusing predictions from an image pyramid[J].arXiv：1805. 06115，2018.
[58] GAO J，WANG Q，LI X.PCC net：perspective crowd counting via spatial convolutional network[J].IEEE Transactions on Circuits and Systems for Video Technology，2019，30（10）：3486-3498.
[59] ZENG L，XU X，CAI B，et al.Multi-scale convolutional neural networks for crowd counting[C]//2017 IEEE International Conference on Image Processing（ICIP），2017：465-469.
[60] ONORO-RUBIO D，LóPEZ-SASTRE R J.Towards perspective-free object counting with deep learning[C]//European Conference on Computer Vision.Cham：Springer，2016：615-629.
[61] MOTTAGHI R，CHEN X，LIU X，et al.The role of context for object detection and semantic segmentation in the wild[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2014：891-898.
[62] ZHAO H，SHI J，QI X，et al.Pyramid scene parsing network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2017：2881-2890.
[63] ZHAO R，OUYANG W，LI H，et al.Saliency detection by multi-context deep learning[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2015：1265-1274.
[64] SINDAGI V A，PATEL V M.Generating high-quality crowd density maps using contextual pyramid CNNs[C]//Proceedings of the IEEE Internation Conference on Computer Vision.Piscataway，NJ：IEEE，2017：1861-1870.
[65] 郝晓亮，杨倩倩，夏殷锋，等.基于上下文特征重聚合网络的人群计数[J].信息技术与网络安全，2021，40（7）：59-65.
HAO X L，YANG Q Q，XIA Y F，et al.Context-aware feature reaggregation network for crowd counting[J].Information Technology and Network Security，2021，40（7）：59-65.
[66] SHANG C，AI H，BAI B.End-to-end crowd counting via joint learning local and global count[C]//2016 IEEE International Conference on Image Processing（ICIP），2016：1215-1219.
[67] LIU W，SALZMANN M，FUA P.Context-aware crowd counting[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition，2019：5099-5108.
[68] CHEN J C，KUMAR A，RANJAN R，et al.A cascaded convolutional neural network for age estimation of unconstrained faces[C]//2016 IEEE 8th International Conference on Biometrics Theory，Applications and Systems（BTAS），2016.
[69] DAI J，HE K，SUN J.Instance-aware semantic segmentation via multi-task network cascades[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2016：3150-3158.
[70] RANJAN R，PATEL V M，CHELLAPPA R.HyperFace：a deep multi-task learning framework for face detection，landmark localization，pose estimation，and gender recognition[J].IEEE Transactions on Pattern Analysis & Machine Intelligence，2019，41（1）：121-135.
[71] SINDAGI V A，PATEL V M.CNN-based cascaded multi-task learning of high-level prior and density estimation for crowd counting[C]//2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance（AVSS），2017：1-6.
[72] LIU J，GAO C，MENG D，et al.Decidenet：counting varying density crowds through attention guided detection and density estimation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2018：5197-5206.
[73] SHEN Z，XU Y，NI B，et al.Crowd counting via adversarial cross-scale consistency pursuit[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2018：5245-5254.
[74] IDREES H，TAYYAB M，ATHREY K，et al.Composition loss for counting，density map estimation and localization in dense crowds[C]//Proceedings of the European Conference on Computer Vision（ECCV），2018：532-546.
[75] ZHAO M，ZHANG J，ZHANG C，et al.Leveraging heterogeneous auxiliary tasks to assist crowd counting[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition，2019：12736-12745.
[76] SHI Z，ZHANG L，SUN Y，et al.Multiscale multitask deep NetVLAD for crowd counting[J].IEEE Transactions on Industrial Informatics，2018，14（11）：4953-4962.
[77] ZHANG L，SHI Z，CHENG M M，et al.Nonlinear regression via deep negative correlation learning[J].IEEE Transactions on Pattern Analysis and Machine Intelligence，2021，43（3）：982-998.
[78] HOSSAIN M，HOSSEINZADEH M，CHANDA O，et al.Crowd counting using scale-aware attention networks[C]//2019 IEEE Winter Conference on Applications of Computer Vision（WACV），2019：1280-1288.
[79] CHEN L C，YI Y，JIANG W，et al.Attention to scale：scale-aware semantic image segmentation[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition（CVPR），2016.
[80] LIU N，LONG Y，ZOU C，et al.ADCrowdNet：an attention-injective deformable convolutional network for crowd understanding[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition（CVPR），2019.
[81] JIANG X，ZHANG L，XU M，et al.Attention scaling for crowd counting[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition（CVPR），2020.
[82] VARIOR R R，SHUAI B，TIGHE J，et al.Multi-scale attention network for crowd counting[J].arXiv：1901. 06026，2019.
[83] GAO J，WANG Q，YUAN Y.SCAR：spatial-/channel-wise attention regression networks for crowd counting[J].Neurocomputing，2019，363：1-8.
[84] ZHU L，ZHAO Z，LU C，et al.Dual path multi-scale fusion networks with attention for crowd counting[J].arXiv：1902.01115，2019.
[85] ZOU Z，CHENG Y，QU X，et al.Attend to count：crowd counting with adaptive capacity multi-scale CNNs[J].Neurocomputing，2019，367：75-83.
[86] LIANG D，CHEN X，XU W，et al.TransCrowd：weakly-supervised crowd counting with transformer[J].arXiv：2104.09116，2021.
[87] CHAN A B，LIANG Z S J，VASCONCELOS N.Privacy preserving crowd monitoring：counting people without people models or tracking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Piscataway，NJ：IEEE，2008：1-7.
[88] IDREES H，SALEEMI I，SEIBERT C，et al.Multi-source multi-scale counting in extremely dense crowd images[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Piscataway，NJ：IEEE，2013：2547-2554.
[89] CAO X K，WANG Z P，ZHAO Y Y，et al.Scale aggregation network for accurate and efficient crowd counting[C]//LNCS 11209：Proceedings of the 15th European Conference on Computer Vision.Berlin：Springer，2018：734-750.
[90] SAM D B，SAJJAN N N，BABU R V，et al.Divide and grow：capturing huge diversity in crowd images with incrementally growing CNN[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Piscataway，NJ：IEEE，2018：3618-3626.
[91] ZHANG L，SHI M J，CHEN Q B.Crowd counting via scale-adaptive convolutional neural network[C]//Proceedings of the IEEE Winter Conference on Applications of Computer Vision.Piscataway，NJ：IEEE，2018：1113-1121.
[92] AICH S，STAVNESS I.Global sum pooling：a generalization trick for object counting with small datasets of large images[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops.Piscataway，NJ：IEEE，2019：73-82.
[93] ZHANG A，YUE L，SHEN J，et al.Attentional neural fields for crowd counting[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision，2019：5714-5723.
[94] YAN Z，YUAN Y，ZUO W，et al.Perspective-guided convolution networks for crowd counting[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision，2019：952-961.
[95] MA Z，WEI X，HONG X，et al.Bayesian loss for crowd count estimation with point supervision[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision，2019：6142-6151.
[96] LIU L，LU H，ZOU H，et al.Weighing counts：sequential crowd counting by reinforcement learning[C]//European Conference on Computer Vision.Springer，Cham，2020：164-181.
[97] WAN J，CHAN A.Modeling noisy annotations for crowd counting[C]//Advances in Neural Information Processing Systems，2020.
[98] WANG B，LIU H，SAMARAS D，et al.Distribution matching for crowd counting[J].arXiv：2009.13077，2020.
[99] YANG Y，LI G，WU Z，et al.Weakly-supervised crowd counting learns from sorting rather than locations[C]//Computer Vision-ECCV 2020：16th European Conference，Glasgow，UK，August 23-28，2020：1-17.
[100] BAI S，HE Z Q，QIAO Y，et al.Adaptive dilated network with-self-correction supervision for counting[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Piscataway，NJ：IEEE，2020：4593-4602.
[101] SONG Q，WANG C，JIANG Z，et al.Rethinking counting and localization in crowds：a purely point-based framework[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision，2021：3365-3374.
[102] LEI Y，LIU Y，ZHANG P，et al.Towards using count-level weak supervision for crowd counting[J].Pattern Recognition，2021，109：107616.
[103] SUN G，LIU Y，PROBST T，et al.Boosting crowd counting with transformers[J].arXiv：2105.10926，2021.