MEI Siyi, LIU Yanlong. Fusion of Sparse Attention and Time Query for Video Object Detection[J]. Computer Engineering and Applications, 2023, 59(20): 192-199.
[1] 王迪聪,白晨帅,邬开俊.基于深度学习的视频目标检测综述[J].计算机科学与探索,2021,15(9):1563-1577.
WANG D C,BAI C S,WU K J.Survey of video object detection based on deep learning[J].Journal of Frontiers of Computer Science and Technology,2021,15(9):1563-1577.
[2] 贾天豪,彭力.残差学习与循环注意力下的SSD目标检测算法[J].计算机科学,2023,50(5):170-176.
JIA T H,PENG L.SSD object detection algorithm with residual learning and cyclic attention[J].Computer Science,2023,50(5):170-176.
[3] 肖雨晴,杨慧敏.目标检测算法在交通场景中应用综述[J].计算机工程与应用,2021,57(06):30-41.
XIAO Y Q,YANG H M.Research on application of object detection algorithm in traffic scene[J].Journal of Computer Engineering and Applications,2021,57(6):30-41.
[4] ZHU X,WANG Y,DAI J,et al.Flow-guided feature aggregation for video object detection[C]//Proceedings of the IEEE International Conference on Computer Vision,2017:408-417.
[5] 尉婉青,禹晶,史薪琪,等.双光流网络指导的视频目标检测[J].中国图象图形学报,2021,26(10):2473-2484.
YU W Q,YU J,SHI X Q,et al.Dual optical flow network-guided video object detection[J].Journal of Image and Graphics,2021,26(10):2473-2484.
[6] KANG K,OUYANG W L,LI H S,et al.Object detection from video tubelets with convolutional neural networks[C]//Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition,2016:817-825.
[7] DOSOVITSKIY A,BEYER L,KOLESNIKOV A,et al.An image is worth 16x16 words:transformers for image recognition at scale[J].arXiv:2010.11929,2020.
[8] VASWANI A,SHAZEER N,PARMAR N,et al.Attention is all you need[J].arXiv:1706.03762,2017.
[9] CARION N,MASSA F,SYNNAEVE G,et al.End-to-end object detection with transformers[C]//European Conference on Computer Vision,2020:213-229.
[10] ZHU X,SU W,LU L,et al.Deformable DETR:deformable transformers for end-to-end object detection[J].arXiv:2010.04159,2020.
[11] ZHU X,XIONG Y,DAI J,et al.Deep feature flow for video recognition[C]]//Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition,2017:2349-2358.
[12] YAO C,FANG C,SHEN X,et al.Video object detection via object-level temporal aggregation[C]//European Conference on Computer Vision,2020:160-177.
[13] JIANG Z,LIU Y,YANG C,et al.Learning where to focus for efficient video object detection[C]//European Conference on Computer Vision,2020:18-34.
[14] FUJITAKE M,SUGIMOTO A.Video sparse transformer with attention-guided memory for video object detection[J].IEEE Access,2022,10:65886-65900.
[15] ZHOU Q,LI X,HE L,et al.Transvod:end-to-end video object detection with spatial-temporal transformers[J].arXiv:2201.05047,2022.
[16] WANG H,TANG J,LIU X,et al.PTSEFormer:progressive temporal-spatial enhanced transformer towards video object detection[C]//European Conference on Computer Vision,2022:732-747.
[17] ZHAO G,LIN J,ZHANG Z,et al.Explicit sparse transformer:concentrated attention through explicit selection[J].arXiv:1912.11637,2019.
[18] RUSSAKOVSKY O,DENG J,SU H,et al.Imagenet large scale visual recognition challenge[J].International Journal of Computer Vision,2015,115(3):211-252.
[19] WEN L,DU D,CAI Z,et al.UA-DETRAC:a new benchmark and protocol for multi-object detection and tracking[J].Computer Vision and Image Understanding,2020,193:102907.
[20] CHEN Y,CAO Y,HU H,et al.Memory enhanced global-local aggregation for video object detection[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision and Pattern Recognition,2020:10337-10346.
[21] HE K,ZHANG X,REN S,et al.Deep residual learning for image recognition[C]//Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition,2016:770-778.
[22] LIU Z,LIN Y,CAO Y,et al.Swin transformer:hierarchical vision transformer using shifted windows[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision,2021:10012-10022.
[23] DENG J,PAN Y,YAO T,et al.Relation distillation networks for video object detection[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision,2019:7023-7032.
[24] SHVETS M,LIU W,BERG A.Leveraging long-range temporal relationships between proposals for video object detection[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision,2019:9756-9764.
[25] WU H,CHEN Y,WANG N,et al.Sequence level semantics aggregation for video object detection[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision,2019:9217-9225.
[26] XU Z,HRUSTIC E,VIVET D.Centernet heatmap propagation for real-time video object detection[C]//European Conference on Computer Vision,2020:220-234.
[27] KIM K,KIM P,CHUNG Y,et al.Performance enhancement of YOLOv3 by adding prediction layers with spatial pyramid pooling for vehicle detection[C]//Proceedings of the IEEE International Conference on Advanced Video and Signal Based Surveillance(AVSS),2018:1-6.
[28] KIM K,KIM P,CHUNG Y,et al.Multi-scale detector for accurate vehicle detection in traffic surveillance data[J].IEEE Access,2019,7:78311-78319.
[29] PERREAULT H,BILODEAU G,SAUNIER N,et al.Spotnet:self-attention multi-task network for object detection[C]//Proceedings of the IEEE International Conference on Computer and Robot Vision(CRV),2020:230-237.
[30] PERREAULT H,BILODEAU G,SAUNIER N,et al.FFAVOD:feature fusion architecture for video object detection[J].Pattern Recognition Letters,2021,151:294-301.
[31] GUO C,FAN B,GU J,et al.Progressive sparse local attention for video object detection[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision,2019:3909-3918.
[32] CHIN T,DING R,MARCULESCU D.Adascale:towards real-time video object detection using adaptive scaling[J].Proceedings of Machine Learning and Systems,2019,1:431-441.
[33] WANG S,ZHOU Y,YAN J,et al.Fully motion-aware network for video object detection[C]//Proceedings of the European Conference on Computer Vision(ECCV),2018:542-557.
[34] CHEN K,WANG J,YANG S,et al.Optimizing video object detection via a scale-time lattice[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2018:7814-7823.
[35] HAN M,WANG Y,CHANG X,et al.Mining inter-video proposal relations for video object detection[C]//European Conference on Computer Vision,2020:431-446.
[36] BERTASIUS G,WANG H,TORRESANI L.Is space-time attention all you need for video understanding[C]//Proceedings of International Conference Machine Learning,2021:813-824.