
计算机工程与应用 ›› 2025, Vol. 61 ›› Issue (11): 1-21.DOI: 10.3778/j.issn.1002-8331.2411-0196
刘鸿达,孙旭辉,李沂滨,韩琳,张宇
出版日期:2025-06-01
发布日期:2025-05-30
LIU Hongda, SUN Xuhui, LI Yibin, HAN Lin, ZHANG Yu
Online:2025-06-01
Published:2025-05-30
摘要: 使用神经网络模型进行图像分类任务一直是非常重要的研究方向,随着深度学习技术的发展,对神经网络模型的要求也越来越高。在识别率高的同时,对模型的参数量、训练时间也都有较高的要求。卷积神经网络一直是深度学习中针对图像处理的主流方法,主要介绍基于卷积神经网络的分类模型的发展历程,分析其不同阶段各个模型的搭建思路;介绍Transformer与卷积神经网络结合的相关模型以及各模型在其他领域的应用情况。最后,对卷积神经网络的发展进行了探讨。
刘鸿达, 孙旭辉, 李沂滨, 韩琳, 张宇. 基于卷积神经网络的图像分类深度学习模型综述[J]. 计算机工程与应用, 2025, 61(11): 1-21.
LIU Hongda, SUN Xuhui, LI Yibin, HAN Lin, ZHANG Yu. Review of Deep Learning Models for Image Classification Based on Convolutional Neural Networks[J]. Computer Engineering and Applications, 2025, 61(11): 1-21.
| [1] XU R S, CHEN C J, TU Z Z, et al. V2X-ViTV2: improved vision transformers for vehicle-to-everything cooperative perception[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025, 47(1): 650-662. [2] HU Y, HUANG Z A, LIU R, et al. Source free semi-supervised transfer learning for diagnosis of mental disorders on fMRI scans[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(11): 13778-13795. [3] WEYLER J, MAGISTRI F, MARKS E, et al. PhenoBench: a large dataset and benchmarks for semantic image interpretation in the agricultural domain[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024, 46(12): 9583-9594. [4] 张文生. 对人工智能现在和未来的思考[J]. 广西科学, 2021, 28(3): 209-214. ZHANG W S. Thoughts on the future and present of artificial intelligence[J]. Guangxi Sciences, 2021, 28(3): 209-214. [5] ACKLEY D H, HINTON G E, SEJNOWSKI T J. A learning algorithm for Boltzmann machines[J]. Cognitive Science, 1985, 9(1): 147-169. [6] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(6): 84-90. [7] GU J X, WANG Z H, KUEN J, et al. Recent advances in convolutional neural networks[J]. Pattern Recognition, 2018, 77: 354-377. [8] 张野, 李明超, 韩帅. 基于岩石图像深度学习的岩性自动识别与分类方法[J]. 岩石学报, 2018, 34(2): 333-342. ZHANG Y, LI M C, HAN S. Automatic identification and classification in lithology based on deep learning in rock images[J]. Acta Petrologica Sinica, 2018, 34(2): 333-342. [9] LIU D H, LI J, YUAN Q Q. A spectral grouping and attention-driven residual dense network for hyperspectral image super-resolution[J]. IEEE Transactions on Geoscience and Remote Sensing, 2021, 59(9): 7711-7725. [10] REN Z D, TANG Y Q, HE Z W, et al. Ship detection in high-resolution optical remote sensing images aided by saliency information[J]. IEEE Transactions on Geoscience and Remote Sensing, 2022, 60: 1-16. [11] LI D, WU H J, WANG Y J, et al. Lightweight parallel octave convolutional neural network for hyperspectral image classification[J]. Photogrammetric Engineering & Remote Sensing, 2023, 89(4): 233-243. [12] SUN Y N, XUE B, ZHANG M J, et al. Evolving deep convolutional neural networks for image classification[J]. IEEE Transactions on Evolutionary Computation, 2020, 24(2): 394-407. [13] 孔令军, 王茜雯, 包云超, 等. 基于深度学习的医疗图像分割综述[J]. 无线电通信技术, 2021, 47(2): 121-130. KONG L J, WANG X W, BAO Y C, et al. A survey on medical image segmentation based on deep learning[J]. Radio Communications Technology, 2021, 47(2): 121-130. [14] 田锦, 袁家政, 刘宏哲. 基于实例分割的车道线检测及自适应拟合算法[J]. 计算机应用, 2020, 40(7): 1932-1937. TIAN J, YUAN J Z, LIU H Z. Instance segmentation based lane line detection and adaptive fitting algorithm[J]. Journal of Computer Applications, 2020, 40(7): 1932-1937. [15] QU Y X, ZHANG C Y, YANG X B, et al. Classifying clear air echoes via static and motion streams network[J]. IEEE Geoscience and Remote Sensing Letters, 2021, 19: 1-5. [16] 张铮, 芦天亮, 曹金璇. 基于分割和多级掩膜学习的遮挡人脸识别方法[J]. 计算机科学与探索, 2024, 18(7): 1814-1825. ZHANG Z, LU T L, CAO J X. Occluded face recognition based on segmentation and multi-stage mask learning[J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(7): 1814-1825. [17] 盛帅, 段先华, 胡维康, 等. Dynamic-YOLOX: 复杂背景下的苹果叶片病害检测模型 [J]. 计算机科学与探索, 2024, 18(8): 2118-2129. SHENG S, DUAN X H, HU W K, et al.Dynamic-YOLOX:detection model for apple leaf disease in complex background[J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(8): 2118-2129. [18] LECUN Y, BOTTOU L, BENGIO Y, et al. Gradient-based learning applied to document recognition[J]. Proceedings of the IEEE, 1998, 86(11): 2278-2324. [19] FUKUSHIMA K. Neocognitron: a self organizing neural network model for a mechanism of pattern recognition unaffected by shift in position[J]. Biological Cybernetics, 1980, 36(4): 193-202. [20] KHALID M, BABER J, KASI M K, et al. Empirical evaluation of activation functions in deep convolution neural network for facial expression recognition[C]//Proceedings of the 2020 43rd International Conference on Telecommunications and Signal Processing. Piscataway: IEEE, 2020: 204-207. [21] LI T Y, ZHANG F, XIE G W, et al. A high speed reconfigurable architecture for softmax and GELU in vision transformer[J]. Electronics Letters, 2023, 59(5): e12751. [22] HYUN J, SEONG H, KIM E. Universal pooling—a new pooling method for convolutional neural networks[J]. Expert Systems with Applications, 2021, 180: 115084. [23] LIN S H, CAI L, LIN X M, et al. Masked face detection via a modified LeNet[J]. Neurocomputing, 2016, 218: 197-202. [24] ELDEM H, üLKER E, IKL O Y. Alexnet architecture variations with transfer learning for classification of wound images[J]. Engineering Science and Technology, an International Journal, 2023, 45: 101490. [25] SZEGEDY C, LIU W, JIA Y Q, et al. Going deeper with convolutions[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2015: 1-9. [26] TANG P J, WANG H L, KWONG S. G-MS2F: GoogLeNet based multi-stage feature fusion of deep CNN for scene recognition[J]. Neurocomputing, 2017, 225: 188-197. [27] YAO X J, WANG X Y, KARACA Y, et al. Glomerulus classification via an improved GoogLeNet[J]. IEEE Access, 2020, 8: 176916-176923. [28] FU Y S, SONG J, XIE F X, et al. Circular fruit and vegetable classification based on optimized GoogLeNet[J]. IEEE Access, 2021, 9: 113599-113611. [29] SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[C]//Proceedings of the 3rd International Conference on Learning Representations, 2015. [30] 30 D F, NGUYEN B M, ESPITIA H E. Towards automated eye cancer classification via VGG and ResNet networks using transfer learning[J]. Engineering Science and Technology, an International Journal, 2022, 35: 101214. [31] LI Z G, LI B T, JAHNG S G, et al. Improved VGG algorithm for visual prosthesis image recognition[J]. IEEE Access, 2024, 12: 45727-45739. [32] ABUHAYI B M, BEZABH Y A, AYALEW A M. Lumbar disease classification using an involutional neural based VGG nets (INVGG)[J]. IEEE Access, 2024, 12: 27518-27529. [33] HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 770-778. [34] XIE S N, GIRSHICK R, DOLLáR P, et al. Aggregated residual transformations for deep neural networks[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 5987-5995. [35] HURTIK P, OZANA S. Dragonflies segmentation with U-Net based on cascaded ResNext cells[J]. Neural Computing and Applications, 2021, 33(9): 4567-4578. [36] JAMPOUR M, ABBAASI S, JAVIDI M. CapsNet regularization and its conjugation with ResNet for signature identification[J]. Pattern Recognition, 2021, 120: 107851. [37] DING X H, ZHANG X Y, MA N N, et al. RepVGG: making VGG-style ConvNets great again[C]//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021: 13728-13737. [38] SHI C J, HAN L L, ZHANG K, et al. Improved repvgg ground-based cloud image classification with attention convolution[J]. Atmospheric Measurement Techniques, 2024, 17: 979-997. [39] TOUVRON H, BOJANOWSKI P, CARON M, et al. ResMLP: feedforward networks for image classification with data-efficient training[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(4): 5314-5321. [40] LIU Z, MAO H Z, WU C Y, et al. A ConvNet for the 2020s[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 11966-11976. [41] ZHU Y M, YUAN K X, ZHONG W L, et al. Spatial-spectral ConvNext for hyperspectral image classification[J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2023, 16: 5453-5463. [42] ZHANG L, DU J M, DONG S F, et al. AM-ResNet: low-energy-consumption addition-multiplication hybrid ResNet for pest recognition[J]. Computers and Electronics in Agriculture, 2022, 202: 107357. [43] HUANG G, LIU Z, VAN DER MAATEN L, et al. Densely connected convolutional networks[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 2261-2269. [44] YANG Y B, ZHONG Z S, SHEN T C, et al. Convolutional neural networks with alternately updated clique[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 2413-2422. [45] CHEN B L, ZHAO T S, LIU J H, et al. Multipath feature recalibration DenseNet for image classification[J]. International Journal of Machine Learning and Cybernetics, 2021, 12(3): 651-660. [46] WANG K, JIANG P, MENG J, et al. Attention-based DenseNet for pneumonia classification[J]. IRBM, 2022, 43(5): 479-485. [47] WU Z T, XIAO M Q, FANG C, et al. Designing universally-approximating deep neural networks: a first-order optimization approach[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024, 46(9): 6231-6246. [48] HU J, SHEN L, ALBANIE S, et al. Squeeze-and-excitation networks[C]//Proceedings of the IEEE Transactions on Pattern Analysis and Machine Intelligence. Piscataway: IEEE, 2020: 2011-2023. [49] SHEN Z R, ZHANG M Y, ZHAO H Y, et al. Efficient attention: attention with linear complexities[C]//Proceedings of the 2021 IEEE Winter Conference on Applications of Computer Vision. Piscataway: IEEE, 2021: 3530-3538. [50] HUANG J, REN L F, ZHOU X K, et al. An improved neural network based on SENet for sleep stage classification[J]. IEEE Journal of Biomedical and Health Informatics, 2022, 26(10): 4948-4956. [51] LIU Q H, LIU W K, LIU Y S, et al. Rice grains and grain impurity segmentation method based on a deep learning algorithm-NAM-EfficientNetV2[J]. Computers and Electronics in Agriculture, 2023, 209: 107824. [52] DENG W J, MARSH J, GOULD S, et al. Fine-grained classification via categorical memory networks[J]. IEEE Transactions on Image Processing, 2022, 31: 4186-4196. [53] KANG S Y, ZHANG Q L, WEI H R, et al. An efficient multiscale integrated attention method combined with hyperspectral system to identify the quality of rice with different storage periods and humidity[J]. Computers and Electronics in Agriculture, 2023, 213: 108259. [54] YULDASHEV Y, MUKHIDDINOV M, ABDUSALOMOV A B, et al. Parking lot occupancy detection with improved MobileNetV3[J]. Sensors, 2023, 23(17): 7642. [55] FAN S J, LIANG W, DING D R, et al. LACN: a lightweight attention-guided ConvNext network for low-light image enhancement[J]. Engineering Applications of Artificial Intelligence, 2023, 117: 105632. [56] DONG X, LI D, FANG J D. FCCD-SAR: a lightweight SAR ATR algorithm based on FasterNet[J]. Sensors, 2023, 23(15): 6956. [57] MAO Y Y, ZHANG N H, WANG Q, et al. Multi-level dispersion residual network for efficient image super-resolution[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Piscataway: IEEE, 2023: 1660-1669. [58] QI J D, WANGDUI B B, JIANG J, et al. EDKSANet: an efficient dual-kernel split attention neural network for the classification of Tibetan medicinal materials[J]. Electronics, 2023, 12(20): 4330. [59] PARK S, JEONG Y, CHOI Y S. Efficient dual attention transformer for image super-resolution[C]//Proceedings of the 39th ACM/SIGAPP Symposium on Applied Computing. New York: ACM, 2024: 963-970. [60] NGO T T, HUH E N, HONG C S. ETANet: an efficient triple-attention network for salient object detection[C]//Proceedings of the 2023 International Conference on Information Networking. Piscataway: IEEE, 2023: 271-276. [61] ZHOU L, CAI H M, GU J J, et al. Efficient image super-resolution using vast-receptive-field attention[C]//Proceedings of the European Conference on Computer Vision. Cham: Springer Nature Switzerland, 2023: 256-272. [62] WANG T R, CHENG N, DING S J, et al. Efficient attention fusion feature extraction network for image super-resolution[C]//Proceedings of the 2023 7th International Conference on Deep Learning Technologies. New York: ACM, 2023: 35-44. [63] LIU H Y, ZHANG Y Y, CHEN Y Y. A symmetric efficient spatial and channel attention (ESCA) module based on convolutional neural networks[J]. Symmetry, 2024, 16(8): 952. [64] HOWARD A G, ZHU M, CHEN B, et al. MobileNets: efficient convolutional neural networks for mobile vision applications [J]. arXiv:1704.04861, 2017. [65] SANDLER M, HOWARD A, ZHU M L, et al. MobileNetV2: inverted residuals and linear bottlenecks[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 4510-4520. [66] HOWARD A, SANDLER M, CHEN B, et al. Searching for MobileNetV3[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2019: 1314-1324. [67] MURTHY C B, HASHMI M F, KESKAR A G. Optimized MobileNet+SSD: a real-time pedestrian detection on a low-end edge device[J]. International Journal of Multimedia Information Retrieval, 2021, 10(3): 171-184. [68] SANG X T, RUAN T, LI C L, et al. A real-time and high-performance MobileNet accelerator based on adaptive dataflow scheduling for image classification[J]. Journal of Real-Time Image Processing, 2023, 21(1): 4. [69] QIU H M, YANG J M, JIANG J, et al. Mob-psp: modified MobileNet-V2 network for real-time detection of tomato diseases[J]. Journal of Real-Time Image Processing, 2024, 21(5): 181. [70] JIANG T Y, LI L Y, SAMALI B, et al. Lightweight object detection network for multi-damage recognition of concrete bridges in complex environments[J]. Computer-Aided Civil and Infrastructure Engineering, 2024, 39(23): 3646-3665. [71] TIAN X Y, SHI L W, LUO Y Q, et al. Garbage classification algorithm based on improved MobileNetV3[J]. IEEE Access, 2024, 12: 44799-44807. [72] ZHANG Z, YANG X T, LUO N, et al. A novel method for Pu-erh tea face traceability identification based on improved MobileNetV3 and triplet loss[J]. Scientific Reports, 2023, 13(1): 6986. [73] YANG Y H, HAN J. Real-Time object detector based MobileNetV3 for UAV applications[J]. Multimedia Tools and Applications, 2023, 82(12): 18709-18725. [74] ZHANG X Y, ZHOU X Y, LIN M X, et al. ShuffleNet: an extremely efficient convolutional neural network for mobile devices[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 6848-6856. [75] MA N N, ZHANG X Y, ZHENG H T, et al. ShuffleNetV2: practical guidelines for efficient CNN architecture design[C]//Proceedings of the European Conference on Computer Vision. Cham: Springer Nature Switzerland, 2018: 122-138. [76] RAN H H, WEN S P, WANG S Q, et al. Memristor-based edge computing of ShuffleNetV2 for image classification[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2021, 40(8): 1701-1710. [77] CAO W, SHI Y, GUO X B, et al. BiInNet: bilateral inversion network for real-time velocity analysis[J]. IEEE Transactions on Geoscience and Remote Sensing, 2021, 60: 1-17. [78] ZHU Y L, LIU R, HU G, et al. Accurate identification of Cashmere and wool fibers based on enhanced ShuffleNetV2 and transfer learning[J]. Journal of Big Data, 2023, 10(1): 152. [79] HE X J, JIN J, JIANG Y, et al. A lightweight convolutional neural network-based feature extractor for visible images[J]. Computer Vision and Image Understanding, 2024, 249: 104157. [80] TAN M X, LE Q V. EfficientNet: rethinking model scaling for convolutional neural networks[C]//Proceedings of the 36th International Conference on Machine Learning, 2019: 6105-6114. [81] SUHARJITO, ELWIREHARDJA G N, PRAYOGA J S. Oil palm fresh fruit bunch ripeness classification on mobile devices using deep learning approaches[J]. Computers and Electronics in Agriculture, 2021, 188: 106359. [82] TAN M X, LE Q V. EfficientNetV2: smaller models and faster training[C]//Proceedings of the 38th International Conference on Machine Learning, 2021: 10096-10106. [83] GUO M H, LU C Z, LIU Z N, et al. Visual attention network[J]. Computational Visual Media, 2023, 9(4): 733-752. [84] JONSSON J, CHEESEMAN B L, MADDU S, et al. Parallel discrete convolutions on adaptive particle representations of images[J]. IEEE Transactions on Image Processing, 2022, 31: 4197-4212. [85] CHEN J R, KAO S H, HE H, et al. Run, don’t walk: chasing higher FLOPs for faster neural networks[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 12021-12031. [86] SUN M M, YAN C M. FGENet: a lightweight facial expression recognition algorithm based on FasterNet[J]. Signal, Image and Video Processing, 2024, 18(8): 5939-5956. [87] LIN M B, CHEN B H, CHAO F, et al. Training compact CNNs for image classification using dynamic-coded filter fusion[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(8): 10478-10487. [88] LIU C Y, GAO G W, WU F, et al. An efficient feature reuse distillation network for lightweight image super-resolution[J]. Computer Vision and Image Understanding, 2024, 249: 104178. [89] WEN L, GAO L, LI X Y, et al. A new genetic algorithm based evolutionary neural architecture search for image classification[J]. Swarm and Evolutionary Computation, 2022, 75: 101191. [90] WOO S, DEBNATH S, HU R H, et al. ConvNestV2: co-designing and scaling ConvNets with masked autoencoders[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 16133-16142. [91] PACAL I. MaxCerVixT: a novel lightweight vision transformer-based approach for precise cervical cancer detection[J]. Knowledge-Based Systems, 2024, 289: 111482. [92] PACAL I. Enhancing crop productivity and sustainability through disease identification in maize leaves: exploiting a large dataset with an advanced vision transformer model[J]. Expert Systems with Applications, 2024, 238: 122099. [93] WANG L G, GUO Y L, DONG X Y, et al. Exploring fine-grained sparsity in convolutional neural networks for efficient inference[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(4): 4474-4493. [94] DUAN Z H, LU M, MA J, et al. QARV: quantization-aware ResNet VAE for lossy image compression[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024, 46(1): 436-450. [95] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16×16 words: transformers for image recognition at scale[J]. arXiv:2010.11929, 2020. [96] BA L J, KIROS J R, HINTON G E. Layer normalization[J]. arXiv:1607.06450, 2016. [97] HINTON G E, SRIVASTAVA N, KRIZHEVSKY A, et al. Improving neural networks by preventing co-adaptation of feature detectors[J]. arXiv:1207.0580, 2012. [98] WU H P, XIAO B, CODELLA N, et al. CvT: introducing convolutions to vision transformers[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2021: 22-31. [99] WANG W H, XIE E Z, LI X, et al. Pyramid vision transformer: a versatile backbone for dense prediction without convolutions[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2021: 548-558. [100] WANG W H, XIE E Z, LI X, et al. PVTV2: improved baselines with pyramid vision transformer[J]. Computational Visual Media, 2022, 8(3): 415-424. [101] YE F M, WU K L, ZHANG R G, et al. Multi-scale feature fusion based on PVTv2 for deep hash remote sensing image retrieval[J]. Remote Sensing, 2023, 15(19): 4729. [102] LIU Y, LI H H, CHENG J, et al. MSCAF-net: a general framework for camouflaged object detection via learning multi-scale context-aware features[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2023, 33(9): 4934-4947. [103] LIU Z, LIN Y T, CAO Y, et al. Swin transformer: hierarchical vision transformer using shifted windows[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2021: 9992-10002. [104] XUE D H, LEI T, YANG S M, et al. Triple change detection network via joint multifrequency and full-scale swin-transformer for remote sensing images[J]. IEEE Transactions on Geoscience and Remote Sensing, 2023, 61: 1-15. [105] ZHANG Y D, SHI Z L, WANG H, et al. LumVertCancNet: a novel 3D lumbar vertebral body cancellous bone location and segmentation method based on hybrid Swin-transformer[J]. Computers in Biology and Medicine, 2024, 171: 108237. [106] LI W T, LIU W Y, ZHU J K, et al. Box2Mask: box-supervised instance segmentation via level-set evolution[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024, 46(7): 5157-5173. [107] DUAN K W, BAI S, XIE L X, et al. CenterNet for object detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024, 46(5): 3509-3521. [108] GAO G W, XU Z X, LI J C, et al. CTCNet: a CNN-transformer cooperation network for face image super-resolution[J]. IEEE Transactions on Image Processing, 2023, 32: 1978-1991. [109] YUAN L, HOU Q B, JIANG Z H, et al. VOLO: vision outlooker for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(5): 6575-6586. [110] CHEN T S, MO L F. Swin-fusion: swin-transformer with feature fusion for human action recognition[J]. Neural Processing Letters, 2023, 55(8): 11109-11130. [111] ZENG C, KWONG S, IP H. Dual swin-transformer based mutual interactive network for RGB-D salient object detection[J]. Neurocomputing, 2023, 559: 126779. [112] CHITTA K, PRAKASH A, JAEGER B, et al. TransFuser: imitation with transformer-based sensor fusion for autonomous driving[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(11): 12878-12895. [113] JIA Q, FENG X M, ZHANG W, et al. Bilevel progressive homography estimation via correlative region-focused transformer[J]. Computer Vision and Image Understanding, 2025, 250: 104209. [114] LIU T, LI S X, XU M, et al. Assessing face image quality: a large-scale database and a transformer method[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024, 46(5): 3981-4000. [115] KANG X D, DUAN P H, LI J E, et al. Efficient swin transformer for remote sensing image super-resolution[J]. IEEE Transactions on Image Processing, 2024, 33: 6367-6379. [116] LIN M, CHEN M, ZHANG Y, et al. Super vision transformer[J]. International Journal of Computer Vision, 2023, 131(12): 3136-3151. [117] MA J B, CHEN H. Efficient supervised pretraining of swin-transformer for virtual staining of microscopy images[J]. IEEE Transactions on Medical Imaging, 2024, 43(4): 1388-1399. [118] SRINIVAS A, LIN T Y, PARMAR N, et al. Bottleneck transformers for visual recognition[C]//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021: 16514-16524. [119] MEHTA S, RASTEGARI M. MobileViT: light-weight, general-purpose, and mobile-friendly vision transformer[C]//Proceedings of the 10th International Conference on Learning Representations, 2022. [120] RASTEGARI S M A M. Separable self-attention for mobile vision transformers[J]. arXiv:2206.02680, 2022. [121] WADEKAR S N, CHAURASIA A. MobileViTV3: mobile-friendly vision transformer with simple and effective fusion of local, global and input features[J]. arXiv:2209.15159, 2022. [122] WANG Z W, WANG C Y, XU X W, et al. Quantformer: learning extremely low-precision vision transformers[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(7): 8813-8826. [123] NIE X S, JIN H Y, YAN Y F, et al. ScopeViT: scale-aware vision transformer[J]. Pattern Recognition, 2024, 153: 110470. [124] HU H J, LIANG M N, WANG C C, et al. Monocular depth estimation with boundary attention mechanism and shifted window adaptive bins[J]. Computer Vision and Image Understanding, 2024, 249: 104220. [125] HUANG N C, YANG Y, ZHANG Q, et al. Lightweight cross-modal transformer for RGB-D salient object detection[J]. Computer Vision and Image Understanding, 2024, 249: 104194. [126] WU G, JIANG J J, JIANG J P, et al. Transforming image super-resolution: a ConvFormer-based efficient approach[J]. IEEE Transactions on Image Processing, 2024, 33: 6071-6082. [127] GIANG K T, SONG S, JO S. TopicFM+: boosting accuracy and efficiency of topic-assisted feature matching[J]. IEEE Transactions on Image Processing, 2024, 33: 6016-6028. [128] TOUVRON H, CORD M, DOUZE M, et al. Training data-efficient image transformers & distillation through attention[C]//Proceedings of the International Conference on Machine Learning, 2021: 10347-10357. [129] YANG C G, AN Z L, ZHOU H L, et al. Online knowledge distillation via mutual contrastive learning for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(8): 10212-10227. [130] ZUO Y K, YAO H T, ZHUANG L S, et al. Hierarchical augmentation and distillation for class incremental audio-visual video recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024, 46(11): 7348-7362. [131] CHOI H, JIN S, HAN K. Adversarial normalization: I can visualize everything (ICE)[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 12115-12124. [132] CHOI H, JIN S, HAN K. ICEv2: interpretability, comprehensiveness, and explainability in vision transformer[J]. International Journal of Computer Vision, 2025, 133(5): 2487-2504. [133] 郭朝鹏, 王馨昕, 仲昭晋, 等. 能耗优化的神经网络轻量化方法研究进展[J]. 计算机学报, 2023, 46(1): 85-102. GUO C P, WANG X X, ZHONG Z J, et al. Research advance on neural network lightweight for energy optimization[J]. Chinese Journal of Computers, 2023, 46(1): 85-102. [134] 吴凯, 沈文忠, 贾丁丁, 等. 融合Transformer和CNN的手掌静脉识别网络[J]. 计算机工程与应用, 2023, 59(24): 98-109. WU K, SHEN W Z, JIA D D, et al. Palm vein recognition network combining transformer and CNN[J]. Computer Engineering and Applications, 2023, 59(24): 98-109. |
| [1] | 李淑慧, 蔡伟, 王鑫, 高蔚洁, 狄星雨. 深度学习框架下的红外与可见光图像融合方法综述[J]. 计算机工程与应用, 2025, 61(9): 25-40. |
| [2] | 陈浞, 刘东青, 唐平华, 黄燕, 张文霞, 贾岩, 程海峰. 面向目标检测的物理对抗攻击研究进展[J]. 计算机工程与应用, 2025, 61(9): 80-101. |
| [3] | 庞俊, 马志芬, 林晓丽, 王蒙湘. 结合GAT与卷积神经网络的知识超图链接预测[J]. 计算机工程与应用, 2025, 61(9): 194-201. |
| [4] | 罗宇轩, 吴高昌, 高明. 自适应卷积和轻量化Transformer的遥感图像超分辨网络[J]. 计算机工程与应用, 2025, 61(9): 263-276. |
| [5] | 陈虹, 由雨竹, 金海波, 武聪, 邹佳澎. 融合改进采样技术和SRFCNN-BiLSTM的入侵检测方法[J]. 计算机工程与应用, 2025, 61(9): 315-324. |
| [6] | 王婧, 李云霞. NS-FEDformer模型对股票收益率的预测研究[J]. 计算机工程与应用, 2025, 61(9): 334-342. |
| [7] | 甄彤, 张威振, 李智慧. 遥感影像中种植作物结构分类方法综述[J]. 计算机工程与应用, 2025, 61(8): 35-48. |
| [8] | 李仝伟, 仇大伟, 刘静, 逯英航. 基于RGB与骨骼数据的人体行为识别综述[J]. 计算机工程与应用, 2025, 61(8): 62-82. |
| [9] | 温浩, 杨洋. 融合ERNIE与知识增强的临床短文本分类研究[J]. 计算机工程与应用, 2025, 61(8): 108-116. |
| [10] | 孟维超, 卞春江, 聂宏宾. 复杂背景下低信噪比红外弱小目标检测方法[J]. 计算机工程与应用, 2025, 61(8): 183-193. |
| [11] | 王燕, 卢鹏屹, 他雪. 结合特征融合注意力的规范化卷积图像去雾网络[J]. 计算机工程与应用, 2025, 61(8): 226-238. |
| [12] | 吕光宏, 王坤. 时空图注意力机制下的SDN网络动态流量预测[J]. 计算机工程与应用, 2025, 61(8): 267-273. |
| [13] | 周佳妮, 刘春雨, 刘家鹏. 融合通道与多头注意力的股价趋势预测模型[J]. 计算机工程与应用, 2025, 61(8): 324-338. |
| [14] | 邢素霞, 李珂娴, 方俊泽, 郭正, 赵士杭. 深度学习下的医学图像分割综述[J]. 计算机工程与应用, 2025, 61(7): 25-41. |
| [15] | 陈宇, 权冀川. 伪装目标检测:发展与挑战[J]. 计算机工程与应用, 2025, 61(7): 42-60. |
| 阅读次数 | ||||||
|
全文 |
|
|||||
|
摘要 |
|
|||||