正态随机仿射变换的图像数据增强方法

doi:10.3778/j.issn.1002-8331.2307-0327

摘要/Abstract

摘要： 针对现有图像数据增强方法会生成大量无效冗余数据，导致训练数据质量降低和网络泛化性能减弱的问题，提出一种基于正态分布的随机仿射变换（random affine transformation based on normal distribution，NRAff）图像数据增强方法。NRAff的核心是设计一个正态随机仿射变换模块，在随机仿射变换中引入正态分布，使图像随机仿射变换幅度以原图像为中心呈正态分布形式输出，通过限制变换图像输出的分布范围，去除无效数据，获取更有效且具有正态分布特性的图像数据。NRAff方法仿照生物视觉感知系统的正态分布采样机制，使生成的图像分布接近生物视觉主观感知效果，突出目标感知的正态分布特性，使网络在变换的特征中学习不变的特征。该方法能够提高图像数据分布的一致性，使网络学习到更多有效的、潜在的仿射变换不变特征，提高网络抗过拟合能力。在图像分类数据集CIFAR10，CIFAR100，SVHN，Fashion-MNIST和Imagenette上，与当前先进的数据增强方法进行实验和对比分析，实验结果表明，提出的图像增强方法在分类准确率上均有不同程度的提升，验证了NRAff方法的有效性和普适性。

关键词: 正态分布, 仿射变换, 数据增强, 图像分类

Abstract: In view of the fact that the existing image data augmentation method generates a large amount of invalid and redundant data, resulting in reduced training data quality and weakened network generalization performance, a random affine transformation based on normal distribution (NRAff) image data enhancement method based on normal distribution is proposed. The core of NRAff is to design a normal random affine transformation module, which introduces normal distribution in the random affine transformation, so that the amplitude of the random affine transformation of the image is output in the form of normal distribution centered on the original image, and the image data with normal distribution characteristics is obtained by limiting the distribution range of the transformed image output, removing invalid data, and obtaining more efficient image data with normal distribution characteristics. The NRAff method imitates the normal distribution sampling mechanism of the biological visual perception system, so that the generated image distribution is close to the subjective perception effect of biological vision, highlights the normal distribution characteristics of target perception, and enables the network to learn unchanged features in the transformed features. This method can improve the consistency of image data distribution, enable the network to learn more effective and potential affine transformation invariant features, and improve the network resistance to overfitting. Experiments and comparative analysis are carried out on the image classification dataset CIFAR10, CIFAR100, SVHN, Fashion-MNIST and Imagenette, and the experimental results show that the proposed image enhancement method has different degrees of improvement in classification accuracy, which verifies the effectiveness and universality of the NRAff method.

Key words: normal distribution, affine transformation, data augmentation, image classification

姜文涛, 陈霖霖, 张晟翀. 正态随机仿射变换的图像数据增强方法[J]. 计算机工程与应用, 2024, 60(23): 176-186.

JIANG Wentao, CHEN Linlin, ZHANG Shengchong. Image Data Augmentation Method for Normal Random Affine Transformation[J]. Computer Engineering and Applications, 2024, 60(23): 176-186.

参考文献

[1] 郭文明, 王腾亿. 类激活映射指导数据增强的细粒度图像分类[J]. 计算机辅助设计与图形学学报, 2021, 33(11): 1698-1704.
GUO W M, WANG T Y. Class activation mapping guided data augmentation for fine-grained visual classification[J]. Journal of Computer-Aided Design & Computer Graphics, 2021, 33(11): 1698-1704.
[2] 罗亚威, 于俊清. 可微风格搜索: 一种在线自动数据增强方法[J]. 计算机辅助设计与图形学学报, 2023, 35(4): 553-561.
LUO Y W , YU J Q. Differentiable style search: an online automatic data augmentation method[J]. Journal of Computer-Aided Design & Computer Graphics, 2023, 35(4): 553-561.
[3] BABADIAN R P, FAEZ K, AMIRI M, et al. Fusion of tactile and visual information in deep learning models for object recognition[J]. Information Fusion, 2023, 92: 313-325.
[4] LIU J, HE J, ZHENG Y, et al. A holistically-guided decoder for deep representation learning with applications to semantic segmentation and object detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45（10）: 11390-11406.
[5] LIU X, XU Q. Adaptive attention-based high-level semantic introduction for image caption[J]. ACM Transactions on Multimedia Computing, Communications, and Applications, 2020, 16(4): 1-22.
[6] GAJERA H K, NAYAK D R, ZAVERI M A. A comprehensive analysis of dermoscopy images for melanoma detection via deep CNN features[J]. Biomedical Signal Processing and Control, 2023, 79: 104186.
[7] CANZIANI A, PASZKE A, CULURCIELLO E. An analysis of deep neural network models for practical applications[J]. arXiv:1605.07678, 2016.
[8] DEVRIES T, TAYLOR G W. Improved regularization of convolutional neural networks with cutout[J]. arXiv:1708.04552, 2017.
[9] ZHANG H, CISSE M, DAUPHIN Y N, et al. Mixup: beyond empirical risk minimization[J]. arXiv:1710.09412, 2017.
[10] YUN S, HAN D, OH S J, et al. CutMix: regularization strategy to train strong classifiers with localizable features[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019: 6023-6032.
[11] CHEN P, LIU S, ZHAO H, et al. Gridmask data augmentation[J]. arXiv:2001.04086, 2020.
[12] SINGH K K, LEE J Y. Hide-and-seek: forcing a network to be meticulous for weakly-supervised object and action localization[C]//Proceedings of the IEEE International Conference on Computer Vision, 2017: 3524-3533.
[13] CRESWELL A, WHITE T, DUMOULIN V, et al. Generative adversarial networks: an overview[J]. IEEE Signal Processing Magazine, 2018, 35(1): 53-65.
[14] 赵晓枫, 夏玉婷, 徐叶斌, 等. 地面红外目标数据联合增强方法[J]. 激光与红外, 2023, 53(7): 1117-1124.
ZHAO X F, XIA Y T, XU Y B, et al. Joint data augmentation method for ground infrared target[J]. Laser & Infrared, 2023, 53(7): 1117-1124.
[15] ANTONIOU A, STORKEY A, EDWARDS H. Data augmentation generative adversarial networks[J]. arXiv:1711. 04340, 2017.
[16] MIRZA M, OSINDERO S. Conditional generative adversarial nets[J]. arXiv:1411.1784, 2014.
[17] YIN X, LI Y, SHIN B S. DAGAN: a domain-aware method for image-to-image translations[J]. Complexity, 2020, 2020: 1-15.
[18] SRIVASTAVA N, HINTON G, KRIZHEVSKY A, et al. Dropout: a simple way to prevent neural networks from overfitting[J]. The Journal of Machine Learning Research, 2014, 15(1): 1929-1958.
[19] WEI J, WANG Q Z, SONG X Q, et al. The status and challenges of image data augmentation algorithms[J]. Journal of Physics: Conference Series, 2023, 2456: 012041.
[20] GARWAY-HEATH D F, CAPRIOLI J, FITZKE F W, et al. Scaling the hill of vision: the physiological relationship between light sensitivity and ganglion cell numbers[J]. Investigative Ophthalmology & Visual Science, 2000, 41(7): 1774-1782.
[21] 张浩, 杨坚华, 李启航, 等. 基于DDR-CycleGAN的红外图像数据增强[J]. 激光与红外, 2022, 52(4): 600-606.
ZHANG H, YANG J H , LI Q H , et al. Infrared image data enhancement based on DDR-CycleGAN[J]. Laser & Infrared, 2022, 52(4): 600-606.
[22] HAO X J, LIU L, YANG R J, et al. A review of data augmentation methods of remote sensing image target recognition[J]. Remote Sensing, 2023, 15(3): 827.
[23] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(6): 84-90.
[24] HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 770-778.
[25] 曾武, 朱恒亮, 邢树礼, 等. 显著性检测引导的图像数据增强方法[J]. 图学学报, 2023, 44(2): 260-270.
ZENG W, ZHU H L, XING S L, et al. Saliency detection-guided for image data augmentation[J]. Chinese Journal of Graphics, 2023, 44(2): 260-270.
[26] CAI L, YE Y, GAO X, et al. An improved visual SLAM based on affine transformation for ORB feature extraction[J]. Optik, 2021, 227: 165421.
[27] FEDOROV V, BALLESTER C. Affine non-local means image denoising[J]. IEEE Transactions on Image Processing, 2017, 26(5): 2137-2148.
[28] 李德毅, 刘常昱. 论正态云模型的普适性[J]. 中国工程科学, 2004, 6(8): 28-34.
LI D Y, LIU C Y. Study on the universality of the normal cloud model[J]. Engineering Science, 2004, 6(8): 28-34.
[29] LASHIN A M Y, AOUF M K. Hadamard product of certain multivalent analytic functions with positive real parts[J]. Mathematics, 2022, 10(9): 1506.