计算机工程与应用 ›› 2025, Vol. 61 ›› Issue (10): 214-227.DOI: 10.3778/j.issn.1002-8331.2401-0188

• 图形图像处理 • 上一篇    下一篇

跨通道细粒度特征融合的矿石图像分类算法

高云霏,吕伏,冯永安   

  1. 1.辽宁工程技术大学 软件学院,辽宁 葫芦岛 125105
    2.辽宁工程技术大学 基础教学部,辽宁 葫芦岛 125105
    3.辽宁工程技术大学 信息化与网络管理中心,辽宁 葫芦岛 125105
  • 出版日期:2025-05-15 发布日期:2025-05-15

Ore Image Classification Algorithm Based on Cross-Channel Fine-Grained Feature Fusion

GAO Yunfei, LYU Fu, FENG Yong’an   

  1. 1.School of Software, Liaoning Technical University, Huludao, Liaoning 125105, China
    2.Department of Basic Teaching, Liaoning Technical University, Huludao, Liaoning 125105, China
    3.Information and Network Management Center, Liaoning Technical University, Huludao, Liaoning 125105, China
  • Online:2025-05-15 Published:2025-05-15

摘要: 为解决深度学习算法在处理细粒度纹理特征的矿石图像时准确率低、计算资源需求大且难以在移动端部署的问题,提出一种跨通道细粒度特征融合的轻量级矿石图像分类算法。通过交替使用CNN与Transformer构建混合网络,以有效提取图像局部与全局信息;引入跨通道细粒度特征融合模块作为特征融合器,采用通道分组和随机通道混洗的融合策略,增强矿石纹理信息的获取能力和保持细粒度特征的多样性;利用多尺度轻量化自注意力模块降低模型参数,增强对不同尺度和空间位置的感知,确保训练的稳定性并避免过度拟合低级特征;构建高效坐标注意力模块作为细粒度特征提取器,实现轻量化和高效率的特征提取。所提算法在Kaggle平台的Mineral Photos和Petrology Thin Section Data两个公开矿石图像数据集上分别取得了95.78%和94.77%的分类准确率,相较于其他9种轻量级分类网络,如ShuffleNetV2、MobileNetV3、RegNet、ConvNeXtV2、LeViT、EdgeViTs、AFFNeT、EdgeNeXt和MViTV2,所提算法具有更少的参数(1.27?MB)、更低的计算量(269?MFLOPs)和更快的分类速度(219?FPS)。

关键词: 矿石图像分类, 卷积神经网络(CNN), Transformer, 跨通道特征融合, 注意力机制

Abstract: In order to solve the problems that the deep learning algorithm has low accuracy in processing ore images with fine-grained texture features, large computing resource requirements and difficulty to deploy on the mobile terminal, a lightweight ore image classification algorithm based on cross-channel fine-grained feature fusion is proposed. A hybrid network is constructed by alternately using CNN and Transformer to effectively extract local and global information of the image. The cross-channel fine-grained feature fusion module is introduced as the feature fuser, and the fusion strategy of channel grouping and random channel shuffle is adopted to enhance the acquisition of ore texture information and maintain the diversity of fine-grained features. The multi-scale lightweight self-attention module is used to reduce the model parameters, enhance the perception of different scales and spatial locations, and ensure the stability of training and avoid overfitting low-level features. An efficient coordinate attention module is constructed as a fine-grained feature extractor to achieve lightweight and efficient feature extraction. The proposed algorithm achieves 95.78% and 94.77% classification accuracy on the two public ore image datasets of Mineral Photos and Petrology Thin Section Data on the Kaggle platform, respectively. Compared with the other nine lightweight classification networks, such as ShuffleNetV2, MobileNetV3, RegNet, ConvNeXtV2, LeViT, EdgeViTs, AFFNeT, EdgeNeXt and MViTV2. The proposed algorithm has fewer parameters (1.27?MB), lower computation (269?MFLOPs) and faster classification speed (219?FPS).

Key words: ore image classification, convolutional neural network (CNN), Transformer, cross-channel feature fusion, attention mechanism