计算机工程与应用 ›› 2022, Vol. 58 ›› Issue (5): 186-192.DOI: 10.3778/j.issn.1002-8331.2009-0349

• 模式识别与人工智能 • 上一篇    下一篇

基于分层精简双线性注意力网络的鱼类识别

董绍江,刘伟,蔡巍巍,饶志荣   

  1. 1.重庆交通大学 机电与车辆工程学院,重庆 400074 
    2.大陆汽车研发(重庆)有限公司,重庆 400074
  • 出版日期:2022-03-01 发布日期:2022-03-01

Fish Recognition Based on Hierarchical Compact Bilinear Attention Network

DONG Shaojiang, LIU Wei, CAI Weiwei, RAO Zhirong   

  1. 1.School of Mechatronics and Vehicle Engineering, Chongqing Jiaotong University, Chongqing 400074, China
    2.Continental Automotive Research and Development(Chongqing) Co., Ltd., Chongqing 400074, China
  • Online:2022-03-01 Published:2022-03-01

摘要: 由于水下鱼类图像采集困难,现有的数据集主要以视频提取为主,采集到的鱼类图像存在背景环境复杂、像素低下等问题,使得细粒度鱼类图像识别任务难度较大。针对上述问题,提出了一种基于空间域注意力机制和分层精简双线性特征融合的网络。识别网络可进行端到端的训练,由两部分组成:第一部分是以空间变换网络(STN)为注意力机制的背景过滤网络;第二部分以vgg16网络作为特征提取器,根据高层卷积部分对鱼类图像细粒度特征响应的差异性,选取三组特征进行降维近似的网络精简融合,最终级联三组融合的特征送入softmax分类器。特征提取网络以ImageNet数据集上训练的参数进行初始化,采用鱼类数据集进一步微调。通过在F4K鱼类数据集上的对比验证,结果表明,所提出的分层精简双线性注意力网络(STN-H-CBP)在降低特征维度减少计算量的同时,在该数据集上的表现与现有的最优方法相当。

关键词: 水下鱼类识别, 空间变换网络, 分层精简双线性网络

Abstract: Due to the difficulty in collecting underwater fish images, the existing datasets are mainly extracted from videos. The collected fish images have problems such as complex background environment and low pixels, making the task of fine-grained fish images recognition more difficult. To solve the above problems, a network based on spatial domain attention mechanism and hierarchical compact bilinear feature fusion is proposed. The recognition network can be trained end-to-end, and consists of two parts: the first part is a background filtering network with a spatial transformation network (STN) as the attention mechanism; the second part is a vgg16 network as a feature extractor, for the difference in the response of the fine-grained features of fish images, which is based on the high-level convolution part, three groups of features are selected for network simplification and fusion of dimensionality reduction approximation, and finally the three groups of fused features are cascaded and sent to the soft-max classifier. The feature extraction network is initialized with the parameters trained on the ImageNet dataset and further fine-tuned using the fish dataset. Through comparison and verification on the F4K fish dataset, the results show that the proposed hierarchical compact bilinear attention network(STN-H-CBP) can reduce the feature dimension and reduce the amount of calculation at the same time. The performance is comparable to existing best practices.

Key words: underwater fish recognition, spatial transformation network, hierarchical compact bilinear network