Computer Engineering and Applications ›› 2024, Vol. 60 ›› Issue (12): 153-159.DOI: 10.3778/j.issn.1002-8331.2303-0249

• Pattern Recognition and Artificial Intelligence • Previous Articles     Next Articles

Method for Recognition of Food Images Based on Improved Attention Model

JIANG Feng, ZHOU Lili   

  1. Taizhou Institute of Science and Technology, Nanjing University of Science and Technology, Taizhou,Jiangsu  225300, China
  • Online:2024-06-15 Published:2024-06-14

改进注意力模型的食品图像识别方法

姜枫,周莉莉   

  1. 南京理工大学 泰州科技学院,江苏 泰州 225300

Abstract: With the increasing demands for healthy diet of people, various kinds of food evaluation assistant softwares emerge as times require, and the topic of food images recognition receives more and more attention. Food images recognition belongs to fine-grained recognition problem, which is more difficult than other image recognition. Moreover, popular food image datasets, such as ISIA Food-500, ETH Food-101 and Vireo Food-172, contain a small number of images, which makes it difficult to train the image recognition system well and further increasing the recognition difficulty. In this paper, an image recognition method based on attention mechanism is proposed. The method introduces the concept of local-attention on the basis of self-attention to describe the fine-grained features of the image and improve the accuracy of image recognition. In addition, an image self-supervised pre-training algorithm is proposed as well, to alleviate the problem of insufficient training samples of food images. The experimental results show that Top-1 accuracy and Top-5 accuracy of the proposed method on ISIA Food-500 dataset are 65.58% and 90.03%, respectively, which is superior to the state-of-the-art algorithms.

Key words: food images, fine-grained image recognition, local attention, self-supervised pre-training, ISIA Food-500 dataset

摘要: 随着人们对健康饮食需求的日益增加,各种饮食评估辅助软件应运而生,食品图像识别问题受到越来越多的关注。食品图像识别属于细粒度图像识别问题,较其他图像识别难度更大。目前主流的食品图像数据集,如ISIA Food-500、ETH Food-101、Vireo Food-172等所包含的图像数量偏少,难以很好地训练图像识别系统,进一步增大了图像识别难度。提出一种基于注意力机制的图像识别方法,该方法在自注意力的基础上引入局部注意力的概念,用于描绘图像细粒度特征,提高图像识别的准确率。此外,还提出一种图像自监督预训练算法,缓解食品图像训练样本不足的问题。实验结果表明,所提方法在ISIA Food-500数据集的Top-1和Top-5准确率分别达到65.58%和90.03%,性能优于现有的其他算法。

关键词: 食品图像, 细粒度图像识别, 局部注意力, 自监督预训练, ISIA Food-500数据集