[1] BOSSARD L, GUILLAUMIN M, GOOL L V. Food-101-mining discriminative components with random forests[C]//European Conference on Computer Vision (ECCV), 2014: 446-461.
[2] CHEN J, NGO C W. Deep-based Ingredient recognition for cooking recipe retrieval[C]//Proceedings of ACM on Multimedia Conference, 2016: 32-41.
[3] MIN W Q, LIU L H, ?WANG Z L, et al. ISIA food-500: a dataset for large-scale food recognition via stacked global-local attention network[C]//Proceedings of ACM Conference on Multimedia Conference, 2020: 393-401.
[4] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Advances in Neural Information Processing Systems, 2017: 5999-6009.
[5] ALEXEY D, LUCAS B, ALEXANDER K, et al. An image is worth 16×16 words: transformers for image recognition at scale[C]//International Conference on Learning Representations, 2021.
[6] HAN K, XIAO A, WU E H, et al. Transformer in transformer[C]//Advances in Neural Information Processing Systems, 2021: 15908-15919.
[7] TOUVRON H, CORD M, SABLAYROLLES A, et al. Going deeper with image transformers[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021: 32-42.
[8] LIU Z, LIN Y T, CAO Y, et al. Swin transformer: hierarchical vision transformer using shifted windows[C]//IEEE/CVF International Conference on Computer Vision (ICCV), 2021: 9992-10002.
[9] YUAN L, HOU Q, JIANG Z, et al. Volo: vision outlooker for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(5): 6575-6586.
[10] CHEN C F, FAN?Q F, ?PANDA R. CrossViT: cross-attention multi-scale vision transformer for image classification[C]//IEEE/CVF International Conference on Computer Vision (ICCV), 2021: 347-356.
[11] YANG S, CHEN M, POMERLEAU D, et al. Food recognition using statistics of pairwise local features[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2010: 2249-2256.
[12] KONG F, TAN J. Dietcam: automatic dietary assessment with mobile camera phones[J]. Pervasive and Mobile Computing, 2012, 8(1): 147-163.
[13] KAWANO Y, YANAI K. Real-time mobile food recognition system[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013: 1-7.
[14] ANTHIMOPOULOS M H, GIANOLA L, SCARNATO L, et al. A food recognition system for diabetic patients based on an optimized bag-of-features model[J]. IEEE Journal of Biomedical and Health Informatics, 2014, 18(4): 1261-1271.
[15] MATSUDA Y, YANAI K. Multiple-food recognition considering co-occurrence employing manifold ranking[C]//International Conference on Pattern Recognition (ICPR), 2012: 2017-2020.
[16] FARINELLA G M, MOLTISANTI M, BATTIATO S. Classifying food images represented as bag of textons[C]//International Conference on Image Processing (ICIP), 2014: 5212-5216.
[17] KAWANO Y, YANAI K. Foodcam: a real-time mobile food recognition system employing fisher vector[C]//International Conference on Multimedia Modeling, 2014: 369-373.
[18] LIU C, CAO Y, LUO Y, et al. Deepfood: deep learning-based food image recognition for computer-aided dietary assessment[C]//International Conference on Smart Homes and Health Telematics. [S.l.]: Springer, 2016: 37-48.
[19] HASSANNEJAD H, MATRELLA G, CIAMPOLINI P, et al. Food image recognition using very deep convolutional networks[C]//Proceedings of the 2nd International Workshop on Multimedia Assisted Dietary Management, 2016: 41-49.
[20] MARTINEL N, FORESTI G L, MICHELONI C. Wide-slice residual networks for food recognition[C]//IEEE Winter Conference on Applications of Computer Vision, 2018: 567-576.
[21] JIANG S Q, MIN W Q, LIU L H, et al. Multi-scale multi-view deep feature aggregation for food recognition[J]. IEEE Transactions on Image Processing, 2020, 29: 265-276.
[22] LIANG H, WEN G, HU Y, et al. MVANet: multi-tasks guided multi-view attention network for Chinese food recognition[J]. IEEE Transactions on Multimedia, 2020, 23: 3551-3561.
[23] MIN W Q, LIU L H, LUO Z D, et al. Ingredient-guided cascaded multi-attention network for food recognition[C]//Proceedings of the 27th ACM International Conference on Multimedia, 2019: 1331-1339.
[24] CUI Y Q, XU Y F, PENG R M. Layer normalization for TSK fuzzy system optimization in regression problems[J]. IEEE Transactions on Fuzzy System, 2023, 31(1): 254-264.
[25] BAO H B, LI D, PIAO S H, et al. BEiT: BERT pre-training of image transformers[C]//International Conference on Learning Representations, 2022.
[26] KINGMA D P, WELLING M. Auto-encoding variational bayes[C]//International Conference on Learning Representations, 2014.
[27] PATHAK D, KRAHENBUHL P, DONAHUE J, et al. Context encoders: feature learning by inpainting[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016: 2536-2544.
[28] XIE Z D, ZHANG Z, CAO Y, ?et al. SimMIM: a simple framework for masked image modeling[C]//?Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022: 9643-9653.
[29] SZEGEDY C, LIU W, JIA Y Q, et al. Going deeper with convolutions[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015: 1-9.
[30] HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016: 770-778.
[31] HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018: 7132-7141.
[32] HOU S H, ?FENG Y S, ?WANG Z L. VegFru: a domain-specific dataset for fine-grained visual categorization[C]//Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2017: 541-549. |