Survey on Attention Mechanisms in Deep Learning Recommendation Models

doi:10.3778/j.issn.1002-8331.2112-0382

Abstract

Abstract: Aims to explore how the attention mechanism helps the recommendation model to dynamically focus on specific parts of the input that help to perform the current recommendation task. This paper analyzes the attention mechanism network framework and the weight calculation method of its input data, and then summarizes from the five perspectives of vanillaattention mechanism, co-attention mechanism, self-attention mechanism, hierarchical attention mechanism, and multi-head attention mechanism. Analyze how it uses key strategies, algorithms, or techniques to calculate the weight of the current input data, and use the calculated weights so that the recommendation model can focus on the necessary parts of the input at each step of the recommendation task, more effective user or item feature representation can be generated, and the operating efficiency and generalization ability of the recommendation model are improved. The attention mechanism can help the recommendation model assign different weights to each part of the input, extract more critical and important information, and enable the recommendation model to make more accurate judgments, and it will not bring more overhead to the calculation and storage of the recommendation model. Although the existing deep learning recommendation model with the attention mechanism can meet the needs of most recommendation tasks to a certain extent, it is certain that the uncertainty of human needs and the explosive growth of information under certain circumstances factors, it will still face the challenges of recommendation diversity, recommendation interpretability, and the integration of multiple auxiliary information.

Key words: attention mechanism, neural network, deep learning, deep learning recommendation model

摘要： 探讨注意力机制如何帮助推荐模型动态关注有助于执行当前推荐任务输入的特定部分。分析注意力机制网络框架及其输入数据的权重计算方法，分别从标准注意力机制、协同注意力机制、自注意力机制、层级注意力机制和多头注意力机制这五个角度出发，归纳分析其如何采用关键策略、算法或技术来计算当前输入数据的权重，并通过计算出的权重以使推荐模型可以在推荐任务的每个步骤上专注于输入的必要部分，从而产生更为有效的用户或物品特征表示，进而提高推荐模型的运行效率、泛化能力等。注意力机制可以帮助推荐模型对输入的每个部分赋予不同的权重，抽取出更加关键及重要的信息，使推荐模型做出更加准确的判断，同时不会对推荐模型的计算和存储带来更大的开销。尽管现有融合注意力机制的深度学习推荐模型能在一定程度上满足大部分推荐任务的需求，但可以肯定的是，在特定情况下人类需求的不确定性、信息的爆炸式增长这两个因素，将使得其仍然面临着推荐多样性、推荐可解释性和多种辅助信息融合等方面的挑战。

关键词: 注意力机制, 神经网络, 深度学习, 深度学习推荐模型

GAO Guangshang. Survey on Attention Mechanisms in Deep Learning Recommendation Models[J]. Computer Engineering and Applications, 2022, 58(9): 9-18.

高广尚. 深度学习推荐模型中的注意力机制研究综述[J]. 计算机工程与应用, 2022, 58(9): 9-18.

References

[1] CHEN J，ZHANG H，HE X，et al.Attentive collaborative filtering：multimedia recommendation with item-and component-level attention[C]//Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval，2017：335-344.
[2] JHAMB Y，EBESU T，FANG Y.Attentive contextual denoi-sing autoencoder for recommendation[C]//Proceedings of the 2018 ACM SIGIR International Conference on Theory of Information Retrieval，2018：27-34.
[3] TAY Y，LUU A T，HUI S C.Multi-pointer co-attention networks for recommendation[C]//Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining，2018：2309-2318.
[4] 朱张莉，饶元，吴渊，等.注意力机制在深度学习中的研究进展[J].中文信息学报，2019，33（6）：1-11.
ZHU Z L，RAO Y，WU Y，et al.Research progress of attentional mechanism in deep learning[J].Chinese Journal of Information Processing，2019，33（6）：1-11.
[5] ZHANG S，YAO L，SUN A，et al.Deep learning based recommender system：a survey and new perspectives[J].ACM Computing Surveys（CSUR），2019，52（1）：1-38.
[6] 黄立威，江碧涛，吕守业，等.基于深度学习的推荐系统研究综述[J].计算机学报，2018，41（7）：1619-1647.
HUANG L W，JIANG B T，LV S Y，et al.Survey on deep learning based recommender systems[J].Chinese Journal of Computers，2018，41（7）：1619-1647.
[7] 方钧婷，谭晓阳.注意力级联网络的金属表面缺陷检测算法[J].计算机科学与探索，2021，15（7）：1245-1254.
FANG J T，TAN X Y.Defect detection of metal surface based on attention cascade R-CNN[J].Journal of Frontiers of Computer Science and Technology，2021，15（7）：1245-1254.
[8] 马丹，万良，程琪芩，等.Attention-CNN在恶意代码检测中的应用研究[J].计算机科学与探索，2021，15（4）：670-681.
MA D，WAN L，CHENG Q Q，et al.Research on application of Attention-CNN in malware detection[J].Journal of Frontiers of Computer Science and Technology，2021，15（4）：670-681.
[9] XIAO J，YE H，HE X，et al.Attentional factorization machines：learning the weight of feature interactions via attention networks[J].arXiv：1708.04617，2017.
[10] XIAO Z，YANG L，JIANG W，et al.Deep multi-interest network for click-through rate prediction[C]//Proceedings of the 29th ACM International Conference on Information & Knowledge Management，2020：2265-2268.
[11] 吴俊杰，刘冠男，王静远，等.数据智能：趋势与挑战[J].系统工程理论与实践，2020，40（8）：2116-2149.
WU J J，LIU G N，WANG J Y，et al.Data intelligence：trends and challenges[J].Systems Engineering-Theory & Practice，2020，40（8）：2116-2149.
[12] MU R.A survey of recommender systems based on deep learning[J].IEEE Access，2018，6：69009-69022.
[13] FANG H，GUO G，ZHANG D，et al.Deep learning-based sequential recommender systems：concepts，algorithms，and evaluations[C]//International Conference on Web Engineering，2019：574-577.
[14] DA’U A，SALIM N.Recommendation system based on deep learning methods：a systematic review and new directions[J].Artificial Intelligence Review，2020，53（4）：2709-2748.
[15] BATMAZ Z，YUREKLI A，BILGE A，et al.A review on deep learning for recommender systems：challenges and remedies[J].Artificial Intelligence Review，2019，52（1）：1-37.
[16] SUN Z，GUO Q，YANG J，et al.Research commentary on recommendations with side information：a survey and research directions[J].Electronic Commerce Research and Applications，2019，37：100879.
[17] GONG Y，ZHANG Q.Hashtag recommendation using attention-based convolutional neural network[C]//International Joint Conference on Artificial Intelligence，2016：2782-2788.
[18] SEO S，HUANG J，YANG H，et al.Representation learning of users and items for review rating prediction using attention-based convolutional neural network[C]//International Workshop on Machine Learning Methods for Recommender Systems，2017.
[19] HAO S，LEE D H，ZHAO D.Sequence to sequence learning with attention mechanism for short-term passenger flow prediction in large-scale metro system[J].Transportation Research Part C：Emerging Technologies，2019，107：287-300.
[20] MNIH V，HEESS N，GRAVES A.Recurrent models of visual attention[C]//Advances in Neural Information Processing Systems，2014：2204-2212.
[21] RENDLE S.Factorization machines[C]//2010 IEEE International Conference on Data Mining，2010：995-1000.
[22] HE X，CHUA T S.Neural factorization machines for sparse predictive analytics[C]//Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval，2017：355-364.
[23] YAN H，CHEN X，GAO C，et al.Deepapf：deep attentive probabilistic factorization for multi-site video recommendation[C]//International Joint Conference on Artificial Intelligence，2019.
[24] CHENG Z，DING Y，HE X，et al.A3NCF：an adaptive aspect attention model for rating prediction[C]//International Joint Conference on Artificial Intelligence，2018：3748-3754.
[25] VINH TRAN L，PHAM T A N，CONG G，et al.Attention-based group recommendation[J].arXiv：1804.04327，2018.
[26] CHEN L，LIU Y，HE X，et al.Matching user with item set：collaborative bundle recommendation with deep attention network[C]//International Joint Conference on Artificial Intelligence，2019：2095-2101.
[27] XIONG C，ZHONG V，SOCHER R.Dynamic coattention networks for question answering[J].arXiv：1611.01604，2016.
[28] CHEN Z，WANG X，XIE X，et al.Co-attentive multi-task learning for explainable recommendation[C]//International Joint Conference on Artificial Intelligence，2019：2137-2143.
[29] YANG D，SONG Z，XUE L，et al.A knowledge-enhanced recommendation model with attribute-level co-attention[C]//Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval，2020：1909-1912.
[30] VASWANI A，SHAZEER N，PARMAR N，et al.Attention is all you need[C]//Advances in Neural Information Processing Systems，2017：5998-6008.
[31] YU S，WANG Y，YANG M，et al.NAIRS：a neural attentive interpretable recommendation system[C]//Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining，2019：790-793.
[32] LV F，JIN T，YU C，et al.SDM：sequential deep matching model for online large-scale recommender system[C]//Proceedings of the 28th ACM International Conference on Information and Knowledge Management，2019：2635-2643.
[33] ZHANG S，TAY Y，YAO L，et al.Next item recommendation with self-attention[J].arXiv：1808.06414，2018.
[34] ZHANG T，ZHAO P，LIU Y，et al.Feature-level deeper self-attention network for sequential recommendation[C]//International Joint Conference on Artificial Intelligence，2019：4320-4326.
[35] ZHANG S，TAY Y，YAO L，et al.Next item recommendation with self-attentive metric learning[C]//Thirty-Third AAAI Conference on Artificial Intelligence，2019.
[36] ZHANG S.Personalized recommendation：neural architectures and beyond[D].University of New South Wales，Sydney，2019.
[37] ZHOU C，BAI J，SONG J，et al.Atrank：an attention-based user behavior modeling framework for recommendation[C]//Proceedings of the AAAI Conference on Artificial Intelligence，2018.
[38] YING H，ZHUANG F，ZHANG F，et al.Sequential recommender system based on hierarchical attention network[M].Stockholm，Sweden：AAAI Press，2018：3926-3932.
[39] YANG Z，YANG D，DYER C，et al.Hierarchical attention networks for document classification[C]//Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics：Human Language Technologies，2016：1480-1489.
[40] XING S，WANG Q，ZHAO X，et al.A hierarchical attention model for rating prediction by leveraging user and product reviews[J].Neurocomputing，2019，332：417-427.
[41] WANG Y，LI J，LYU M，et al.Cross-media keyphrase prediction：a unified framework with multi-modality multi-head attention and image wordings[C]//Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing（EMNLP），2020：3311-3324.
[42] SONG W，SHI C，XIAO Z，et al.Autoint：automatic feature interaction learning via self-attentive neural networks[C]//Proceedings of the 28th ACM International Conference on Information and Knowledge Management，2019：1161-1170.
[43] 王喆.深度学习推荐系统[M].北京：电子工业出版社，2020.
WANG Z.Deep learning recommendation system[M].Beijing：Publishing House of Electronics Industry，2020.
[44] 黄昕，赵伟，王本友.推荐系统与深度学习[M].北京：清华大学出版社，2019.
HUANG X，ZHAO W，WANG B Y.Recommendation systems and deep learning[M].Beijing：Tsinghua University Press，2019.
[45] CARBONELL J，GOLDSTEIN J.The use of MMR，diversity-based reranking for reordering documents and producing summaries[C]//Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval，1998：335-336.
[46] CHEN L，ZHANG G，ZHOU H.Fast greedy map inference for determinantal point process to improve recommendation diversity[C]//Proceedings of the 32nd International Conference on Neural Information Processing Systems，2018：5627-5638.
[47] LU Y，ZHANG S，HUANG Y，et al.Future-aware diverse trends framework for recommendation[C]//Proceedings of the Web Conference，2021：2992-3001.
[48] CHEN W，REN P，CAI F，et al.Improving end-to-end sequential recommendations with intent-aware diversification[C]//Proceedings of the 29th ACM International Conference on Information & Knowledge Management，2020：175-184.
[49] HERLOCKER J L，KONSTAN J A，RIEDL J.Explaining collaborative filtering recommendations[C]//Proceedings of the 2000 ACM Conference on Computer Supported Cooperative Work，2000：241-250.
[50] FERDOUSI Z V，COLAZZO D，NEGRE E.CBPF：leveraging context and content information for better recommendations[C]//International Conference on Advanced Data Mining and Applications，2018：381-391.
[51] WANG X，WANG D，XU C，et al.Explainable reasoning over knowledge graphs for recommendation[C]//Proceedings of the AAAI Conference on Artificial Intelligence，2019：5329-5336.