[1] KOREN Y, BELL R, VOLINSKY C. Matrix factorization techniques for recommender systems[J]. Computer, 2009, 42(8): 30-37.
[2] HE X N, LIAO L Z, ZHANG H W, et al. Neural collaborative filtering[C]//Proceedings of the 26th International Conference on World Wide Web. New York: ACM, 2017: 173-182.
[3] COVINGTON P, ADAMS J, SARGIN E, et al. Deep neural networks for YouTube recommendations[C]//Proceedings of the 10th ACM Conference on Recommender Systems. New York: ACM, 2016: 191-198.
[4] FORTUNATO M, AZAR M G, PIOT B, et al. Noisy networks for exploration[J]. arXiv:1706.10295, 2017.
[5] SHANI G, HECKERMAN D, BRAFMAN R I. An MDP-based recommender system[J]. Journal of Machine Learning Research, 2005, 6: 1265-1295.
[6] ZHENG G J, ZHANG F Z, ZHENG Z H, et al. DRN: a deep reinforcement learning framework for news recommendation[C]//Proceedings of the 2018 World Wide Web Conference. New York: ACM, 2018: 167-176.
[7] CHEN M M, BEUTEL A, COVINGTON P, et al. Top-K off-policy correction for a REINFORCE recommender system[C]//Proceedings of the 12th ACM International Conference on Web Search and Data Mining. New York: ACM, 2019: 456-464.
[8] WILLIAMS R J. Simple statistical gradient-following algorithms for connectionist reinforcement learning[J]. Machine Learning, 1992, 8(3/4): 229-256.
[9] GAO C M, WANG S Q, LI S J, et al. CIRS: bursting filter bubbles by counterfactual interactive recommender system[J]. ACM Transactions on Information Systems, 2023, 42(1): 1-27.
[10] SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Proximal policy optimization algorithms[J]. arXiv:1707.06347, 2017.
[11] DU C, GAO Z F, YUAN S, et al. Exploration in online advertising systems with deep uncertainty-aware learning[C]//Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. New York: ACM, 2021: 2792-2801.
[12] WU K, BIAN W, CHAN Z, et al. Adversarial gradient driven exploration for deep click-through rate prediction[C]//Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2022: 2050-2058.
[13] AUER P, CESA-BIANCHI N, FISCHER P. Finite-time analysis of the multiarmed bandit problem[J]. Machine Learning, 2002, 47(2): 235-256.
[14] SUTTON R S, BARTO A G. Reinforcement learning: an introduction[M]. Cambridge: MIT Press, 2018.
[15] MNIH V, BADIA A P, MIRZA M, et al. Asynchronous methods for deep reinforcement learning[C]//Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48. New York: ACM, 2016: 1928-1937.
[16] PATHAK D, AGRAWAL P, EFROS A A, et al. Curiosity-driven exploration by self-supervised prediction[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops. Piscataway: IEEE, 2017: 488-489.
[17] KIM H, KIM J, JEONG Y, et al. EMI: exploration with mutual information[C]//Proceedings of the International Conference on Machine Learning, 2019: 3360-3369.
[18] CHEN M M, WANG Y Y, XU C, et al. Values of user exploration in recommender systems[C]//Proceedings of the 15th ACM Conference on Recommender Systems. New York: ACM, 2021: 85-95.
[19] WANG S Q, GAO C M, GAO M, et al. Who are the best adopters?User selection model for free trial item promotion[J]. IEEE Transactions on Big Data, 2023, 9(2): 746-757.
[20] OUDEYER P Y, KAPLAN F. How can we define intrinsic motivation?[C]//Proceedings of the 8th International Conference on Epigenetic Robotics: Modeling Cognitive Development in Robotic System, 2008.
[21] HUANG J, OOSTERHUIS H, CETINKAYA B, et al. State encoders in reinforcement learning for recommendation: a reproducibility study[C]//Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2022: 2738-2748.
[22] CHEN M M, XU C, GATTO V, et al. Off-policy actor-critic for recommender systems[C]//Proceedings of the 16th ACM Conference on Recommender Systems. New York: ACM, 2022: 338-349.
[23] TANG J X, WANG K. Personalized top-N sequential recommendation via convolutional sequence embedding[C]//Proceedings of the 11th ACM International Conference on Web Search and Data Mining. New York: ACM, 2018: 565-573.
[24] YUAN F J, KARATZOGLOU A, ARAPAKIS I, et al. A simple convolutional generative network for next item recommendation[C]//Proceedings of the 12th ACM International Conference on Web Search and Data Mining. New York: ACM, 2019: 582-590.
[25] KANG W C, MCAULEY J. Self-attentive sequential recommendation[C]//Proceedings of the 2018 IEEE International Conference on Data Mining. Piscataway: IEEE, 2018: 197-206.
[26] XIN X, PIMENTEL T, KARATZOGLOU A, et al. Rethinking reinforcement learning for recommendation: a prompt perspective[C]//Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2022: 1347-1357.
[27] KONDA V, TSITSIKLIS J. Actor-critic algorithms[C]//Advances in Neural Information Processing Systems, 1999.
[28] SWAMINATHAN A, JOACHIMS T. Counterfactual risk minimization: learning from logged bandit feedback[C]//Proceedings of the International Conference on Machine Learning, 2015: 814-823.
[29] YU T, THOMAS G, YU L, et al. MOPO: model-based offline policy optimization[C]//Advances in Neural Information Processing Systems, 2020: 14129-14142.
[30] SUTTON R S, MCALLESTER D, SINGH S, et al. Policy gradient methods for reinforcement learning with function approximation[C]//Advances in Neural Information Processing Systems, 1999.
[31] SILVEIRA T, ZHANG M, LIN X, et al. How good your recommender system is? A survey on evaluations in recommendation[J]. International Journal of Machine Learning and Cybernetics, 2019, 10(5): 813-831.
[32] JANNER M, FU J, ZHANG M, et al. When to trust your model: model-based policy optimization[C]//Advances in Neural Information Processing Systems, 2019.
[33] GUO H F, TANG R M, YE Y M, et al. DeepFM: a factorization-machine based neural network for CTR prediction[C]//Proceedings of the 26th International Joint Conference on Artificial Intelligence. Melbourne, Australia: AAAI Press, 2017: 1725-1731.
[34] XU S Y, TAN J T, FU Z H, et al. Dynamic causal collaborative filtering[C]//Proceedings of the 31st ACM International Conference on Information & Knowledge Management. New York: ACM, 2022: 2301-2310. |