计算机工程与应用 ›› 2025, Vol. 61 ›› Issue (7): 188-195.DOI: 10.3778/j.issn.1002-8331.2311-0037

• 模式识别与人工智能 • 上一篇    下一篇

基于内在奖励的强化学习推荐探索策略

庾源清,马为之,张敏   

  1. 1.清华大学 计算机科学与技术系,北京 100084
    2.清华大学 智能产业研究院,北京 100084
  • 出版日期:2025-04-01 发布日期:2025-04-01

Exploration Strategy in Reinforcement Learning Based on Intrinsic Reward for Recommendation

YU Yuanqing, MA Weizhi, ZHANG Min   

  1. 1.Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China
    2.Institute for AI Industry Research, Tsinghua University, Beijing 100084, China
  • Online:2025-04-01 Published:2025-04-01

摘要: 近年来强化学习算法被引入推荐系统以解决探索-利用问题,改善平台上用户体验并提升系统长期效益。现有研究主要从模型层面进行探索策略设计,但大部分工作很少考虑用户体验对探索策略的影响。提出通过修改奖励的方式设计探索策略,充分考虑强化学习在推荐场景下将用户建模为环境的特殊性,将商品多样性和新颖性作为内在奖励,利用用户体验指导模型的探索方向。在两个不同类型的真实数据集上进行实验,实验结果表明所提出方法在推荐性能和推荐商品多样性等各项指标上实现了明显的效果提升,验证了所提出探索策略的有效性。

关键词: 推荐系统, 强化学习, 探索策略

Abstract: In recent years, reinforcement learning algorithms have been introduced into recommender systems to address the exploration-exploitation dilemma, enhancing user experience in recommender systems and boosting long-term benefits. Existing studies mainly focus on the design of exploration strategies at the model level, with little consideration for the impact of user experience on exploration strategies. This study proposes to design an exploration strategy by modifying rewards, taking into account the uniqueness of modeling users as the environment in reinforcement learning for recommendation scenarios. Specifically, the diversity and novelty of items are selected as intrinsic rewards, guiding the model’s exploration direction based on user experience. Experiments are conducted on two different types of real-world datasets, and the results demonstrate significant performance improvements in recommendation accuracy and diversity of recommended items, validating the effectiveness of the proposed exploration strategy.

Key words: recommender systems, reinforcement learning, exploration strategy