Dynamic Pricing Strategy Based on Gaussian Process and Parallel Thompson Sampling

doi:10.3778/j.issn.1002-8331.2012-0493

Abstract

Abstract: Considering the research on pricing strategies of the same type of products in the case of uncertain demand in the short term, this paper introduces Gaussian process to learn the demand function, and uses parallel Thompson algorithm to establish a two-stage learning and decision-making process pricing model based on exploration-exploitation trade-off. After using the proposed GP-PTS algorithm to complete the numerical experiment and the real data application, the results show that the accuracy of the algorithm depends on whether the features are complete. If a prior is given and the product features are complete, the price simulated by GP-PTS algorithm will obtain better benefits than the current platform pricing strategy, and will provide a good reference for enterprises to make pricing decisions in the short term.

Key words: dynamic pricing, Gaussian process, Thompson sampling, parallel Bayesian optimization

摘要： 考虑短期内需求不确定情况下同类型产品的定价策略研究，引入高斯过程进行需求函数的学习，利用批量汤普森算法建立基于探索-利用的两阶段学习和决策过程的定价模型。在利用提出的GP-PTS（Gaussian process-parallel Thompson sampling）算法完成数值实验和某平台出行的真实数据应用后得出的结果表明：算法的精准度取决于特征是否完备，若给定一个先验且产品特征完备时，基于GP-PTS算法模拟出来的价格会取得比目前平台价格策略更好的收益，为企业在短期内进行定价决策提供良好借鉴。

关键词: 动态定价, 高斯过程, 汤普森抽样, 批量贝叶斯优化

BI Wenjie, WANG Rong. Dynamic Pricing Strategy Based on Gaussian Process and Parallel Thompson Sampling[J]. Computer Engineering and Applications, 2022, 58(16): 303-311.

毕文杰, 王荣. 基于高斯过程与批量汤普森抽样的动态定价策略[J]. 计算机工程与应用, 2022, 58(16): 303-311.

References

[1] FERREIRA K J，LEE B H A，SIMCHI-LEVI D.Analytics for an online retailer：demand forecasting and price optimization[J].Manufacturing & Service Operations Management，2016，18（1）：69-88.
[2] 李丽萍，于宏新，肖艳玲.双寡头竞争结构下同质产品动态定价研究[J].统计与决策，2011（2）：47-49.
LI L P，YU H X，XIAO Y L.Research on dynamic pricing of homogeneous products under duopoly competition structure[J].Statistics and Decision，2011（2）：47-49.
[3] 毕文杰，刘承飞，刘海英.考虑需求替代与社会学习的易逝品动态定价策略[J].系统工程，2018，36（1）：53-62.
BI W J，LIU C F，LIU H Y.Dynamic pricing of perishable goods considering demand substitution and social learning[J].Systems Engineering，2012，36（1）：53-62.
[4] CERYAN O.Asymmetric pricing and replenishment controls for substitutable products[J].Decision Sciences，2019，50（4）：1-12.
[5] 赵天，胡敏，胡玉生.基于Hotelling模型的可替代产品动态定价研究[J].北京信息科技大学学报（自然科学版），2020，35（4）：38-45.
ZHAO T，HU M，HU Y S.Study on dynamic pricing of substitute products based on the Hotelling model[J].Journal of Beijing University of Information Science and Technology（Natural Science Edition），2020，35（4）：38-45.
[6] BESBES O，ZEEVI A.Blind network revenue management[J].Social Science Electronic Publishing，2014，60（6）：1537-1550.
[7] BESBES O，ZEEVI A.Dynamic pricing without knowing the demand function：risk bounds and near-optimal algorithms[J].Operations Research，2009，57（6）：1407-1420.
[8] SLIVKINS A.Introduction to multi-armed bandits[J].arXiv：1904.07272，2019.
[9] 毕文杰，郭乐薇.基于多摇臂赌博机的产品定价算法[J].计算机工程与应用，2021，57（11）：224-231.
BI W J，GUO L W.Product pricing algorithm based on multi-armed bandit[J].Computer Engineering and Applications，2021，57（11）：224-231.
[10] 乔勋双，毕文杰.考虑时变奖励的多摇臂算法在动态定价中的应用[J].计算机工程与应用，2021，57（12）：237-242.
QIAO X S，BI W J.Application of multi-armed bandit algorithm with time-varying rewards in dynamic pricing[J].Computer Engineering and Applications，2021，57（12）：237-242.
[11] FERREIRA K J，SIMCHI-LEVI D.Online network revenue management using Thompson sampling[J].Operations Research，2018，66（6）：1586-1602.
[12] RINGBECK D，HUCHZERMEIER A.Dynamic pricing and learning：an application of Gaussian process regression[J/OL].Social Science Electronic Publishing（2019-06-24）[2020-12-28].https：//ssrn.com/abstract=3406293.
[13] HERNáNDEZ-LOBATO J M，REQUEIMA J，PYZER-KNAPP E O，et al.Parallel and distributed Thompson sampling for large-scale accelerated exploration of chemical space[C]//Proceedings of the 34th International Conference on Machine Learning，2017：1470-1479.
[14] KANDASAMY K，KRISHNAMURTHY A，SCHNEIDER J，et al.Parallelised Bayesian optimisation via Thompson sampling[C]//Proceedings of the 2018 International Conference on Artificial Intelligence and Statistics，2018：133-142.
[15] DE PALMA A，MENDLER-DüNNER C，PARNELL T，et al.Acquisition functions for batch Bayesian optimization[J/OL].arXiv：1903.09434，2019.
[16] RASMUSSEN C E，WILLIAMS C K I.Gaussian processes for machine learning[M].Cambridge：MIT Press，2006.
[17] 崔佳旭，杨博.贝叶斯优化方法和应用综述[J].软件学报，2018，29（10）：3068-3090.
CUI J X，YANG B.Survey on Bayesian optimization methodology and applications[J].Journal of Software，2018，29（10）：3068-3090.
[18] SHAHRIARI B，SWERSKY K，WANG Z，et al.Taking the human out of the loop：a review of Bayesian optimization[J].Proceedings of the IEEE，2015，104（1）：148-175.
[19] RUSSO D，VAN ROY B，KAZEROUNI A，et al.A tutorial on Thompson sampling[J].Foundations and Trends in Machine Learning，2017，11（1）：1-42.
[20] BUBECK S，LIU C Y.Prior-free and prior-dependent regret bounds for Thompson sampling[C]//Proceedings of the 48th Annual Conference on Information Sciences and Systems，2014.
[21] SRINIVAS N，KRAUSE A，KAKADE S M，et al，Information-theoretic regret bounds for Gaussian process optimization in the bandit setting[J].IEEE Transactions on Information Theory，2012，58（5）：3250-3265.
[22] RUSSO D，VAN ROY B.Learning to optimize via posterior sampling[J].Mathematics of Operations Research，2014，39（4）：1221-1243.
[23] AGRAWAL S，GOYAL N.Thompson sampling for contextual bandits with linear payoffs[C]//Proceedings of the 30th International Conference on International Conference on Machine Learning，2013：127-135.