计算机工程与应用 ›› 2015, Vol. 51 ›› Issue (24): 254-259.

• 工程与应用 • 上一篇    下一篇

基于经验分布的打车概率和等待时间预测

王诏远1,2,李天瑞1,2,程  尧3,王  跃1,2,易修文1,2   

  1. 1.西南交通大学 信息科学与技术学院,成都 610031
    2.四川省云计算与智能技术高校重点实验室,成都 610031
    3.西南交通大学 数学学院,成都 610031
  • 出版日期:2015-12-15 发布日期:2015-12-30

Prediction of probability of hitting vacant taxi and waiting time based on empirical distribution

WANG Zhaoyuan1,2, LI Tianrui1,2, CHENG Yao3, WANG Yue1,2, YI Xiuwen1,2   

  1. 1.School of Information Science and Technology, Southwest Jiaotong University, Chengdu 610031, China
    2.Key Laboratory of Cloud Computing and Intelligent Technology, Sichuan Province, Chengdu 610031, China
    3.School of Mathematics, Southwest Jiaotong University, Chengdu 610031, China
  • Online:2015-12-15 Published:2015-12-30

摘要: 提出了一种预测乘客在指定位置和指定时间预测打车概率和等待时间的方法。设计了一种将地图离散化,使用特征点修复GPS轨迹的解决方案,且适用于大数据问题;在修复的GPS数据基础上提出了基于经验分布在等待特征点和时间点的打车概率和等待时间模型;并基于该模型预测用户指定位置和指定时间的打车概率。另外给出了基于该模型的增量学习的方法。大规模GPS轨迹数据使用Hadoop平台实现了管理和分析计算,证明了该方案的可行性;预测结果在仿真实验中取得了良好的效果,证明了模型具有较高的准确性,同时可以期望准确性随着数据量的增大而提升;另外该模型得到的特征点和特征时间概率和等待时间的参考表并不会随着GPS轨迹数据的增大而增大,证明了模型有良好的可扩展性。

关键词: 出租车轨迹, 打车概率预测, 等待时间预测, Hadoop

Abstract: This paper presents an approach for predicting the probability of hitting a vacant taxi and waiting time of one passenger at a specified location and time. A solution that discretizes the map and uses the feature points to repair GPS trajectory is provided, which is suitable for big data problems. Based on the empirical distribution, a model for computing feature points and time points’ probability of hitting a vacant taxi and the waiting time is proposed by using the repaired GPS trajectory. The probability of hitting vacant taxi and the waiting time by the user-specified location and time is predicted according to the model. Alternatively an incremental learning method is introduced based on the model. A large-scale GPS trajectory data is managed and analyzed using Hadoop platform. The feasibility of the proposed solution is proven. Simulation experiments validate the performance of the model, which proves the model has a high accuracy and it may enhance with the increasing size of data. The reference table including the feature points, the time points’ probability and waiting time, does not increase with the increasing of GPS trajectory data. It shows that the proposed model has a good scalability.

Key words: taxi trajectory, prediction of probability of hitting vacant taxi, prediction of waiting time, Hadoop