计算机工程与应用 ›› 2019, Vol. 55 ›› Issue (10): 233-239.DOI: 10.3778/j.issn.1002-8331.1810-0018

• 工程与应用 • 上一篇    下一篇

多参数的城市时空热点查询

康家兴,牛保宁,郝晋瑶   

  1. 太原理工大学 信息与计算机学院,太原 030000
  • 出版日期:2019-05-15 发布日期:2019-05-13

College of Information and Computer, Taiyuan University of Technology, Taiyuan 030000, China

KANG Jiaxing, NIU Baoning, HAO Jinyao   

  1. College of Information and Computer, Taiyuan University of Technology, Taiyuan 030000, China
  • Online:2019-05-15 Published:2019-05-13

摘要: 城市时空热点指城市居民来往次数较多且交通流量较大的时空区域。确定城市时空热点在城市基础设施建设、交通规划、商铺选址、打击犯罪等公共服务领域有大量的应用。目前的热点检测通常是在收集到的全部出租车轨迹上,采用Getis-Ord统计学方法,把轨迹按照时空立方单元进行划分,计算所有轨迹数据覆盖下的热点单元,作为城市时空热点。由于积累的轨迹数量庞大且计算复杂,现有检测算法的重点放在了如何应对海量的数据上。但随着实际应用的扩展,很多需求下的热点检测不需要用到全部数据,适当的数据组织可以使热点检测变得高效。针对实际应用的需要,时空热点查询可以按照用户指定参数(地理范围、日期范围、城市热点大小和时间组织方式),计算时空区域的热度,返回TOP-K热度单元作为时空热点。针对不同的查询参数,时空热点查询需要处理的数据不同,小粒度三维网格索引的轨迹数据组织方法能够快速提取需要处理的轨迹数据。用纽约市出租车轨迹数据集在Spark集群进行查询实验,结果证明这样的索引方法和存储策略能够满足指定参数,并大幅减少查询响应时间。

关键词: 城市时空热点, 大数据, 数据组织, Spark

Abstract: City hot-spots are a spatial-temporal area with frequent trips and larger traffic flow. It has a large number of applications in the areas of public services, such as urban infrastructure construction, transportation planning, selection of shop location, and crime prevention. The current methods for hot-spots detection usually adopt the Getis-Ord statistical method to divide the trajectories into spatial-temporal areas and calculate the hot value of the areas covered by all the trajectory data as city hot-spots. Due to the large amount of data and complex calculation, the current methods focus on how to deal with massive amounts of data. With the expansion of the application field, hot-spots detection does not have to use all data, and proper data organization can make hot-spots detection efficient. This paper proposes a hot-spots query to better meet the flexible needs of users, which calculates the hot value of the areas according to the user-specified parameters(geographical range, date range, size of hot-spots and way of time organization) and returns the TOP-K hot areas as city hot-spots. For different query parameters, the hot-spots query needs to process different data. This paper proposes a trajectory data organization method with small-scale grid index to quickly extract the trajectory data that need to be processed. Using the New York City taxi trajectory dataset to conduct query experiments in the Spark cluster, the results show that the proposed indexing method and storage strategy can greatly reduce the time.

Key words: city hot-spots, big data, data management, Spark