计算机工程与应用 ›› 2015, Vol. 51 ›› Issue (14): 99-103.

• 数据库、数据挖掘、机器学习 • 上一篇    下一篇

海量活动轨迹相似查询

刘  勇,覃  飙,余  萝   

  1. 中国人民大学 信息学院,北京 100872
  • 出版日期:2015-07-15 发布日期:2015-08-03

Towards similarity search for massive activity trajectories

LIU Yong, QIN Biao, YU Luo   

  1. School of Information, Renmin University of China, Beijing 100872, China
  • Online:2015-07-15 Published:2015-08-03

摘要: 活动轨迹的近似查询是在带关键词信息的轨迹集中,检索与查询点集距离最近且满足查询点集关键词要求的活动轨迹的过程。因为GAT(Grid index for Activity Trajectories)不能查询海量活动轨迹,将GAT扩展到适用于海量活动轨迹的近似查询技术GATH(GAT on Hadoop)。和GAT相比,GATH使用两种新的索引结构进行剪枝;其网格索引依照海量数据的特点从底层单元格开始进行基于空间的剪枝;其倒排索引用于进行基于关键词的剪枝。实验结果证实GATH比GAT能有效缩短索引建立时间及提高剪枝效率。

关键词: 海量数据, 活动轨迹, 海量活动轨迹的网格索引(GATH), 近似查询

Abstract: Given a sequence of query locations, each associated with a set of key activities, an activity trajectory similarity query returns k trajectories that cover the query activities and yield the shortest minimum match distance. Since GAT (Grid index for Activity Trajectories) is not for big data, it introduces a new structure GATH (GAT on Hadoop) to solve the problem of similarity search on massive activity trajectories. Moreover, GATH uses grid index for space pruning and inverted index for keyword pruning. The experimental results demonstrate that GATH is more efficient for both index building and data pruning than GAT.

Key words: massive data, active trajectories, Grid index for Activity Trajectories on Hadoop(GATH), similarity search