计算机工程与应用 ›› 2019, Vol. 55 ›› Issue (10): 83-89.DOI: 10.3778/j.issn.1002-8331.1801-0471

• 大数据与云计算 • 上一篇    下一篇

时空属性关系标签的频繁轨迹模式挖掘

潘晓英1,2,赵  倩1,赵  普1   

  1. 1.西安邮电大学 计算机学院,西安 710121
    2.西安邮电大学 陕西省网络数据智能处理重点实验室,西安 710121
  • 出版日期:2019-05-15 发布日期:2019-05-13

Frequent Trajectory of Pattern Mining with Spatio-Temporal Attribute and Relationship Label

PAN Xiaoying1,2, ZHAO Qian1, ZHAO Pu1   

  1. 1.School of Computer Science & Technology, Xi’an University of Posts and Telecommunications, Xi’an 710121, China
    2.Shaanxi Key Laboratory of Network Data Intelligent Processing, Xi’an University of Posts and Telecommunications, Xi’an 710121, China
  • Online:2019-05-15 Published:2019-05-13

摘要: 校园卡技术的广泛应用是高校信息化程度的重要标志,其中学生消费数据隐含了强大的潜在价值,对其进行挖掘具备重大的实用意义。由此,提出一种将校园消费流水数据转换为带有时空属性的消费轨迹树DP-DBSCAN算法和带有关系标签的频繁轨迹挖掘模式FP-TRtree。DP-DBSCAN算法采用时间分块、顺序查询和距离度量,能高效地将数据转换为FP-TRtree带有顺序的频繁一项集,同时无需考虑参数问题,也避免了查询每个数据点最近邻对象的巨大耗时。FP-TRtree模式按顺序添加关系值,支持度降序排序,并对相同轨迹节点间的关系标签不断迭代优化。可视化分析结果表明,该数据转换算法和挖掘模式不但可以发现频繁消费的学生关系轨迹网及孤立人群,而且能定量描述节点间学生的消费亲密程度,同时也减少了数据库扫描次数以及树分支的建立。实验结果不仅符合学生实际消费情况,还能从复杂的消费网络中发现隐含的信息,为院校管理、领导决策提供可参照的依据。

关键词: DP-DBSCAN算法, 一卡通数据, 关系标签, FP-TRtree模式, 可视化

Abstract: The wide application of campus card technology is an important symbol to measure the informatization degree in colleges. Among them, the students’ consumption data imply a great potential value, and it is a significance to excavate. Motivated by this problem, this paper proposes a method to convert campus streaming data into a consumption trajectory tree DP-DBSCAN algorithm with spatial attribute, and builds FP-TRtree mining model with relationship label. DP-DBSCAN algorithm adopts the time block, order query and distance measurement, it can transfer data into FP-TRtree frequent item set with order effectively, ignoring parameters. The FP-TRtree model adds value in sequence, supports descending order, and optimizes the relationship label between the same loci. Visualization analysis demonstrates that the method and model not only find the student relationship track network of frequent consumption and isolated populations, but also make a quantitative description between nodes of students spending intimate degree. At the same time, the method reduces the database scan times and the establishment of the branch of a tree. The experimental results conform to real consumption of students, it can find hidden information from the complex consumption network and provide the basis for school management.

Key words: DP-DBSCAN algorithm, cartoon data, relationship label, FP-TRtree mode, visualization