计算机工程与应用 ›› 2015, Vol. 51 ›› Issue (2): 117-124.

• 数据库、数据挖掘、机器学习 • 上一篇    下一篇

基于压缩全文索引的演变图查询

肖  洋1,2,朱  青1,2,吴粤皖1   

  1. 1.中国人民大学 信息学院 计算机系,北京 100872
    2.中国人民大学 信息学院 数据工程与知识工程教育部重点实验室,北京 100872
  • 出版日期:2015-01-15 发布日期:2015-01-12

Querying on evolving graphs based on compressed full-text index

XIAO Yang1,2, ZHU Qing1,2, WU Yuewan1   

  1. 1.Department of Computer Science, School of Information, Renmin University of China, Beijing 100872, China
    2.Key Laboratory of Data Engineering and Knowledge Engineering, School of Information, Renmin University of China, Beijing 100872, China
  • Online:2015-01-15 Published:2015-01-12

摘要: 演变图中含有大量的时间和空间信息,其中某些空间信息随着时间的推移表现出相似的演变规律。给出了一种演变图查询模型,可以挖掘出在相同时间范围内具有相同变化规律的演变子图。但是演变图的规模往往是巨大的,当需要对其进行多次查询时,每次遍历整个演变图将带来非常高的查询代价,而现有的基于枚举的哈希索引算法又使得预处理过程拥有相当大的时间和空间开销,为了减少对大规模演变图的预处理代价,将压缩的全文索引技术应用于演变图,它基于涡轮转换和后缀数组。在构建后缀数组时,给出了两种不同的线性算法,确保了预处理过程的稳定性。通过在Facebook、Enron邮件系统以及模拟数据集上的实验,评估了该算法的可行性、效率以及可扩展性。

关键词: 演变图, 查询, 演变子图, 后缀数组, 压缩全文索引

Abstract: Evolving graph contains large amount of temporal and spatial information, some of which always perform in similar evolving rules. This paper gives a query model, mining for the evolving subgraphs whose edges changing in the same way at the same time range. However, the size of evolving graphs in real world is huge. Querying on it repeatedly will cost a lot. Even though the existing index method based on Hash has solved query problem, it is also faced in challenge of preprocessing. In order to reduce the price of preprocessing in mass evolving graph, it proposes a compressed full-text indexing technique. It is based on Burrows-Wheeler transform and suffix array. In constructing a suffix array, it also gives two different linear algorithms, ensuring the stability of preprocessing. It evaluates the feasibility, efficiency and scalability of the algorithm on Facebook, Enron email system and simulated datasets.

Key words: evolving graph, query, evolving subgraph, suffix array, compressed full-text index