计算机工程与应用 ›› 2016, Vol. 52 ›› Issue (8): 62-69.

• 大数据与云计算 • 上一篇    下一篇

一种基于可排序视图的RDF模式匹配算法

熊  政1,2,王金明2,郑海雁1,2,李昆明1,徐立臻2,崇志宏2   

  1. 1.江苏方天电力技术有限公司智能电网产品中心,南京 211189
    2.东南大学 计算机科学与工程学院,南京 211189
  • 出版日期:2016-04-15 发布日期:2016-04-19

RDF pattern matching algorithm with sorted view

XIONG Zheng1,2, WANG Jinming2, ZHENG Haiyan1,2, LI Kunming1, XU Lizhen2, CHONG Zhihong2   

  1. 1.Jiangsu Frontier Electric Technology Co. LTD Smart Grid Center, Nanjing 211189, China
    2.School of Computer Science and Engineering, Southeast University, Nanjing 211189, China
  • Online:2016-04-15 Published:2016-04-19

摘要: 随着语义网络中数据量的激增,在RDF数据集中高效查询数据已成为一个亟待解决的问题。传统的基于物化视图的RDF模式匹配方法虽然能降低表的自连接操作次数,加快查询模式重写过程,但在视图集中检索模式匹配的视图等价于子图同构这一NP-hard问题。为了减小查询模式重写代价,提高RDF模式匹配过程效率,引入可排序视图概念,设计包含映射发现算法contain及其扩展算法contain+,简化等长度模式间包含映射发现过程,同时保证模式间的匹配代价与输入数据的规模线性相关。此外,提出基于倒排表/MapReduce检索候选可排序视图的方法,实现RDF模式重写算法rewrite,用以处理不同规模数据集上的模式匹配问题。理论分析及实验证明,基于可排序视图的RDF模式匹配算法能有效地兼顾算法效率及算法可扩展性。

关键词: 可排序视图, 倒排表, Mapreduce, 模式重写

Abstract: Along with the rocketing amount of data in semantic web, querying in RDF data set has become a urgent problem to be solved. Traditional RDF pattern matching methods based on materialized views can reduce the times of table joining, speed up the rewriting process of query pattern, but the problem of retrieving matching views in view sets is equivalent to subgraph isomorphism, which is NP-hard. In order to reduce the cost of query pattern rewriting, and improve the efficiency of RDF pattern matching, the concept of sorted view is introduced, contain mapping discovery algorithm is designed and it simplifies the process of contain mapping discovery between patterns with the same length and ensures linear correlation between the matching price and the scale of input data. Furthermore methods of searching candidate sorted views through inverted list or MapReduce are proposed and RDF pattern rewriting algorithm is implemented to solve pattern matching problem in different data sets. Theoretical analysis and experimental results demonstrate that the RDF pattern matching algorithms based on sorted view can reach the trade-offs between efficiency and scalability.

Key words: sorted view, inverted list, MapReduce, pattern rewriting