计算机工程与应用 ›› 2022, Vol. 58 ›› Issue (3): 83-90.DOI: 10.3778/j.issn.1002-8331.2101-0109

• 理论与研发 • 上一篇    下一篇

基于去中心化索引的IPFS数据获取方法研究

石秋娥,周喜,王轶   

  1. 1.中国科学院 新疆理化技术研究所,乌鲁木齐 830011
    2.中国科学院大学,北京 100049
    3.中国科学院 新疆理化技术研究所 新疆民族语音语言信息处理实验室,乌鲁木齐 830011
  • 出版日期:2022-02-01 发布日期:2022-01-28

Research of IPFS Data Acquisition Method Based on Decentralized Index

SHI Qiu’e, ZHOU Xi, WANG Yi   

  1. 1.Xinjiang Technical Institute of Physics & Chemistry, Chinese Academy of Sciences, Urumqi 830011, China
    2.University of Chinese Academy of Sciences, Beijing 100049, China
    3.Xinjiang Laboratory of Minority Speech and Language Information Processing, Xinjiang Technical Institute of Physics & Chemistry, Chinese Academy of Sciences, Urumqi 830011, China
  • Online:2022-02-01 Published:2022-01-28

摘要: 星际文件系统(interplanetary file system,IPFS)实现了去中心化存储,可以满足日益增长的数据存储需求,然而IPFS仅提供一种精确的数据获取方式,在缺乏数据唯一标识时无法查找数据。现有的IPFS数据获取方法削弱了IPFS的去中心化,仅实现了关键字搜索,对长查询语句进行关键词搜索加重了网络负担。为此,提出了一种去中心化混合索引的IPFS数据获取方法——IPFS-DDAM。提取数据的关键词及中心语句以建立关键词索引及句子索引;使用分布式哈希表(distributed Hash table,DHT)存储索引,数据内容相似的句子索引存储相邻,实现了句子索引的邻近范围搜索及关键词索引的精确搜索;改进缓存存储机制,减少冗余存储;通过在公开数据集上的仿真实验证明了该方法的有效性,并且减少了网络负担。

关键词: 星际文件系统, 分布式哈希表, 去中心化, 关键词索引, 句子索引, 缓存

Abstract: The interplanetary file system(IPFS) is a decentralized file system that can meet the growing demand of data storage. However, only a precise way to retrieve data is provided by IPFS, so the data cannot be found in the absence of a unique data identifier. The current data acquisition method weakens the decentralization of IPFS, and only realizes keyword search. In order to solve them effectively, a data acquisition method based on a decentralized mixtureindex mechanism is proposed. Keywords and central sentences are extracted from data and used to build keyword indexes and sentence index respectively. All indexes are stored at distributed Hash table(DHT). Keyword indexes realize precise search based on the Hash value of keywords. And the sentence indexes with similar content are stored neighboring, consequently, search operations can be executed in the neighborhood of nodes that storing the sentence index. The redundant data is reduced by improving the storage mechanism of cache. The experiments on public data sets prove the effectiveness of the proposed method.

Key words: interplanetary file system(IPFS), distributed Hash table(DHT), decentralized, keyword index, sentence index, cache