信息网络安全 ›› 2016, Vol. 16 ›› Issue (3): 59-63.doi: 10.3969/j.issn.1671-1122.2016.03.010

• • 上一篇    下一篇

基于HBase的RDF数据存储方案研究与设计

王媛媛1(), 吕晓丹1, 胡琪1, 吴鸿川2   

  1. 1.贵州大学计算机科学与技术学院,贵州贵阳 550025
    2.贵州大学大数据与信息工程学院,贵州贵阳 550025
  • 收稿日期:2016-01-18 出版日期:2016-03-25 发布日期:2020-05-13
  • 作者简介:

    作者简介: 王媛媛(1990--),女,贵州,硕士研究生,主要研究方向为大数据技术与算法;吕晓丹(1970--),男,贵州,副教授,硕士,主要研究方向为算法设计与数据分析;胡琪(1991--),女,贵州,硕士研究生,主要研究方向为信息安全;吴鸿川(1995--),男,四川,本科,主要研究方向为通信网络与信息科学.

  • 基金资助:
    黔科合JZ字[2014]2001

Research and Design on the Storage Model for RDF Data Based on HBase

Yuanyuan WANG1(), Xiaodan LV1, Qi HU1, Hongchuan WU2   

  1. 1. College of Computer Science and Technology, Guizhou University, Guiyang Guizhou 550025, China
    2. College of Big Data and Information Engineering, Guizhou University, Guiyang Guizhou 550025, China
  • Received:2016-01-18 Online:2016-03-25 Published:2020-05-13

摘要:

针对RDF数据的存储问题,文章提出一种利用分布式数据库HBase以及设计数据库的Rowkey进行有效存储的方案,主要利用HBase与RDF数据的特点,使用经典的BKDRHash算法对谓词进行散列计算,将散列值与谓词作为主键,实现数据的存储.有效设置HBase的Rowkey不仅避免出现节点堆积现象,BKDRHash算法的使用也保证了数据的完整性.为证明该存储模式的有效性,文章实验选择利用MapReduce将数据生成HBase内部存储格式HFile文件进行并行加载.实验证明,针对这样的存储模式,当数据量很大时数据加载性能较好.使用LUBM测试集进行仿真实验,证明该方案是有效的.

关键词: 语义网, RDF数据, HBase, MapReduce, HFile文件

Abstract:

Aim ing at the storage of RDF data, this paper proposes an effective storage scheme based on the Rowkey and the distributed database HBase , which mainly use of the characteristics of HBase and RDF data. The method uses the classic BKDRHash algorithm to hash the predicate, and looks the hash value and the predicate as the primary key to enforce the data storage. Effective setting the Rowkey of HBase not only avoid the phenomenon of node accumulation, the use of BKDRHash algorithm also ensures the integrity of the data. In order to prove the validity of this storage mode, the experiment is to use MapReduce to load the data into HBase in parallel ways with the internal storage format HFile file. Experiments show that, for such a storage model, when the data quantity is large, the data loading can achieve better performance. The paper mainly uses the LUBM test set to carry on the simulation experiment, and it proves that the scheme is effective.

Key words: semantic Web, RDF data, HBase, MapReduce, HFile file

中图分类号: