一种基于Solr的HBase海量数据二级索引方案

doi:10.3969/j.issn.1671-1122.2017.08.006

信息网络安全 ›› 2017, Vol. 17 ›› Issue (8): 39-44.doi: 10.3969/j.issn.1671-1122.2017.08.006

一种基于Solr的HBase海量数据二级索引方案

王文贤^1,², 陈兴蜀^1,², 王海舟^1,²(), 吴小松²

1. 四川大学网络空间安全研究院,四川成都610065
2. 四川大学计算机学院网络与可信计算研究所,四川成都610065

收稿日期:2017-06-26 出版日期:2017-08-20 发布日期:2020-05-12
作者简介:
作者简介：王文贤（1978—）,男,福建,讲师,博士,主要研究方向为网络空间安全、舆情分析和挖掘;陈兴蜀（1968—）,女,贵州,教授,博士,主要研究方向为云计算和大数据安全、网络情报分析;王海舟（1986—）,男,四川,讲师,博士,主要研究方向为网络空间安全、舆情分析和挖掘;吴小松（1989—）,男,四川,硕士研究生,主要研究方向为网络空间安全、文本挖掘。
基金资助:
国家科技支撑计划[2012BAH18B05];国家自然科学基金 [61272447];四川省科技厅计划项目[16ZHSF0483]

A Secondary Index Scheme of Big Data in HBase Based on Solr

Wenxian WANG^1,², Xingshu CHEN^1,², Haizhou WANG^1,²(), Xiaosong WU²

1.Cybersecurity Research Institute, Sichuan University, Chengdu Sichuan 610065, China
2.Network and Trusted Computing Institute, College of Computer, Sichuan University, Chengdu Sichuan 610065, China

Received:2017-06-26 Online:2017-08-20 Published:2020-05-12

摘要/Abstract

摘要：

针对HBase不提供二级索引和华为的hindex方案难以满足海量数据检索速度需求的问题,文章设计了基于Solr的HBase二级索引方案SIHBase（Solr Indexing HBase）。该方案使用HBase的Coprocessor（协处理器）为数据表的创建、修改、删除以及数据的插入、更新、删除和恢复等操作都实现了相应的回调函数,通过回调函数向Solr发送相关请求,以实现在Solr中自动为HBase建立和维护二级索引,保证数据与索引的一致性。该方案具有良好的通用性,可以同时为多张表的多列数据建立索引。该方案扩展了HBase的客户端功能,增加了直接查询Solr的接口,利用Solr提供的高效、灵活、多样的检索功能实现对HBase海量数据的快速检索。最后,与hindex进行了二级索引的查询性能对比实验,证明了该方案在查询速度上要远快于hindex。

关键词: HBase, 二级索引, Solr, 快速检索

Abstract:

For the problem that HBase cannot provide secondary indexes and Huawei hindex scheme is difficult to meet the project demand of retrieval speed, designed an HBase secondary index scheme named SIHBase (Solr Indexing HBase) based on Solr. The scheme uses HBase Coprocessor to achieve the related callback function for creating, changing, deleting operations of the data tables and inserting, updating, deleting recovering operations of data. Thus, it can create and manage secondary indexes in Solr for HBase automatically and ensure the consistency of the data and index. The scheme has favorable generality and can create index for multi-column data of multiple tables in the meantime. And then extended the client-side function of HBase, increased the direct query interface of Solr, using efficient, flexible and diversified retrieval functions which Solr provided to realize quick retrieval for the mass data of HBase. Finally, a contrast experiment about query performance with hindex show that SIHBase was turned out to be much faster than hindex in query speed.

Key words: HBase, secondary index, Solr, quick retrieval

中图分类号:

TP309

王文贤, 陈兴蜀, 王海舟, 吴小松. 一种基于Solr的HBase海量数据二级索引方案[J]. 信息网络安全, 2017, 17(8): 39-44.

Wenxian WANG, Xingshu CHEN, Haizhou WANG, Xiaosong WU. A Secondary Index Scheme of Big Data in HBase Based on Solr[J]. Netinfo Security, 2017, 17(8): 39-44.

图/表 7

图1

图2

图3

图4

图5

表1

表2

参考文献 19

[1]	中国互联网信息中心. CNNIC 中国互联网络发展状况统计报告 [EB/OL]. , 2017-2-15.
[2]	江民彬. 非关系型与关系型空间数据库对比分析与协同应用研究[D]. 北京: 首都师范大学, 2013.
[3]	The Apache Software Foundation. Apache HBase [EB/OL]. , 2016-6-15.
[4]	WANG D, XIAO L.Storage and Query of Condition Monitoring Data in Smart Grid Based on Hadoop[C] // IEEE Computer Society. 2012 Fourth International Conference on Computational and Information Sciences (ICCIS), August 17-19, 2012, Chongqing, China. Washington, DC, USA: IEEE Computer Society, 2012: 377-380.
[5]	宋衍,周庆,张国双,等. 一种基于数据库服务的密文检索实现[J]. 信息网络安全,2015(9):10-14.
[6]	NISHIMURA S, DAS S, AGRAWAL D, et al.MD-HBase: a Scalable Multi-dimensional Data Infrastructure for Location Aware Services[C] //Luleå University of Technology. 12th IEEE International Conference on Mobile Data Management (MDM), June 6-9, 2011, Luleå, Norrbotten, Sweden. Washington, DC, USA: IEEE Computer Society, 2011, 1: 7-16.
[7]	刘浩阳. MS SQL数据库在线取证研究[J]. 信息网络安全,2016 (9):26-30.
[8]	CLINT M. Indexed Transactional HBase [EB/OL]. , 2016-6-15.
[9]	YORAM K. IHBase [EB/OL]. , 2016-6-15.
[10]	陈新鹏. 基于HBase的数据生成与索引方法的研究[D]. 北京: 北京邮电大学, 2014.
[11]	张榆, 马友忠, 孟小峰. 一种基于HBase的高效空间关键字查询策略[J]. 小型微型计算机系统, 2012, (10): 2141-2146.
[12]	GAO X, ROTH E, MCKELVEY K, et al.Cloud Computing for Data-Intensive Applications[M]. New York: Springer, 2014.
[13]	CHENG P, AN J.The Key as Dictionary Compression Method of Inverted Index Table under the Hbase Database[J]. Journal of Software, 2013, 8(5): 1086-1093.
[14]	MINGJIE L, EUGENE K, ANDREW P. Apache HBase [EB/OL]. , 2016-06-15.
[15]	丁飞, 陈长松, 张涛,等. 基于协处理器的 HBase区域级第二索引研究与实现[J]. 计算机应用, 2014(S1): 181-185.
[16]	卓海艺. 基于HBase的海量数据实时查询系统设计与实现[D]. 北京: 北京邮电大学, 2013.
[17]	邹敏昊. 基于Lucene的HBase全文检索功能的设计与实现[D]. 南京: 南京大学, 2013.
[18]	华为. HBase二级索引hindex [EB/OL]. , 2016-6-15.
[19]	RAJESHBABU C. hindex [EB/OL]. , 2016-6-15.

一种基于Solr的HBase海量数据二级索引方案

A Secondary Index Scheme of Big Data in HBase Based on Solr

RichHTML

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

图/表 7

参考文献 19

相关文章 4

编辑推荐

Metrics

本文评价

[1]	傅智宙, 王利明, 唐鼎, 张曙光. 基于同态加密的HBase二级密文索引方法研究[J]. 信息网络安全, 2020, 20(4): 55-64.
[2]	陈希林, 马丁. 针对微博信息分析的HBase存储结构设计[J]. 信息网络安全, 2016, 16(9): 267-271.
[3]	王媛媛, 吕晓丹, 胡琪, 吴鸿川. 基于HBase的RDF数据存储方案研究与设计[J]. 信息网络安全, 2016, 16(3): 59-63.
[4]	李青云, 余文. 关系型数据库到HBase的转换设计[J]. 信息网络安全, 2015, 15(1): 51-55.