Netinfo Security ›› 2016, Vol. 16 ›› Issue (9): 267-271.doi: 10.3969/j.issn.1671-1122.2016.09.051

• Orginal Article • Previous Articles     Next Articles

Design of Storage Structure in HBase for Microblog Information Analysis

Xilin CHEN(), Ding MA   

  1. People's Public Security University of China, Beijing 102623, China
  • Received:2016-07-25 Online:2016-09-20 Published:2020-05-13

Abstract:

With the development of the Internet, microblog's impact on people's life is getting deeper. Due to the surge of microblog users, it has a very large amount of data, and every moment in the rapid growth.As this situation, the traditional database for massive data processing has been difficult to meet the demand. So NoSQL database came into being.Among them, HBase which mentioned in this paper is one of the most popular open source NoSQL currently. HBase, as a new type of NoSQL database which is based on Hadoop Distributed File System, can not only meet the efficient storage of structured data, and achieve efficient processing through the Mapreduce ,but also store unstructured data provide relatively flexible information storage and management for massive data.What’s the most important is HBase cluster is very convenient to expand. It only need to increase the slave node machine,which will be easier than the expansion operation of traditional database,such as read and write separation, with separate tables. In this paper, we studied the design of Row-key for microblog's information in HBase. We discussed from the angle of depth and breadth of information.The query efficiency of HBase is improved by two level index. In the premise of not changing HBase source code, we solved the problem that the information query subject to the design of Key-rows in a large extent, and gave full consideration to the applicable storage mode for microblog information such as photos, links,etc, to meet the efficient management of the microblog information.

Key words: microblog, Hadoop, NoSQL, HBase, two-layer based index

CLC Number: