信息网络安全 ›› 2017, Vol. 17 ›› Issue (10): 81-85.doi: 10.3969/j.issn.1671-1122.2017.10.013

• • 上一篇    下一篇

基于网络空间安全实时数据的HDFS小文件问题研究

王绍节(), 龙春, 万巍, 赵静   

  1. 中国科学院计算机网络信息中心,北京100190
  • 收稿日期:2017-08-01 出版日期:2017-10-10 发布日期:2020-05-12
  • 作者简介:

    作者简介: 王绍节(1989—),男,河北,工程师,硕士,主要研究方向为计算机技术;龙春(1979—),男,湖北,高级工程师,博士,主要研究方向为网络体系结构和网络安全;万巍(1982—),男,湖北,高级工程师,博士,主要研究方向为网络空间安全;赵静(1987—),女,甘肃,工程师,博士研究生,主要研究方向为网络安全。

  • 基金资助:
    国家自然科学基金青年科学基金 [61601443];国家重点研发计划[2017YFB0801900]

Research on HDFS Small File Problem Based on Real-time Data of Cybersecurity

Shaojie WANG(), Chun LONG, Wei WAN, Jing ZHAO   

  1. Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China
  • Received:2017-08-01 Online:2017-10-10 Published:2020-05-12

摘要:

网络空间安全态势感知要求实时掌握安全风险信息,即实时将数据写入并能尽快检索出来。如果同一存储空间被同时执行读写操作,将引发冲突,导致错误。部分实时数据源具备文件定期转移功能,可以解决冲突问题。当时间间隔较短时,定期转移将产生众多小文件,造成存储资源浪费。针对产生的小文件问题,文章研究并提出了基于文件阈值的转移追加策略。在定期转移文件的基础上,添加文件追加的功能,采取一定策略对文件进行合并处理,保证文件大小不低于预定阈值。实验结果显示该策略可以有效减少生成的文件数量,降低对存储空间资源的浪费。

关键词: 大数据, 分布式文件系统, 网络空间安全

Abstract:

Cybersecurity awareness needs the real-time risk information to work. That is to write the real-time data and search it out as soon as possible. However, read and write the same storage unit at the same time will cause conflict and finally result into error. Some data source has the ability to transfer files on a regular basis, which can solve this problem. But it will produce a lot of small files and waste a lot of storage with small interval. To solve the small file problem, this paper came up with a file transfer append strategy based on file size. That is to add append function to the write and transfer file function to merge small files. This strategy can guarantee the file size over the pre-set value. The simulation result shows that this strategy can reduce the file amount and cut down the waste of storage effectively.

Key words: big data, HDFS, cybersecurity

中图分类号: