基于深度学习的浏览器Fuzz样本生成技术研究

doi:10.3969/j.issn.1671-1122.2019.03.004

信息网络安全 ›› 2019, Vol. 19 ›› Issue (3): 26-33.doi: 10.3969/j.issn.1671-1122.2019.03.004

基于深度学习的浏览器Fuzz样本生成技术研究

方勇¹, 朱光夏天²(), 刘露平², 贾鹏²

1.四川大学网络空间安全学院,四川成都 610207
2.四川大学电子信息学院,四川成都 610065

收稿日期:2019-01-10 出版日期:2019-03-19 发布日期:2020-05-11
作者简介:
作者简介：方勇（1966—）,男,四川,教授,博士,主要研究方向为信息安全理论与应用、网络攻防及网络行为监管技术;朱光夏天（1993—）,男,湖北,硕士研究生,主要研究方向为Windows 安全、漏洞挖掘与利用;刘露平（1988—）,男,四川,博士研究生,主要研究方向为二进制安全、漏洞挖掘;贾鹏（1988—）,男,河南,博士研究生,主要研究方向为病毒传播动力学、二进制安全、恶意代码分析。
基金资助:
国家重点研发计划[2017YFB0802900]

Research on Browser Fuzz Sample Generation Technology Based on Deep Learning

Yong FANG¹, Guangxiatian ZHU²(), Luping LIU², Peng JIA²

1. College of Cybersecurity, Sichuan University, Chengdu Sichuan 610207, China
2. College of Electronics and Information, Sichuan University, Chengdu Sichuan 610065, China

Received:2019-01-10 Online:2019-03-19 Published:2020-05-11

摘要/Abstract

摘要：

在众多软件漏洞挖掘的方法中,Fuzz测试是最为成熟有效的一种。而传统的Fuzz测试普遍存在挖掘深度不足、样本没有指向性等问题。针对该问题,文章提出一种使用长短期记忆网络（Long Short Term Memory, LSTM）引导生成浏览器Fuzz所需的样本集的框架。该框架包含样本生成和模糊测试两个部分。首先,对样本进行预处理,将样本解析为向量送入神经网络中学习。其次,待神经网络学习完成后,利用学习完成的网络生成样本,并利用传统变异策略将生成的样本进行变异,构成测试集。最后,使用测试集作为输入进行浏览器Fuzz测试。为验证该框架的有效性,对LSTM网络的学习结果、生成样本结果和Fuzz结果进行了统计与分析。实验证明,该框架能满足浏览器Fuzz生成的需求,并克服了传统浏览器Fuzz中样本挖掘深度不足、指向性弱的问题,适合针对某一类或某几类浏览器漏洞的挖掘。

关键词: 浏览器Fuzz, 深度学习, 样本生成, LSTM神经网络, 文件向量化

Abstract:

Fuzz testing is one of the most mature and effective methods among the approaches used to mine vulnerabilities for modern software. However, traditional Fuzz testing generally have some problems, such as limited depth of exploring code space or lacking of directivity in generating samples. To alleviate these issues, a kind of framework was proposed to generate samples of browsers by making use of long short term memory (LSTM) network. The framework consists two components: sample generating and Fuzz testing. Firstly, the sample are encoded into vectors which are much easier to implement in LSTM network. This process is called file preprocessing. After finishing the learning period, the network will generate a mound of samples as test set. Then test set will be generated by mutating samples based on traditional mutation strategies. Finally, the test set will be feed into the browser for Fuzz testing. In order to verify the effectiveness of the framework, the learning results, generating sample results and Fuzz results of LSTM network have been analyzed statistically. It is proofed that the proposed framework could satisfy the needs of browser Fuzz generation and overcome the difficulties of insufficient mining depth and lack of directivity in generating samples in traditional browser Fuzz, which was suitable for mining one or several browser vulnerabilities.

Key words: browser Fuzz, deep learning, sample generation, LSTM neural network, file vectorization

中图分类号:

TP309

方勇, 朱光夏天, 刘露平, 贾鹏. 基于深度学习的浏览器Fuzz样本生成技术研究[J]. 信息网络安全, 2019, 19(3): 26-33.

Yong FANG, Guangxiatian ZHU, Luping LIU, Peng JIA. Research on Browser Fuzz Sample Generation Technology Based on Deep Learning[J]. Netinfo Security, 2019, 19(3): 26-33.

图/表 9

图1

图2

表1

图3

图4

表2

图5

表3

表4

参考文献 18

[1]	WANG xiajing,HU changzhen,MA rui. A Survey of Key techniques of Binary Program Vulnerability Discovery[J]. Netinfo Security, 2017, 17(8): 1-13.
	王夏菁,胡昌振,马锐,等.二进制程序漏洞挖掘关键技术研究综述[J].信息网络安全,2017,17(8):1-13.
[2]	PANG Y, XUE X, NAMIN A S.Early Identification of Vulnerable Software Components via Ensemble Learning[C]// IEEE. 15th IEEE International Conference on Machine Learning and Applications(ICMLA), December 18-20, 2016, Los Angeles, California, USA. New York: IEEE, 2017: 476-481.
[3]	PANG Y, XUE X, NAMIN A S.Predicting Vulnerable Software Components through N-Gram Analysis and Statistical Feature Selection[C]// IEEE. 14th International Conference on Machine Learning and Applications (ICMLA), December 4 -7, 2013, Miami, Florida, USA. New York: IEEE, 2015:543-548.
[4]	WEN weiping, WU bozhi, JIAO yingnan, et al. Design and Implementation on Malicious Document Detection Tool Based on Machine Learning[J]. Netinfo Security, 2018, 18(8): 1-7.
	文伟平,吴勃志,焦英楠,等.基于机器学习的恶意文档识别工具设计与实现[J].信息网络安全,2018,18(8):1-7.
[5]	GODEFROID P, PELEG H, SINGH R.Learn&Fuzz: Machine Learning for Input Fuzzing[C]// IEEE. 32nd IEEE/ACM International Conference on Automated Software Engineering, October 30-November 3, 2017, Urbana-Champaign, IL, USA, New York: IEEE, 2017: 50-59.
[6]	WANG J, CHEN B, WEI L, et al.Skyfire: Data-Driven Seed Generation for Fuzzing[C]//IEEE. 38th IEEE Symposium on Security and Privacy, May 22-24, 2017, San Jose, California, USA. New York: IEEE, 2017: 579-594.
[7]	WU Fang, A Study of Binary Vulnerability Analysis and Detection Based on Deep Leraning[D]. Beijing: Beijing Jiaotong University, 2018.
	吴芳. 基于深度学习的二进制程序漏洞分析与检测方法研究[D].北京:北京交通大学,2018.
[8]	SUNDERMEYER M,SCHLÜTER R,NEY H. LSTM Neural Networks for Language Modeling[C]//INTERSPEECH. 13th Annual Conference of the International Speech Communication Association, September 9-13, 2012, Portland, Oregon, USA. New York: 2012: 601-608.
[9]	KALCHBRENNER N, GREFENSTETTE E,BLUNSOM P. A Convolutional Neural Network for Modelling Sentences[EB/OL]. .
[10]	PALANGI H, DENG L, SHEN Y, et al.Deep Sentence Embedding Using Long Short-Term Memory Networks: Analysis and Application to Information Retrieval[J]. ACM Transactions on Audio Speech & Language Processing, 2016, 24(4): 694-707.
[11]	BENGIO Y, SIMARD P, FRASCONI P.Learning Long-term Dependencies with Gradient Descent is Difficult[J]. IEEE Transactions on Neural Networks, 2002, 5(2):157-166.
[12]	BELTRAMELLI T.Pix2code: Generating Code from a Graphical User Interface Screenshot[C]//EICS. ACM SIGCHI Symposium on Engineering Interactive Computing Systems, June 26-29, 2017, Lisbon, Portugal. New York: ACM, 2018: 3.
[13]	NIEPERT M, AHMED M, KUTZKOV K.Learning Convolutional Neural Networks for Graphs[C]//ICML. 33rd International conference on machine learning, June 19 - 24, 2016, New York City, NY, USA. New York: ICML, 2016: 2014-2023.
[14]	GRAVES A. Generating Sequences With Recurrent Neural Networks[EB/OL]. , 2013-8-4.
[15]	SHE D, PEI K, EPSTEIN D, et al. NEUZZ: Efficient Fuzzing with Neural Program Learning[EB/OL]., 2018-7-15.
[16]	RAJPAL M, BLUM W, SINGH R. Not all Bytes are Equal: Neural Byte Sieve for Fuzzing[EB/OL]., 2017-11-10.
[17]	BÖTTINGER K, GODEFROID P, SINGH R. Deep Reinforcement Fuzzing[C]//IEEE. IEEE Security and Privacy Workshops (SPW), May 24, 2018, San Francisco, CA, USA. New York: IEEE, 2018: 116-122.
[18]	HUANG Yi.Research on Software Security Vulnerability Discovery Based on Fuzzing[D]. Hefei: University of Science and Technology of China, 2010.
	黄奕. 基于模糊测试的软件安全漏洞发掘技术研究[D].合肥:中国科学技术大学,2010.

编辑推荐 0

Metrics

阅读次数

全文

154

HTML			PDF

最新录用	在线预览	正式出版	最新录用	在线预览	正式出版
0	0	11	0	0	143

来源	本网站	其他网站

次数	153	1
比例	99%	1%

摘要

777

最新录用	在线预览	正式出版

0	0	777

	来源	本网站

	次数	777
	比例	100%

向量定义	表达式
标签词向量	N_twi=(N_ti,L_ti,P_ti)
标签属性词向量	N_ewi=(N_ei,L_ei,P_ei)
文本词向量	N_cwi=(N_ci,L_ci,P_ci)
标签向量	N_tsi=(N_twi,N_ewi)
文本向量	N_csi=(N_twi,N_cwi)
标签向量集	&#x02211;N_t = (N_tsi,N_tsj,N_tsk)
文本向量集	&#x02211;N_c = (N_csi,N_csj,N_csk)
输入向量集	&#x02211;N = (N_t,N_c)

超参数名	参数设置
LSTM网络细胞数量	256
激活函数	sigmoid
Dropout比例	0.3

样本批次	样本数量/个	合格样本数量/个	样本合格率/%
第一批	10000	8312	83.12
第二批	10000	8001	80.01
第三批	10000	8362	83.62
第四批	10000	8196	81.96

Crash类型	触发次数	衍生样本数量	非衍生样本数量
内存破坏	31	25	6
网页过大	16	12	4
其他	36	23	13

基于深度学习的浏览器Fuzz样本生成技术研究

Research on Browser Fuzz Sample Generation Technology Based on Deep Learning

RichHTML

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

图/表 9

参考文献 18

相关文章 8

编辑推荐 0

Metrics

本文评价

[1]	王蓉, 马春光, 武朋. 基于联邦学习和卷积神经网络的入侵检测方法[J]. 信息网络安全, 2020, 20(4): 47-54.
[2]	谢永恒, 冯宇波, 董清风, 王梅. 基于深度学习的数据接入方法研究[J]. 信息网络安全, 2019, 19(9): 36-40.
[3]	马春光, 郭瑶瑶, 武朋, 刘海波. 生成式对抗网络图像增强研究综述[J]. 信息网络安全, 2019, 19(5): 10-12.
[4]	朱海麒, 姜峰. 人工智能时代面向运维数据的异常检测技术研究与分析[J]. 信息网络安全, 2019, 19(11): 24-35.
[5]	段大高, 谢永恒, 盖新新, 刘占斌. 基于神经网络的微博虚假消息识别模型[J]. 信息网络安全, 2017, 17(9): 134-137.
[6]	任浩, 罗森林, 潘丽敏, 高君丰. 基于图结构的文本表示方法研究[J]. 信息网络安全, 2017, 17(3): 46-52.
[7]	雷青, 荆丽桦, 赵德明, 郑继龙. 基于深度学习的安卓APP视频枪支检测技术研究[J]. 信息网络安全, 2016, 16(9): 149-153.
[8]	崔鹏飞, 裘玥, 孙瑞. 面向网络内容安全的图像识别技术研究[J]. 信息网络安全, 2015, 15(9): 154-157.