基于流量特征分类的异常IP识别系统的设计与实现

doi:10.3969/j.issn.1671-1122.2021.08.001

信息网络安全 ›› 2021, Vol. 21 ›› Issue (8): 1-9.doi: 10.3969/j.issn.1671-1122.2021.08.001

基于流量特征分类的异常IP识别系统的设计与实现

文伟平(), 胡叶舟, 赵国梁, 陈夏润

北京大学软件与微电子学院,北京 100080

收稿日期:2021-04-12 出版日期:2021-08-10 发布日期:2021-09-01
通讯作者: 文伟平 E-mail:weipingwen@ss.pku.edu.cn
作者简介:文伟平（1976—）,男,湖南,教授,博士,主要研究方向为网络攻击与防范、软件安全漏洞分析、恶意代码研究、信息系统逆向工程和可信计算技术|胡叶舟（1995—）,男,河南,硕士研究生,主要研究方向为异常网络流量识别、区块链安全|赵国梁（1991—）,男,山东,硕士研究生,主要研究方向为恶意代码研究、异常网络流量识别|陈夏润（1997—）男,江西,硕士研究生,主要研究方向为软件安全漏洞分析、恶意代码研究
基金资助:
国家自然科学基金(61872011)

Design and Implementation of an Abnormal IP Identification System Based on Traffic Feature Classification

WEN Weiping(), HU Yezhou, ZHAO Guoliang, CHEN Xiarun

School of Software and Microelectronics, Peking University, Beijing 100080, China

Received:2021-04-12 Online:2021-08-10 Published:2021-09-01
Contact: WEN Weiping E-mail:weipingwen@ss.pku.edu.cn

摘要/Abstract

摘要：

异常IP识别是追踪恶意主机的重要方式,是网络安全研究的热点之一。当前应用机器学习技术进行异常IP识别多依赖整体网络流量,在单台服务器流量下会失效,且面临标记数据成本高昂问题。针对上述问题,文章把聚类算法和遗传算法应用到对端异常IP主机的识别与分类技术中,利用网络流量的多维特征和单台主机上可检测的IP地址特征数据,使用无监督学习和半监督学习相结合的方法,实现对端异常IP的识别、检测,并且将方法实现为异常IP识别系统。系统在实验中能实现对UNSW-NB15数据集9种不同类型恶意IP的识别,识别精度最高可以达到98.84%。文章方法对恶意IP分类工作十分有效,并且可以识别未知类型的恶意IP,具有广泛的适用性和健壮性,已应用在国家某网络安全中心的流量识别系统中。

关键词: 恶意主机, 分类算法, 主机识别, 权重向量

Abstract:

Anomalous IP identification is an important way to track malicious hosts, and is one of the hot spots in network security research. Current applications of machine learning techniques for anomalous IP identification mostly rely on overall network traffic, which will fail under single server traffic and face the problem of high cost of labeled data. To address the above problems, the paper applies clustering algorithm and genetic algorithm to the identification and classification technology of end-to-end abnormal IP hosts, using the multidimensional features of network traffic and IP address feature data detectable on a single host, using a combination of unsupervised learning and semi-supervised learning to achieve the identification and detection of end-to-end abnormal IP, and implements the method as an abnormal IP identification system. The system can achieve the identification of 9 different types of malicious IP in the UNSW-NB15 dataset in the experiment, and the recognition accuracy can reach up to 98.84%. The article method is very effective for malicious IP classification work and can identify unknown types of malicious IP with wide applicability and robustness, and has been applied in the traffic identification system of a national network security center.

Key words: malicious hosts, classification algorithm, host identification, weight vector

中图分类号:

TP309

文伟平, 胡叶舟, 赵国梁, 陈夏润. 基于流量特征分类的异常IP识别系统的设计与实现[J]. 信息网络安全, 2021, 21(8): 1-9.

WEN Weiping, HU Yezhou, ZHAO Guoliang, CHEN Xiarun. Design and Implementation of an Abnormal IP Identification System Based on Traffic Feature Classification[J]. Netinfo Security, 2021, 21(8): 1-9.

图/表 9

图1

图2

图3

表1

图4

图5

图6

表2

图7

参考文献 21

[1]	LAKHINA A, CROVELLA M, DIOT C. Mining Anomalies Using Traffic Feature Distributions[J]. ACM SIGCOMM Computer Communication Review, 2005, 35(4):217-228. doi: 10.1145/1090191.1080118 URL
[2]	LEE D J, BROWNLEE N. A Methodology for Finding Significant Network Hosts[C].//IEEE. 32nd IEEE Conference on Local Computer Networks (LCN 2007). October 15-18, 2007, Dublin, Ireland. Piscataway: IEEE, 2007: 981-988.
[3]	ZIMBA A, CHEN Hongsong, WANG Zhaoshun, et al. Modeling and Detection of the Multi-stages of Advanced Persistent Threats Attacks Based on Semi-supervised Learning and Complex Networks Characteristics[J]. Future Generation Computer Systems, 2020, 106(5):501-517. doi: 10.1016/j.future.2020.01.032 URL
[4]	HUANG Siyi. Ip Address Characterizing Based on Netflow[D]. Nanjing: Southeast University, 2017.
	黄思逸. 基于流记录的IP地址角色挖掘[D]. 南京:东南大学, 2017.
[5]	MOORE A, ZUEV D, CROGAN M. Discriminators for Use in Flow-based Classification[R]. London: Queen Mary University of London, RR-05-13, 2013.
[6]	SUH K, FIGUEIREDO D R, KUROSE J F, et al. Characterizing and Detecting Skype-relayed Traffic//INFOCOM. The 25th Conference on Computer Communications. April 23-29, 2006, Barcelona, Spain. New York: IEEE, 2006: 2706-2717.
[7]	LI Wei, CANINI M, MOORE A W, et al. Efficient Application Identification and the Temporal and Spatial Stability of Classification Schema[J]. Computer Networks, 2009, 53(6):790-809. doi: 10.1016/j.comnet.2008.11.016 URL
[8]	GAO Jixiang. NAT Recognition Method Based on Network Traffic Features[D]. Chengdu: University of Electronic Science and Technology of China, 2012.
	高骥翔. 基于网络流量特征的 NAT 识别方法[D]. 成都:电子科技大学, 2012.
[9]	LIU Bin, LI Zhitang, LI Jia. A New Method on P2P Traffic Identification Based on Flow[J]. Journal of Xiamen University(Natural Science), 2007, 2046(2):132-135.
	柳斌, 李之棠, 李佳. 一种基于流特征的 P2P 流量实时识别方法[J]. 厦门大学学报(自然科学版). 2007, 2046(2):132-135.
[10]	CHEN Yiran. Study of the Host Behavior Classification Method Based on The Network Features of Flow and Connection[D]. Chengdu: University of Electronic Science and Technology of China, 2016.
	陈怡然. 基于网络流和连接特征的端主机分类[D]. 成都:电子科技大学, 2016.
[11]	XUE Lihui. Research on Malicious IP Classification Algorithm Based on Big Data Platform[D]. Beijing: Beijing Jiaotong University, 2019.
	薛丽慧. 基于大数据平台的恶意IP分类算法研究[D]. 北京:北京交通大学, 2019.
[12]	ZHAO Yibin. Research on APT Malware Traffic Detection Method Based on Association Rules and Timing[D]. Chengdu: University of Electronic Science and Technology of China, 2020.
	赵艺宾. 关联规则与时序特征结合的APT恶意软件流量检测方法研究[D]. 成都:电子科技大学, 2020.
[13]	WANG Yong, ZHOU Huiyi, FENG Hao, et al. Network Traffic Classification Method Basing on CNN[J]. Journal on Communications, 2018, 39(1):14-23.
	王勇, 周慧怡, 俸皓, 等. 基于深度卷积神经网络的网络流量分类方法[J]. 通信学报, 2018, 39(1):14-23.
[14]	IDHAMMAD M, AFDEL K, BELOUCH M. Semi-supervised Machine Learning Approach for DDoS Detection[J]. Applied Intelligence, 2018, 48(10):3193-3208. doi: 10.1007/s10489-018-1141-2 URL
[15]	GU Yonghao, LI Kaiyue, GUO Zhenyang, et al. Semi-supervised K-means DDoS Detection Method Using Hybrid Feature Selection Algorithm[J]. IEEE Access, 2019, 2019(7):64351-64365.
[16]	AHMAD S, LAVIN A, PURDY S, et al. Unsupervised Real-time Anomaly Detection for Streaming Data[J]. Neurocomputing, 2017, 262(1):134-147. doi: 10.1016/j.neucom.2017.04.070 URL
[17]	XIAO Yawen, WU Jun, LIN Zongli, et al. A Semi-supervised Deep Learning Method Based on Stacked Sparse Auto-encoder for Cancer Prediction Using RNA-seq Data[J]. Computer Methods and Programs in Biomedicine, 2018, 166(11):99-105. doi: 10.1016/j.cmpb.2018.10.004 URL
[18]	LU Yi, LU Shiyong, FOTOUHI F, et al. FGKA: A Fast Genetic K-means Clustering Algorithm[C].//ACM. Proceedings of the 2004 ACM Symposium on Applied Computing. March 14,2004, Nicosia Cyprus. New York: ACM, 2004: 622-623.
[19]	LIU Anan, SU Yuting, NIE Weizhi, et al. Hierarchical Clustering Multi-task Learning for Joint Human Action Grouping and Recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 39(1):102-114. doi: 10.1109/TPAMI.34 URL
[20]	WANG Yinnian. The Research and Application of Genetic Algorithm[D]. Wuxi: Jiangnan University, 2009.
	王银年. 遗传算法的研究与应用[D]. 无锡:江南大学, 2009.
[21]	WEI Zaoyu. Research on Blockchain Smart Contract Vulnerability Detection Based on Taint Analysis and Genetic Algorithm[D]. Beijing: Beijing University of Posts and Telecommunications, 2020.
	韦早裕. 基于污点分析和遗传算法的区块链智能合约漏洞检测技术研究[D]. 北京:北京邮电大学, 2020.

	信息熵	比例	均值	方差	速度	占比向量
源端口	√	√
目的端口	√	√
发送包			√	√	√
接收包			√	√	√
协议						√
持续时长			√	√
状态						√
丢包率			√	√
TTL			√	√
流量速度			√	√

类别	N	F	R	S	B	D	E	G	W	A
N	409.3	85.4	33.8	6.5	6.5	4.8	5.1	1.1	6	6.5
F	0	118.7	26.1	0.9	2.9	0.3	0	0.1	2.5	5.5
R	0	0.9	153.1	0	2	0	0	0	0	4
S	0	0	0	154.6	2.8	0	0	0	0	1.6
B	0	0.2	0.6	8.8	102	4.8	0	0.6	7	19
D	0	0.2	0	0	5.9	126.6	5.2	17.2	0.6	4.3
E	0	0	0	0	2.6	5.7	149.4	0.1	0	2.2
G	0	1	0	0.5	2.9	11.7	1.3	132	4.8	5.8
W	0	2.1	0.9	0	2.3	0.3	0	1.1	90.6	7.7
A	0	1.1	2.1	0	5.4	2.7	0.4	0	4.7	33.6

基于流量特征分类的异常IP识别系统的设计与实现

Design and Implementation of an Abnormal IP Identification System Based on Traffic Feature Classification

RichHTML

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

图/表 9

参考文献 21

相关文章 4

编辑推荐

Metrics

本文评价

[1]	李辉, 倪时策, 肖佳, 赵天忠. 面向互联网在线视频评论的情感分类技术[J]. 信息网络安全, 2019, 19(5): 61-68.
[2]	杨连群, 温晋英, 刘树发, 王峰. 一种改进的图分割算法在用户行为异常检测中的应用[J]. 信息网络安全, 2016, 16(6): 35-40.
[3]	张鑫;马勇;曹鹏. 基于贝叶斯分类算法的木马程序流量识别方法[J]. , 2012, 12(8): 0-0.
[4]	李政泽;韩毅;周斌;贾焰. 微博用户分类的特征词权重优化及推荐策略[J]. , 2012, 12(8): 0-0.