信息网络安全 ›› 2017, Vol. 17 ›› Issue (10): 42-49.doi: 10.3969/j.issn.1671-1122.2017.10.007

• • 上一篇    下一篇

基于主动学习的非均衡异常数据分类算法研究

王波, 王怀彬   

  1. 天津理工大学计算机与通信工程学院,天津 300384
  • 收稿日期:2017-06-28 出版日期:2017-10-10 发布日期:2020-05-12
  • 作者简介:

    作者简介: 王波(1991—),男,河南,硕士研究生,主要研究方向为网络安全、数据挖掘;王怀彬(1960—),男,内蒙古,研究员,硕士,主要研究方向为计算机网络技术、计算机仿真技术。

  • 基金资助:
    天津市科技发展项目[15ZXHLX00200]

Research on Imbalanced Abnormal Data Classification Algorithm Based on Active Learning

Bo WANG, Huaibin WANG   

  1. School of Computer and Communication Engineering, Tianjin University of Technology, Tianjin 300384, China
  • Received:2017-06-28 Online:2017-10-10 Published:2020-05-12

摘要:

目前,网络安全正面临着越来越复杂的挑战。随着攻击方式和类型的多样化,其破坏程度也在不断增加,网络防护要求已经从单一被动的方式,转为数据融合技术下的主动的网络态势感知,因此,对于异常数据分类的研究仍然十分重要。然而,传统的分类算法在面临非均衡数据时,只考虑了算法正确率的提升,忽视了少数类的分类效果,从而容易导致对攻击和漏洞信息的误判,并且对于新的异常类型的识别效率不够理想。文章针对上述问题,首先,采用主动学习的采样方法提高了算法在大量样本中的学习效率;然后,基于组合类器的思想对分类算法进行改进,利用误分类代价函数增加算法对少数类的分类精度;最后,通过实验仿真将文中方法和传统方法进行对比,验证提出方法的可行性和有效性。

关键词: 网络安全, 非均衡分类, 主动学习, 代价函数, 组合分类

Abstract:

Network security is facing increasingly complex challenges. With the diversification of attack methods and types, the extent of damage is also increasing; network protection requirements have been from a single passive approach to data fusion of active network technology under the situation awareness. Therefore, for the study of abnormal data classification is still very important. However, the traditional classification algorithm in the face of unbalanced data, only consider the algorithm accuracy, ignoring the classification effect of the minority class, thus easily lead to attacks and vulnerabilities of false positives, and for the new type of abnormal recognition efficiency is not ideal. Aiming at the above problems, firstly, this paper uses the sampling method of active learning algorithm to improve the learning efficiency in a large number of samples; then, the classification algorithm is improved based on the idea of the combination classifier, and the classification accuracy of the algorithm is increased by using the misclassification cost function; finally, the feasibility and effectiveness of the proposed method are verified by comparing the proposed method with the traditional method.

Key words: network security, imbalanced classification, active learning, cost function, combination classification

中图分类号: