信息网络安全 ›› 2017, Vol. 17 ›› Issue (3): 66-71.doi: 10.3969/j.issn.1671-1122.2017.03.011

• • 上一篇    下一篇

基于AdaBoost-Bayes算法的URL分类方法

张腾飞, 张谦, 刘嘉勇()   

  1. 四川大学电子信息学院,四川成都610065
  • 收稿日期:2016-11-01 出版日期:2017-03-20 发布日期:2020-05-12
  • 作者简介:

    作者简介:张腾飞(1991—),男,河南,硕士研究生,主要研究方向为数据挖掘与机器学习;张谦(1987—),男,贵州,博士研究生,主要研究方向为网络信息安全,数据挖掘;刘嘉勇(1962—),男,四川,教授,博士,主要研究方向为网络数据分析与信息安全。

  • 基金资助:
    国家自然科学基金[61377018]

URL Classification Method Based on AdaBoost and Bayes Algorithm

Tengfei ZHANG, Qian ZHANG, Jiayong LIU()   

  1. College of Electronic and Information Engineering of Sichuan University, Chengdu Sichuan 610065,China
  • Received:2016-11-01 Online:2017-03-20 Published:2020-05-12

摘要:

为实现根据HTTP协议数据流对用户的行为进行分析,需要快速区分出用户访问资源的URL。为此文章提出一种结合规则过滤和机器学习算法的方法,用于快速识别用户访问的URL。首先将解析后的数据包根据URL后缀过滤掉资源加载类的数据包,然后根据浏览器UserAgent的特有字段和在浏览器上访问网页的特性识别出浏览器UserAgent, 最后通过基于AdaBoost和Bayse算法训练好的分类器识别出用户访问URL。实验结果表明,本文方法能够在局域网数据流中高效、准确的识别出用户访问的URL。

关键词: 规则过滤, 机器学习算法, URL分类

Abstract:

In order to realize the analysis of the behavior of the data stream from the HTTP protocol, the user needs to identify the URL. In this paper, a new method based on rule filtering and machine learning algorithm is proposed to quickly identify users to access URL. Firstly, the analytical data packets according to the URL suffix filtered load resources packet. Secondly, according to the unique browser user agent field and in the browser access identifying characteristic of the web browser user agent. Finally, the AdaBoost and Bayes algorithm to train a good sub category recognition user access URL based on. Experimental results show that the method can efficiently and accurately identify the user access URL in the local area network data stream.

Key words: rule filtering, machine learning algorithm, URL classification

中图分类号: