Netinfo Security ›› 2023, Vol. 23 ›› Issue (1): 18-27.doi: 10.3969/j.issn.1671-1122.2023.01.003

Previous Articles     Next Articles

Vulnerability Similarity Algorithm Evaluation Based on NLP and Feature Fusion

JIA Fan1(), KANG Shuya1, JIANG Weiqiang2, WANG Guangtao2   

  1. 1. School of Electronic and Information Engineering, Beijing Jiaotong University, Beijing 100044, China
    2. Information Security Center, China Mobile Group Co., Ltd., Beijing 100053, China
  • Received:2022-03-24 Online:2023-01-10 Published:2023-01-19
  • Contact: JIA Fan E-mail:fjia@bjtu.edu.cn

Abstract:

The study of vulnerability similarity helps security researchers to find solutions to new vulnerabilities from historical vulnerability information. The existing work on vulnerability similarity is not much, and the selection of its model is also lack of objective experimental data support. On this basis, this paper combined various word embedding technologies and deep learning auto-encoders to calculate semantic similarity from the perspective of vulnerability description text. At the same time, multi-dimensional feature data were extracted from public databases such as NVD, to calculate vulnerability feature similarity from the perspective of vulnerability features, and finally a dual angle vulnerability similarity measurement algorithm and evaluation scheme based on NLP and feature fusion was designed. Based on objective experimental analysis, the effects of various model combinations were compared from the aspects of numerical distribution, similarity discrimination, accuracy, etc. The final optimized model combination can obtain the highest F1 score of 0.927 in the determination of vulnerability similarity.

Key words: natural language processing, deep learning, vulnerability similarity, word embedding

CLC Number: