Netinfo Security ›› 2024, Vol. 24 ›› Issue (10): 1477-1483.doi: 10.3969/j.issn.1671-1122.2024.10.001

Previous Articles     Next Articles

Data Augmentation Method via Large Language Model for Relation Extraction in Cybersecurity

LI Jiao1,2(), ZHANG Yuqing2, WU Yabiao1   

  1. 1. Topsec Technologies Inc., Beijing 100193, China
    2. School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing 101408, China
  • Received:2024-06-10 Online:2024-10-10 Published:2024-09-27

Abstract:

Relationship extraction technology can be used for threat intelligence mining and analysis, providing crucial information support for network security defense. However, relationship extraction tasks in cybersecurity face the problem of dataset deficiency. In recent years, large language model has shown its superior text generation ability, providing powerful technical support for data augmentation tasks. In order to compensate for the shortcomings of traditional data augmentation methods in terms of accuracy and diversity, this paper proposed a data augmentation method via large language model for relation extraction in cybersecurity named MGDA. MGDA used large language model to enhance the original data from four granularities of words, phrases, grammar, and semantics in order to ensure accuracy while improving diversity. The experimental results show that the proposed data augmentation method in this paper effectively improves the effectiveness of relationship extraction tasks in cybersecurity and diversity of generated data.

Key words: cyber security, relation extraction, data augmentation, large language model

CLC Number: