Netinfo Security ›› 2024, Vol. 24 ›› Issue (7): 1076-1087.doi: 10.3969/j.issn.1671-1122.2024.07.009

Previous Articles     Next Articles

Research on TTP Extraction Method Based on Pre-Trained Language Model and Chinese-English Threat Intelligence

REN Changyu1, ZHANG Ling2, JI Hangyuan1, YANG Liqun3()   

  1. 1. State Key Laboratory of Complex & Critical Software Environment, Beihang University, Beijing 100083, China
    2. School of Electrical Engineering, Zhengzhou University, Zhengzhou 450001, China
    3. School of Cyber Science and Technology, Beihang University, Beijing 100083, China
  • Received:2024-04-03 Online:2024-07-10 Published:2024-08-02

Abstract:

The tactics, techniques, and procedures (TTP) intelligence primarily resides in unstructured threat reports and serves as a valuable source of cyber threat intelligence. However, the existing open-source TTP classification label datasets are predominantly focused on the English domain, with limited coverage of source materials and TTP types, particularly lacking relevant data in the Chinese domain. To address this issue, this paper constructed a bilingual TTP intelligence dataset, bilingual threat intelligence classifying dataset (BTICD), which included 17700 samples and 236 corresponding TTPs. BTICD was the first to utilize publicly available Chinese threat report as corpora for TTP annotation and also annotated a portion of white-box samples that cannot be mapped to any TTP. This paper introduced and fine-tuned pre-trained models on the bilingual dataset to obtain a bilingual TTP identification model SecBiBERT. Experimental results show that SecBiBERT achieves a Micro F1 score of 86.49% on the 50 common TTP classification tasks and a Micro F1 score of 73.09% on the full set of 236 TTP classification tasks, which outperforms existing similar models.

Key words: TTP, threat intelligence, pre-trained language model

CLC Number: