信息网络安全 ›› 2026, Vol. 26 ›› Issue (4): 615-625.doi: 10.3969/j.issn.1671-1122.2026.04.009

• 学术研究 • 上一篇    下一篇

基于大语言模型的多策略增强中文网络威胁情报实体抽取研究

胡勉宁1, 李欣1,2,3(), 李明锋1, 袁得嵛1,2,3   

  1. 1 中国人民公安大学信息网络安全学院北京 100038
    2 安全防范技术与风险评估公安部重点实验室北京 100038
    3 中国人民公安大学公安大数据战略研究中心北京 100038
  • 收稿日期:2024-12-21 出版日期:2026-04-10 发布日期:2026-04-29
  • 通讯作者: 李欣 E-mail:lixin@ppsuc.edu.cn
  • 作者简介:胡勉宁(2000—),男,四川,硕士研究生,主要研究方向为网络威胁情报、自然语言处理|李欣(1977—),男,江西,教授,博士,CCF会员,主要研究方向为信息安全|李明锋(2003—),男,四川,硕士研究生,主要研究方向为网络安全|袁得嵛(1986—),男,河北,副教授,博士,主要研究方向为人工智能安全
  • 基金资助:
    国家重点研发计划(2022YFC3301101);中国人民公安大学基本科研业务费重点项目(2022JKF02007)

Research on Multi-Strategy Enhanced Chinese Network Threat Intelligence Entity Extraction Based on Large Language Model

HU Mianning1, LI Xin1,2,3(), LI Mingfeng1, YUAN Deyu1,2,3   

  1. 1 School of Information and Network Security, People’s Public Security University of China, Beijing 100038, China
    2 Key Laboratory of Security Technology and Risk Assessment, Ministry of Public Security, Beijing 100038, China
    3 Public Security Big Data Strategy Research Center of the People’s Public Security University of China, Beijing 100038, China
  • Received:2024-12-21 Online:2026-04-10 Published:2026-04-29

摘要:

随着网络空间环境的复杂化,网络威胁情报驱动式的网络安全防御方式逐渐占据重要地位。为解决目前中文网络威胁情报领域中数据量不足、中文分词及抽取低效等问题,文章开展了基于大语言模型的多策略增强中文网络威胁情报的实体抽取研究,旨在为网络威胁情报知识图谱构建及情报驱动式防御赋能。文章通过自建中文网络威胁情报的实体标注数据集,运用一种多策略数据增强技术来提升网络威胁情报抽取的准确性。文章在多个增强数据集上使用MECT,同时与LGN、LR_CNN和Lattice_LSTM等多个模型进行横向和纵向对比实验,实验结果表明,命名实体识别效果最高提升近10%。文章通过实验验证了基于大语言模型的多策略数据增强在中文网络威胁情报实体抽取任务中的有效性,证明了其在网络威胁情报实体抽取领域的可靠性和实用性。

关键词: 实体抽取, 数据增强, 中文网络威胁情报, 大语言模型

Abstract:

With the increasing complexity of the cyberspace environment, network threat intelligence driven network security defense methods are gradually occupying an important position. The article aims to address the issues of insufficient data ownership, inefficient Chinese word segmentation and extraction in the current field of Chinese cyber threat intelligence. It conducts research on entity extraction based on a large language model with multiple strategies to enhance Chinese cyber threat intelligence, aiming to empower the construction of a knowledge graph for cyber threat intelligence and intelligence driven defense. The article improved the accuracy of network threat intelligence extraction by building a self constructed entity annotation dataset of Chinese network threat intelligence and applying a multi-strategy data augmentation technique. And MECT was used on multiple enhanced datasets to conduct horizontal and vertical comparative experiments with multiple models such as LGN, LR_CNN, Lattice_LSTM, etc. The results showed that the named entity recognition performance improves by nearly 10%. The article validates the effectiveness of multi-strategy data augmentation based on large language models in the task of extracting Chinese network threat intelligence entities through experiments, demonstrating its reliability and practicality in the field of network threat intelligence entity extraction.

Key words: entity extraction, data augmentation, Chinese cyber threat intelligence, large language model

中图分类号: