信息网络安全 ›› 2021, Vol. 21 ›› Issue (10): 76-82.doi: 10.3969/j.issn.1671-1122.2021.10.011

• 入选论文 • 上一篇    下一篇

基于MRC的威胁情报实体识别方法研究

程顺航, 李志华()   

  1. 江南大学人工智能与计算机学院,无锡 214122
  • 收稿日期:2021-04-13 出版日期:2021-10-10 发布日期:2021-10-14
  • 通讯作者: 李志华 E-mail:jswxzhli@aliyun.com
  • 作者简介:程顺航(1998—),男,湖北,硕士研究生,主要研究方向为自然语言处理、信息安全|李志华(1969—),男,湖南,副教授,博士,主要研究方向为云计算、信息安全
  • 基金资助:
    国家自然科学基金(60704047);工业和信息化部智能制造项目(ZH-XZ-180004);中央高校基本科研业务费专项资金(JUSRP211A41);中央高校基本科研业务费专项资金(JUSRP42003);111基地建设项目(B2018)

Research on Threat Intelligence Entity Recognition Method Based on MRC

CHENG Shunhang, LI Zhihua()   

  1. School of Artificial Intelligence and Computer, Jiangnan University, Wuxi 214122, China
  • Received:2021-04-13 Online:2021-10-10 Published:2021-10-14
  • Contact: LI Zhihua E-mail:jswxzhli@aliyun.com

摘要:

在威胁情报实体抽取领域,由于网络数据源结构复杂、无关信息多,且威胁情报实体具有专业性强、分类模糊等特点,传统实体识别方法对于威胁情报挖掘的效率不高。针对此问题,文章通过将实体识别转化为机器阅读理解的方式,提出一种融入专业知识的MRC指针标注模型(Threat Intelligence Machine Reading Comprehension,TIMRC),该模型能够为每个实体问题找到对应的开始和结尾索引。基于此,文章进一步构造了一种威胁情报实体识别(Threat Intelligence Entity Identification,TIEI)方法,通过对978篇安全类文章进行实验验证,证明了TIEI方法的有效性及高效的实体挖掘能力。

关键词: 威胁情报, 实体识别, 机器阅读理解

Abstract:

In the field of threat intelligence entity extraction, due to the complex structure of network data sources, more irrelevant information, and the strong professional and fuzzy classification of threat intelligence entities, the efficiency of traditional entity recognition methods for threat intelligence mining is not high. To solve this problem, this paper put forward a kind of MRC pointer annotation model(TIMRC) by transforming entity recognition into machine reading comprehension. The model could find the corresponding beginning and end index for each entity problem. Based on this, a threat intelligence entity identification(TIEI) method was further constructed. Experiments on 978 security articles show that TIEI method is effective and efficient in entity mining.

Key words: threat intelligence, entity recognition, machine reading comprehension

中图分类号: