信息网络安全 ›› 2021, Vol. 21 ›› Issue (6): 63-69.doi: 10.3969/j.issn.1671-1122.2021.06.008

• 技术研究 • 上一篇    下一篇

基于ELECTRA-CRF的电信网络诈骗案件文本命名实体识别模型

丁家伟1, 刘晓栋2()   

  1. 1. 中国人民公安大学侦查学院,北京 100038
    2. 中国人民公安大学治安与交通管理学院,北京 100038
  • 收稿日期:2021-04-29 出版日期:2021-06-10 发布日期:2021-07-01
  • 通讯作者: 刘晓栋 E-mail:liuxiaodong@ppsuc.edu.cn
  • 作者简介:丁家伟(1997—),男,山东,硕士研究生,主要研究方向为刑事侦查学|刘晓栋(1988—),男,山东,讲师,博士,主要研究方向为公安大数据、应急管理
  • 基金资助:
    国家重点研发计划(2020YFC1522600)

Named Entity Recognition Model of Telecommunication Network Fraud Crime Based on ELECTRA-CRF

DING Jiawei1, LIU Xiaodong2()   

  1. 1. College of Investigation, People’s Public Security University of China, Beijing 100038, China;
    2. College of Public Security and Traffic Management, People’s Public Security University of China, Beijing 100038, China;
  • Received:2021-04-29 Online:2021-06-10 Published:2021-07-01
  • Contact: LIU Xiaodong E-mail:liuxiaodong@ppsuc.edu.cn

摘要:

文章提出一种基于ELECTRA-CRF的电信网络诈骗案件文本命名实体识别模型。该模型首先将标注后的语料输入ELECTRA模型,得到以字为颗粒度的状态转移特征;然后由CRF模型计算转移分数,判断当前位置与其相邻位置字符的实体标注组合;最后将该模型与BERT-CRF模型、RoBERTa-CRF模型进行对比。实验结果表明,文中模型在运算效率上明显优于其他两种深度学习模型,且准确度、召回率和调和平均值并未有太大损失,可以很好地应用于电信网络诈骗案件的命名实体识别中。

关键词: 命名实体识别, ELECTRA模型, 电信网络诈骗

Abstract:

This paper proposes a text named entity recognition model of telecommunication network fraud crimes based on ELECTRA-CRF. Firstly, the annotated corpus is input into ELECTRA model to obtain the state transition features with Chinese characters as granularity. And then CRF model is used to calculate the transfer score to determine the entity label group of the character at the current position and its adjacent position. Finally, the BERT-CRF model and RoBERTa-CRF model are compared through experiments. The experimental results show that the text named entity recognition model proposed in this paper based on ELECTRA-CRF is significantly better than the other two deep learning models in operation efficiency, and the loss of the accuracy, recall rate and reconciliation average are very small. It can be well applied to the named entity recognition of telecommunication network fraud crimes.

Key words: named entity recognition, ELECTRA model, telecommunication network fraud crime

中图分类号: