信息网络安全 ›› 2024, Vol. 24 ›› Issue (11): 1783-1792.doi: 10.3969/j.issn.1671-1122.2024.11.017

• 入选论文 • 上一篇    下一篇

一种面向法律文书的命名实体识别模型

卢睿1,2(), 李林瑛3   

  1. 1.辽宁警察学院公安信息系,大连 116036
    2.辽宁省公安大数据智能应用重点实验室,大连 116036
    3.大连外国语大学软件学院,大连 116044
  • 收稿日期:2024-07-04 出版日期:2024-11-10 发布日期:2024-11-21
  • 通讯作者: 卢睿 luruilly@sina.com
  • 作者简介:卢睿(1978—),女,辽宁,教授,博士,CCF会员,主要研究方向为公安情报分析、文本挖掘、优化理论与方法|李林瑛(1975—),男,吉林,教授,博士,主要研究方向为自然语言处理、文本挖掘、优化理论与方法
  • 基金资助:
    辽宁省科技厅应用基础研究计划(2023JH2/101300134);辽宁省教育厅高等学校基本科研项目(LJKMZ20221549);辽宁省研究生教育教学改革研究项目(LNYJG2022423);辽宁省教育厅重点攻关项目(JYTZD2023088)

A Named Entity Recognition Model for Legal Documents

LU Rui1,2(), LI Linying3   

  1. 1. Police Information Department, Liaoning Police College, Dalian 116036, China
    2. Liaoning Provincial Key Laboratory of Public Security Big Data Intelligent Application, Dalian 116036, China
    3. School of Software Engineering, Dalian University of Foreign, Dalian 116044, China
  • Received:2024-07-04 Online:2024-11-10 Published:2024-11-21

摘要:

准确识别法律文书中的实体是构建智慧司法的基础,但通用的命名实体识别模型不能很好地识别法律文书中实体边界,识别结果不能与法律业务紧密结合。为有效提高法律文书中各实体的识别效果,文章提出一种面向法律文书的命名实体识别模型BBAG-NER。该模型首先利用BERT对字符序列进行编码,然后运用双向长短记忆神经网络和Attention分配不同权重以提高对实体边界的划分能力,最后采用全局指针识别备选司法实体片段,并通过实体分类器得到最终的实体类别。实验结果表明,在法律文书语料数据集上,BBAG-NER模型的F1值达到了89.18%,较BERT-CRF模型提高了2.1%,验证了模型整体的有效性。

关键词: 法律文书, 命名实体识别, 全局指针网络, 双向长短时记忆

Abstract:

Accurate identification of entities in legal documents is fundamental for building an intelligent judicial system. However, generic Named Entity Recognition models often struggle with accurately recognizing entity boundaries in legal documents and integrating recognition results closely with legal practices. To improve the accuracy of entity recognition in legal documents, this paper proposed the BBAG-NER model for Named Entity Recognition in legal documents. The model first encoded character sequences using BERT, then employed Bidirectional Long Short-Term Memory and Attention mechanisms to assign different weights and enhance the ability to delineate entity boundaries. Finally, it used a global pointer network to identify potential judicial entity segments and obtained the final entity categories through an entity classifier. Experimental results on a legal document corpus dataset show that our proposed model achieves an F1 score of 89.18%, representing a 2.1% improvement compared to the BERT-CRF model, demonstrating the overall effectiveness of our proposed model.

Key words: legal documents, named entity recognition, global pointer network, BiLSTM

中图分类号: