标签语义增强的低资源案件关键要素识别

doi:10.3969/j.issn.1671-1122.2026.03.013

摘要/Abstract

摘要：

案件关键要素识别是司法文本智能分析的核心任务，在类案检索、裁判辅助等场景中具有重要价值。然而，司法领域标注数据稀缺的“低资源”特性，导致依赖大规模标注数据的传统命名实体识别方法性能受限。文章提出一种融合标签语义信息的识别模型，将实体类型标签作为提示信息嵌入文本编码过程，通过构建标签锚点向量与上下文文本向量的交互机制，显式建模标签与文本之间的语义关联，增强模型对要素类型语义的理解能力和低资源场景下的要素边界定位能力。实验结果表明，该方法在低资源案件数据集上的识别性能优于对比的基线模型，验证了标签语义对关键要素识别的增强作用，为司法领域低资源信息抽取任务提供了新的解决方案。

关键词: 案件关键要素识别, 低资源, 标签语义, 命名实体识别, 司法文本分析

Abstract:

The identification of key elements in cases is a core task in intelligent analysis of judicial texts, and has significant value in scenarios such as case retrieval and judicial decision support. However, the “low-resource” nature of scarce labeled data in the judicial domain limits the performance of traditional named entity recognition methods that rely on large-scale labeled data. This paper proposed a recognition model that integrated label semantic information, embedding entity type labels as prompt information into the text encoding process. By constructing an interaction mechanism between label anchor vectors and contextual text vectors, the model explicitly captured the semantic associations between labels and text, enhancing its understanding of element type semantics and its ability to locate element boundaries in low-resource scenarios. Experimental results show that the proposed method outperforms baseline models on low-resource case datasets, demonstrating the enhancement effect of label semantics on key element identification and providing a new solution for low-resource information extraction tasks in the judicial domain.

Key words: identification of key case elements, low resources, label semantics, named entity recognition, judicial text analysis

中图分类号:

TP309

肖文, 涂敏. 标签语义增强的低资源案件关键要素识别[J]. 信息网络安全, 2026, 26(3): 471-481.

XIAO Wen, TU Min. Key Element Identification of Low-Resource Cases with Label Semantic Enhancement[J]. Netinfo Security, 2026, 26(3): 471-481.

图/表 8

图1

表1

表2

表3

表4

图2

图3

表5

参考文献 31

[1]	DENG Shumin, MA Yubo, ZHANG Ningyu, et al. Information Extraction in Low-Resource Scenarios: Survey and Perspective[C]// IEEE. 2024 IEEE International Conference on Knowledge Graph (ICKG). New York: IEEE, 2024: 33-49.
[2]	HUANG Yi, GAO Yuhan, REN Chengjuan. A Survey of Data Augmentation in Named Entity Recognition[EB/OL]. (2025-07-10)[2025-07-29]. https://www.sciencedirect.com/science/article/pii/S0925231225015280.
[3]	SANTOSO J, SUTANTO P, CAHYADI B, et al. Pushing the Limits of Low-Resource NER Using LLM Artificial Data Generation[C]// ACL. Findings of the Association for Computational Linguistics:ACL 2024. Stroudsburg: ACL, 2024: 9652-9667.
[4]	ZHANG Xinghua, CHEN Gaode, CUI Shiyao, et al. Exogenous and Endogenous Data Augmentation for Low-Resource Complex Named Entity Recognition[C]// ACM. The 47th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM,2024: 630-640.
[5]	JIANG Miao, CHEN Honghui. Label-Guided Data Augmentation for Chinese Named Entity Recognition[EB/OL]. (2025-02-26)[2025-07-29]. https://www.mdpi.com/2076-3417/15/5/2521.
[6]	SASIKUMAR N, MANTRI K S I. Transfer Learning for Low-Resource Clinical Named Entity Recognition[C]// ACL. The 5th Clinical Natural Language Processing Workshop. Stroudsburg: ACL, 2023: 514-518.
[7]	XU Yiwu, CHEN Yun. ECTTLNER: An Effective Cross-Task Transferring Learning Method for Low-Resource Named Entity Recognition[EB/OL]. (2025-01-31)[2025-07-29]. https://link.springer.com/article/10.1007/s11063-025-11729-x.
[8]	HOU Wenlong, ZHAO Weidong, LIU Xianhui, et al. Knowledge-Enriched Prompt for Low-Resource Named Entity Recognition[J]. ACM Transactions on Asian and Low-Resource Language Information Processing, 2024, 23(5): 1-15.
[9]	LEE S, OH S, JUNG W. Enhancing Low-Resource Fine-Grained Named Entity Recognition by Leveraging Coarse-Grained Datasets[C]// ACL. The 2023 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2023: 3269-3279.
[10]	ZHANG Min, QIAO Xiaosong, ZHAO Yanqing, et al. SmartSpanNer: Making Spanner Robust in Low Resource Scenarios[C]// ACL. Findings of the Association for Computational Linguistics:EMNLP 2023. Stroudsburg:ACL, 2023: 7964-7976.
[11]	NGUYEN N D, TAN Wei, DU Lan, et al. AUC Maximization for Low-Resource Named Entity Recognition[C]// AAAI. The AAAI Conference on Artificial Intelligence. Menlo Park: AAAI, 2023, 37(11): 13389-13399.
[12]	NGUYEN N D, TAN W, DU L, et al. Low-Resource Named Entity Recognition: Can One-vs-All AUC Maximization Help?[C]// IEEE. 2023 IEEE International Conference on Data Mining (ICDM). New York: IEEE, 2023: 1241-1246.
[13]	SHRIMAL A, JAIN A, MEHTA K, et al. NER-MQMRC: Formulating Named Entity Recognition as Multi Question Machine Reading Comprehension[C]//ACL. The 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies:Industry Track. Stroudsburg: ACL, 2022: 230-238.
[14]	ZHANG Yuhao, WANG Yongliang. A Query-Parallel Machine Reading Comprehension Framework for Low-Resource NER[C]// ACL. Findings of the Association for Computational Linguistics:EMNLP 2023. Stroudsburg: ACL, 2023: 2052-2065.
[15]	HUANG Jin, YAN Danfeng, CAI Yuanqiang. PMRC: Prompt-Based Machine Reading Comprehension for Few-Shot Named Entity Recognition[C]// AAAI. The AAAI Conference on Artificial Intelligence. Menlo Park: AAAI, 2024, 38(16): 18316-18326.
[16]	LIU Jiang, FEI Hao, LI Fei, et al. Tkdp: Threefold Knowledge-Enriched Deep Prompt Tuning for Few-Shot Named Entity Recognition[J]. IEEE Transactions on Knowledge and Data Engineering, 2024, 36(11): 6397-6409. doi: 10.1109/TKDE.2024.3389650 URL
[17]	MA Jie, BALLESTEROS M, DOSS S, et al. Label Semantics for Few Shot Named Entity Recognition[C]// ACL. Findings of the Association for Computational Linguistics:ACL 2022. Stroudsburg: ACL, 2022: 1956-1971.
[18]	SHAO Qi, XIAO Bo, CHEN Qiao, et al. Chinese Name Entity Recognition with Label Semantics[C]// IEEE. 2023 8th IEEE International Conference on Network Intelligence and Digital Content (IC-NIDC). New York: IEEE, 2023: 1-5.
[19]	LI Xuewei, LI Xinliang, ZHAO Mankun, et al. CLINER: Exploring Task-Relevant Features and Label Semantic for Few-Shot Named Entity Recognition[J]. Neural Computing and Applications, 2024, 36(9): 4679-4691. doi: 10.1007/s00521-023-09285-3
[20]	YUAN Yihan, ZHANG Qinghua, ZHOU Xiong, et al. A Chinese Named Entity Recognition Model: Integrating Label Knowledge and Lexicon Information[J]. International Journal of Machine Learning and Cybernetics, 2025, 16(1): 253-266. doi: 10.1007/s13042-024-02207-2
[21]	LIU Xiaoya, LUO Senlin, WU Zhouting, et al. Joint Contrastive Learning with Semantic Enhanced Label Referents for Few-Shot NER[EB/OL]. (2024-10-11)[2025-07-29]. https://www.sciencedirect.com/science/article/pii/S0925231224018526.
[22]	ZHANG Yue, WANG Changzheng, SU Xuefeng, et al. Few-Shot Named Entity Recognition Method Based on Semantic Information Awareness of Labels[J]. Acta Scientiarum Naturalium Universitatis Pekinensi, 2024, 60(3): 413-421.
	张越, 王长征, 苏雪峰, 等. 基于标签语义信息感知的少样本命名实体识别方法[J]. 北京大学学报自然科学版, 2024, 60(3):413-421.
[23]	DONG Yuhong, LU Peng, CHEN Jing, et al. Method for Extracting Legal Elements Based on Judgment Text[J]. Journal of CAEIT, 2024, 19(6): 552-558.
	董玉红, 卢鹏, 陈静, 等. 基于司法裁判文本的法律要素抽取方法[J]. 中国电子科学研究院学报, 2024, 19(6):552-558.
[24]	WANG Yingjie, ZHANG Chengye, BAI Fengbo, et al. Named Entity Recognition Approach of Judicial Documents Based on Transformer[J]. Computer Science, 2024, 51(S1): 125-133.
	王颖洁, 张程烨, 白凤波, 等. 基于Transformer的司法文书命名实体识别方法[J]. 计算机科学, 2024, 51(S1):125-133.
[25]	DOU Wenqi, CHEN Yanping, QIN Yongbin, et al. Method for Case Element Recognition Based on Machine Reading Comprehension[J]. Computer Engineering and Design, 2023, 44(8): 2475-2481.
	窦文琦, 陈艳平, 秦永彬, 等. 基于机器阅读理解的案件要素识别方法[J]. 计算机工程与设计, 2023, 44(8):2475-2481.
[26]	MAO Xingliang, CHEN Xiaohong, NING Ken, et al. Global and Local Information Integration for Recognizing Key Case Elements[J]. Journal of Software, 2023, 34(12): 5724-5736
	毛星亮, 陈晓红, 宁肯, 等. 全局和局部信息融合的案情关键要素识别[J]. 软件学报, 2023, 34(12):5724-5736.
[27]	WANG Xiao, WAN Yuqing. A Named Entity Identification Method for Legal Documents[J]. Computer Applications and Software, 2023, 40(8): 180-186.
	王霄, 万玉晴. 一种面向法律文书的命名实体识别方法[J]. 计算机应用与软件, 2023, 40(8):180-186.
[28]	LU Rui, LI Linying. A Named Entity Recognition Model for Legal Documents[J]. Netinfo Security, 2024, 24(11): 1783-1792.
	卢睿, 李林瑛. 一种面向法律文书的命名实体识别模型[J]. 信息网络安全, 2024, 24(11):1783-1792.
[29]	ZHOU Peng, HE Jun. Named Entity Recognition in Chinese Legal Domains Based on Random Prompts[J]. Computer Engineering and Design, 2025, 46(4): 1167-1173.
	周鹏, 何军. 基于随机提示的中文法律领域命名实体识别[J]. 计算机工程与设计, 2025, 46(4):1167-1173.
[32]	WANG Jintao, MENG Qixiang, GAO Zhilin, et al. Research on Case Information Element Extraction Method Based on Instruction Fine-Tuning of Large Language Models[J]. Journal of Frontiers of Computer Science and Technology, 2025, 19(8): 2161-2173.
	王劲滔, 孟琪翔, 高志霖, 等. 基于大语言模型指令微调的案件信息要素抽取方法研究[J]. 计算机科学与探索, 2025, 19(8):2161-2173. doi: 10.3778/j.issn.1673-9418.2412085
[31]	LIU Qiang, WANG Jianbin, FU Jinbo, et al. Named Entity Recognition Method of Legal Instruments Based on Improved Few-Shot Learning[J]. IEEE Access, 2024, 12: 157444-157454. doi: 10.1109/ACCESS.2024.3484765 URL

标签	含义	数量/条	所占比例
NHCS	犯罪嫌疑人	1657	19.65%
NHVI	受害人	984	11.67%
NCSM	被盗货币	343	4.07%
NCGV	物品价值	808	9.58%
NCSP	盗窃获利	159	1.89%
NASI	被盗物品	1564	18.55%
NATS	作案工具	214	2.54%
NT	时间	1167	13.84%
NS	地点	1216	14.42%
NO	组织机构	320	3.80%

参数	数值
max_seq_length	512
epoch	50
train_batch_size	8
learning_rate	2e-5
hidden_size	768
warm_ratio	0.1
seed	42
early_stopping	8

训练样本数量/条	30			50			100
训练样本数量/条	P	R	F1	P	R	F1	P	R	F1
BERT-CE	67.79%	70.94%	69.33%	73.42%	76.40%	74.88%	77.09%	81.64%	79.30%
BERT-CRF	66.37%	70.87%	68.54%	74.25%	78.69%	76.41%	79.61%	81.49%	80.54%
OVA-AUC	64.21%	72.29%	68.01%	69.86%	79.12%	74.21%	76.31%	81.72%	78.92%
SMART-SPAN	75.30%	69.03%	72.03%	84.32%	77.53%	80.78%	86.31%	80.31%	83.20%
PMRC	77.53%	66.99%	71.88%	82.80%	75.97%	79.24%	88.10%	80.83%	84.30%
本文模型	73.02%	71.60%	72.30%	84.79%	79.85%	82.25%	85.96%	84.71%	85.33%
训练样本数量/条	200			300			500
训练样本数量/条	P	R	F1	P	R	F1	P	R	F1
BERT-CE	80.66%	83.04%	81.83%	84.32%	86.43%	85.36%	85.48%	87.68%	86.57%
BERT-CRF	82.12%	84.66%	83.37%	85.52%	86.21%	85.86%	87.04%	88.64%	87.83%
OVA-AUC	79.38%	82.51%	80.91%	83.44%	86.35%	84.87%	83.60%	88.34%	85.91%
SMART-SPAN	88.94%	82.50%	85.59%	88.61%	82.58%	85.49%	89.65%	86.41%	88.00%
PMRC	87.53%	83.50%	85.47%	87.76%	83.50%	85.57%	88.92%	85.68%	87.27%
本文模型	86.95%	85.68%	86.31%	88.75%	86.17%	87.44%	88.11%	88.10%	88.11%

训练样本数量/条	30	50	100	200	300	500
完整模型	72.30%	82.25%	85.33%	86.31%	87.44%	88.11%
—entity loss	71.52%	79.84%	84.49%	85.29%	86.67%	87.96%
—label interaction	71.47%	79.20%	84.68%	84.72%	86.60%	85.93%
—entity label	68.55%	76.33%	81.04%	84.10%	85.88%	86.46%

模型	类型错误	边界错误	伪正例	伪负例
SMART-SPAN	21.9%	31.9%	10.1%	36.1%
PMRC	18.4%	31.3%	18.1%	32.2%
本文模型	14.7%	30.4%	27.5%	27.5%