信息网络安全 ›› 2024, Vol. 24 ›› Issue (10): 1537-1543.doi: 10.3969/j.issn.1671-1122.2024.10.007

• 入选论文 • 上一篇    下一篇

基于少样本命名实体识别技术的电子病历指纹特征提取

王亚欣1,2, 张健1,2()   

  1. 1.南开大学网络空间安全学院,天津 300350
    2.天津市网络与数据安全技术重点实验室,天津 300350
  • 收稿日期:2024-06-05 出版日期:2024-10-10 发布日期:2024-09-27
  • 通讯作者: 张健, zhang.jian@nankai.edu.cn
  • 作者简介:王亚欣(2002—),男,山西,硕士研究生,主要研究方向为数据安全|张健(1968—),男,天津,教授,博士,CCF会员,主要研究方向为网络安全、数据安全、云安全、系统安全
  • 基金资助:
    国家重点研发计划(2022YFB3103202);天津市重点研发计划(20YFZCGX00680)

Fingerprint Feature Extraction of Electronic Medical Records Based on Few-Shot Named Entity Recognition Technology

WANG Yaxin1,2, ZHANG Jian1,2()   

  1. 1. College of Cyber Science, Nankai University, Tianjin 300350, China
    2. Tianjin Key Laboratory of Network and Data Security Technology, Tianjin 300350, China
  • Received:2024-06-05 Online:2024-10-10 Published:2024-09-27

摘要:

随着《中华人民共和国个人信息保护法》《中华人民共和国数据安全法》等有关法律法规的颁布实施,电子病历数据保护引起大家的重视。快速高效识别电子病历是数据保护的第一环节,也是数据安全领域的重要研究课题之一。文章提出一种基于少样本命名实体识别技术的电子病历指纹特征提取方法,首先通过公共数据集训练编码器,获得广阔的文本特征空间;然后使用电子病历数据集微调编码器,并利用原型网络表征实体类型标签;最后通过提取电子病历特征,得到“实体类型+实体集”的指纹特征。实验结果表明,与对比模型相比,该方法在I2B2数据集上性能更优异,有效提升了对电子病历数据的隐私保护能力。

关键词: 数据安全, 电子病历, 对比学习, 命名实体识别, 少样本学习

Abstract:

With the promulgation and implementation of the “Personal Information Protection Law of the People’s Republic of China” “Data Security Law of the People’s Republic of China” and other relevant laws and regulations, electronic medical record data protection has attracted much attention. Fast and efficient identification of electronic medical records is the first link of data protection and an important research topic in the field of data security. This paper proposed an electronic medical record fingerprint feature extraction method based on few-shot named entity recognition technology. First, the encoder was trained through a public dataset to obtain a broad text feature space. Subsequently, the encoder was fine-tuned using the electronic medical record dataset, and the entity type label was characterized by a prototype network. Finally, the fingerprint feature of “entity type + entity set” was obtained by extracting the electronic medical record feature. The experimental results show that the method has excellent performance on the I2B2 dataset, surpassing other models and effectively improving the privacy protection ability of electronic medical record dataset.

Key words: data security, electronic medical records, comparative learning, named entity recognition, few-shot learning

中图分类号: