信息网络安全 ›› 2024, Vol. 24 ›› Issue (10): 1537-1543.doi: 10.3969/j.issn.1671-1122.2024.10.007
收稿日期:
2024-06-05
出版日期:
2024-10-10
发布日期:
2024-09-27
通讯作者:
张健, 作者简介:
王亚欣(2002—),男,山西,硕士研究生,主要研究方向为数据安全|张健(1968—),男,天津,教授,博士,CCF会员,主要研究方向为网络安全、数据安全、云安全、系统安全
基金资助:
WANG Yaxin1,2, ZHANG Jian1,2()
Received:
2024-06-05
Online:
2024-10-10
Published:
2024-09-27
摘要:
随着《中华人民共和国个人信息保护法》《中华人民共和国数据安全法》等有关法律法规的颁布实施,电子病历数据保护引起大家的重视。快速高效识别电子病历是数据保护的第一环节,也是数据安全领域的重要研究课题之一。文章提出一种基于少样本命名实体识别技术的电子病历指纹特征提取方法,首先通过公共数据集训练编码器,获得广阔的文本特征空间;然后使用电子病历数据集微调编码器,并利用原型网络表征实体类型标签;最后通过提取电子病历特征,得到“实体类型+实体集”的指纹特征。实验结果表明,与对比模型相比,该方法在I2B2数据集上性能更优异,有效提升了对电子病历数据的隐私保护能力。
中图分类号:
王亚欣, 张健. 基于少样本命名实体识别技术的电子病历指纹特征提取[J]. 信息网络安全, 2024, 24(10): 1537-1543.
WANG Yaxin, ZHANG Jian. Fingerprint Feature Extraction of Electronic Medical Records Based on Few-Shot Named Entity Recognition Technology[J]. Netinfo Security, 2024, 24(10): 1537-1543.
表2
召回率和F1分数对比
对比学习损失函数 | 模式 | 1-shot | 5-shot | ||
---|---|---|---|---|---|
召回率 | F1分数 | 召回率 | F1分数 | ||
NCE损失函数 | IO | 12.03 % | 5.45 % | 12.51 % | 6.89 % |
BIO | 14.22 % | 6.12 % | 18.86 % | 9.37 % | |
BIOES | 13.35 % | 6.31 % | 13.68 % | 7.37 % | |
InfoNCE 损失函数 | IO | 18.90 % | 8.69 % | 28.17 % | 15.38 % |
BIO | 23.60 % | 10.65 % | 34.93 % | 17.14 % | |
BIOES | 23.74 % | 11.38 % | 31.20 % | 16.82 % | |
CWCL 损失函数 | IO | 41.04 % | 19.20 % | 49.52 % | 26.86 % |
BIO | 43.15 % | 19.84 % | 51.60 % | 25.38 % | |
BIOES | 36.72 % | 17.87 % | 44.17 % | 23.74 % |
[1] | OH J, KIM S, HO N, et al. Understanding Cross-Domain Few-Shot Learning Based on Domain Similarity and Few-Shot Difficulty[J]. Advances in Neural Information Processing Systems, 2022, 35: 2622-2636. |
[2] | ZHENG Hao, WANG Runqi, LIU Jianzhuang, et al. Cross-Level Distillation and Feature Denoising for Cross-Domain Few-Shot Classification[EB/OL]. (2023-11-04)[2024-05-30]. https://arxiv.org/abs/2311.02392v1. |
[3] | JANG H, LEE H, SHIN J. Unsupervised Meta-Learning via Few-Shot Pseudo-Supervised Contrastive Learning[EB/OL]. (2023-03-02)[2024-05-30]. https://arxiv.org/abs/2303.00996v1. |
[4] | MIKOLOV T, CHEN Kai, CORRADO G, et al. Efficient Estimation of Word Representations in Vector Space[EB/OL]. (2013-09-07)[2024-05-30]. https://arxiv.org/abs/1301.3781v3. |
[5] | ZHANG Yuntao, GONG Ling, WANG Yongcheng. An Improved TF-IDF Approach for Text Classification[J]. Journal of Zhejiang University Science, 2005, 6(1): 49-55. |
[6] | MIKOLOV T, SUTSKEVER I, CHEN Kai, et al. Distributed Representations of Words and Phrases and Their Compositionality[EB/OL]. (2013-10-16)[2024-05-30]. https://arxiv.org/abs/1310.4546v1. |
[7] | COLLOBERT R, WESTON J, BOTTOU L, et al. Natural Language Processing (Almost) from Scratch[J]. Journal of Machine Learning Research, 2011, 12: 2493-2537. |
[8] | LAMPLE G, BALLESTEROS M, SUBRAMANIAN S, et al. Neural Architectures for Named Entity Recognition[EB/OL]. (2016-04-07)[2024-05-30]. https://arxiv.org/abs/1603.01360v3. |
[9] | CHIU J P C, NICHOLS E. Named Entity Recognition with Bidirectional LSTM-CNNS[J]. Transactions of the Association for Computational Linguistics, 2016, 4: 357-370. |
[10] | HUANG Zhiheng, XU Wei, YU Kai. Bidirectional LSTM-CRF Models for Sequence Tagging[EB/OL]. (2015-08-09)[2024-05-30]. https://arxiv.org/abs/1508.01991v1. |
[11] | CUI Leyang, WU Yu, LIU Jian, et al. Template-Based Named Entity Recognition Using BART[EB/OL]. (2021-06-03)[2024-05-30]. https://arxiv.org/abs/2106.01760v1. |
[12] | DAS S S S, KATIYAR A, PASSONNEAU R J, et al. CONTaiNER: Few-Shot Named Entity Recognition via Contrastive Learning[EB/OL]. (2021-09-15)[2024-05-30]. https://arxiv.org/abs/2109.07589v2. |
[13] | LI Yongqi, YU Yu, QIAN Tieyun. Type-Aware Decomposed Framework for Few-Shot Named Entity Recognition[EB/OL]. (2023-02-13)[2024-05-30]. https://arxiv.org/abs/2302.06397v2. |
[14] | SNELL J, SWERSKY K, ZEMEL R S. Prototypical Networks for Few-Shot Learning[EB/OL]. (2017-03-15)[2024-05-30]. https://arxiv.org/abs/1703.05175. |
[15] | GUTMANN M, HYVÄRINEN A. Noise-Contrastive Estimation: A New Estimation Principle for Unnormalized Statistical Models[J]. Journal of Machine Learning Research, 2010, 9: 297-304. |
[16] | VAN D O A, LI Yazhe, VINYALS O, et al. Representation Learning with Contrastive Predictive Coding[EB/OL]. (2018-07-10)[2024-05-30]. https://arxiv.org/abs/1807.03748v2. |
[17] | SRINIVASA R S, CHO J, YANG Chouchang, et al. CWCL: Cross-Modal Transfer with Continuously Weighted Contrastive Loss[EB/OL]. (2023-09-26)[2024-05-30]. https://api.semanticscholar.org/CorpusID:262826006. |
[18] | WEISCHEDEL R, PALMER M, MARCUS M, et al. OntoNotes Release 5.0[EB/OL]. (2013-10-16)[2024-05-30]. https://doi.org/10.35111/xmhb-2b84. |
[19] | STUBBS A, UZUNER Ö. Annotating Longitudinal Clinical Narratives for De-Identification: The 2014 I2b2/UTHealth Corpus[J]. Journal of Biomedical Informatics, 2015, 58: S20-S29. |
[20] | TÄNZER M, RUDER S, REI M. Memorisation Versus Generalisation in Pre-Trained Language Models[EB/OL]. (2021-04-16)[2024-05-30]. https://arxiv.org/abs/2105.00828v2. |
[21] | YANG Yi, KATIYAR A. Simple and Effective Few-Shot Named Entity Recognition with Structured Nearest Neighbor Learning[EB/OL]. (2020-10-06)[2024-05-30]. https://arxiv.org/abs/2010.02405v1. |
[22] | MA Tingting, JIANG Huiqiang, WU Qianhui, et al. Decomposed Meta-Learning for Few-Shot Named Entity Recognition[EB/OL]. (2022-04-12)[2024-05-30]. https://arxiv.org/abs/2204.05751v2. |
[23] | MA Jie, BALLESTEROS M, DOSS S, et al. Label Semantics for Few Shot Named Entity Recognition[EB/OL]. (2022-04-16)[2024-05-30]. https://arxiv.org/abs/2203.08985v1. |
[1] | 卢睿, 李林瑛. 一种面向法律文书的命名实体识别模型[J]. 信息网络安全, 2024, 24(11): 1783-1792. |
[2] | 马敏, 付钰, 黄凯. 云环境下基于秘密共享的安全外包主成分分析方案[J]. 信息网络安全, 2023, 23(4): 61-71. |
[3] | 许盛伟, 邓烨, 刘昌赫, 谭莉. 一种基于国密算法的音视频选择性加密方案[J]. 信息网络安全, 2023, 23(11): 48-57. |
[4] | 刘高扬, 吴伟玲, 张锦升, 王琛. 多模态对比学习中的靶向投毒攻击[J]. 信息网络安全, 2023, 23(11): 69-83. |
[5] | 于成丽, 张阳, 贾世杰. 云环境中数据安全威胁与防护关键技术研究[J]. 信息网络安全, 2022, 22(7): 55-63. |
[6] | 金波, 唐前进, 唐前临. CCF计算机安全专业委员会2022年网络安全十大发展趋势解读[J]. 信息网络安全, 2022, 22(4): 1-6. |
[7] | 肖晓雷, 赵雪莲. 我国跨境数据流动治理的研究综述[J]. 信息网络安全, 2022, 22(10): 45-51. |
[8] | 杨晓琪, 白利芳, 唐刚. 基于DSMM模型的数据安全评估模型研究与设计[J]. 信息网络安全, 2021, 21(9): 90-95. |
[9] | 丁家伟, 刘晓栋. 基于ELECTRA-CRF的电信网络诈骗案件文本命名实体识别模型[J]. 信息网络安全, 2021, 21(6): 63-69. |
[10] | 朱艳华, 廖方宇, 胡良霖, 王志强. 科学数据安全标准规范关键问题探索[J]. 信息网络安全, 2021, 21(11): 1-8. |
[11] | 刘红, 张越今, 赵文霞, 杨牧. 多维度数据分级分类安全管理框架[J]. 信息网络安全, 2021, 21(10): 48-53. |
[12] | 傅智宙, 王利明, 唐鼎, 张曙光. 基于同态加密的HBase二级密文索引方法研究[J]. 信息网络安全, 2020, 20(4): 55-64. |
[13] | 赵萌, 丁勇, 王玉珏. 指定审计员的云数据安全存储方案[J]. 信息网络安全, 2018, 18(11): 66-72. |
[14] | 宋建业, 何暖, 朱一明, 付安民. 基于阿里云平台的密文数据安全去重系统的设计与实现[J]. 信息网络安全, 2017, 17(3): 39-45. |
[15] | 国杰彬, 李运发, 张大军. 云计算中面向数据安全的身份认证策略研究[J]. 信息网络安全, 2017, 17(3): 72-77. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||