信息网络安全 ›› 2026, Vol. 26 ›› Issue (3): 432-441.doi: 10.3969/j.issn.1671-1122.2026.03.009

• 入选论文 • 上一篇    下一篇

FEViT:一种基于频域增强ViT的深度伪造检测模型

陈宇琪1, 钱汉伟1,2, 夏玲玲1, 王群1()   

  1. 1.江苏警官学院计算机信息与网络安全系,南京 210031
    2.计算机软件新技术国家重点实验室(南京大学),南京 210093
  • 收稿日期:2025-08-10 出版日期:2026-03-10 发布日期:2026-03-30
  • 通讯作者: 王群 E-mail:wangqun@jspi.cn
  • 作者简介:陈宇琪(1991—),女,江苏,讲师,硕士,主要研究方向为网络空间安全|钱汉伟(1984—),男,江苏,副教授,硕士,CCF会员,主要研究方向为人工智能安全|夏玲玲(1988—),女,江苏,副教授,博士,CCF会员,主要研究方向为网络攻击与防范|王群(1971—),男,甘肃,教授,博士,CCF杰出会员,主要研究方向为网络空间安全
  • 基金资助:
    国家自然科学基金(72401110)

FEViT: A Frequency Domain Enhanced ViT for Deepfake Detection

CHEN Yuqi1, QIAN Hanwei1,2, XIA Lingling1, WANG Qun1()   

  1. 1. Department of Computer Information and Cyber Security, Jiangsu Police Institute, Nanjing 210031, China
    2. State Key Laboratory for Novel Software Technology at Nanjing University, Nanjing 210093, China
  • Received:2025-08-10 Online:2026-03-10 Published:2026-03-30

摘要:

随着深度伪造技术的快速发展,AI换脸、身份伪造、肖像权侵权以及虚假信息传播等社会安全问题日益突出。目前,现有的深度伪造检测方法常常依赖特定数据集,导致数据偏见,难以捕捉跨算法和跨场景的通用伪造特征。因此,在面对新型伪造技术时,这些方法的检测准确率通常较低,且泛化能力有限。文章提出一种结合高频伪影信息和视觉Transformer的模型FEViT,该模型基于频域增强模型进行深度伪造检测,提高了模型对不同来源伪造图像的泛化能力。FEViT采用多维度优化策略,先通过傅里叶变换与高通滤波器相结合,精确提取高频伪影特征,放大频域差异;再通过对视觉Transformer结构的3项优化,增强局部异常的敏感度并提升复杂特征的分类能力。实验结果表明,FEViT在多个公开数据集上的表现优于现有检测方法,在准确率、AUCF1分数等指标上具有显著优势,平均准确率提高了8.0%~16.4%,展现出较好的检测性能和泛化能力。

关键词: 深度伪造检测, 视觉Transformer, 高频伪影, 傅里叶变换

Abstract:

The rapid advancement of deepfake technology has led to increasing concerns over social security issues, including AI-based face-swapping, identity forgery, portrait rights violations, and the dissemination of false information. Current deepfake detection methods often rely heavily on specific datasets, resulting in data bias and making it challenging to capture generalizable forgery features across different algorithms and scenarios. Consequently, these methods generally exhibit reduced detection accuracy and limited generalization ability when faced with novel forgery techniques. In response to this, the present study proposed a deepfake detection method FEViT that integrated high-frequency artifact information with visual transformers to enhance the model’s ability to generalize across forgeries from diverse sources. The approach employed a multi-dimensional optimization strategy: first, high-frequency artifact features were accurately extracted by combining Fourier transform and high-pass filtering, thereby amplifying frequency domain differences; second, three optimizations were applied to the visual transformer architecture to improve sensitivity to local anomalies and enhance the classification of complex features. Experimental results demonstrate that the proposed method outperforms existing detection techniques across multiple public datasets, with significant improvements in accuracy, AUC, and F1 score, achieving an average accuracy increase of 8% to 16.4%, and showing strong detection performance and generalization ability.

Key words: deepfake detection, visual transformer, high-frequency artifacts, Fourier transform

中图分类号: