FEViT：一种基于频域增强ViT的深度伪造检测模型

doi:10.3969/j.issn.1671-1122.2026.03.009

信息网络安全 ›› 2026, Vol. 26 ›› Issue (3): 432-441.doi: 10.3969/j.issn.1671-1122.2026.03.009

FEViT：一种基于频域增强ViT的深度伪造检测模型

陈宇琪¹, 钱汉伟¹^,², 夏玲玲¹, 王群¹()

1.江苏警官学院计算机信息与网络安全系，南京 210031
2.计算机软件新技术国家重点实验室(南京大学)，南京 210093

收稿日期:2025-08-10 出版日期:2026-03-10 发布日期:2026-03-30
通讯作者: 王群 E-mail:wangqun@jspi.cn
作者简介:陈宇琪（1991—），女，江苏，讲师，硕士，主要研究方向为网络空间安全|钱汉伟（1984—），男，江苏，副教授，硕士，CCF会员，主要研究方向为人工智能安全|夏玲玲（1988—），女，江苏，副教授，博士，CCF会员，主要研究方向为网络攻击与防范|王群（1971—），男，甘肃，教授，博士，CCF杰出会员，主要研究方向为网络空间安全
基金资助:
国家自然科学基金(72401110)

FEViT: A Frequency Domain Enhanced ViT for Deepfake Detection

CHEN Yuqi¹, QIAN Hanwei¹^,², XIA Lingling¹, WANG Qun¹()

1. Department of Computer Information and Cyber Security, Jiangsu Police Institute, Nanjing 210031, China
2. State Key Laboratory for Novel Software Technology at Nanjing University, Nanjing 210093, China

Received:2025-08-10 Online:2026-03-10 Published:2026-03-30

摘要/Abstract

摘要：

随着深度伪造技术的快速发展，AI换脸、身份伪造、肖像权侵权以及虚假信息传播等社会安全问题日益突出。目前，现有的深度伪造检测方法常常依赖特定数据集，导致数据偏见，难以捕捉跨算法和跨场景的通用伪造特征。因此，在面对新型伪造技术时，这些方法的检测准确率通常较低，且泛化能力有限。文章提出一种结合高频伪影信息和视觉Transformer的模型FEViT，该模型基于频域增强模型进行深度伪造检测，提高了模型对不同来源伪造图像的泛化能力。FEViT采用多维度优化策略，先通过傅里叶变换与高通滤波器相结合，精确提取高频伪影特征，放大频域差异；再通过对视觉Transformer结构的3项优化，增强局部异常的敏感度并提升复杂特征的分类能力。实验结果表明，FEViT在多个公开数据集上的表现优于现有检测方法，在准确率、AUC和F1分数等指标上具有显著优势，平均准确率提高了8.0%~16.4%，展现出较好的检测性能和泛化能力。

关键词: 深度伪造检测, 视觉Transformer, 高频伪影, 傅里叶变换

Abstract:

The rapid advancement of deepfake technology has led to increasing concerns over social security issues, including AI-based face-swapping, identity forgery, portrait rights violations, and the dissemination of false information. Current deepfake detection methods often rely heavily on specific datasets, resulting in data bias and making it challenging to capture generalizable forgery features across different algorithms and scenarios. Consequently, these methods generally exhibit reduced detection accuracy and limited generalization ability when faced with novel forgery techniques. In response to this, the present study proposed a deepfake detection method FEViT that integrated high-frequency artifact information with visual transformers to enhance the model’s ability to generalize across forgeries from diverse sources. The approach employed a multi-dimensional optimization strategy: first, high-frequency artifact features were accurately extracted by combining Fourier transform and high-pass filtering, thereby amplifying frequency domain differences; second, three optimizations were applied to the visual transformer architecture to improve sensitivity to local anomalies and enhance the classification of complex features. Experimental results demonstrate that the proposed method outperforms existing detection techniques across multiple public datasets, with significant improvements in accuracy, AUC, and F1 score, achieving an average accuracy increase of 8% to 16.4%, and showing strong detection performance and generalization ability.

Key words: deepfake detection, visual transformer, high-frequency artifacts, Fourier transform

中图分类号:

TP309

陈宇琪, 钱汉伟, 夏玲玲, 王群. FEViT：一种基于频域增强ViT的深度伪造检测模型[J]. 信息网络安全, 2026, 26(3): 432-441.

CHEN Yuqi, QIAN Hanwei, XIA Lingling, WANG Qun. FEViT: A Frequency Domain Enhanced ViT for Deepfake Detection[J]. Netinfo Security, 2026, 26(3): 432-441.

图/表 10

图1

图2

表1

图3

表2

图4

表3

图5

表4

表5

参考文献 28

[1]	GOODFELLOW I, POUGET-ABADIE J, MIRZA M, et al. Generative Adversarial Networks[J]. Communications of the ACM, 2020, 63(11): 139-144. doi: 10.1145/3422622 URL
[2]	HO J, JAIN A, ABBEEL P. Denoising Diffusion Probabilistic Models[J]. Advances in Neural Information Processing Systems, 2020, 33: 6840-6851.
[3]	PENG Chunlei, MIAO Zimin, LIU Decheng, et al. Where Deepfakes Gaze at Spatial-Temporal Gaze Inconsistency Analysis for Video Face Forgery Detection[J]. IEEE Transactions on Information Forensics and Security, 2024, 19: 4507-4517. doi: 10.1109/TIFS.2024.3381823 URL
[4]	HEO Y J, YEO W H, KIM B G. Deepfake Detection Algorithm Based on Improved Vision Transformer[J]. Applied Intelligence, 2022, 53(7): 7512-7527. doi: 10.1007/s10489-022-03867-9
[5]	MARRA F, GRAGNANIELLO D, VERDOLIVA L, et al. A Full-Image Full-Resolution End-to-End-Trainable CNN Framework for Image Forgery Detection[J]. IEEE Access, 2020, 8: 133488-133502. doi: 10.1109/Access.6287639 URL
[6]	ROSSLER A, COZZOLINO D, VERDOLIVA L, et al. FaceForensics++: Learning to Detect Manipulated Facial Images[C]// IEEE. The IEEE/CVF International Conference on Computer Vision. New York: IEEE, 2019: 11935-11944.
[7]	THIES J, ZOLLHOFER M, STAMMINGER M, et al. Face2Face: Real-Time Face Capture and Reenactment of RGB Videos[C]// IEEE. The IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2016: 2387-2395.
[8]	KARRAS T, LAINE S, AILA T. A Style-Based Generator Architecture for Generative Adversarial Networks[C]// IEEE. The IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2019: 4401-4410.
[9]	LI Yuezun, LYU Siwei. Exposing DeepFake Videos by Detecting Face Warping Artifacts[C]// IEEE. IEEE International Workshop on Information Forensics and Security (WIFS). New York: IEEE, 2018: 1-7.
[10]	AFCHAR D, NOZICK V, YAMAGISHI J, et al. MesoNet: A Compact Facial Video Forgery Detection Network[C]// IEEE. 2018 IEEE International Workshop on Information Forensics and Security. New York: IEEE, 2018: 1-7.
[11]	CUNHA L, ZHANG Li, SOWAN B, et al. Video Deepfake Detection Using Particle Swarm Optimization Improved Deep Neural Networks[J]. Neural Computing and Applications, 2024, 36: 8417-8453. doi: 10.1007/s00521-024-09536-x
[12]	ZHAO Hanqing, ZHOU Wenbo, CHEN Dongdong, et al. Multi-Attentional DeepFake Detection[C]// IEEE. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2021: 2185-2194.
[13]	DURALL R, KEUPER M, KEUPER J. Watch Your Up-Convolution: CNN Based Generative Deep Neural Networks are Failing to Reproduce Spectral Distributions[C]// IEEE. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2020: 7890-7899.
[14]	FRANK J, EISENHOFER T, SCHONHERT L, et al. Leveraging Frequency Analysis for Deep Fake Image Recognition[C]// PMLR. International Conference on Machine Learning. Cambridge: PMLR, 2020: 3247-3258.
[15]	YOUNUS M A, HASAN T M. Effective and Fast DeepFake Detection Method Based on Haar Wavelet Transform[C]// IEEE. 2020 International Conference on Computer Science and Software Engineering. New York: IEEE, 2020: 186-190.
[16]	RICKER J, DAMM S, HOLZ T, et al. Towards the Detection of Diffusion Model Deepfakes[C]// Springer. Proceedings of International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications. Heidelberg: Springer, 2024: 446-457.
[17]	PONTORNO O, GUARNERA L, BATTIATO S. On the Exploitation of DCT-Traces in the Generative-AI Domain[C]// IEEE. 2024 IEEE International Conference on Image Processing (ICIP). New York: IEEE, 2024: 3806-3812.
[18]	QIAN Yuyang, YIN Guojun, SHENG Lu, et al. Thinking in Frequency: Face Forgery Detection by Mining Frequency-Aware Clues[C]// Springer. Computer Vision-ECCV 2020: The 16th European Conference on Computer Vision. Heidelberg: Springer, 2020: 86-103.
[19]	TAN Chuangchuang, ZHAO Yao, WEI Shikui, et al. Frequency-Aware Deepfake Detection: Improving Generalizability through Frequency Space Domain Learning[C]// AAAI. The AAAI Conference on Artificial Intelligence. Palo Alto: AAAI, 2024: 5052-5060.
[20]	DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An Image is Worth 16×16 Words: Transformers for Image Recognition at Scale[C]// ICLR. International Conference on Learning Representations. New York: ICLR, 2021.
[21]	VASWANI A, SHAZEER N, PARMAR N, et al. Attention is All You Need[C]// Curran Associates, Inc. Advances in Neural Information Processing Systems. New York: Curran Associates, Inc., 2017: 5998-6008.
[22]	WEI Gang, HE Qianhua, OUYANG Jingzheng. On Function Approximation Capability of Multilayer Perceptrons[J]. Information and Control, 1996, 25(6): 2-5.
[23]	LE T N, NGUYEN H H, YAMAGISHI J, et al. OpenForensics: Large-Scale Challenging Dataset for Multi-Face Forgery Detection and Segmentation In-the-Wild[C]// IEEE. International Conference on Computer Vision. New York: IEEE, 2021: 10117-10127.
[24]	ZI Bojia, CHANG Minghao, CHEN Jingjing, et al. WildDeepfake: A Challenging Real-World Dataset for Deepfake Detection[C]// ACM. The 28th ACM International Conference on Multimedia. New York: ACM, 2020: 2382-2390.
[25]	YAN Zhiyuan, YAO Taiping, CHEN Shen, et al. DF40: Toward Next-Generation Deepfake Detection[C]// NeurIPS. The 38th Conference on Neural Information Processing Systems Track on Datasets and Benchmarks. Cambridge: MIT Press, 2024: 29387-29434.
[26]	JEONG Y, KIM D, MIN S, et al. BiHPF: Bilateral High-Pass Filters for Robust Deepfake Detection[EB/OL]. (2021-09-02)[2025-07-04]. https://arxiv.org/abs/2109.00911.
[27]	JEONG Y, KIM D, RO Y, et al. FrePGAN: Robust Deepfake Detection Using Frequency-Level Perturbations[EB/OL]. (2022-02-07)[2025-07-04]. https://arxiv.org/abs/2202.03347.
[28]	WEI Jun, WANG Shuhui, HUANG Qingming. F³Net:Fusion, Feedback and Focus for Salient Object Detection[C]// AAAI. The 34th AAAI Conference on Artificial Intelligence. Palo Alto: AAAI, 2020: 12321-12328.

数据集	真实样本/个	伪造样本/个
Open- Forensics	14000	14000
WildDeep- fakes	15000	65000
DF40	20000	10000

模型名称	Accuracy	AUC	F1
ViT	87.65%	0.912	0.90
FEViT	94.57%	0.989	0.94

数据集	干扰方式
DF40-A	明度增强10%	明度增强20%	明度增强30%
DF40-B	添加杂色数量5%	添加杂色数量10%	添加杂色数量15%
DF40-C	图像压缩10%	图像压缩20%	图像压缩30%

模型	DF40-A	DF40-B	DF40-C
模型	Accuracy
ViT	73.6%	62.8%	70.3%
FreqNet	85.2%	77.4%	84.5%
FEViT	92.2%	87.4%	91.1%

模型	StyleGAN2	DF40
模型	Accuracy
BiHPF	77.0%	76.1%
FrePGAN	72.2%	78.0%
F³Net	82.2%	80.1%
FreqNet	88.0%	87.3%
FEViT	91.1%	94.6%

FEViT：一种基于频域增强ViT的深度伪造检测模型

FEViT: A Frequency Domain Enhanced ViT for Deepfake Detection

RichHTML

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

图/表 10

参考文献 28

相关文章 3

编辑推荐

Metrics

本文评价

[1]	陈咏豪, 蔡满春, 张溢文, 彭舒凡, 姚利峰, 朱懿. 多尺度多层次特征融合的深度伪造人脸检测方法[J]. 信息网络安全, 2025, 25(9): 1456-1464.
[2]	张新有, 高志超, 冯力, 邢焕来. 基于FFT-iTransformer的网络安全态势特征插补与预测[J]. 信息网络安全, 2025, 25(2): 228-239.
[3]	彭舒凡, 蔡满春, 刘晓文, 马瑞. 基于图像细粒度特征的深度伪造检测算法[J]. 信息网络安全, 2022, 22(11): 77-84.