FEViT: A Frequency Domain Enhanced ViT for Deepfake Detection

doi:10.3969/j.issn.1671-1122.2026.03.009

Abstract

Abstract:

The rapid advancement of deepfake technology has led to increasing concerns over social security issues, including AI-based face-swapping, identity forgery, portrait rights violations, and the dissemination of false information. Current deepfake detection methods often rely heavily on specific datasets, resulting in data bias and making it challenging to capture generalizable forgery features across different algorithms and scenarios. Consequently, these methods generally exhibit reduced detection accuracy and limited generalization ability when faced with novel forgery techniques. In response to this, the present study proposed a deepfake detection method FEViT that integrated high-frequency artifact information with visual transformers to enhance the model’s ability to generalize across forgeries from diverse sources. The approach employed a multi-dimensional optimization strategy: first, high-frequency artifact features were accurately extracted by combining Fourier transform and high-pass filtering, thereby amplifying frequency domain differences; second, three optimizations were applied to the visual transformer architecture to improve sensitivity to local anomalies and enhance the classification of complex features. Experimental results demonstrate that the proposed method outperforms existing detection techniques across multiple public datasets, with significant improvements in accuracy, AUC, and F1 score, achieving an average accuracy increase of 8% to 16.4%, and showing strong detection performance and generalization ability.

Key words: deepfake detection, visual transformer, high-frequency artifacts, Fourier transform

CLC Number:

TP309

CHEN Yuqi, QIAN Hanwei, XIA Lingling, WANG Qun. FEViT: A Frequency Domain Enhanced ViT for Deepfake Detection[J]. Netinfo Security, 2026, 26(3): 432-441.

Figures/Tables 10

References 28

[1]	GOODFELLOW I, POUGET-ABADIE J, MIRZA M, et al. Generative Adversarial Networks[J]. Communications of the ACM, 2020, 63(11): 139-144. doi: 10.1145/3422622 URL
[2]	HO J, JAIN A, ABBEEL P. Denoising Diffusion Probabilistic Models[J]. Advances in Neural Information Processing Systems, 2020, 33: 6840-6851.
[3]	PENG Chunlei, MIAO Zimin, LIU Decheng, et al. Where Deepfakes Gaze at Spatial-Temporal Gaze Inconsistency Analysis for Video Face Forgery Detection[J]. IEEE Transactions on Information Forensics and Security, 2024, 19: 4507-4517. doi: 10.1109/TIFS.2024.3381823 URL
[4]	HEO Y J, YEO W H, KIM B G. Deepfake Detection Algorithm Based on Improved Vision Transformer[J]. Applied Intelligence, 2022, 53(7): 7512-7527. doi: 10.1007/s10489-022-03867-9
[5]	MARRA F, GRAGNANIELLO D, VERDOLIVA L, et al. A Full-Image Full-Resolution End-to-End-Trainable CNN Framework for Image Forgery Detection[J]. IEEE Access, 2020, 8: 133488-133502. doi: 10.1109/Access.6287639 URL
[6]	ROSSLER A, COZZOLINO D, VERDOLIVA L, et al. FaceForensics++: Learning to Detect Manipulated Facial Images[C]// IEEE. The IEEE/CVF International Conference on Computer Vision. New York: IEEE, 2019: 11935-11944.
[7]	THIES J, ZOLLHOFER M, STAMMINGER M, et al. Face2Face: Real-Time Face Capture and Reenactment of RGB Videos[C]// IEEE. The IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2016: 2387-2395.
[8]	KARRAS T, LAINE S, AILA T. A Style-Based Generator Architecture for Generative Adversarial Networks[C]// IEEE. The IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2019: 4401-4410.
[9]	LI Yuezun, LYU Siwei. Exposing DeepFake Videos by Detecting Face Warping Artifacts[C]// IEEE. IEEE International Workshop on Information Forensics and Security (WIFS). New York: IEEE, 2018: 1-7.
[10]	AFCHAR D, NOZICK V, YAMAGISHI J, et al. MesoNet: A Compact Facial Video Forgery Detection Network[C]// IEEE. 2018 IEEE International Workshop on Information Forensics and Security. New York: IEEE, 2018: 1-7.
[11]	CUNHA L, ZHANG Li, SOWAN B, et al. Video Deepfake Detection Using Particle Swarm Optimization Improved Deep Neural Networks[J]. Neural Computing and Applications, 2024, 36: 8417-8453. doi: 10.1007/s00521-024-09536-x
[12]	ZHAO Hanqing, ZHOU Wenbo, CHEN Dongdong, et al. Multi-Attentional DeepFake Detection[C]// IEEE. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2021: 2185-2194.
[13]	DURALL R, KEUPER M, KEUPER J. Watch Your Up-Convolution: CNN Based Generative Deep Neural Networks are Failing to Reproduce Spectral Distributions[C]// IEEE. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2020: 7890-7899.
[14]	FRANK J, EISENHOFER T, SCHONHERT L, et al. Leveraging Frequency Analysis for Deep Fake Image Recognition[C]// PMLR. International Conference on Machine Learning. Cambridge: PMLR, 2020: 3247-3258.
[15]	YOUNUS M A, HASAN T M. Effective and Fast DeepFake Detection Method Based on Haar Wavelet Transform[C]// IEEE. 2020 International Conference on Computer Science and Software Engineering. New York: IEEE, 2020: 186-190.
[16]	RICKER J, DAMM S, HOLZ T, et al. Towards the Detection of Diffusion Model Deepfakes[C]// Springer. Proceedings of International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications. Heidelberg: Springer, 2024: 446-457.
[17]	PONTORNO O, GUARNERA L, BATTIATO S. On the Exploitation of DCT-Traces in the Generative-AI Domain[C]// IEEE. 2024 IEEE International Conference on Image Processing (ICIP). New York: IEEE, 2024: 3806-3812.
[18]	QIAN Yuyang, YIN Guojun, SHENG Lu, et al. Thinking in Frequency: Face Forgery Detection by Mining Frequency-Aware Clues[C]// Springer. Computer Vision-ECCV 2020: The 16th European Conference on Computer Vision. Heidelberg: Springer, 2020: 86-103.
[19]	TAN Chuangchuang, ZHAO Yao, WEI Shikui, et al. Frequency-Aware Deepfake Detection: Improving Generalizability through Frequency Space Domain Learning[C]// AAAI. The AAAI Conference on Artificial Intelligence. Palo Alto: AAAI, 2024: 5052-5060.
[20]	DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An Image is Worth 16×16 Words: Transformers for Image Recognition at Scale[C]// ICLR. International Conference on Learning Representations. New York: ICLR, 2021.
[21]	VASWANI A, SHAZEER N, PARMAR N, et al. Attention is All You Need[C]// Curran Associates, Inc. Advances in Neural Information Processing Systems. New York: Curran Associates, Inc., 2017: 5998-6008.
[22]	WEI Gang, HE Qianhua, OUYANG Jingzheng. On Function Approximation Capability of Multilayer Perceptrons[J]. Information and Control, 1996, 25(6): 2-5.
[23]	LE T N, NGUYEN H H, YAMAGISHI J, et al. OpenForensics: Large-Scale Challenging Dataset for Multi-Face Forgery Detection and Segmentation In-the-Wild[C]// IEEE. International Conference on Computer Vision. New York: IEEE, 2021: 10117-10127.
[24]	ZI Bojia, CHANG Minghao, CHEN Jingjing, et al. WildDeepfake: A Challenging Real-World Dataset for Deepfake Detection[C]// ACM. The 28th ACM International Conference on Multimedia. New York: ACM, 2020: 2382-2390.
[25]	YAN Zhiyuan, YAO Taiping, CHEN Shen, et al. DF40: Toward Next-Generation Deepfake Detection[C]// NeurIPS. The 38th Conference on Neural Information Processing Systems Track on Datasets and Benchmarks. Cambridge: MIT Press, 2024: 29387-29434.
[26]	JEONG Y, KIM D, MIN S, et al. BiHPF: Bilateral High-Pass Filters for Robust Deepfake Detection[EB/OL]. (2021-09-02)[2025-07-04]. https://arxiv.org/abs/2109.00911.
[27]	JEONG Y, KIM D, RO Y, et al. FrePGAN: Robust Deepfake Detection Using Frequency-Level Perturbations[EB/OL]. (2022-02-07)[2025-07-04]. https://arxiv.org/abs/2202.03347.
[28]	WEI Jun, WANG Shuhui, HUANG Qingming. F³Net:Fusion, Feedback and Focus for Salient Object Detection[C]// AAAI. The 34th AAAI Conference on Artificial Intelligence. Palo Alto: AAAI, 2020: 12321-12328.

数据集	真实样本/个	伪造样本/个
Open- Forensics	14000	14000
WildDeep- fakes	15000	65000
DF40	20000	10000

模型名称	Accuracy	AUC	F1
ViT	87.65%	0.912	0.90
FEViT	94.57%	0.989	0.94

数据集	干扰方式
DF40-A	明度增强10%	明度增强20%	明度增强30%
DF40-B	添加杂色数量5%	添加杂色数量10%	添加杂色数量15%
DF40-C	图像压缩10%	图像压缩20%	图像压缩30%

模型	DF40-A	DF40-B	DF40-C
模型	Accuracy
ViT	73.6%	62.8%	70.3%
FreqNet	85.2%	77.4%	84.5%
FEViT	92.2%	87.4%	91.1%

模型	StyleGAN2	DF40
模型	Accuracy
BiHPF	77.0%	76.1%
FrePGAN	72.2%	78.0%
F³Net	82.2%	80.1%
FreqNet	88.0%	87.3%
FEViT	91.1%	94.6%