基于多特征感知和注意力机制的深度伪造图像检测研究

doi:10.3969/j.issn.1671-1122.2026.04.011

摘要/Abstract

摘要：

随着生成对抗网络（GAN）和扩散技术的不断进步，生成的图像在视觉质量上已经达到一个较高水平，与真实图像几乎难以分辨，这对个人隐私和社会安全均构成潜在威胁。为应对这一挑战，文章提出一种多特征融合的深度伪造图像检测模型，该模型结合全局、局部和颜色特征，以全面捕捉生成图像中的伪造痕迹，进而准确识别图像真伪。全局分支聚焦提取整个图像的全局空间信息，局部分支通过细粒度选择模块关注关键区域的局部特征，而颜色分支则增强了对不同颜色空间中伪造特征的适应性。将这些特征通过注意力机制进行融合，全面提升对深度伪造图像伪造痕迹的捕捉能力。通过在14个GAN和5个扩散模型数据集上的实验，验证了该方法对不同生成模型均具有较高的检测准确性和泛化能力，为深度伪造图像的检测提供了一种高效且可靠的解决方案。

关键词: 深度伪造图像检测, 生成对抗网络, 扩散模型, 颜色差异, 注意力机制

Abstract:

With the continuous advancement of GAN and diffusion technologies, the visual quality of generated images had reached an exceptionally high level, making them nearly indistinguishable from real images. This posed potential threats to personal privacy and social security. To address this challenge, a multi-feature fusion model for deepfake image detection was proposed, integrating global, local, and color features to comprehensively capture forgery traces in generated images and accurately identify their authenticity. The global branch focused on extracting the overall spatial information of the image, the local branch employed a fine-grained selection module to capture local features in key regions, and the color branch enhanced adaptability to forgery features across different color spaces. These features were fused through an attention mechanism, which significantly improved the capability of capturing forgery traces in deepfake images. Extensive experiments conducted on 14 GAN datasets and 5 diffusion model datasets demonstrate that the proposed method achieves high detection accuracy and strong generalization ability across different generative models, providing an efficient and reliable solution for deepfake image detection.

Key words: deepfake image detection, generative adversarial network, diffusion model, color disparities, attention mechanism

中图分类号:

TP309

袁小刚, 裴桓, 安德智, 万建鑫. 基于多特征感知和注意力机制的深度伪造图像检测研究[J]. 信息网络安全, 2026, 26(4): 642-653.

YUAN Xiaogang, PEI Huan, AN Dezhi, WAN Jianxin. Research on Deepfake Image Detection Based on Multi-Feature Perception and Attention Mechanism[J]. Netinfo Security, 2026, 26(4): 642-653.

图/表 14

图1

图2

图3

图4

图5

表1

表2

表3

表4

图6

表5

表6

图7

图8

参考文献 40

[1]	HO J, JAIN A, ABBEEL P. Denoising Diffusion Probabilistic Models[J]. Advances in Neural Information Processing Systems, 2020, 33: 6840-6851.
[2]	RAMESH A, PAVLOV M, GOH G, et al. Zero-Shot Text-to-Image Generation[C]// PMLR. International Conference on Machine Learning. New York: PMLR, 2021: 8821-8831.
[3]	THOMSON T, ANGUS D, DOOTSON P, et al. Visual Mis/Disinformation in Journalism and Public Communications: Current Verification Practices, Challenges and Future Opportunities[J]. Journalism Practice, 2022, 16(5): 938-962.
[4]	BRUNDAGE M, AVIN S, CLARK J, et al. The Malicious Use of Artificial Intelligence: Forecasting, Prevention and Mitigation[EB/OL].(2018-02)[2026-01-02]. https://arxiv.org/pdf/1802.07228.
[5]	BAI Weiming, ZHANG Zhipeng, LI Bing, et al. Robust Texture-Aware Computer-Generated Image Forensic: Benchmark and Algorithm[J]. IEEE Trans Image Process, 2021, 30: 8439-8453.
[6]	YANG Ke, LI Yongliang, HE Jindong, et al. Deepfake Face Detection Based on Masked Image Modeling[J]. Computer Applications, 2025, 45: 72-77.
	杨珂, 李永亮, 何金栋, 等. 基于掩码图像建模的深度伪造人脸检测[J]. 计算机应用, 2025, 45: 72-77.
[7]	LI Jialin, SHEN Zhe. Channel-Adaptive Deepfake Image Detection Enhanced by Spatial Domain Features[J]. Journal of Computer-Aided Design and Graphics, 2025, 37(2): 313-320.
	李佳林, 沈哲. 空间域增强的通道自适应深度伪造图像检测方法[J]. 计算机辅助设计与图形学学报, 2025, 37(2): 313-320.
[8]	WANG S, WANG O, ZHANG R, et al. CNN-Generated Images Are Surprisingly Easy to Spot. for Now[C]// IEEE. The IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2020: 8695-8704.
[9]	LI Weichuang, HE Peisong, LI Haoliang, et al. Detection of GAN-Generated Images by Estimating Artifact Similarity[J]. IEEE Signal Processing Letters, 2021, 29: 862-866.
[10]	CHEN Beijing, LIU Xin, ZHENG Yuhui, et al. A Robust GAN-Generated Face Detection Method Based on Dual-Color Spaces and An Improved Xception[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2021, 32(6): 3527-3538.
[11]	LIU Yun, WAN Zuliang, YIN Xiaohua, et al. Detection of GAN Generated Image Using Color Gradient Representation[J]. Journal of Visual Communication and Image Representation, 2023, 95: 1-9.
[12]	LIU Honggu, LI Xiaodan, ZHOU Wenbo, et al. Spatial-Phase Shallow Learning: Rethinking Face Forgery Detection in Frequency Domain[C]// IEEE. The IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2021: 772-781.
[13]	GAO Yuan, ZHANG Yu, ZENG Ping, et al. Refining Localized Attention Features with Multi-Scale Relationships for Enhanced Deepfake Detection in Spatial-Frequency Domain[J]. Electronics, 2024, 13(9): 1749.
[14]	CHAI L, BAU D, LIM S, et al. What Makes Fake Images Detectable? Understanding Properties That Generalize[C]// Springer. European Conference on Computer Vision. Heidelberg: Springer, 2020: 103-120.
[15]	CHEN Beijing, JU Xingwang, XIAO Bin, et al. Locally GAN-Generated Face Detection Based on An Improved Xception[J]. Information Sciences, 2021, 572: 16-28.
[16]	SONG Jiajun, LIU Guixiong, HUANG Jiaxi, et al. HiFi-Net Deepfake Image Detection Based on U-HRNet and SoftTripleLoss[J]. China Testing, 2023, 49(9): 37-45.
	宋家骏, 刘桂雄, 黄家曦, 等. 应用U-HRNet+SoftTripleLoss的HiFi-Net伪造图像检测技术研究[J]. 中国测试, 2023, 49(9): 37-45.
[17]	RAJ S, MATHEW J, MONDAL A. Generalized and Robust Model for GAN-Generated Image Detection[J]. Pattern Recognition Letters, 2024, 182: 104-110.
[18]	XI Ziyi, LIN Hao, LUO Weiqi. Dual Stream Computer-Generated Image Detection Network Based on Channel Joint and Softpool[EB/OL].(2022-07-07)[2026-01-02]. https://arxiv.org/pdf/2207.03205.
[19]	ZHAO Lei, ZHANG Mingcheng, DING Hongwei, et al. MFF-Net: Deepfake Detection Network Based on Multi-Feature Fusion[J]. Entropy, 2021, 23(12): 1692-1706.
[20]	CORVI R, COZZOLINO D, ZINGARINI G, et al. On the Detection of Synthetic Images Generated by Diffusion Models[C]// IEEE. ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). New York: IEEE, 2023: 1-5.
[21]	CAZENAVETTE G, SUD A, LEUNG T, et al. Fakeinversion: Learning to Detect Images from Unseen Text-to-Image Models by Inverting Stable Diffusion[C]// IEEE. The IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2024: 10759-10769.
[22]	LORENZ P, DURALL R L, KEUPER J. Detecting Images Generated by Deep Diffusion Models Using Their Local Intrinsic Dimensionality[C]// IEEE. The IEEE/CVF International Conference on Computer Vision. New York: IEEE, 2023: 448-459.
[23]	SANTOSH L L, AMERINI I, WANG Xin, et al. Robust CLIP-Based Detector for Exposing Diffusion Model-Generated Images[EB/OL].(2024-09-08)[2026-01-02]. https://arxiv.org/pdf/2404.12908.
[24]	WANG Zhendong, BAO Jianmin, ZHOU Wengang, et al. Dire for Diffusion-Generated Image Detection[C]// IEEE. The IEEE/CVF International Conference on Computer Vision. New York: IEEE, 2023: 22445-22455.
[25]	CHEN Baoying, ZENG Jishen, YANG Jianquan, et al. Drct: Diffusion Reconstruction Contrastive Training towards Universal Detection of Diffusion Generated Images[C]// PMLR. Forty-First International Conference on Machine Learning. New York: PLMR, 2024: 7621-7639.
[26]	HE Kaiming, ZHANG Xiangyu, REN Shaoqing, et al. Deep Residual Learning for Image Recognition[C]// IEEE. The IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2016: 770-778.
[27]	KARRAS T, AILA T, LAINE S, et al. Progressive Growing of GANs for Improved Quality, Stability and Variation[EB/OL].(2018-02-26)[2026-01-02]. https://arxiv.org/pdf/1710.10196.
[28]	TAN Chuangchuang, TAO Renshuai, LIU Huan, et al. Gangen-Detection: A Dataset Generated by Gans for Generalizable Deepfake Detection[EB/OL].( 2024)[2026-01-02]. https://github.com/chuangchuangtan/GANGen-Detection.
[29]	TAN Chuangchuang, ZHAO Yao, WEI Shikui, et al. Rethinking the Up-Sampling Operations in CNN-Based Generative Network for Generalizable Deepfake Detection[C]// IEEE. The IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2024: 28130-28139.
[30]	YU F, SEFF A, ZHANG Yinda, et al. Lsun: Construction of a Large-Scale Image Dataset Using Deep Learning with Humans in the Loop[EB/OL].(2016-06-04)[2026-01-02]. https://arxiv.org/pdf/1506.03365.
[31]	PASZKE A, GROSS S, MASSA F, et al. Pytorch: An Imperative Style, High-Performance Deep Learning Library[J]. Advances in Neural Information Processing Systems, 2019, 32: 8026-8037.
[32]	KINGMA D P, BA J L. Adam: A Method for Stochastic Optimization[EB/OL].(2017-01-30)[2026-01-02]. https://arxiv.org/pdf/1412.6980.
[33]	FRANK J, EISENHOFER T, SCHÖNHERR L, et al. Leveraging Frequency Analysis for Deep Fake Image Recognition[C]// PMLR. International Conference on Machine Learning. New York: PMLR, 2020: 3247-3258.
[34]	DURALL R, KEUPER M, KEUPER J. Watch Your Up-Convolution: CNN Based Generative Deep Neural Networks Are Failing to Reproduce Spectral Distributions[C]// IEEE. The IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2020: 7890-7899.
[35]	JEONG Y, KIM D, MIN S, et al. Bihpf: Bilateral High-Pass Filters for Robust Deepfake Detection[C]// IEEE. The IEEE/CVF Winter Conference on Applications of Computer Vision. New York: IEEE, 2022: 48-57.
[36]	TAN Chuangchuang, ZHAO Yao, WEI Shuikui, et al. Learning on Gradients: Generalized Artifacts Representation for GAN-Generated Images Detection[C]// IEEE. The IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2023: 12105-12114.
[37]	QIAN Yuyang, YIN Guojun, SHENG Lu, et al. Thinking in frequency: Face Forgery Detection by Mining Frequency-Aware Clues[C]// Springer. European Conference on Computer Vision. Heidelberg: Springer, 2020: 86-103.
[38]	JEONG Y, KIM D, RO Y, et al. Frepgan: Robust Deepfake Detection Using Frequency-Level Perturbations[C]// AAAI. The AAAI Conference on Artificial Intelligence. Menlo Park: AAAI, 2022: 1060-1068.
[39]	SHIOHARA K, YAMASAKI T. Detecting Deepfakes with Self-Blended Images[C]// IEEE. The IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2022: 18720-18729.
[40]	MANDELLI S, BONETTINI N, BESTAGINI P, et al. Detecting GAN-Generated Images by Orthogonal Training of Multiple CNNs[C]// IEEE. 2022 IEEE International Conference on Image Processing. New York: IEEE, 2022: 3091-3095.

模型	分类器	同类测试		跨类测试		全类测试
模型	分类器	ACC	AP	ACC	AP	ACC	AP
基于像素方法	ResNet-50	72.4%	68.7%	61.9%	58.6%	64.5%	61.1%
文献[8]模型	ResNet-50	50.5%	66.6%	50.0%	58.3%	50.2%	60.4%
文献[33]模型	ResNet-50	93.3%	89.7%	73.5%	68.1%	78.4%	73.5%
文献[34]模型	SVM(rbf)	88.3%	83.0%	62.0%	59.2%	68.5%	65.1%
文献[34]模型	SVM(poly)	88.8%	83.9%	62.0%	59.1%	68.7%	65.3%
文献[34]模型	SVM(linear)	81.1%	74.1%	60.2%	57.0%	65.4%	61.3%
文献[34]模型	Linear Reg.	79.9%	73.2%	60.5%	57.0%	65.3%	61.1%
文献[35]模型	ResNet-50	94.8%	93.5%	73.4%	69.0%	78.7%	75.2%
文献[17]模型	Attention	95.0%	94.1%	77.0%	68.3%	80.0%	79.0%
本文模型	Attention	98.5%	99.1%	81.9%	98.6%	92.6%	98.7%

模型	IDDPM		ADM		DDPM		Midjourney		DALLE		Mean
模型	ACC	AP	ACC	AP	ACC	AP	ACC	AP	ACC	AP	ACC	AP
文献[8]模型	48.3%	52.6%	53.4%	64.4%	50.0%	63.3%	48.6%	38.5%	49.3%	44.7%	49.9%	52.7%
文献[33]模型	70.5%	85.7%	67.3%	72.2%	47.6%	43.1%	39.7%	40.8%	68.7%	65.2%	58.8%	61.4%
文献[34]模型	63.2%	71.7%	39.1%	40.8%	54.1%	53.6%	45.7%	47.2%	53.9%	52.2%	51.2%	53.1%
文献[39]模型	63.5%	62.5%	57.1%	60.1%	55.3%	57.7%	54.3%	56.4%	48.8%	47.4%	55.8%	56.8%
文献[40]模型	47.9%	57.0%	51.0%	56.1%	47.3%	45.5%	50.0%	44.7%	49.8%	49.7%	49.2%	50.6%
文献[36]模型	45.2%	46.9%	72.7%	79.3%	59.8%	88.5%	68.3%	76.0%	75.1%	80.9%	64.2%	74.3%
本文模型	68.6%	74.6%	70.8%	82.7%	58.2%	81.0%	74.5%	69.0%	68.9%	77.2%	68.2%	76.9%

通道	文献[8]通道		文献[28]通道		文献[29]通道
通道	mACC	mAP	mACC	mAP	mACC	mAP
H	86.7%	93.4%	84.6%	91.2%	68.2%	76.9%
S	80.1%	90.2%	75.7%	76.8%	60.7%	69.5%
V	79.3%	92.3%	70.8%	91.1%	61.1%	75.8%

模型	文献[8]模型		文献[28]模型		文献[29]模型
模型	mACC	mAP	mACC	mAP	mACC	mAP
消融颜色分支	84.5%	90.7%	82.5%	90.6%	65.4%	74.1%
消融GLFCM模块	77.3%	89.6%	79.2%	90.7%	63.5%	72.3%
消融局部分支	72.3%	81.0%	78.0%	83.6%	59.8%	62.6%
消融FFM模块	50.0%	50.2%	50.7%	50.5%	49.7%	50.3%
本文模型	86.7%	93.4%	84.6%	91.2%	68.2%	76.9%