信息网络安全 ›› 2024, Vol. 24 ›› Issue (8): 1173-1183.doi: 10.3969/j.issn.1671-1122.2024.08.004

• 理论研究 • 上一篇    下一篇

基于多尺度特征融合重建学习的深度伪造人脸检测算法

许楷文1, 周翊超1, 谷文权2, 陈晨3, 胡晰远1()   

  1. 1.南京理工大学计算机科学与工程学院,南京 210094
    2.鹿邑县公安局视频侦查大队,周口 477299
    3.中国科学院自动化研究所,北京 100190
  • 收稿日期:2024-05-11 出版日期:2024-08-10 发布日期:2024-08-22
  • 通讯作者: 胡晰远 huxy@njust.edu.cn
  • 作者简介:许楷文(1996—),男,江苏,博士研究生,主要研究方向为深度伪造检测|周翊超(1983—),男,山东,副研究员,博士,主要研究方向为多模态信息融合与处理|谷文权(1982—),男,河南,主要研究方向为视频侦查与影像物证检验|陈晨(1982—),女,河南,副研究员,博士,主要研究方向为机器学习理论、流形学习|胡晰远(1984—),男,浙江,教授,博士,主要研究方向为人工智能理论及其应用、图像与视频处理
  • 基金资助:
    国家自然科学基金(62172227)

A Multi-Scale Feature Fusion Deepfake Detection Algorithm Based on Reconstruction Learning

XU Kaiwen1, ZHOU Yichao1, GU Wenquan2, CHEN Chen3, HU Xiyuan1()   

  1. 1. School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China
    2. Luyi County Public Security Bureau Video Investigation Brigade, Zhoukou 477299, China
    3. Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
  • Received:2024-05-11 Online:2024-08-10 Published:2024-08-22

摘要:

随着深度伪造技术的快速发展,针对深度伪造人脸的检测已经成为计算机视觉领域的研究热点。虽然现有的基于噪声、局部纹理或频率特征的检测方法能够在特定场景中表现出良好的检测效果,但这些方法缺乏对人脸细粒度表征特征的深入挖掘,限制了其泛化能力。为了解决上述问题,文章提出了一种新型的基于多尺度特征融合重建的分类网络模型MSFFR,该网络模型从重建学习的角度学习挖掘人脸细粒度内容和梯度表征特征信息,并采用多尺度特征融合的方式实现伪造人脸的检测,通过融合这两种信息来识别伪造面孔。文章提出的模型包含3个创新模块,设计了双分支特征提取模块,用于揭示真实人脸与伪造人脸之间的分布差异;提出了细粒度内容和梯度特征融合模块,用于探索挖掘人脸细粒度内容特征与梯度特征之间的相关性;引入了基于重建视差的双向注意力模块,有效地指导模型对融合后的特征进行分类。在大规模基准数据集上进行的广泛实验表明,与现有技术相比,文章提出的方法在检测性能方面具有显著提高,尤其是在泛化能力方面表现出色。

关键词: 深伪检测, 多尺度特征融合, 重建学习, 深度生成模型

Abstract:

With the rapid development of deepfake technology, the detection of deepfake faces has become a research hotspot in the field of computer vision. Although existing detection methods based on noise, local texture, or frequency features can exhibit good detection performance to a certain extent or in specific scenarios, these methods lack in-depth exploration of fine-grained facial representation features, limiting their generalization ability. To address the above issues, this paper proposed a novel classification network model based on multi-scale feature fusion reconstruction MSFFR. This network explored fine-grained facial content and gradient representation features from the perspective of reconstruction learning and achieved deepfake face detection through multi-scale feature fusion. The model included three innovative modules, a dual-branch feature extraction module designed to reveal distribution differences between real and fake faces; a fine-grained content and gradient feature fusion module to explore the correlation between fine-grained content features and gradient features of faces; a bidirectional attention module based on reconstruction disparity, effectively guiding the model to classify the fused features. Extensive experiments conducted on large-scale benchmark datasets demonstrate that, compared with existing state-of-the-art techniques, the proposed method significantly improves detection performance, especially in terms of generalization ability.

Key words: deepfake detection, multi-scale feature fusion, reconstruction learning, deep generative model

中图分类号: