信息网络安全 ›› 2026, Vol. 26 ›› Issue (5): 684-698.doi: 10.3969/j.issn.1671-1122.2026.05.002

• 学术研究 • 上一篇    下一篇

基于频域分布对齐的无训练黑盒深伪攻击方法

虞楚尔1, 王菡悦1, 吴坚2,3,4, 丁伟杰3, 陈先钳2,4, 王总辉1()   

  1. 1 浙江大学计算机科学与技术学院, 杭州 310027
    2 浙江省公安厅网络安全保卫总队, 杭州 310009
    3 浙江警察学院信息网络安全学院, 杭州 310053
    4 浙江大学网络空间安全学院, 杭州 310027
  • 收稿日期:2026-01-20 出版日期:2026-05-10 发布日期:2026-06-03
  • 通讯作者: 王总辉 zhwang@zju.edu.cn
  • 作者简介:虞楚尔(1995—),女,浙江,博士研究生,主要研究方向为多媒体内容安全|王菡悦(2003—),女,河南,硕士研究生,主要研究方向为人工智能内容安全|吴坚(1980—),男,浙江,正高级工程师,硕士,主要研究方向为人工智能、网络安全和数字取证技术|丁伟杰(1980—),男,河南,教授,博士,主要研究方向为人工智能内容安全、数据智能分析及网络空间治理技术|陈先钳(1984—),男,浙江,正高级工程师,硕士,主要研究方向为电子数据取证技术|王总辉(1979—),男,浙江,高级工程师,博士,主要研究方向为内容安全、人工智能安全和计算机体系结构
  • 基金资助:
    国家重点研发计划(2024YFF0618800)

A Training-Free Black-Box Attack against DeepFake Detectors via Frequency Distribution Alignment

YU Chuer1, WANG Hanyue1, WU Jian2,3,4, DING Weijie3, CHEN Xianqian2,4, WANG Zonghui1()   

  1. 1 College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China
    2 Cybersecurity Corps, Zhejiang Provincial Public Security Department, Hangzhou 310009, China
    3 College of Information and Cyber Security, Zhejiang Police College, Hangzhou 310053, China
    4 School of Cyber Science and Technology, Zhejiang University, Hangzhou 310027, China
  • Received:2026-01-20 Online:2026-05-10 Published:2026-06-03

摘要:

随着深度伪造(简称“深伪”)技术生成质量不断逼近真实影像,深伪检测在多媒体内容安全中的作用愈发重要。然而,现有方法多依赖真实与伪造样本的统计偏差进行判别,使其在黑盒条件下面临对抗脆弱性。文章研究在无训练、零查询黑盒场景下检测系统的安全性,提出一种新的攻击视角,将对抗攻击建模为对伪造图像统计指纹的定向修复,而非外源噪声扰动。基于此,文章提出频域分布对齐的无训练黑盒攻击方法(SpectralFusion)。该方法利用深伪生成中天然存在的真实-伪造成对先验,在无需访问检测器参数、梯度及额外训练与查询的前提下,通过频域分析定位伪造图像与真实参考的统计差异,并仅对异常频段进行受控修正。具体地,设计差异感知掩码以精确定位异常频段,并引入自适应融合强度机制动态调节修正能量;结合局部重叠块的频谱处理策略,实现对伪造频域特征的精细对齐与重建。实验结果表明,SpectralFusion在保持较高视觉保真度的同时,能够稳定削弱多种检测模型的判别置信度,并在不同模型架构与伪造类型下展现出良好泛化能力。研究结果揭示了深伪检测模型在频域统计层面的潜在安全隐患,为评估黑盒检测系统的鲁棒性提供了新视角。

关键词: 深度伪造检测, 黑盒攻击, 频域分布对齐

Abstract:

As deepfake generation quality increasingly approaches that of real images, deepfake detection has become crucial for multimedia content security. However, most existing methods rely on statistical discrepancies between real and fake samples, rendering them potentially vulnerable under black-box conditions. This paper investigated the security of deepfake detection systems in a training-free, zero-query black-box setting and introduced a novel attack perspective: modeling adversarial attacks as targeted correction of statistical fingerprints in fake images, rather than exogenous noise perturbations. Based on this insight, this paper proposed SpectralFusion, a training-free black-box method that aligns frequency-domain distribution distributions. Leveraging the inherent real-fake paired prior present in deepfake generation, SepctralFusion identifies statistical discrepancies between fake images and their real references through frequency-domain analysis and applies controlled corrections only to anomalous frequency bands—without accessing model parameters, gradients, or additional training and queries. Specifically, we designed a difference-aware frequency band mask to accurately localize abnormal frequency components, and introduced an adaptive fusion strength mechanism to dynamically regulate correction intensity. Combined with a local overlapping block-based frequency processing strategy, our method enables fine-grained alignment and reconstruction of manipulated frequency features. Extensive experiments results show that SpectralFusion consistently deceives multiple deepfake detection models while preserving high visual fidelity, and generalizes well across diverse model architectures and manipulation types. Our findings reveal inherent vulnerabilities of deepfake detectors in the frequency-domain statistical space, offering a new perspective for evaluating the robustness of black-box detection systems in real-world scenarios.

Key words: deepfake detection, black-box attack, frequency distribution alignment

中图分类号: