信息网络安全 ›› 2021, Vol. 21 ›› Issue (10): 1-7.doi: 10.3969/j.issn.1671-1122.2021.10.001

• 优秀论文 • 上一篇    下一篇

基于混合特征和多通道GRU的伪造语音鉴别方法

潘孝勤, 杜彦辉()   

  1. 中国人民公安大学信息网络安全学院,北京100038
  • 收稿日期:2021-06-05 出版日期:2021-10-10 发布日期:2021-10-14
  • 通讯作者: 杜彦辉 E-mail:duyanhui@ppsuc.edu.cn
  • 作者简介:潘孝勤(1997—),女,江苏,硕士研究生,主要研究方向为网络安全、人工智能|杜彦辉(1969—),男,北京,教授,博士,主要研究方向为网络安全、大数据
  • 基金资助:
    国家重点研发计划(2017YFB0802804);中国人民公安大学基本科研业务费重大项目(2020JKF101)

Forged Voice Identification Method Based on Feature Fusion and Multi-channel GRU

PAN Xiaoqin, DU Yanhui()   

  1. College of Information and Cyber Security, People’s Public Security University of China, Beijing 100038, China
  • Received:2021-06-05 Online:2021-10-10 Published:2021-10-14
  • Contact: DU Yanhui E-mail:duyanhui@ppsuc.edu.cn

摘要:

为了解决现有鉴伪模型存在的泛化能力不强、检测准确率较低等难题,文章提出基于混合特征融合的多通道GRU伪造语音鉴别模型。该模型利用多通道挖掘不同输入特征的多尺度信息,同时引入注意力机制对多尺度特征进行融合并决策分类。在ASVspoof2019数据集上进行验证,所提方法对Logical Access伪造样本的检测准确率达到了96.30%,对Physical Access达到了87.33%,优于其他算法。实验结果证明,时频域特征融合的伪造语音检测方法能够学习更有效的真伪鉴别特征,获得更高的检测准确率。

关键词: 语音伪造检测, 多通道GRU, 特征融合, 深度学习

Abstract:

In order to solve the problems of poor generalization ability and low detection accuracy of existing counterfeit authentication models, this article proposes a three-channel GRU forged voice identification model based on hybrid feature fusion. Validated on the ASVspoof2019 dataset, the accuracy of the proposed method reaches 96.30% for the detection of fake Logical Access samples and 87.33% for that of the fake Physical Access samples, which is better than other algorithms. The experimental results prove that the fake voice detection method based on time-frequency domain feature fusion can learn more effective authenticity identification features and obtain higher detection accuracy.

Key words: speech forgery detection, multi-channel GRU, feature fusion, deep learning

中图分类号: