信息网络安全 ›› 2021, Vol. 21 ›› Issue (10): 1-7.doi: 10.3969/j.issn.1671-1122.2021.10.001
收稿日期:
2021-06-05
出版日期:
2021-10-10
发布日期:
2021-10-14
通讯作者:
杜彦辉
E-mail:duyanhui@ppsuc.edu.cn
作者简介:
潘孝勤(1997—),女,江苏,硕士研究生,主要研究方向为网络安全、人工智能|杜彦辉(1969—),男,北京,教授,博士,主要研究方向为网络安全、大数据
基金资助:
Received:
2021-06-05
Online:
2021-10-10
Published:
2021-10-14
Contact:
DU Yanhui
E-mail:duyanhui@ppsuc.edu.cn
摘要:
为了解决现有鉴伪模型存在的泛化能力不强、检测准确率较低等难题,文章提出基于混合特征融合的多通道GRU伪造语音鉴别模型。该模型利用多通道挖掘不同输入特征的多尺度信息,同时引入注意力机制对多尺度特征进行融合并决策分类。在ASVspoof2019数据集上进行验证,所提方法对Logical Access伪造样本的检测准确率达到了96.30%,对Physical Access达到了87.33%,优于其他算法。实验结果证明,时频域特征融合的伪造语音检测方法能够学习更有效的真伪鉴别特征,获得更高的检测准确率。
中图分类号:
潘孝勤, 杜彦辉. 基于混合特征和多通道GRU的伪造语音鉴别方法[J]. 信息网络安全, 2021, 21(10): 1-7.
PAN Xiaoqin, DU Yanhui. Forged Voice Identification Method Based on Feature Fusion and Multi-channel GRU[J]. Netinfo Security, 2021, 21(10): 1-7.
表3
逻辑方式伪造语音检测效果(单位:%)
模型 | ACC | EER | AUC | F1-score | Precision | Recall |
---|---|---|---|---|---|---|
RNN | 88.87 | 11.98 | 84.64 | 89.84 | 83.64 | 97.04 |
LSTM | 89.27 | 11.91 | 91.09 | 89.65 | 87.73 | 91.66 |
GRU | 88.93 | 7.17 | 97.65 | 89.92 | 83.58 | 97.31 |
Multi-channel RNN | 92.70 | 7.10 | 98.13 | 92.77 | 93.29 | 92.25 |
Multi-channel LSTM | 94.27 | 5.82 | 93.20 | 94.37 | 94.00 | 94.74 |
Multi-channel GRU | 96.30 | 0.34 | 99.99 | 96.48 | 93.20 | 100.00 |
表4
物理方式伪造语音检测效果(单位:%)
模型 | ACC | EER | AUC | F1-score | Precision | Recall |
---|---|---|---|---|---|---|
RNN | 79.87 | 21.44 | 96.45 | 78.84 | 83.83 | 74.40 |
LSTM | 82.43 | 19.35 | 92.99 | 81.52 | 86.78 | 76.85 |
GRU | 81.70 | 18.41 | 94.11 | 80.98 | 85.02 | 77.31 |
Multi-channel RNN | 83.77 | 15.26 | 88.75 | 82.13 | 92.25 | 74.01 |
Multi-channel LSTM | 83.99 | 14.07 | 93.22 | 84.95 | 80.88 | 89.46 |
Multi-channel GRU | 87.33 | 7.92 | 97.18 | 88.48 | 82.11 | 95.93 |
[1] | HANILI C, KINNUNEN T, SAHIDULLAH M, et al. Classifiers for Synthetic Speech Detection: A Comparison [C]//ISCA. Interspeech 2015, September 6-10, 2015, Dresden, Germany. Dresden: ISCA, 2015: 2057-2061. |
[2] | SAHIDULLAH M, KINNUNEN T, HANILCI C. A Comparison of Features for Synthetic Speech Detection [C]//ISCA. Interspeech 2015, September 6-10, 2015, Dresden, Germany. Dresden: ISCA, 2015: 2087-2091. |
[3] | CHEN Lianwu, GUO Wu, DAI Lirong. Speaker Verification Against Synthetic Speech [C]//IEEE. 7th International Symposium on Chinese Spoken Language Processing, November 29-December 3, 2010, Tainan, Taiwan, China. New Jersey: IEEE, 2010: 309-312. |
[4] | YANG Jichen, HE Qianhua, HU Yongjian, et al. CBC-based Synthetic Speech Detection[J]. International Journal of Digital Crime and Forensics (IJDCF), 2019, 11(2):63-74. |
[5] | TIAN Xiaohai, DU S, XIAO Xiong, et al. Detecting Synthetic Speech Using Long Term Magnitude and Phase Information [C]//IEEE. China Summit and International Conference on Signal and Information Processing (ChinaSIP), July 12-15, 2015, Chengdu, China. Chengdu: IEEE, 2015: 611-615. |
[6] | WU Zhizheng, XIAO Xiong, CHNG E S, et al. Synthetic Speech Detection Using Temporal Modulation Feature [C]//IEEE. International Conference on Acoustics, Speech and Signal Processing, May 26-31, 2013, Vancouver, BC, Canada. Vancouver: IEEE, 2013: 7234-7238. |
[7] | SAILOR H B, KAMBLE M R, PATIL H A. Unsupervised Representation Learning Using Convolutional Restricted Boltzmann Machine for Spoof Speech Detection [C]//ISCA. Interspeech 2017, August 20-24, 2017, Stockholm, Sweden. Stockholm: ISCA, 2017: 2601-2605. |
[8] | TIAN Xiaohai, XIAO Xiong, CHNG E S, et al. Spoofing Speech Detection Using Temporal Convolutional Neural Network [C]//IEEE. 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, December 13-16, 2016, Jeju, Korea(South). Jeju: IEEE, 2016: 1-6. |
[9] | WANG Run, JUEFEI-XU F, HUANG Yihao, et al. DeepSonar: Towards Effective and Robust Detection of AI-synthesized Fake Voices [C]//ACM. 28th ACM International Conference on Multimedia, October 12-16, 2020, Seattle, WA, USA. New York: ACM, 2020: 1207-1216. |
[10] | WU Zhenzong, DAS R K, YANG Jichen, et al. Light Convolutional Neural Network with Feature Genuinization for Detection of Synthetic Speech Attacks [C]//ISCA. Interspeech 2020, October 25-19, 2020, Shanghai, China. Shanghai: ISCA, 2020: 1101-1105. |
[11] | LI Xu, LI Na, WENG Chao, et al. Replay and Synthetic Speech Detection with Res2Net Architecture [C]//IEEE. International Conference on Acoustics, Speech and Signal Processing (ICASSP), June 6-11, 2021, Toronto, Ontario, Canada. Toronto: IEEE, 2021: 6354-6358. |
[12] | GOMEZ-ALANIS A, PEINADO A M, GONZALEZ J A, et al. A Deep Identity Representation for Noise Robust Spoofing Detection [C]//ISCA. Interspeech 2018, September 2-6, 2018, Hyderabad, India. Hyderabad: ISCA, 2018: 676-680. |
[13] |
DUA M, JAIN C, KUMAR S. LSTM and CNN Based Ensemble Approach for Spoof Detection Task in Automatic Speaker Verification Systems[J]. Journal of Ambient Intelligence and Humanized Computing, 2021, 12(2):1-16.
doi: 10.1007/s12652-020-02846-7 URL |
[14] | GOMEZ-ALANIS A, PEINADO A M, GONZALEZ J A, et al. A Light Convolutional GRU-RNN Deep Feature Extractor for ASV Spoofing Detection [C]//ISCA. Interspeech 2019, September 15-19, 2019, Graz, Austria. Graz: ISCA, 2019: 1068-1072. |
[15] | KAMBLE M R, PATIL H A. Detection of Replay Spoof Speech Using Teager Energy Feature Cues[EB/OL]. https://www.researchgate.net/publication/343659925_Detection_of_Replay_Spoof_Speech_Using_Teager_Energy_Feature_Cues, 2021-05-01. |
[16] | VARIANI E, SAINATH T N, SHAFRAN I, et al. Complex Linear Projection (CLP): A Discriminative Approach to Joint Feature Extraction and Acoustic Modeling [C]//ISCA. Interspeech 2016, September 8-12, 2016, San Francisco, CA, USA. San Francisco: ISCA, 2016: 808-812. |
[1] | 朱新同, 唐云祁, 耿鹏志. 基于特征融合的篡改与深度伪造图像检测算法[J]. 信息网络安全, 2021, 21(8): 70-81. |
[2] | 路宏琳, 王利明. 面向用户的支持用户掉线的联邦学习数据隐私保护方法[J]. 信息网络安全, 2021, 21(3): 64-71. |
[3] | 徐国天, 盛振威. 基于融合CNN与LSTM的DGA恶意域名检测方法[J]. 信息网络安全, 2021, 21(10): 41-47. |
[4] | 谭茹涵, 左黎明, 刘二根, 郭力. 基于图像特征融合的恶意代码检测[J]. 信息网络安全, 2021, 21(10): 90-95. |
[5] | 吴警, 芦天亮, 杜彦辉. 基于Char-RNN改进模型的恶意域名训练数据生成技术[J]. 信息网络安全, 2020, 20(9): 6-11. |
[6] | 王文华, 郝新, 刘焱, 王洋. AI系统的安全测评和防御加固方案[J]. 信息网络安全, 2020, 20(9): 87-91. |
[7] | 张蕾华, 黄进, 张涛, 王生玉. 视频侦查中人像智能分析应用及算法优化[J]. 信息网络安全, 2020, 20(5): 88-93. |
[8] | 王蓉, 马春光, 武朋. 基于联邦学习和卷积神经网络的入侵检测方法[J]. 信息网络安全, 2020, 20(4): 47-54. |
[9] | 毕新亮, 杨海滨, 杨晓元, 黄思远. 基于StarGAN的生成式图像隐写方案[J]. 信息网络安全, 2020, 20(12): 64-71. |
[10] | 程洋, 雷敏, 罗群. 基于深度学习的物联网终端设备接入认证方法[J]. 信息网络安全, 2020, 20(11): 67-74. |
[11] | 顾兆军, 郝锦涛, 周景贤. 基于改进双线性卷积神经网络的恶意网络流量分类算法[J]. 信息网络安全, 2020, 20(10): 67-74. |
[12] | 谢永恒, 冯宇波, 董清风, 王梅. 基于深度学习的数据接入方法研究[J]. 信息网络安全, 2019, 19(9): 36-40. |
[13] | 马春光, 郭瑶瑶, 武朋, 刘海波. 生成式对抗网络图像增强研究综述[J]. 信息网络安全, 2019, 19(5): 10-12. |
[14] | 方勇, 朱光夏天, 刘露平, 贾鹏. 基于深度学习的浏览器Fuzz样本生成技术研究[J]. 信息网络安全, 2019, 19(3): 26-33. |
[15] | 朱海麒, 姜峰. 人工智能时代面向运维数据的异常检测技术研究与分析[J]. 信息网络安全, 2019, 19(11): 24-35. |
阅读次数 | ||||||||||||||||||||||||||||||||||||||||||||||||||
全文 336
|
|
|||||||||||||||||||||||||||||||||||||||||||||||||
摘要 539
|
|
|||||||||||||||||||||||||||||||||||||||||||||||||