Netinfo Security ›› 2026, Vol. 26 ›› Issue (3): 420-431.doi: 10.3969/j.issn.1671-1122.2026.03.008
Previous Articles Next Articles
QIN Zhenkai1,2, LUO Qining1, NONG Xunyi1, YU Xiaochuan1,2(
), CAO Xiaochun3
Received:2025-08-10
Online:2026-03-10
Published:2026-03-30
CLC Number:
QIN Zhenkai, LUO Qining, NONG Xunyi, YU Xiaochuan, CAO Xiaochun. Multi-Level Speech Emotion Recognition Model Integrating Gender and Emotional Intensity Cue Features[J]. Netinfo Security, 2026, 26(3): 420-431.
Add to citation manager EndNote|Ris|BibTeX
URL: http://netinfo-security.org/EN/10.3969/j.issn.1671-1122.2026.03.008
| [1] |
NEAL T M S, SLOBOGIN C, SAKS M J, et al. Psychological Assessments in Legal Contexts: Are Courts Keeping “Junk Science” Out of the Courtroom[J]. Psychological Science in the Public Interest, 2019, 20(3): 135-164.
doi: 10.1177/1529100619888860 URL |
| [2] |
CHAO Yadong, WANG Huapeng, LIU En, et al. Polygraphing from Speech Stress through Layered Voice Analysis[J]. Forensic Science and Technology, 2020, 45(2): 155-159.
doi: 10.16467/j.1008-3650.2020.02.008 |
|
晁亚东, 王华朋, 刘恩, 等. 基于语音情感分析系统的语音压力测谎[J]. 刑事技术, 2020, 45(2):155-159.
doi: 10.16467/j.1008-3650.2020.02.008 |
|
| [3] | KAPPEN M, VANHOLLEBEKE G, VAN D D J, et al. Acoustic and Prosodic Speech Features Reflect Physiological Stress but Not Isolated Negative Affect: A Multi-Paradigm Study on Psychosocial Stressors[EB/OL]. (2024-03-06)[2025-06-10]. https://doi.org/10.1038/s41598-024-55550-3. |
| [4] |
WANG Shanmin, LIU Chengguang, CHEN Shengyu, et al. A Survey of Multimodal Emotion Recognition from Facial Expressions, Audios, and Language[J]. Journal of Image and Graphics, 2025, 30(6): 2120-2138.
doi: 10.11834/jig.250168 URL |
| 王善敏, 刘成广, 陈胜宇, 等. 面向表情、语音和语言的多模态情感识别综述[J]. 中国图象图形学报, 2025, 30(6):2120-2138. | |
| [5] | SCHEWSKI L, DOSS M M, BELDI G, et al. Measuring Negative Emotions and Stress through Acoustic Correlates in Speech: A Systematic Review[EB/OL]. (2025-07-24)[2025-07-30]. https://doi.org/10.1371/journal.pone.0328833. |
| [6] |
FUKUSHIMA K. Neocognitron: A Self-Organizing Neural Network Model for a Mechanism of Pattern Recognition Unaffected by Shift in Position[J]. Biological Cybernetics, 1980, 36(4): 193-202.
doi: 10.1007/BF00344251 URL |
| [7] |
SCHUSTER M, PALIWAL K K. Bidirectional Recurrent Neural Networks[J]. IEEE Transactions on Signal Processing, 1997, 45(11): 2673-2681.
doi: 10.1109/78.650093 URL |
| [8] | LIU Ying, YUAN Li, ZU Shuodi, et al. Emotion Recognition Based on Multimodal Physiological Data: A Survey[J]. Journal of University of Electronic Science and Technology of China, 2024, 53(5): 720-731. |
| 刘颖, 袁莉, 祖铄迪, 等. 基于多模态生理数据的情感识别综述[J]. 电子科技大学学报, 2024, 53(5):720-731. | |
| [9] | LI Haifeng, CHEN Jing, MA Lin, et al. Dimensional Speech Emotion Recognition Review[J]. Journal of Software, 2020, 31(8): 2465-2491. |
| 李海峰, 陈婧, 马琳, 等. 维度语音情感识别研究综述[J]. 软件学报, 2020, 31(8):2465-2491. | |
| [10] | JORDAN E, TERRISSE R, LUCARINI V, et al. Speech Emotion Recognition in Mental Health: Systematic Review of Voice-Based Applications[EB/OL]. [2025-07-30]. https://pubmed.ncbi.nlm.nih.gov/41027025/. |
| [11] | ZHANG Mengxing. On Application and Legitimacy Review of Demeanor Evidence in the Construction of Intelligent Judicature[J]. Journal of Henan University of Economics and Law, 2023, 38(2): 114-123. |
| 张梦星. 智慧司法建设中情态证据的应用与合法性审查研究[J]. 河南财经政法大学学报, 2023, 38(2):114-123. | |
| [12] | SHI Pengcheng, WANG Hailong, LIU Lin. Emotion Recognition from Physiological Signals: A Review of Cross-Domain Transfer and Multimodal Fusion[EB/OL]. [2025-08-10]. http://fcst.ceaj.org/CN/10.3778/j.issn.1673-9418.2505043. |
| 史鹏程, 王海龙, 柳林. 生理信号情感识别:跨域迁移与多模态融合综述[EB/OL]. [2025-08-10]. http://fcst.ceaj.org/CN/10.3778/j.issn.1673-9418.2505043. | |
| [13] | YANG Jie, LIANG Changwei, WU Xiyu, et al. A Study on Speech Emotion Evaluation Scale Based on Physiology and Acoustics Features[J]. Essays on Linguistics, 2023(4): 3-19. |
| 杨洁, 梁昌维, 吴西愉, 等. 基于生理声学的语音情感评价尺度研究[J]. 语言学论丛, 2023(4):3-19. | |
| [14] | ZHANG Yusha, JIANG Shengyi. Speech Emotion Data Mining Classification and Recognition Method Based on MFCC Feature Extraction and Improved SVM[J]. Computer Applications and Software, 2020, 37(8): 160-165. |
| 张钰莎, 蒋盛益. 基于MFCC特征提取和改进SVM的语音情感数据挖掘分类识别方法研究[J]. 计算机应用与软件, 2020, 37(8):160-165. | |
| [15] | KWAK I Y, KWAG S, LEE J, et al. ResMax: Detecting Voice Spoofing Attacks with Residual Network and Max Feature Map[C]// IEEE. 2020 25th International Conference on Pattern Recognition (ICPR). New York: IEEE, 2021: 4837-4844. |
| [16] | JIANG Nan, PANG Yongheng, GAO Shuang. Speech Recognition Based on Attention Mechanism and Spectrogram Feature Extraction[J]. Journal of Jilin University (Science Edition), 2024, 62(2): 320-330. |
| 姜囡, 庞永恒, 高爽. 基于注意力机制语谱图特征提取的语音识别[J]. 吉林大学学报(理学版), 2024, 62(2):320-330. | |
| [17] | CHEN Qiaohong, YU Zeyuan, SUN Qi, et al. Speech Emotion Recognition Based on Attention Mechanism and LSTM[J]. Journal of Zhejiang Sci-Tech University (Natural Sciences), 2020, 48(6): 815-822. |
| 陈巧红, 于泽源, 孙麒, 等. 基于注意力机制与LSTM的语音情绪识别[J]. 浙江理工大学学报(自然科学版), 2020, 48(6):815-822. | |
| [18] | SANG D V, CUONG L T B. Improving CRNN with EfficientNet-Like Feature Extractor and Multi-Head Attention for Text Recognition[C]// ACM. The Tenth International Symposium on Information and Communication Technology-SoICT 2019. New York: ACM, 2019: 285-290. |
| [19] | HAN Yongming, ZHANG Mingxing, GENG Zhiqiang. Heart Rate Variability Features for Emotion Dimensional Prediction by Using a Principal Component Analysis-Support Vector Regression (PCA-SVR) Model[J]. Journal of Beijing University of Chemical Technology (Natural Science Edition), 2021, 48(5): 102-110. |
|
韩永明, 张明星, 耿志强. 基于心率变异性特征和PCA-SVR的PAD维度情感预测分析[J]. 北京化工大学学报(自然科学版), 2021, 48(5):102-110.
doi: 10.13543/j.bhxbzr.2021.05.013 |
|
| [20] | XIAO Xi, XU Chen. Speech Feature Fusion Algorithm Based on Acoustic State Likelihood and Supervised State Modelling[J]. Journal of Tsinghua University (Science and Technology), 2019, 59(6): 476-481. |
|
肖熙, 徐晨. 基于声学状态似然值得分模型及监督状态模型的语音识别特征融合算法[J]. 清华大学学报(自然科学版), 2019, 59(6):476-481.
doi: 10.16511/j.cnki.qhdxxb.2019.21.011 |
|
| [21] | YUE Liya, HU Pei, ZHU Jiulong. Advanced Differential Evolution for Gender-Aware English Speech Emotion Recognition[EB/OL]. (2024-07-31)[2025-07-10]. https://pmc.ncbi.nlm.nih.gov/articles/PMC11291894/. |
| [22] | JIA Junwei, JIANG Nan. Correlation Analysis of Multimodal Lie Features Based on Speech and Physiological Signals[J]. Electro-Optic Technology Application, 2020, 35(4): 26-30. |
| 贾俊玮, 姜囡. 基于语音和生理信号的多模态谎言特征相关性分析[J]. 光电技术应用, 2020, 35(4):26-30. | |
| [23] |
ABDUL Z K, AL-TALABANI A K. Mel Frequency Cepstral Coefficient and Its Applications: A Review[J]. IEEE Access, 2022, 10: 122136-122158.
doi: 10.1109/ACCESS.2022.3223444 URL |
| [24] |
MATEO C, TALAVERA J A. Bridging the Gap between the Short-Time Fourier Transform (STFT), Wavelets, the Constant-Q Transform and Multi-Resolution STFT[J]. Signal, Image and Video Processing, 2020, 14(8): 1535-1543.
doi: 10.1007/s11760-020-01701-8 |
| [25] | KUMAR Y S, KUMAR R, KUMAR S. 2D-Discrete Cosine Transform Based Dynamically Controllable Image Compression Technique[C]// IEEE. 2020 IEEE 22nd Electronics Packaging Technology Conference (EPTC). New York: IEEE, 2020: 203-206. |
| [26] | ZHAI Yuting, WANG Xin, BAI Lei. Dynamic Task Scheduling for Wireless Sensor Networks Based on an Improved Bat Algorithm[J]. Chinese Journal of Sensors and Actuators, 2024, 37(4): 704-708. |
| 翟羽婷, 王欣, 白蕾. 基于改进蝙蝠算法的无线传感器网络动态任务调度[J]. 传感技术学报, 2024, 37(4):704-708. | |
| [27] |
CHEN Xi, ITA L S, LEVINE M, et al. Acoustic-Prosodic and Lexical Cues to Deception and Trust: Deciphering How People Detect Lies[J]. Transactions of the Association for Computational Linguistics, 2020, 8: 199-214.
doi: 10.1162/tacl_a_00311 URL |
| [28] | ZHAO Li, LIANG Ruiyu, XIE Yue, et al. Progress and Outlook of Lie Detection Technique in Speech[J]. Journal of Data Acquisition and Processing, 2017, 32(2): 246-257. |
| 赵力, 梁瑞宇, 谢跃, 等. 语音测谎技术研究现状与展望[J]. 数据采集与处理, 2017, 32(2):246-257. | |
| [29] |
GENG Lili, NIU Baoning. Convolutional Neural Network Pruning Based on Channel Similarity Entropy[J]. Computer Engineering, 2024, 50(7): 133-143.
doi: 10.19678/j.issn.1000-3428.0068284 |
|
耿丽丽, 牛保宁. 基于通道相似度熵的卷积神经网络裁剪[J]. 计算机工程, 2024, 50(7):133-143.
doi: 10.19678/j.issn.1000-3428.0068284 |
|
| [30] | WANG Di, XU Yong, LI Hongliang, et al. Kernel Normalization[J]. Computer Technology and Development, 2019, 29(12): 27-32. |
| 王迪, 许勇, 李宏亮, 等. 卷积核归一化[J]. 计算机技术与发展, 2019, 29(12):27-32. | |
| [31] | SHATRAVIN V, SHASHEV D, SHIDLOVSKIY S. Implementation of the SoftMax Activation for Reconfigurable Neural Network Hardware Accelerators[EB/OL]. (2023-11-28)[2025-07-10]. https://www.mdpi.com/2076-3417/13/23/12784. |
| [32] | MAO Anqi, MOHRI M, ZHONG Yutao. Cross-Entropy Loss Functions: Theoretical Analysis and Applications[C]// ACM. The 40th International Conference on Machine Learning (ICML’23). New York: ACM, 2023: 23803-23828. |
| [33] | GUPTA M V, VAIKOLE S, OZA A D, et al. Audio-Visual Stress Classification Using Cascaded RNN-LSTM Networks[EB/OL]. (2022-09-27)[2025-07-10]. https://pmc.ncbi.nlm.nih.gov/articles/PMC9598122/. |
| [34] | QIN Libo, LI Zhouyang, CHE Wanxiang, et al. Co-GAT: A Co-Interactive Graph Attention Network for Joint Dialog Act Recognition and Sentiment Classification[C]// AAAI. The AAAI Conference on Artificial Intelligence. Palo Alto: AAAI, 2021: 13709-13717. |
| [35] | SUN Chao, ZHANG Min, WU Ruijuan, et al. A Convolutional Recurrent Neural Network with Attention Framework for Speech Separation in Monaural Recordings[EB/OL]. (2021-01-14)[2025-07-30]. https://pmc.ncbi.nlm.nih.gov/articles/PMC7809293/. |
| [36] | HU Jie, SHEN Li, SUN Gang. Squeeze-and-Excitation Networks[C]// IEEE. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2018: 7132-7141. |
| [37] | VASWANI A, SHAZEER N, PARMAR N, et al. Attention Is All You Need[EB/OL]. (2023-08-02)[2025-07-30]. https://doi.org/10.48550/arXiv.1706.03762. |
| [1] | XU Yanwei, TU Min, ZHANG Liang. A Review on the Authenticity Verification of Deepfake Speech [J]. Netinfo Security, 2026, 26(3): 367-377. |
| [2] | XU Ruzhi, WU Xiaoxin, LYU Changran. Research on Transformer-Based Super-Resolution Network Adversarial Sample Defense Method [J]. Netinfo Security, 2025, 25(9): 1367-1376. |
| [3] | CHEN Yonghao, CAI Manchun, ZHANG Yiwen, PENG Shufan, YAO Lifeng, ZHU Yi. A Multi-Scale and Multi-Level Feature Fusion Approach for Deepfake Face Detection [J]. Netinfo Security, 2025, 25(9): 1456-1464. |
| [4] | WANG Xinmeng, CHEN Junbao, YANG Yitao, LI Wenjin, GU Dujuan. Bayesian Optimized DAE-MLP Malicious Traffic Identification Model [J]. Netinfo Security, 2025, 25(9): 1465-1472. |
| [5] | JIN Zhigang, LI Zimeng, CHEN Xuyang, LIU Zepei. Review of Network Intrusion Detection System for Unbalanced Data [J]. Netinfo Security, 2025, 25(8): 1240-1253. |
| [6] | WANG Gang, GAO Yunpeng, YANG Songru, SUN Litao, LIU Naiwei. A Survey on Deep Learning-Based Encrypted Malicious Traffic Detection Methods [J]. Netinfo Security, 2025, 25(8): 1276-1301. |
| [7] | ZHANG Xinglan, TAO Kejin. Universal Perturbations Generation Method Based on High-Level Features and Important Channels [J]. Netinfo Security, 2025, 25(5): 767-777. |
| [8] | JIN Zengwang, JIANG Lingyang, DING Junyi, ZHANG Huixiang, ZHAO Bo, FANG Pengfei. A Review of Research on Industrial Control System Security [J]. Netinfo Security, 2025, 25(3): 341-363. |
| [9] | CHEN Hongsong, LIU Xinrui, TAO Zimei, WANG Zhiheng. A Survey of Anomaly Detection Model for Time Series Data Based on Deep Learning [J]. Netinfo Security, 2025, 25(3): 364-391. |
| [10] | LI Hailong, CUI Zhian, SHEN Xieyang. Overview of Anomaly Analysis and Detection Methods for Network Traffic [J]. Netinfo Security, 2025, 25(2): 194-214. |
| [11] | WU Haoying, CHEN Jie, LIU Jun. Improved Neural Network Differential Distinguisher of Simon32/64 and Simeck32/64 [J]. Netinfo Security, 2025, 25(2): 249-259. |
| [12] | JIN Di, REN Hao, TANG Rui, CHEN Xingshu, WANG Haizhou. Research on Offensive Language Detection in Social Networks Based on Emotion-Assisted Multi-Task Learning [J]. Netinfo Security, 2025, 25(2): 281-294. |
| [13] | PANG Shuchao, LI Zhengxiao, QU Junyi, MA Ruhao, CHEN Hechang, DU Anan. Detecting Poisoned Samples for Untargeted Backdoor Attacks [J]. Netinfo Security, 2025, 25(12): 1878-1888. |
| [14] | LI Guyue, ZHANG Zihao, MAO Chenghai, LYU Rui. A Cumulant-Deep Learning Fusion Model for Underwater Modulation Recognition [J]. Netinfo Security, 2025, 25(10): 1554-1569. |
| [15] | LIANG Fengmei, PAN Zhenghao, LIU Ajian. A Joint Detection Method for Physical and Digital Face Attacks Based on Common Forgery Clue Awareness [J]. Netinfo Security, 2025, 25(10): 1604-1614. |
| Viewed | ||||||
|
Full text |
|
|||||
|
Abstract |
|
|||||