信息网络安全 ›› 2025, Vol. 25 ›› Issue (2): 281-294.doi: 10.3969/j.issn.1671-1122.2025.02.009

• 理论研究 • 上一篇    下一篇

基于情感辅助多任务学习的社交网络攻击性言论检测技术研究

金地1,2,3, 任昊1,2,3, 唐瑞1,2,3, 陈兴蜀1,2,3, 王海舟1,2,3()   

  1. 1.四川大学网络空间安全学院,成都 610065
    2.数据安全防护与智能治理教育部重点实验室,成都 610065
    3.四川大学网络空间安全研究院,成都 610065
  • 收稿日期:2024-12-10 出版日期:2025-02-10 发布日期:2025-03-07
  • 通讯作者: 王海舟 E-mail:whzh.nc@scu.edu.cn
  • 作者简介:金地(2001—),女,河南,硕士研究生,主要研究方向为网络舆情分析|任昊(1991—),男,安徽,副研究员,博士,主要研究方向为数据安全和隐私保护、AI安全与治理、应用密码学|唐瑞(1990—),男,四川,助理研究员,博士,主要研究方向为人工智能安全、社交网络分析|陈兴蜀(1968—),女,贵州,教授,博士,主要研究方向为云计算安全、数据安全、威胁检测、开源情报和人工智能安全|王海舟(1986—),男,四川,副教授,博士,CCF会员,主要研究方向为网络舆情分析、开源情报分析
  • 基金资助:
    国家重点研发计划(2022YFC3303101);四川省科技厅重点研发计划(2023YFG0145)

Research on Offensive Language Detection in Social Networks Based on Emotion-Assisted Multi-Task Learning

JIN Di1,2,3, REN Hao1,2,3, TANG Rui1,2,3, CHEN Xingshu1,2,3, WANG Haizhou1,2,3()   

  1. 1. School of Cyber Science and Engineering, Sichuan University, Chengdu 610065, China
    2. Key Laboratory of Data Protection and Intelligent Management, Ministry of Education, Chengdu 610065, China
    3. China Cyber Science Research Institute, Sichuan University, Chengdu 610065, China
  • Received:2024-12-10 Online:2025-02-10 Published:2025-03-07

摘要:

随着互联网和移动互联网技术的快速发展,越来越多的人们热衷于在社交网络上获取信息,表达自己的立场和观点。但近年来,社交网络上充斥着越来越多的攻击性言论及其他不良言论,网络暴力大量滋生。目前,攻击性言论检测研究大多集中在英文领域,面向中文攻击性言论检测的相关研究较少。针对该问题,首先,文章采集了新浪微博平台中大量的推文数据,并依据制定的标注规则对相关数据进行标注,构建了中文攻击性言论数据集;然后,文章提取了包括情感特征、内容特征、传播特征3个类别在内的统计特征;最后,文章构建了基于多任务学习的攻击性言论检测模型,引入辅助任务情感分析,利用两个任务之间的高度相关性提升模型的检测效果。实验结果表明,文章提出的检测模型对攻击性言论的检测效果优于其他常用检测方法。该研究工作为后续的面向社交网络的攻击性言论检测提供了方法和思路。

关键词: 攻击性言论, 多任务学习, 社交网络, 深度学习

Abstract:

With the rapid development of the Internet and mobile Internet technologies, more and more people are eager to obtain information and express their views and opinions on social networks. However, in recent years, social networks have been flooded with an increasing amount of offensive language and other undesirable comments, leading to the proliferation of online violence. Currently, research on offensive language detection is mostly concentrated in the English language field, with few studies focused on offensive language detection in Chinese. To address this issue, this thesis collected a large amount of tweet data from the Sina Weibo platform and annotated the data according to established rules to construct a Chinese offensive language dataset. Then, statistical features, including sentiment features, content features, and communication features, were extracted. Finally, a multi-task learning-based offensive language detection model was constructed. The auxiliary task of sentiment analysis was introduced to improve the detection performance of the model by leveraging the high correlation between the two tasks. Experimental results show that the model proposed in this thesis outperforms other commonly used detection methods for offensive language detection. The research provides methods and ideas for future offensive language detection on social networks.

Key words: offensive language, multi-task learning, social networks, deep learning

中图分类号: