信息网络安全 ›› 2025, Vol. 25 ›› Issue (10): 1477-1492.doi: 10.3969/j.issn.1671-1122.2025.10.001

• 综述论文 •    下一篇

大模型安全检测评估技术综述

胡斌1, 黑一鸣2, 吴铁军3, 郑开发4,5(), 刘文忠6   

  1. 1.华中科技大学计算机科学与技术学院,武汉 430074
    2.中国信息通信研究院人工智能研究所,北京 100083
    3.东南大学计算机科学与工程学院,南京 210096
    4.浙江大学计算机科学与技术学院,杭州 310027
    5.北京神州绿盟科技有限公司,北京 100089
    6.北京理工大学计算机学院,北京 100081
  • 收稿日期:2025-06-15 出版日期:2025-10-10 发布日期:2025-11-07
  • 通讯作者: 郑开发 E-mail:zhengkaifa@zju.edu.cn
  • 作者简介:胡斌(1979—),男,湖北,正高级工程师,博士,主要研究方向为大语言模型安全、网络安全风险治理、软件供应链安全|黑一鸣(1994—),男,山东,工程师,博士,主要研究方向为应用安全、内容安全和网络安全|吴铁军(1979—),男,湖北,高级工程师,博士,主要研究方向为网络空间安全、流量行为分析、恶意流量识别|郑开发(1989—),男,湖北,高级工程师,博士,主要研究方向为网络空间安全、数据安全、舆情分析|刘文忠(1984—),男,湖北,硕士研究生,主要研究方向为网络安全风险评估、网络安全等级保护测评、工业自动化安全评估
  • 基金资助:
    国家重点研发计划(2022YFB3104900);北京市高层次创新创业人才支持计划科技新星计划(20250484975);山东省自然科学基金(ZR2024MF084)

A Review of Safety Detection and Evaluation Technologies for Large Models

HU Bin1, HEI Yiming2, WU Tiejun3, ZHENG Kaifa4,5(), LIU Wenzhong6   

  1. 1. College of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China
    2. Artificial Intelligence Research Institute, China Academy of Information and Communications Technology, Beijing 100083, China
    3. School of Computer Science and Engineering, Southeast University, Nanjing 210096, China
    4. College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China
    5. Beijing Shenzhou NSFOCUS Technology Co., Ltd., Beijing 100089, China
    6. School of Computer Science, Beijing Institute of Technology, Beijing 100081, China
  • Received:2025-06-15 Online:2025-10-10 Published:2025-11-07
  • Contact: ZHENG Kaifa E-mail:zhengkaifa@zju.edu.cn

摘要:

随着人工智能技术快速发展,大语言模型(LLM)凭借其强大的自然语言处理能力已在科研、教育、金融、医疗等许多领域崭露头角。然而,在LLM被广泛使用的过程中,伴随一系列安全问题:如存在偏见、歧视的风险,存在生成有害内容的风险,存在泄露用户隐私信息的风险,存在信息误导性传播的风险以及容易受到恶意对抗攻击等安全风险。上述风险可能对用户造成损害,甚至影响社会稳定及伦理秩序,因此需要对LLM进行全面安全检测评估。文章针对目前关于LLM安全性检测评估的相关研究内容,归纳总结常见的安全风险类型,并对已提出的主流安全检测评估技术或方法进行综述,同时介绍相关评估方法、评估指标、常用数据集和工具,归纳国内外关于大模型安全评估出台的重要参考标准、规范。此外,文章还讨论了安全对齐的技术理念、原理、功能实现机制及安全对齐技术评价体系。最后,通过分析当前LLM安全检测评估面临的问题,展望未来技术发展趋势和研究方向,旨在为学术界、产业界的相关研究和实践提供参考。

关键词: 大语言模型, 检测评估, 安全风险, 评估标准, 对抗攻击

Abstract:

With the rapid development of artificial intelligence technology, large language models (LLMs) have emerged in many fields such as scientific research, education, finance, and healthcare due to their powerful natural language processing capabilities. However, as LLMs are widely adopted, they bring a series of security issues: risks of bias and discrimination, potential generation of harmful content, threats to user privacy leakage, risks of misleading information dissemination, and vulnerabilities to malicious adversarial attacks. These risks may harm users and even impact social stability and ethical order, necessitating comprehensive security testing and evaluation of LLMs. This article primarily focused on current research on LLM security assessment, categorizing common security risks and reviewing mainstream security evaluation techniques. It also introduced relevant assessment methods, metrics, commonly used datasets, and tools, while summarizing key security evaluation standards and guidelines developed globally. Additionally, the paper discussed the technical concepts, principles, and implementation mechanisms of safety alignment, along with its evaluation framework. Finally, by analyzing the challenges faced in current LLM security assessment, it outlined future technological trends and research directions, aiming to provide guidance for academic and industrial research and practices.

Key words: LLM, detection and evaluation, security risks, evaluation standards, adversarial attacks

中图分类号: