Netinfo Security ›› 2025, Vol. 25 ›› Issue (10): 1477-1492.doi: 10.3969/j.issn.1671-1122.2025.10.001

    Next Articles

A Review of Safety Detection and Evaluation Technologies for Large Models

HU Bin1, HEI Yiming2, WU Tiejun3, ZHENG Kaifa4,5(), LIU Wenzhong6   

  1. 1. College of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China
    2. Artificial Intelligence Research Institute, China Academy of Information and Communications Technology, Beijing 100083, China
    3. School of Computer Science and Engineering, Southeast University, Nanjing 210096, China
    4. College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China
    5. Beijing Shenzhou NSFOCUS Technology Co., Ltd., Beijing 100089, China
    6. School of Computer Science, Beijing Institute of Technology, Beijing 100081, China
  • Received:2025-06-15 Online:2025-10-10 Published:2025-11-07
  • Contact: ZHENG Kaifa E-mail:zhengkaifa@zju.edu.cn

Abstract:

With the rapid development of artificial intelligence technology, large language models (LLMs) have emerged in many fields such as scientific research, education, finance, and healthcare due to their powerful natural language processing capabilities. However, as LLMs are widely adopted, they bring a series of security issues: risks of bias and discrimination, potential generation of harmful content, threats to user privacy leakage, risks of misleading information dissemination, and vulnerabilities to malicious adversarial attacks. These risks may harm users and even impact social stability and ethical order, necessitating comprehensive security testing and evaluation of LLMs. This article primarily focused on current research on LLM security assessment, categorizing common security risks and reviewing mainstream security evaluation techniques. It also introduced relevant assessment methods, metrics, commonly used datasets, and tools, while summarizing key security evaluation standards and guidelines developed globally. Additionally, the paper discussed the technical concepts, principles, and implementation mechanisms of safety alignment, along with its evaluation framework. Finally, by analyzing the challenges faced in current LLM security assessment, it outlined future technological trends and research directions, aiming to provide guidance for academic and industrial research and practices.

Key words: LLM, detection and evaluation, security risks, evaluation standards, adversarial attacks

CLC Number: