信息网络安全 ›› 2025, Vol. 25 ›› Issue (7): 1163-1171.doi: 10.3969/j.issn.1671-1122.2025.07.014

• 技术研究 • 上一篇    下一篇

基于大语言模型的气象数据语义智能识别算法研究

酆薇, 肖文名(), 田征, 梁中军, 姜滨   

  1. 国家气象信息中心,北京 100081
  • 收稿日期:2025-03-03 出版日期:2025-07-10 发布日期:2025-08-07
  • 通讯作者: 肖文名 E-mail:xiaowm@cma.gov.cn
  • 作者简介:酆薇(1970—),女,湖南,高级工程师,硕士,主要研究方向为网络安全、数据安全、人工智能|肖文名(1967—),男,江西,正高级工程师,硕士,主要研究方向为人工智能、气象大数据、气象数字基础设施、气象信息安全|田征(1984—),女,北京,高级工程师,硕士,主要研究方向为网络安全、安全运营、人工智能|梁中军(1983—),男,新疆,正高级工程师,博士,主要研究方向为气象大数据、数据安全|姜滨(1971—),男,北京,高级工程师,本科,主要研究方向为通信安全、气象大数据
  • 基金资助:
    中国气象局创新发展专项(CXFZ2025J080)

Research on Semantic Intelligent Recognition Algorithms for Meteorological Data Based on Large Language Models

FENG Wei, XIAO Wenming(), TIAN Zheng, LIANG Zhongjun, JIANG Bin   

  1. National Meteorological Information Centre, Beijing 100081, China
  • Received:2025-03-03 Online:2025-07-10 Published:2025-08-07
  • Contact: XIAO Wenming E-mail:xiaowm@cma.gov.cn

摘要:

气象数据作为典型的时空大数据,在赋能经济社会发展的同时面临严峻的数据安全挑战。针对当前气象数据安全监测中存在的语义理解不足、数据特征识别准确率低和泛化能力差等问题,文章提出一种基于大语言模型的气象数据语义智能识别算法。该算法通过构建高质量的训练数据集和领域知识库,融合检索增强生成(RAG)与低秩适应(LoRA)轻量化模型技术,应用思维链(CoT)进行微调,选择近端策略优化(PPO)算法作为强化学习算法,持续优化气象数据识别大模型的识别性能。实验结果表明,文章所提算法能有效提高气象数据特征识别的准确率。

关键词: 大语言模型, 数据安全, 语义智能识别, RAG, CoT

Abstract:

Meteorological data, as a typical spatiotemporal big data, faces severe data security challenges while empowering economic and social development. Addressing current issues in meteorological data security monitoring, such as insufficient semantic understanding, low accuracy in data feature recognition, and poor generalization capability, this study proposed an intelligent semantic recognition framework for meteorological data based on large language models. By constructing high-quality training datasets and domain knowledge bases, integrating Retrieval-Augmented Generation (RAG) with LoRA lightweight model technology, applying Chain-of-Thought (CoT) fine-tuning, and selecting PPO as the reinforcement learning algorithms to continuously optimize the recognition performance of the meteorological data security model. Experimental results demonstrate that this method effectively improves the accuracy of meteorological data feature recognition.

Key words: large language models, data security, semantic intelligent recognition, RAG, CoT

中图分类号: