信息网络安全 ›› 2023, Vol. 23 ›› Issue (3): 96-102.doi: 10.3969/j.issn.1671-1122.2023.03.010

• 理论研究 • 上一篇    下一篇

基于深度学习的教育数据分类方法

谭柳燕1,2, 阮树骅1,2(), 杨敏1,2, 陈兴蜀1,2   

  1. 1.四川大学网络空间安全学院,成都 610065
    2.四川大学网络空间安全研究院,成都 610065
  • 收稿日期:2022-10-19 出版日期:2023-03-10 发布日期:2023-03-14
  • 通讯作者: 阮树骅 E-mail:ruanshuhua@scu.edu.cn
  • 作者简介:谭柳燕(1998—),女,四川,硕士研究生,主要研究方向为数据分类分级|阮树骅(1966—),女,浙江,副教授,硕士,主要研究方向为云计算与大数据安全、区块链安全|杨敏(1994—),女,四川,博士研究生,主要研究方向为数据安全和数据治理|陈兴蜀(1968—),女,贵州,教授,博士,主要研究方向为可信计算、云计算与大数据安全
  • 基金资助:
    国家自然科学基金(U19A2081);四川大学工科特色团队项目(2020SCUNG129)

Educational Data Classification Based on Deep Learning

TAN Liuyan1,2, RUAN Shuhua1,2(), YANG Min1,2, CHEN Xingshu1,2   

  1. 1. School of Cyber Science and Engineering, Sichuan University, Chengdu 610065, China
    2. Cyber Science Research Institute, Sichuan University, Chengdu 610065, China
  • Received:2022-10-19 Online:2023-03-10 Published:2023-03-14
  • Contact: RUAN Shuhua E-mail:ruanshuhua@scu.edu.cn

摘要:

大数据技术的不断发展和数据泄露事件的频繁发生,催生了保护教育行业数据安全的迫切需求。教育行业的个人教育和成长的精准数据具有极高的价值,因此对教育数据实施保护已迫在眉睫。针对这一问题,文章提出了基于深度学习的教育数据分类方法。首先,根据数据管理主体的不同,定义个人数据、机构数据和业务数据3个类别;其次,提出一种基于字词向量结合的Bi-LSTM神经网络模型,实现教育数据分类的自动化、智能化;最后,通过在两所高校数据集上的实验对文章提出的分类方案进行验证。实验表明,相比于基线模型,文章所提方法在实验数据集上训练得到的模型分类准确率可达95%,且在各指标上均达到最优。

关键词: 教育数据, 数据分类, 深度学习, 字词向量, Bi-LSTM

Abstract:

The continuous development of big data technology and the frequent occurrence of data leakage incidents have created an urgent need to protect data security in the education industry. In the education industry, it contains precise information on personal education and growth, which is of great value. Therefore, protecting educational data security has become an urgent need. To solve this problem, an educational data classification method based on deep learning is proposed in this paper. First, according to the role of data subject, three categories of personal data, organizational data, and business data were defined. Then, a Bi-LSTM neural network model combining based on word mixed embedding was proposed and implemented for automation and intellectualization of educational data classification. Finally, this paper validated the proposed classification method through experiments on two universities’ datasets. The experiment results show that the accuracy of our model can reach 95%, and all performance metrics are optimal compared with baselines.

Key words: educational data, data classification, deep learning, char-word mixture word representation, Bi-LSTM

中图分类号: