Research on Graph Neural Network Text Matching Model for Derivative Classification

doi:10.3969/j.issn.1671-1122.2026.04.008

Abstract

Abstract:

Derivative classification is a method that judge the degree of secrets according to the similarity of text semantics. It is generally abstracted as a text matching task. Due to the fact that texts to be classified have the characteristics of long length, sparse secret key-point features and complex semantics structure, the traditional text matching method is difficult to accurately model and capture the features of secret key-point that contains the semantics of confidential matters in the text. Therefore, a targeted graph neural network text matching model for derivative classification was proposed, which transformed text matching into a graph matching problem. Firstly, a secret key-point feature extractor was designed to model the text as a matching graph representing the features of secret key-point, so as to solve the problem of weak representation of secret key-point features of the text to be classified. Secondly, a hierarchized graph neutral network was designed to perform multiple rounds of updating and aggregation operations on the encoded matching graph, so as to enhance the extraction of similarity features between the texts to be classified. Finally, the classification result was predicted according to the edges of the matching graph. Experimental results indicate that the performance of the model in this paper is significantly improved on the dataset that simulating derivative classification. The accuracy of the classification is increased by more than 4.77% and the F1 value is increased by more than 3.83%.

Key words: derivative classification, graph neural network, secret key-point feature extractor, long text matching, matching graph

CLC Number:

TP309

YU Miao, GUO Songhui, SONG Shuaichao, YANG Yeming. Research on Graph Neural Network Text Matching Model for Derivative Classification[J]. Netinfo Security, 2026, 26(4): 605-614.

Figures/Tables 10

References 27

[1]	AI Si. Accurately Grasp the Original and Derived Classification[J]. Confidentiality Work, 2019(6): 43-44.
	艾思. 准确把握原始定密和派生定密[J]. 保密工作, 2019(6): 43-44.
[2]	ZHAI Peipei. Design and Implementation of the Digital Classification Management System[D]. Hangzhou: Hangzhou Dianzi University, 2015.
	翟佩佩. 数字化定密管理系统的设计与实现[D]. 杭州: 杭州电子科技大学, 2015.
[3]	XIANG Xuefeng. Research on the Computer-Aided Secret-Level Classification System Based on Keywords Relevancy[D]. Beijing: Beijing Jiaotong University, 2017.
	项雪峰. 基于关键词相关度的计算机辅助定密技术研究[D]. 北京: 北京交通大学, 2017.
[4]	LI Chengeng, XIE Sijiang. Research on Computer-Aided Secret-Level Classification Based on Improved Textrank Algorithm[J]. Computer Applications and Software, 2022, 39(3): 336-340.
	李晨庚, 谢四江. 基于改进的TextRank算法的计算机辅助定密研究[J]. 计算机应用与软件, 2022, 39(3): 336-340.
[5]	YANG Weiqi. Research and Implementation of the Auxiliary Secret-Level Setting System Based on Deep Learning[D]. Beijing: Beijing Jiaotong University, 2020.
	杨玮祺. 基于深度学习的辅助定密系统研究与实现[D]. 北京: 北京交通大学, 2020.
[6]	YANG Weiqi, DU Ye. Text Classification Network Based on Pre-Trained Model[J]. Modern Computer, 2020(12): 52-57.
	杨玮祺, 杜晔. 基于预训练模型的文本分类网络TextCGA[J]. 现代计算机, 2020(12): 52-57.
[7]	WANG Xinyun. Research on Key Technologies of Electronic Document Secret Point Extraction and Auxiliary Secret Level Determination[D]. Beijing: Beijing Jiaotong University, 2022.
	王心蕴. 电子文档密点提取与辅助定密关键技术研究[D]. 北京: 北京交通大学, 2022.
[8]	LU Gaojie, LIU Qing, DAI Dai, et al. Unified Structure Generation for Universal Information Extraction[EB/OL].(2022-03-23)[2025-01-12]. https://arxiv.org/abs/2203.12277.
[9]	HUANG Posen, HE Xiaodong, GAO Jianfeng, et al. Learning Deep Structured Semantic Models for Web Search Using Clickthrough Data[C]// ACM. The 22nd ACM International Conference on Information & Knowledge Management. New York: ACM, 2013: 2333-2338.
[10]	SHEN Yelong, HE Xiaodong, GAO Jianfeng, et al. A Latent Semantic Model with Convolutional-Pooling Structure for Information Retrieval[C]// ACM. The 23rd ACM International Conference on Conference on Information and Knowledge Management. New York: ACM, 2014: 101-110.
[11]	HU Baotian, LU Zhengdong, LI Hang, et al. Convolutional Neural Network Architectures for Matching Natural Language Sentence[C]// MIT. The 27th International Conference on Neural Information Processing Systems. Cambridge: MIT, 2014: 2042-2050.
[12]	PALANGI H, DENG Li, SHEN Yelong, et al. Deep Sentence Embedding Using Long Short-Term Memory Networks: Analysis and Application to Information Retrieval[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2016, 24(4): 694-707.
[13]	WAN Shengxian, LAN Yanyan, GUO Jiafeng, et al. A Deep Architecture for Semantic Matching with Multiple Positional Sentence Representations[C]// AAAI. The AAAI Conference on Artificial Intelligence. Menlo Park: AAAI, 2016: 2835-2841.
[14]	PANG Liang, LAN Yanyan, GUO Jiafeng, et al. Text Matching as Image Recognition[C]// AAAI. The AAAI Conference on Artificial Intelligence. Menlo Park: AAAI, 2016: 2793-2799.
[15]	LYU Lebin, LIU Qun, PENG Lu, et al. Text Matching Fusion Model Combining Multi-Granularity Information[J]. Computer Science, 2021, 48(6): 196-201.
	吕乐宾, 刘群, 彭露, 等. 结合多粒度信息的文本匹配融合模型[J]. 计算机科学, 2021, 48(6): 196-201.
[16]	CHEN Qian, ZHU Xiaodan, LING Zhenhua, et al. Enhanced LSTM for Natural Language Inference[C]// ACL. The 55th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2017: 1657-1668.
[17]	MENG Jinxu, SHAN Hongtao, WAN Junjie, et al. BSLA: Improved Text Similarity Model for Siamese-LSTM[J]. Computer Engineering and Applications, 2022, 58(23): 178-185.
	孟金旭, 单鸿涛, 万俊杰, 等. BSLA: 改进Siamese-LSTM的文本相似模型[J]. 计算机工程与应用, 2022, 58(23): 178-185.
[18]	DAI Xiang, SUN Haichun, NIU Shuo, et al. Research on Chinese Question Answering Matching Based on Mutual Attention Mechanism and Bert[J]. Netinfo Security, 2021, 21(12): 102-108.
	代翔, 孙海春, 牛硕, 等. 融合互注意力机制与BERT的中文问答匹配技术研究[J]. 信息网络安全, 2021, 21(12): 102-108.
[19]	DEVLIN J, CHANG Mingwei, LEE K, et al. BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding[C]//ACL. The 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies. Stroudsburg: ACL, 2019: 4171-4186.
[20]	CAI Hua, HU Jingxi, MA Ren, et al. Matching Long-Form Document with Topic Extraction and Aggregation[C]// ACM. The 2022 5th International Conference on Algorithms, Computing and Artificial Intelligence. New York: ACM, 2023: 1-6.
[21]	PANG Liang, LAN Yanyan, CHENG Xueqi. Match-Ignition: Plugging PageRank into Transformer for Long-Form Text Matching[C]// ACM. The 30th ACM International Conference on Information & Knowledge Management. New York: ACM, 2021: 1396-1405.
[22]	WANG Jiarui, PENG Cheng, FAN Min. TP-TM: Two-Phase Text Matching Model for Long-Form Texts[J]. Journal of Computer Applications, 2023, 43: 33-38.
	王佳睿, 彭程, 范敏. 面向长文本的两阶段文本匹配模型TP-TM[J]. 计算机应用, 2023, 43: 33-38.
[23]	DING Na, LIU Peng, SHAO Huipeng, et al. Bi-Attention Text-Keyword Matching for Law Recommendation[J]. Acta Scientiarum Naturalium Universitatis Pekinensis, 2024, 60(1): 79-88.
	丁娜, 刘鹏, 邵惠鹏, 等. 双向注意力文本关键词匹配法条推荐[J]. 北京大学学报(自然科学版), 2024, 60(1): 79-88.
[24]	YU Chuanming, JIANG Yifan. Research on Legal Text Matching Based on Pre-Training Model[J]. Scientific Information Research, 2023, 5(3): 13-25.
	余传明, 江一帆. 基于预训练模型的法律文本类案匹配研究[J]. 科技情报研究, 2023, 5(3): 13-25.
[25]	LIU Bang, NIU Di, WEI Haojie, et al. Matching Article Pairs with Graphical Decomposition and Convolutions[C]// ACL. The 57th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2019: 6284-6294.
[26]	CHEN Yibo, ZHANG Zuping, HUANG Xin, et al. Matching Document Pairs Using Multi-Feature Semantic Fusion Based on Knowledge Graph[J]. Journal of Central South University (Science and Technology), 2023, 54(8): 3122-3131.
	陈毅波, 张祖平, 黄鑫, 等. 基于知识图谱使用多特征语义融合的文档对匹配[J]. 中南大学学报(自然科学版), 2023, 54(8): 3122-3131.
[27]	HUANG Zhenye, MO Ganqing, YU Keman. General Text Matching Based on Topic Model[J]. Computer Applications and Software, 2024(5): 310-318.
	黄振业, 莫淦清, 余可曼. 基于主题模型的通用文本匹配方法[J]. 计算机应用与软件, 2024(5): 310-318.

数据集	样本数/条
数据集	总数	正样本	负样本	训练集	验证集	测试集
CNSE	29063	12865	16198	17438	5813	5812
CNSS	33503	16887	16616	20102	6701	6700
SDC	2000	892	1198	1200	400	400

训练参数	参数值
学习率	0.00001
文本最大切分长度/字符	510
批处理大小	16
训练轮数/轮	100

训练参数	参数值
文本最大输入长度/字符	500
模型堆叠层数/层	12
输出向量维度	768

训练参数	参数值
学习率	0.0001
神经元数量/个	96
隐藏层数/层	2

模型	数据集
	CNSE		CNSS		SDC
	Acc	F1值	Acc	F1值	Acc	F1值
SimNet	71.05%	69.26%	70.78%	74.50%	73.90%	75.16%
C-DSSM	60.17%	48.57%	52.96%	56.75%	58.91%	50.74%
MatchPyramid	66.36%	54.01%	62.52%	62.58%	61.58%	59.06%
BERT	81.30%	79.20%	86.64%	87.08%	86.25%	88.03%
CIG	84.64%	82.75%	89.77%	90.07%	88.98%	91.74%
Match-Ignition	86.32%	84.55%	91.28%	91.39%	91.32%	92.01%
本文模型	86.04%	84.66%	91.72%	89.18%	96.09%	95.84%