信息网络安全 ›› 2024, Vol. 24 ›› Issue (11): 1763-1772.doi: 10.3969/j.issn.1671-1122.2024.11.015

• 入选论文 • 上一篇    下一篇

融合实例和标记相关性增强消歧的偏多标记学习算法

高光亮(), 梁广俊, 洪磊, 高谷刚, 王群   

  1. 江苏警官学院计算机信息与网络安全系,南京 210031
  • 收稿日期:2024-08-06 出版日期:2024-11-10 发布日期:2024-11-21
  • 通讯作者: 高光亮 guangliang.gao@njust.edu.cn
  • 作者简介:高光亮(1989—),男,山东,讲师,博士,CCF会员,主要研究方向为社会网络安全、复杂网络分析|梁广俊(1982—),男,安徽,副教授,博士,CCF会员,主要研究方向为网络空间安全、数据建模|洪磊(1988—),男,江苏,副教授,博士,主要研究方向为数据挖掘|高谷刚(1975—),男,江苏,高级实验师,博士,主要研究方向为智慧警务、人工智能|王群(1971—),男,甘肃,教授,博士,CCF杰出会员,主要研究方向为网络空间安全
  • 基金资助:
    国家自然科学基金(72401110);江苏省高等学校自然科学研究面上项目(23KJB520009)

Disambiguation-Based Partial Multi-Label Learning Algorithm Augmented by Fusing Instance and Label Correlations

GAO Guangliang(), LIANG Guangjun, HONG Lei, GAO Gugang, WANG Qun   

  1. Department of Computer Information and Cyber Security, Jiangsu Police Institute, Nanjing 210031, China
  • Received:2024-08-06 Online:2024-11-10 Published:2024-11-21

摘要:

实例的候选标记集合包含真实标记和噪声标记。基于消歧的偏多标记学习旨在消除噪声标记,识别并预测与实例真正相关的标记。传统的消歧策略通常仅考虑标记间的相关性,忽略了实例间的相关性。为此,文章提出一种融合实例和标记相关性增强消歧的偏多标记学习算法,进而提升基于消歧的偏多标记学习性能。首先,依据真实标记矩阵的低秩性和噪声标记的稀疏性构建基础模型;然后,定义核函数以捕捉实例间的线性和非线性相关性,从而进一步消除噪声标记;最后,通过从特征空间到标记空间的线性映射,实现相关标记的预测。在合成和真实偏多标记数据集上的实验结果表明,与8种对比算法相比,文章所提算法在统计学上具有显著差异并且表现更好。

关键词: 偏多标记学习, 实例相关性, 标记相关性, 噪声标记消除

Abstract:

A set of candidate labels for each instance, which contains real and noisy labels, disambiguation-based partial multi-label learning aims to eliminate the noisy labels, thereby identifying and predicting the labels that are truly relevant to each instance. Traditional disambiguation strategies usually only focus on the correlation between labels and ignore the correlation between instances. To this end, a disambiguation-based partial multi-label learning algorithm augmented by fusing instance and label correlations was proposed, thereby improving the performance of disambiguation-based multi-label learning. First, a basic model was constructed based on the low-rank nature of ground-truth label matrix and the sparsity of noisy labels. Second, the kernel trick was used to map the feature vectors of the instances into a high-dimensional space so as to capture the linear and nonlinear correlations between the instances properly, which in turn helped us to eliminate noisy labels further. Finally, the associated labels of each instance was predicted by a linear mapping from the feature space to the label space. The experimental synthetic and real-world datasets show that compared with 8 comparative algorithms the algorithm proposed in the article has significant differences in statistics and performs better.

Key words: partial multi-label learning, instance correlation, label correlation, noisy label elimination

中图分类号: