信息网络安全 ›› 2017, Vol. 17 ›› Issue (6): 30-34.doi: 10.3969/j.issn.1671-1122.2017.06.005

• 技术研究 • 上一篇    下一篇

基于数据挖掘的网络链接预测研究

徐燕1, 2   

  1. 1.北京语言大学信息科学学院,北京 100083;
    2.中国科学院计算技术研究所,北京 100190
  • 收稿日期:2017-05-10 出版日期:2017-06-20
  • 通讯作者: 徐燕 xuy@blcu.edu.cn
  • 作者简介:徐燕(1968-),女,湖南,副教授,博士,主要研究方向为信息检索、信息安全、数据挖掘。
  • 基金资助:
    国家自然科学基金[60873166 ]; 北京市自然科学基金[4122076]

Research on Network Link Prediction Based on Data Mining

XU Yan1, 2   

  1. 1. the Information Science Department, Beijing Language and Culture University, Beijing 100083, China;
    2. Institute of Computing Technology, Chinese Academy of Science, Beijing 100190, China
  • Received:2017-05-10 Online:2017-06-20

摘要: 近年来社交网络日益火热,基于社交网络的数据挖掘也随之兴起。链接预测作为网络数据挖掘的重要课题,其借助已知的网络结构等信息来预测和估计尚未链接的两个节点间存在链接的可能性。社交网络的链接预测可以用于好友推荐,过滤冗余信息,提高用户的满意度、忠诚度,建立一个健康的社交网络环境。已有的链接预测算法集中研究网络结构信息或网络节点属性,以分析网络全局或局部特性。文章考虑到微博社交网络的本质,提出了融合多特征的链接预测方法,其中包括节点特征、拓扑特征、社交特征以及投票特征。基于这些特征,在微博社交网络数据上应用SVM、朴素贝叶斯、随机森林和逻辑回归4种机器学习算法训练预测模型,预测潜在的社交链接。结果表明,文章提出的组合特征相对于传统特征表现更好,且融合多种特征能够提高最终的预测精度。

关键词: 链接预测, 数据挖掘, 特征提取, 好友推荐, 机器学习算法

Abstract: In recent years, social networks have become increasingly hot, and data mining based on social networks has also arisen. Link prediction (LP) is an important topic of network data mining, which uses the known network structure and other information to predict and estimate the possibility of linking between two nodes that are not yet linked. Link prediction in social network can be used to recommend friends, filter redundant information, improve user’s satisfaction and loyalty, and build a healthy social networking environment. In previous researches, attentions are focused on structure information or node attributes, in order to analyze the global or local properties. Considering the natures of microblog social network, this paper proposes a link prediction method combining multiple features which includes node features, topological features, social features and voting features. Based on these features, 4 machine learning algorithms, SVM, naive Bayes, random forest and logical regression, are applied on microblog social network data to train predictive models to predict potential social links. The results show that combining multiple features performs better than the traditional features, and the combination of multiple features can achieve highest accuracy.

Key words: link prediction, data mining, features extraction, friends recommendation, machine learning algorithm

中图分类号: