信息网络安全 ›› 2017, Vol. 17 ›› Issue (2): 51-58.doi: 10.3969/j.issn.1671-1122.2017.02.008

• • 上一篇    下一篇

跨社交网络的实体用户关联技术研究

罗梁1, 王文贤1,2, 钟杰1, 王海舟1()   

  1. 1.四川大学计算机学院网络与可信计算研究所,四川成都610065
    2. 四川大学网络空间安全研究院,四川成都610065
  • 收稿日期:2016-12-01 出版日期:2017-02-20 发布日期:2020-05-12
  • 作者简介:

    作者简介: 罗梁(1992—),男,湖北,硕士研究生,主要研究方向为数据挖掘、舆情安全等;王文贤(1978—),男,福建,讲师,博士,主要研究方向为网络空间安全、舆情分析和挖掘;钟杰(1989—),男,四川,硕士研究生,主要研究方向为社交网络、舆情分析;王海舟(1986—),男,四川,讲师,博士,主要研究方向为网络空间安全、舆情分析和挖掘。

  • 基金资助:
    国家科技支撑计划[2012BAH18B05];国家自然科学基金 [61272447]

Research on Users Associated Technology across Social Network

Liang LUO1, Wenxian WANG1,2, Jie ZHONG1, Haizhou WANG1()   

  1. 1. Network and Trusted Computing Institute, College of Computer, Sichuan University, Chengdu Sichuan 610065,China
    2. Cybersecurity Research Institute, Sichuan University, Chengdu Sichuan 610065, China)
  • Received:2016-12-01 Online:2017-02-20 Published:2020-05-12

摘要:

近年来,随着社交网络大规模普及,社交网络在人们生活中扮演了越来越重要的角色。它们拥有海量的用户规模,但进行实名认证的用户却只占很小的比例,这使得恶意用户可以肆意散播各种谣言和不良信息,给互联网监管带来了巨大挑战。因此对跨社交网络的实体用户进行关联,建立身份识别信息网络,有助于解决用户的身份识别和监管问题。文章设计实现了针对QQ空间和新浪微博的信息采集系统,然后针对网络上采集到的544万微博用户和2459万QQ空间用户的资料和行为数据进行分析,提出了一种用户跨社交网站关联整体模型。该模型基于逻辑回归模型进行用户判定分类,同时根据SimRank算法的原理提出了SNC算法剔除噪声用户,提高模型精确度,最后利用本文筛选出的数据集进行跨社交网络用户关联实验。实验结果表明本模型能够筛选出关联性较强的用户对,经过剪枝处理后模型精确度有效提升,模型能够有效的对不同社交网络的用户进行关联。

关键词: 跨社交网络, 用户关联, 信息采集, SNC算法, 逻辑回归模型

Abstract:

With the massive popularity of social networks in recent years, social network has played a very important role in people’s daily lives. It has a lot of users, but few of them needs real name authentication, which malicious users can freely spread rumors and bad information to the public and bring challenges to Internet regulations. Therefore, associating entity users across different social networks, establish the network identification can help identify and supervise the users. The paper’s main research work are as follows. Firstly we designed a system to collect QZone and Weibo’s user’s information. Secondly we analyze the data we collect from the internet which contains 5,440,000 users of Weibo and 24,590,000 users of QZone. Then we proposed a model of users associated across social network. This model is based on logic regression model which is used to classify the users, at the same time, according to the principle of SimRank algorithm, the SNC algorithm is proposed to eliminate the noise and improve the accuracy of the model. Finally we use the model on the dataset we collected. The experimental result shows that the model can filter out pairs of users that associated strongly, the accuracy of the model has improved and the model can associate users of different social networks after pruning.

Key words: cross social networks, users association, information collection, SNC algorithm, logistic regression

中图分类号: