信息网络安全 ›› 2016, Vol. 16 ›› Issue (6): 68-73.doi: 10.3969/j.issn.1671-1122.2016.06.011

• • 上一篇    下一篇

基于通联数据的人际关系网络构建与挖掘

曲洋, 王永剑, 彭如香, 姜国庆()   

  1. 公安部第三研究所信息网络安全公安部重点实验室,上海 201400
  • 收稿日期:2016-05-15 出版日期:2016-06-20 发布日期:2020-05-13
  • 作者简介:

    作者简介: 曲洋(1988—),男,吉林,助理研究员,硕士,主要研究方向为网络信息安全;王永剑(1981—),男,山西,副研究员,博士,主要研究方向为信息安全;彭如香(1987—),女,湖南,助理研究员,硕士,主要研究方向为网络信息安全、数据挖掘;姜国庆(1989—),男,上海,研究实习员,主要研究方向为信息网络安全、机器学习。

  • 基金资助:
    国家重点基础研究发展计划[2014CB340406];2016基本科研业务费专项资金[C163567];广州市科技计划[2014Y2-00022]

Construction and Data Mining of Social Network Based on Communication Log

Yang QU, Yongjian WANG, Ruxiang PENG, Guoqing JIANG()   

  1. Key Laboratory of Information Network Security Of Ministry Of Public Security, The Third Research Institute Of Ministry Of Public Security, Shanghai 201204, China
  • Received:2016-05-15 Online:2016-06-20 Published:2020-05-13

摘要:

网络通讯已然成为了信息时代最具代表性的产物,用户之间的社交关系也变得越来越清晰、越来越重要。文章通过模拟通联数据,利用中文分词、自然语言处理等技术构建反映人际关系的通联好友网络,并设计了一种适用于好友网络人际关系预测的多分类算法。该算法首先利用层次聚类对原始数据进行聚类并结合人工干预,确定最终类的个数,从而有效避免通联分组信息的多义词性造成的类别数过多的问题,然后以通联来往记录等信息为基础设计分类特征,最后利用在小样本下,具有复杂决策边界建模能力的支持向量机(Support Vector Machine, SVM)进行训练,得到适用于人际关系预测的分类模型,并用于未知人际关系的预测。

关键词: 通联日志, 人际网络, 用户串并, 关系预测, SVM

Abstract:

Communication on Internet has became one of the most representative products of information age, and the social relationship between users are becoming more clear, more and more important. In this paper, we build a social network which reflects the interpersonal contacts and then design an interpersonal relationship prediction algorithm of social network prediction model based on multiple classification algorithm for imitating communication log by using Chinese word segmentation and natural language processing (NLP) technologies. The algorithm firstly determined the number of the final class by using hierarchical clustering of raw data and combining the artificial intervention, thus effectively avoid to generating large mount of class label caused by many types of polysemous word. Finally we use Support Vector Machine (SVM) to train realtionship pretection model which can have a good perfermance under the small sample and also have an ability of complex decision boundary modeling.

Key words: communication log, social network, user recognition, relationship pretection, SVM

中图分类号: