信息网络安全 ›› 2016, Vol. 16 ›› Issue (1): 81-87.doi: 10.3969/j.issn.1671-1122.2016.01.015

• • 上一篇    下一篇

微博自动分类系统设计

张士豪(), 顾益军, 张俊豪   

  1. 中国人民公安大学网络安全保卫学院,北京102623
  • 收稿日期:2015-11-16 出版日期:2016-01-01 发布日期:2020-05-13
  • 作者简介:

    作者简介: 张士豪(1992-),男,山西,硕士研究生,主要研究方向为网络安全与数据挖掘;顾益军(1968-),男,江苏,副教授,博士,主要研究方向为网络安全与数据挖掘;张俊豪(1991-),男,河南,硕士研究生,主要研究方向为网络安全与数据挖掘。

  • 基金资助:
    基金项目: 公安部重点研究计划[2011ZDYJGADX016]

An Automatic Classification System for Microblogging

Shihao ZHANG(), Yijun GU, Junhao ZHANG   

  1. School of Cybersecurity,People’s Public Security University of China, Beijing 102623, China
  • Received:2015-11-16 Online:2016-01-01 Published:2020-05-13

摘要:

文章提出了一种热门微博分类的新思路,通过对热门微博的转发用户进行聚类分析,并根据不同的用户聚集状态来区分不同种类的热门微博。在用户聚类中采用了基于K-means聚类算法的改进算法X-means,并根据微博用户数据特点对X-means算法进行了进一步改进,将属性差异和用户节点差异考虑在聚类过程当中。其中,在对X-means算法改进过程中,对于用户属性的加权采用了基于对数函数的加权方式,确保聚类结果更加科学、准确;在对用户自身权重的加权中,通过建立重点人员信息库的方式,实现了对特殊用户节点的加权,并利用HITS算法对重点人员信息库实现动态更新。在完成用户聚类之后,将得到的重要用户的信息分领域录入重点人员信息库,实现聚类过程与信息库的反馈机制。另外,实验将相同数据分别代入改进前后的K-means算法与X-means算法中,并通过轮廓系数评价聚类结果,证明了改进后的X-means算法在微博用户聚类中更有优势。

关键词: 微博分类, 用户聚类, 轮廓系数

Abstract:

This paper proposed a new idea for popular microblogging classification, by analyzing the users who forwarded the popular microblogging to obtain the clustering result, and distinguishing the different kinds of popular microblogging depending on the aggregation state of user. The user clustering algorithm is called X-means algorithm which improved on the basis of K-means clustering algorithm, and improved further according to the characteristics of the microblogging user. Taking into account the difference of the user themselves and their attributes, this paper used a weighted approach based on the logarithmic function in the process of improving X-means algorithm ,which can ensure that the clustering results more scientific and accurate. Simultaneously , this paper achieved a weighted approach for the special nodes by the way of establishing a Key-Personnel- Database, then this paper achieved the dynamic updates of the database with the HITS algorithm. After completing the user clustering, the experiment put the important user information into the Key-Personnel- Database in different fields, by which can achieve the feedback mechanism between the clustering processes and the database. In addition, clustered the microblogging user with the X-means algorithm and the k-means algorithm as well as their improved algorithm, and ultimately proved the improved X-means algorithm has more advantages in the microblogging user clustering.

Key words: microblogging classification, user clustering, outline coefficient

中图分类号: