信息网络安全 ›› 2014, Vol. 14 ›› Issue (9): 127-131.doi: 10.3969/j.issn.1671-1122.2014.09.029

• 入选论文 • 上一篇    下一篇

基于分类的中文微博热点话题发现方法研究

郑飞, 张蕾   

  1. 上海市公安局,上海 200025
  • 收稿日期:2014-08-06 出版日期:2014-09-01
  • 作者简介:郑飞(1980-),男,河南,助理工程师,博士,主要研究方向:数据挖掘和交通流理论; 张蕾(1967-),女,江西,高级工程师,本科,主要研究方向:网络安全。

Classification-based Hot Topic Detection Approach on Chinese Micro-blog

ZHENG Fei, ZHANG Lei   

  1. Shanghai Bureau of Public Security, Shanghai 200025, China
  • Received:2014-08-06 Online:2014-09-01

摘要: 智能手机和微博客户端强化了微博的媒体特性,实时发现微博话题具有现实意义。文章提出了一种基于关键字分类的中文微博热点话题发现方法,通过关键字对微博信息进行筛选和归类,以时间窗内词频和增长速度构造赋权函数提取主题词,词汇的同文本条件概率作为相似度判定依据,基于改进的单遍聚类算法进行主题词聚类。对系统运行结果分析表明,该方法可以实时有效地聚类发现微博热点话题。

关键词: 分类, 微博, 话题发现, 聚类

Abstract: Smart-phones and micro-blog client reinforce the micro-blog media features. Therefore, Micro-blog hot topic real-time detection can provide valuable research results in relevant fields. The paper introduces a real-time hot micro-blog topic detection method based on keywords classification. Filtered micro-blog messages were classified according to keywords. A multi-weight function based on the word frequency and growth in the time window was used to extract the key words of micro-blog information. An improved single-pass clustering algorithm based on same-text conditional probability was used to find the micro-blog hot topic. The results show that the approach is effect in clustering micro-blog hot topic in real time.

Key words: classification, micro-blog, topic detection, clustering