信息网络安全 ›› 2015, Vol. 15 ›› Issue (10): 46-52.doi: 10.3969/j.issn.1671-1122.2015.10.007

• 技术研究 • 上一篇    下一篇

一种基于词语能量值变化的微博热点话题发现方法研究

林思娟1,2(), 林柏钢1,2, 许为1,2, 杨旸1,2   

  1. 1.福州大学数学与计算机科学学院,福建福州 350108
    2.网络系统信息安全福建省高校重点实验室,福建福州350108
  • 收稿日期:2015-07-22 出版日期:2015-10-01 发布日期:2015-11-04
  • 作者简介:

    作者简介: 林思娟(1990-),女,福建,硕士研究生,主要研究方向:信息安全、社交网络热点发现;林柏钢(1953-),男,福建,教授,博士生导师,主要研究方向:网络与信息安全、编码与密码;许为(1990-),女,山西,硕士研究生,主要研究方向:信息安全、社交网络处理;杨旸(1984-),女,湖北,讲师,博士,主要研究方向:密码学与信息安全。

  • 基金资助:
    国家自然科学基金[61402112];福建省安全课题[828398]

Research on Microblog Hot Topic Detection Method Based on Term Energy Change

LIN Si-juan1,2(), LIN Bo-gang1,2, XU Wei1,2, YANG Yang1,2   

  1. 1. College of Mathematics and Computer Science, Fuzhou University, Fuzhou Fujian 350108, China
    2. Key Lab of Information Security of Network System in Fujian Province, Fuzhou Fujian 350108, China
  • Received:2015-07-22 Online:2015-10-01 Published:2015-11-04

摘要:

随着微博的迅速发展,微博上的热点话题发现成为目前的研究热点之一。文章以微博的实时性强作为研究的切入点,通过研究不同时域上词语的能量值变化,提出一种基于词语能量值变化的微博热点话题检测方法。该方法基于传统的话题生命周期理论,按微博的时间先后顺序对微博进行划分;引入了物理学科中加速度的概念,用词语的加速度来刻画词语在相邻窗口之间速度的变化;综合考虑词语的加速度和权重值来构造词语的复合权值,更适合量化词语的能量值;在单条件概率的基础上,使用了双条件概率的上下文相似度计算方法,并增加文档分布相似度来减少话题混淆的概率。实验表明了文章方法的有效性和稳定鲁棒性。与单条件概率的上下文相似度模型相比,改进之后的上下文相似度模型在不同的关键词检测方法中均具有更好的聚类效果。

关键词: 热点话题发现, 词语能量值, 加速度, 上下文相似度

Abstract:

With the popularity of microblog, hot topic detection on microblog has been a hot area of research. Regarding the instantaneity of microblog as a point of penetration, the paper proposes a method of hot topic detection based on change of term energy by studying the change of term energy at different time domain. Based on traditional topic aging theory, the method divides all microblog data into different microblog windows, and introduces the concept of acceleration in physics, which uses the acceleration of terms to describe the change of the speed of the terms in the adjacent window. The paper combines the term acceleration and term weight into a compound weight to quantize term energy better. The paper uses double-conditional probability context similarity computing method based on single-conditional probability, and adds document distribution similarity to decrease the probability of topic confusion. The experiments show that the method is effective and stable in robustness. Compared with single-conditional probability context similarity model, the modified context similarity model has better clustering effect in different keyword detection methods.

Key words: hot topic detection, term energy, acceleration, context similarity

中图分类号: