信息网络安全 ›› 2014, Vol. 14 ›› Issue (9): 17-21.doi: 10.3969/j.issn.1671-1122.2014.09.004

• 优秀论文 • 上一篇    下一篇

一种基于主题相关性分类的微博话题立场研判方法

王明元, 贾焰, 周斌, 黄九鸣   

  1. 国防科技大学计算机学院,湖南长沙 410073
  • 收稿日期:2014-08-06 出版日期:2014-09-01
  • 作者简介:王明元(1985-),男,甘肃,硕士研究生,主要研究方向:社交网络分析;贾焰(1960-),女,四川,博士生导师,教授,主要研究方向:网络和信息安全、数据库与数据挖掘;周斌(1971-),男,江西,研究员,博士,主要研究方向:信息安全、数据挖掘等;黄九鸣(1981-),男,福建,助理研究员,博士,主要研究方向:信息安全、数据挖掘等。
  • 基金资助:
    国家重点基础研究发展计划(973计划)[2013CB329601、2013CB329602] 、国家高科技研究发展计划(863计划)[2012AA013002]

A Method of Discriminating Microblog Topic Position based on the Text Classification with Correlation of Subject

WANG Ming-yuan, JIA Yan, ZHOU Bin, HUANG Jiu-ming   

  1. College of Computer, National University of Defense Technology, Changsha Hunan 410073, China
  • Received:2014-08-06 Online:2014-09-01

摘要: 对微博话题的立场进行精确研判是短文本挖掘的重点之一。文章提出了一种基于主题相关性对微博分类研判的方法,旨在识别网民对于微博话题的立场,是支持还是反对。微博和主题的相关性大小,常常会导致其文本特征有较大差异。文章首先利用关键词提取技术和互信息计算方法获取话题主题词集,接着对话题语料按是否与主题相关进行分类,然后分别采用机器学习和词典规则两种方法进行研判,综合得到话题的立场。实验结果表明,主题相关文本采用机器学习而主题无关文本采用词典规则的方法可以大大提高研判准确率。以此为基础,文章构建了一个微博话题立场研判模型,可用于政府有关部门监测互联网舆情以及企业评估产品市场等方面。

关键词: 微博, 话题, 立场, 相关性, 朴素贝叶斯

Abstract: How to discriminate accurately the microblog topic position is one of the highlights in the short essay mining. This paper proposes a method based on the text classification with correlation of subject, which can discriminate users for the topic who is to support or oppose. The correlation of subject often leads to the text that have greatly different features. The method first obtain the topic keywords by extraction technology and mutual information, then classify the text to conversation corpus with the correlation of subject, at last adopt different method to analyze the comprehensive microblog topic position. The experimental results show that the method of correlated adopting machine learning and the uncorrelated adopting dictionary can greatly improve the discrimination accuracy. On this basis, we construct a model, can be used for the relevant government departments to monitor the Internet public opinion and business evaluate the products market, etc.

Key words: microblog, topic, position, correlation, naive-bayes