信息网络安全 ›› 2014, Vol. 14 ›› Issue (12): 32-36.doi: 10.3969/j.issn.1671-1122.2014.12.007

• 技术研究 • 上一篇    下一篇

一种基于大数据技术的舆情监控系统

曹彬1, 顾怡立2, 谢珍真3, 陈震4   

  1. 1.清华大学计算机科学与技术系,北京100084;
    2.纽约大学文理学院,纽约10012;
    3.吉林大学计算机科学与技术系,吉林长春130012;
    4.清华大学信息技术研究院,北京 100084
  • 收稿日期:2014-11-02 出版日期:2014-12-15
  • 通讯作者: 曹彬 caobni@gmail.com
  • 作者简介:曹彬(1983-),男,河北,硕士研究生,主要研究方向:网络安全、网络流量异常检测等;顾怡立(1992-),女,江苏,硕士研究生,主要研究方向:网络技术;谢珍真(1988-),女,吉林,博士研究生,主要研究方向:网络安全与云计算;陈震(1976-),男,浙江,副教授,博士,主要研究方向:高速网络、P2P 系统、可信计算。
  • 基金资助:
    国家自然科学A3重点基金[61161140320]; 国家重点基础研究发展计划(国家973项目)[2012CB315800]

A Public Opinion Monitoring System Based on Big Data Technology

CAO Bin1, GU Yi-li2, XIE Zhen-zhen3, CHEN Zhen4   

  1. 1. Department of Computer Science and Technology , Tsinghua University, Beijing 100084,China;
    2. College of Arts &Science, New York University, New York 10012, The USA;
    3. Department of Computer Science and Technology,Jilin University, Changchun Jilin 130012,China;
    4. Research Institute of Information Technology , Tsinghua University, Beijing 100084,China
  • Received:2014-11-02 Online:2014-12-15

摘要: 随着互联网的普及,社交网络已经成为人们生活中至关重要的一部分。这种新媒体时代的潮流促进了信息的流动和传播,同时也带来了海量的媒体内容与用户数据。社交媒体分析是舆情监控系统的主要内容,舆情数据的分析、处理与监控是新媒体时代带来的新的技术问题之一。近年来的大数据处理计算技术提供了处理海量数据的成熟解决方案。大数据处理平台有很多种,其中Hadoop具有成熟的社区,其架构稳定且易于使用。在文本分类方面,LDA统计模型方法给文本分类问题带来了新的处理办法。因此,文章提出了一个基于成熟开源架构的舆情监控系统,系统基于Hadoop平台,以Nutch作为爬虫,使用Solr实现核心的索引查找功能。整个平台在海量数据处理方面展现了较高的分析处理效率,在应对海量数据带来的问题的同时还提供了智能的分析与统计功能。

关键词: 舆情监控, 爬虫, 搜索, LDA算法, 社交媒体

Abstract: With the popularization of Internet, social network has become a vital part of people's lives. Social media promotes flow and dissemination of information, but also brings a deluge of social media data and user data. Social media analysis is the main component of public opinion monitor system. Analysing and monitoring of public opinion data is one of the new technical problems caused by media in this era. In recent years, new technology such as big data processing provides proven solution to cope with the massive data . There are many big data processing platforms, in which Hadoop platform has a mature community and its structure is stable and easy to use. To text classification problems, LDA statistical modeling brings a new approach. Therefore, this paper proposes a public opinion monitoring system based on proven open source architectures. The system bases on Hadoop platform, with Nutch as a crawler, using Solr to achieve the core index search function. The entire platform demonstrates its high efficiency in the mass data processing analysis. while providing intelligent analysis and statistical functions in response to the problems caused by massive amounts of data.

Key words: public opinion monitor, crawler, search, LDA algorithm, social media

中图分类号: