信息网络安全 ›› 2015, Vol. 15 ›› Issue (7): 13-19.doi: 10.3969/j.issn.1671-1122.2015.07.003

• • 上一篇    下一篇

云计算技术在垃圾短信过滤中的应用与实现

孙大鹏()   

  1. 国家计算机网络应急技术处理协调中心辽宁分中心,辽宁沈阳 110035
  • 收稿日期:2015-04-09 出版日期:2015-07-01 发布日期:2015-07-28
  • 作者简介:

    作者简介: 孙大鹏(1976-),男,辽宁,工程师,硕士,主要研究方向:网络信息安全。

  • 基金资助:
    国家242信息安全计划[2011A011]

Application and Implementation of Hadoop Cloud Computing Technology in Junk Message Filtering

SUN Da-peng()   

  1. Liaoning Branch of CNCERT, Shenyang Liaoning 110035, China
  • Received:2015-04-09 Online:2015-07-01 Published:2015-07-28

摘要:

垃圾短信的问题日益突显,不仅对人们的正常生活造成了诸多的不良影响,还对公共安全和社会稳定造成了一定程度的危害。因此对垃圾短信准确过滤显得尤其重要。经过研究发现,现有的短信过滤技术存在一些不足:基于黑白名单的过滤技术显得过于简单粗暴,基于内容分析的垃圾短信过滤技术虽然准确度得到很大程度的提高,但在实现上也存在着复杂度过高、易导致信息网络阻塞等不足。针对这一缺点,文章详细调查分析了近年来飞速发展起来的云计算技术,发现其在伸缩性、可靠性、成本等方面具有非常大的优势,尤其是依靠它的高扩展能力可以把计算规模做到无限大,而成本又非常低,可以作为不错的计算平台。在此基础上,文章深入分析正在使用的垃圾短信过滤的实现方案,对各过滤实现方式的原理及其性能做了仔细分析比较。文章分析了现行基于内容过滤器所使用的算法,发现其可以通过云计算的Hadoop开源实现方案中的MapReduce编程模型来实现。

关键词: 云计算, 垃圾短信过滤, Hadoop, MapReduce

Abstract:

The problem of junk message has become more severe. The flood of junk message has not only greatly disturbed people’s life and also endangered public security and social stability. Therefore, the research of accurate and intelligent filter of junk message is of great significance. The research of existing filtration methods indicates that their implement has some shortcoming. The filtration methods based on black and white list are too simple and brutal. Although, the accuracy of content-based filtration has been improved greatly, their complexity of algorithm usually is cause of operator service network jam. The research indicates that the cloud computing technology has a great advantage in scalability, reliability, cost and other aspects. In particular, the scale of computing power can be made of infinite size in low cost relied on its high-expansion of scale. So the cloud computing is a good platform. Based on this foundation, the essay conducted a careful analysis of algorithm principle of content-based filtration and found that almost all the algorithm of content-based filtration currently used is based on Bayes classification theory. After a detailed study and relevant experiment, found that the content-based filter can be implemented by relying on the cloud computing platform and MapReduce programming model.

Key words: cloud computing, junk message filtering, Hadoop, MapReduce

中图分类号: