数据挖掘中一种增强的Apriori算法分析

doi:10.3969/j.issn.1671-1122.2015.11.013

信息网络安全 ›› 2015, Vol. 15 ›› Issue (11): 77-83.doi: 10.3969/j.issn.1671-1122.2015.11.013

数据挖掘中一种增强的Apriori算法分析

胡雪¹, 封化民^1,², 李明伟¹, 丁钊³

1.北京电子科技学院,北京 100070
2.西安电子科技大学通信工程学院,陕西西安 710071
3.西安电子科技大学计算机学院,陕西西安 710071

收稿日期:2015-09-01 出版日期:2015-11-25 发布日期:2015-11-20
作者简介:
作者简介：胡雪（1990-）,男,山东,硕士研究生,主要研究方向：计算机网络、数据挖掘算法研究;封化民（1963-）,男,陕西,教授,博士,主要研究方向：信息安全、网络安全、密码学;李明伟（1991-）男,安徽,硕士研究生,主要研究方向：数据挖掘、机器学习算法研究与应用;丁钊（1992-）男,山东,硕士研究生,主要研究方向：数据挖掘算法研究与应用、信息安全。
基金资助:
国家自然科学基金[61103210];中央高校基本科研业务费专项资金[2015XS1-LB,38201541]

Analysis of An Enhanced Apriori Algorithms in Data Mining

HU Xue¹, FENG Hua-min^1,², LI Ming-wei¹, DING Zhao³

1. Beijing Electronic Science and Technology Institute, Beijing 100070, China
2. Communication Engineering Institute, Xidian University, Xi’an Shanxi 710071, China
3. School of Computer Science and Technology, Xidian University, Xi’an Shanxi 710071, China

Received:2015-09-01 Online:2015-11-25 Published:2015-11-20

摘要/Abstract

摘要：

在当今这个信息极度发达的社会,网络数据急剧膨胀,激增的数据背后隐藏着许多重要的信息,所以对大量数据进行分析是必要的。Apriori算法是一种挖掘关联规则的频繁项集算法,其核心思想是通过候选集生成和情节的向下封闭检测两个阶段来挖掘频繁项集。可能产生大量的候选集,以及可能需要重复扫描数据库是Apriori算法的两大缺点。文中提出了一种需要更少的扫描时间的Apriori算法,在剪枝候选项集的同时也在消除冗余的子项集的产生。改进的Apriori算法通过消除数据库中不需要记录的传输有效减少了I/O所花费的时间,Apriori算法的效率得到了极大的优化。文章给出了算法实现思想及证明,并对传统的和改进的Apriori算法进行比较和分析。

关键词: 数据挖掘, 关联规则, 频繁项集, 事务数, 支持计数

Abstract:

In the highly developed information society, network data expand rapidly and much important information hide behind the surge of data. So it is necessary that analyze a large amounts of data. Apriori algorithm is a frequent item set algorithm for mining association rules. Its core idea is to excavate frequent item sets through two stages including generating candidate sets and closed down testing of plot. May generate a large number of candidate sets and may need to repeat scanning database are the two major drawbacks of Apriori algorithm. By eliminating unnecessary transmission of records in the database, the improved Apriori algorithm effectively reduces the time spent on I/O, greatly optimizes the efficiency of the algorithm, proves and gives the algorithm implementation thought. In this paper, an enhanced Apriori algorithm is proposed which takes less scanning time. It is achieved by eliminating the redundant generation of sub-items during pruning the candidate item sets. Both traditional and enhanced Apriori algorithms are compared and analyzed in this paper.

Key words: data mining, association rule, frequent item sets, transaction number, support counting

中图分类号:

TP309

胡雪, 封化民, 李明伟, 丁钊. 数据挖掘中一种增强的Apriori算法分析[J]. 信息网络安全, 2015, 15(11): 77-83.

HU Xue, FENG Hua-min, LI Ming-wei, DING Zhao. Analysis of An Enhanced Apriori Algorithms in Data Mining[J]. Netinfo Security, 2015, 15(11): 77-83.

图/表 20

表1

表2

表3

表4

表5

表6

表7

表8

表9

表10

表11

表12

表13

表14

表15

表 16

表17

表18

表19

图1

参考文献 20

[1]	陈晓,赵晶玲. 大数据处理中混合型聚类算法的研究与实现[J]. 信息网络安全,2015,(4):45-49.
[2]	刘步中. 基于频繁项集挖掘算法的改进与研究[J]. 计算机应用研究,2012, 29(2):475-477.
[3]	崔贯勋,李梁,王柯柯,等. 关联规则挖掘中Apriori算法的研究与改进[J]. 计算机应用,2010,30(11):2952-2955.
[4]	吴旭,郭芳毓,颉夏青,等. 面向机构知识库结构化数据的文本相似度评价算法[J]. 信息网络安全,2015,(5):16-20.
[5]	AGRAWAL R, IMIELIŃSKI T, SWAMI A. Mining association rules between sets of items in large databases[J]. ACM SIGMOD Record, 1993, 22(2): 207-216.
[6]	SAVASERE A, OMIECINSKI E, NAVATHE S B.An Efficient Algorithm for Mining Association Rules in Large Databases[C]//21th International Conference on Very Large Data Bases, San Francisco, CA, USA, 1995: 432-444.
[7]	TIOVONEN H.Sampling Large Databases for Association Rules[C]//22th International Conference on Very Large Data Bases, San Francisco, CA, USA , 1996: 134-145.
[8]	CRESTANA-JENSEN V, SOPARKAR N.Frequent Itemset Counting Across Multiple Tables[C]//4th Pacific-Asia Conference, PAKDD 2000, Kyoto, Japan, 2000: 49-61.
[9]	许为,林柏钢,林思娟,等. 一种基于用户交互行为和相似度的社交网络社区发现方法研究[J]. 信息网络安全,2015,(7):77-83.
[10]	BAY S D, PAZZANI M J.Detecting Group Differences: Mining Contrast Sets[J]. Data Mining & Knowledge Discovery, 2002, 5(3): 213-246.
[11]	AGARWAL R C, AGGARWAL C C, PRASAD V V V. A Tree Projection Algorithm For Generation Of Frequent Itemsets[J]. Journal of Parallel & Distributed Computing,(Special Issue on High Performance Data Mining), 2000, 61(3): 350-371.
[12]	ZHANG G L, LEI J S, WU X H.An Improved Apriori Algorithm for Mining Association Rules[J]. Microelectronics & Computer, 2010, 23(2): 10-12.
[13]	吴坚,沙晶. 基于随机森林算法的网络舆情文本信息分类方法研究[J]. 信息网络安全,2014,(11):36-40.
[14]	JR R J B. Efficiently mining long patterns from databases[C]//The ACM International Conference on Management of Data, Washington, 1998, 27:85-93.
[15]	高聪. Deep Web下不确定数据处理的研究[D]. 沈阳:东北大学,2008.
[16]	AGRAWAL R, SRIKANT R.Fast Algorithms for Mining Association Rules in Large Databases[C]//The 20th International Conference on Very Large Data Bases, CA, USA, 1994: 487-499.
[17]	孙兴东,李爱平,李树栋. 一种基于聚类的微博关键词提取方法的研究与实现[J]. 信息网络安全,2014,(12):27-31.
[18]	HAN J, PEI J, YIN Y, et al.Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach[J]. Data Mining & Knowledge Discovery, 2004, 8(1):53-87.
[19]	HUNYADI D.Performance Comparison of Apriori and FP-Growth Algorithms in Generating Association Rules[J]. 5th European conference on European computing conference, Wisconsin, 2011: 376-381.
[20]	GOSWAMI D N, CHATURVEDI A, RAGHUVANSHI C S.Efficient Algorithm for Frequent Pattern Mining Based On Apriori[J]. International Journal on Computer Science & Engineering, 2010, 2(4): 942-947.

数据挖掘中一种增强的Apriori算法分析

Analysis of An Enhanced Apriori Algorithms in Data Mining

RichHTML

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

图/表 20

参考文献 20

相关文章 15

编辑推荐

Metrics

本文评价

[1]	李桥, 龙春, 魏金侠, 赵静. 一种基于LMDR和CNN的混合入侵检测模型[J]. 信息网络安全, 2020, 20(9): 117-121.
[2]	黄保华, 程琪, 袁鸿, 黄丕荣. 基于距离与误差平方和的差分隐私K-means聚类算法[J]. 信息网络安全, 2020, 20(10): 34-40.
[3]	宋鑫, 赵楷, 张琳琳, 方文波. 基于随机森林的Android恶意软件检测方法研究[J]. 信息网络安全, 2019, 19(9): 1-5.
[4]	蒋辰, 杨庚, 白云璐, 马君梅. 面向隐私保护的频繁项集挖掘算法[J]. 信息网络安全, 2019, 19(4): 73-81.
[5]	张蕾华, 牛红太, 王仲妮, 刘雪红. 基于大数据的前科人员犯罪预警模型构建研究[J]. 信息网络安全, 2019, 19(4): 82-89.
[6]	王旭东, 余翔湛, 张宏莉. 面向未知协议的流量识别技术研究[J]. 信息网络安全, 2019, 19(10): 74-83.
[7]	蒋卓键, 伍淳华, 夏铭. 云系统中多层次用户分类方法研究与实现[J]. 信息网络安全, 2017, 17(8): 69-75.
[8]	徐燕. 基于数据挖掘的网络链接预测研究[J]. 信息网络安全, 2017, 17(6): 30-34.
[9]	李殿伟, 何明亮, 袁方. 基于角色行为模式挖掘的内部威胁检测研究[J]. 信息网络安全, 2017, 17(3): 27-32.
[10]	方跃坚, 朱锦钟, 周文, 李同亮. 数据挖掘隐私保护算法研究综述[J]. 信息网络安全, 2017, 17(2): 6-11.
[11]	杨旭东. 网络舆情监控系统关键技术研究[J]. 信息网络安全, 2016, 16(9): 251-256.
[12]	陈晓, 赵晶玲. 大数据处理中混合型聚类算法的研究与实现[J]. 信息网络安全, 2015, 15(4): 45-49.
[13]	陈敏欣;谢冬青;黄海. 环境监测有害成分的数据融合及其水质状况评价[J]. , 2014, 14(2): 0-0.
[14]	. 环境监测有害成分的数据融合及其水质状况评价[J]. , 2014, 14(2): 63-.
[15]	叶明;叶猛. 数据挖掘在防范和打击计算机犯罪中的应用研究[J]. , 2013, 13(11): 0-0.