信息网络安全 ›› 2015, Vol. 15 ›› Issue (11): 77-83.doi: 10.3969/j.issn.1671-1122.2015.11.013

• 技术研究 • 上一篇    下一篇

数据挖掘中一种增强的Apriori算法分析

胡雪1, 封化民1,2, 李明伟1, 丁钊3   

  1. 1.北京电子科技学院,北京 100070
    2.西安电子科技大学通信工程学院,陕西西安 710071
    3.西安电子科技大学计算机学院,陕西西安 710071
  • 收稿日期:2015-09-01 出版日期:2015-11-25 发布日期:2015-11-20
  • 作者简介:

    作者简介: 胡雪(1990-),男,山东,硕士研究生,主要研究方向:计算机网络、数据挖掘算法研究;封化民(1963-),男,陕西,教授,博士,主要研究方向:信息安全、网络安全、密码学;李明伟(1991-)男,安徽,硕士研究生,主要研究方向:数据挖掘、机器学习算法研究与应用;丁钊(1992-)男,山东,硕士研究生,主要研究方向:数据挖掘算法研究与应用、信息安全。

  • 基金资助:
    国家自然科学基金[61103210];中央高校基本科研业务费专项资金[2015XS1-LB,38201541]

Analysis of An Enhanced Apriori Algorithms in Data Mining

HU Xue1, FENG Hua-min1,2, LI Ming-wei1, DING Zhao3   

  1. 1. Beijing Electronic Science and Technology Institute, Beijing 100070, China
    2. Communication Engineering Institute, Xidian University, Xi’an Shanxi 710071, China
    3. School of Computer Science and Technology, Xidian University, Xi’an Shanxi 710071, China
  • Received:2015-09-01 Online:2015-11-25 Published:2015-11-20

摘要:

在当今这个信息极度发达的社会,网络数据急剧膨胀,激增的数据背后隐藏着许多重要的信息,所以对大量数据进行分析是必要的。Apriori算法是一种挖掘关联规则的频繁项集算法,其核心思想是通过候选集生成和情节的向下封闭检测两个阶段来挖掘频繁项集。可能产生大量的候选集,以及可能需要重复扫描数据库是Apriori算法的两大缺点。文中提出了一种需要更少的扫描时间的Apriori算法,在剪枝候选项集的同时也在消除冗余的子项集的产生。改进的Apriori算法通过消除数据库中不需要记录的传输有效减少了I/O所花费的时间,Apriori算法的效率得到了极大的优化。文章给出了算法实现思想及证明,并对传统的和改进的Apriori算法进行比较和分析。

关键词: 数据挖掘, 关联规则, 频繁项集, 事务数, 支持计数

Abstract:

In the highly developed information society, network data expand rapidly and much important information hide behind the surge of data. So it is necessary that analyze a large amounts of data. Apriori algorithm is a frequent item set algorithm for mining association rules. Its core idea is to excavate frequent item sets through two stages including generating candidate sets and closed down testing of plot. May generate a large number of candidate sets and may need to repeat scanning database are the two major drawbacks of Apriori algorithm. By eliminating unnecessary transmission of records in the database, the improved Apriori algorithm effectively reduces the time spent on I/O, greatly optimizes the efficiency of the algorithm, proves and gives the algorithm implementation thought. In this paper, an enhanced Apriori algorithm is proposed which takes less scanning time. It is achieved by eliminating the redundant generation of sub-items during pruning the candidate item sets. Both traditional and enhanced Apriori algorithms are compared and analyzed in this paper.

Key words: data mining, association rule, frequent item sets, transaction number, support counting

中图分类号: