• • 上一篇    下一篇

快速最小生成树 Sollin 求解算法

张毅%顾逸圣%王伟   

  • 基金资助:
    国家自然科学基金[61103068]、教育部博士点基金[20110072120017]、信息安全国家重点实验室开放课题[2013]、上海科委优秀技术带头人计划课题、中央高校基本科研业务费专项资金

Fast Parallel Sollin's Algorithm for Minimum Spanning Tree on the GPU

ZHANG Yi%GU Yi-sheng%WANG Wei   

  • About author:同济大学计算机科学与技术系,上海 200092; 国家高性能计算机工程技术中心同济分中心,上海 200092; 同济大学嵌入式系统与服务计算教育部重点实验室,上海 200092

摘要: 最小生成树算法在计算机网络、信息安全等领域中有着广泛的应用,目前比较普遍的求解算法有 Prim 算法和 Kruskal 算法,但这两种算法由于本身的数据结构特性和迭代过程的相关性限制而难以并行化,因而无法有效地利用通用 GPU 并行架构进行并行化加速。Sollin算法虽然是最古老的最小生成树算法之一,但是在算法中经过初始化的森林迭代过程,每次迭代可以同时扩展合并多棵最小生成树,经过数次扩展和合并,最终由初始森林合并为一棵树,一旦成功扩展,这棵树一定是最小生成树。在扩展合并的过程中,每次迭代中每棵树的扩展和合并过程相互独立,这一特征使 Sollin 算法具有较好的并行性,可以利用目前流行的 GPU 并行架构进行快速求解优化。正因如此,文章针对传统 CPU 执行的串行 Sollin 算法,结合适用于通用 GPU 并行运算的数据结构特性,在提出可扩展树这一并行适用的数据存储结构的基础上,提出了一种基于可扩展树的快速扩展、合并最小生成树的方法,并针对通用 GPU 平台进行了实现。我们采用多组不同规模的测试数据进行实验。结果表明,相对传统串行 Sollin 最小生成树求解算法,该算法在用于较大数据规模的情况中具有明显的性能提升,同由 CPU 执行的传统串行方案相比,文章提出的方案获得了10至18倍的加速度。

Abstract: Prim's algorithm and Kruskal's algorithm are two common ones for minimum spanning tree which has a wild application in computer networks, information security, and so on. However, due to the data structures and the interactive processes of these two algorithms, it is difficult to make them parallelized with the help of general parallel algorithm on GPU. Sollin’s algorithm, old though it is, on the other hand, can expand several spanning trees independently at the same time during iteration, which finally turns the forest into a MST after times of iteration. It is much easy for acceleration using the general parallel algorithm on GPU for it has such characteristics. It is for this reason that we present a novel parallel Sollin’s algorithm on the GPU. We design the data structure for Sollin’s algorithm in consideration of parallelism of GPU and then do some parallelization which is based on a fast algorithm for tree mergence. We adopt several groups of data of different scales to test our algorithms. The experimental results show that this method achieves an acceleration of 10 to 18 compared with the traditional CPU-based one when the data are of a large scale.