信息网络安全 ›› 2022, Vol. 22 ›› Issue (9): 86-95.doi: 10.3969/j.issn.1671-1122.2022.09.010

• 理论研究 • 上一篇    下一篇

一种Spark平台下的作业性能评估方法

张征辉1,2, 陈兴蜀1,2, 罗永刚2(), 吴天雄3   

  1. 1.四川大学网络空间安全学院,成都 610065
    2.四川大学网络空间安全研究院,成都 610065
    3.四川大学计算机学院,成都 610065
  • 收稿日期:2022-06-15 出版日期:2022-09-10 发布日期:2022-11-14
  • 通讯作者: 罗永刚 E-mail:iamlyg98@scu.edu.cn
  • 作者简介:张征辉(1997—),男,江西,硕士研究生,主要研究方向为云计算及大数据安全|陈兴蜀(1968—),女,贵州,教授,博士,主要研究方向为可信计算、云计算与大数据安全|罗永刚(1980—),男,贵州,研究员,博士,主要研究方向为大数据和网络安全|吴天雄(1994—),男,湖北,硕士研究生,主要研究方向为大数据和信息安全
  • 基金资助:
    国家自然科学基金(U19A2081);国家自然科学基金(61802270);国家自然科学基金(61802271);教育部-中国移动科研基金(CM20200409);四川大学工科特色团队项目(2020SCUNG129)

A Job Performance Evaluation Method under Spark Platform

ZHANG Zhenghui1,2, CHEN Xingshu1,2, LUO Yonggang2(), WU Tianxiong3   

  1. 1. School of Cyber Science and Engineering, Sichuan University, Chengdu 610065, China
    2. Cyber Science Research Institute, Sichuan University, Chengdu 610065, China
    3. School of Computer, Sichuan University, Chengdu 610065, China
  • Received:2022-06-15 Online:2022-09-10 Published:2022-11-14
  • Contact: LUO Yonggang E-mail:iamlyg98@scu.edu.cn

摘要:

为了解决Spark作业运行过程中性能评估和性能优化问题,文章提出一种基于层次分析的Spark作业性能评估和分析方法。首先,针对由于特征选取影响传统作业类型划分准确性的问题,文章选取更加真实的CPU、I/O特征,并结合K-Means聚类算法构建作业分类器,提升划分准确率;其次,文章通过消除作业运行过程中数据排序、磁盘溢写、文件合并等操作来优化作业工作流,并将优化后的作业性能指标作为评估基准,使得作业运行性能评估更具客观性和通用性;然后,对各性能指标进行量化、分层,利用层次分析法计算各层级间专家经验的指标权重,结合作业分类器和评估基准构建性能评估模型;最后,在作业类型划分、工作流优化方法和性能评估3方面进行实验验证。实验结果证明了文章提出的作业类型划分和工作流优化方法的有效性以及评估模型的准确性。

关键词: Spark, 评估基准, 量化, 层次分析法

Abstract:

In order to solve the problem of performance evaluation and performance optimization during the operation of Spark jobs, this paper proposed a performance evaluation and analysis method of Spark jobs based on hierarchical analysis. Firstly, to address the problem of low accuracy of traditional job type classification affected by feature selection, more realistic CPU and I/O features were selected and combined with K-Means clustering algorithm to build a job classifier to improve the classification accuracy. Secondly, the job workflow was optimized by eliminating operations such as data sorting, disk overflow writing, and file merging during job operation, and the optimized job performance index was used as the evaluation benchmark, making the job operation performance evaluation more objective and general. Afterwards, the performance metrics were quantified and stratified, hierarchical analysis was introduced to calculate their weights, and the performance evaluation model was constructed by combining job classifiers and evaluation benchmarks. Finally, experimental validation was conducted in three aspects: job type classification, workflow optimization method and performance evaluation. The experimental results show the effectiveness of the proposed job type classification and workflow optimization method, as well as the accuracy of the evaluation model.

Key words: Spark, assessment benchmark, quantification, hierarchical analysis

中图分类号: