Netinfo Security ›› 2022, Vol. 22 ›› Issue (9): 86-95.doi: 10.3969/j.issn.1671-1122.2022.09.010

Previous Articles     Next Articles

A Job Performance Evaluation Method under Spark Platform

ZHANG Zhenghui1,2, CHEN Xingshu1,2, LUO Yonggang2(), WU Tianxiong3   

  1. 1. School of Cyber Science and Engineering, Sichuan University, Chengdu 610065, China
    2. Cyber Science Research Institute, Sichuan University, Chengdu 610065, China
    3. School of Computer, Sichuan University, Chengdu 610065, China
  • Received:2022-06-15 Online:2022-09-10 Published:2022-11-14
  • Contact: LUO Yonggang E-mail:iamlyg98@scu.edu.cn

Abstract:

In order to solve the problem of performance evaluation and performance optimization during the operation of Spark jobs, this paper proposed a performance evaluation and analysis method of Spark jobs based on hierarchical analysis. Firstly, to address the problem of low accuracy of traditional job type classification affected by feature selection, more realistic CPU and I/O features were selected and combined with K-Means clustering algorithm to build a job classifier to improve the classification accuracy. Secondly, the job workflow was optimized by eliminating operations such as data sorting, disk overflow writing, and file merging during job operation, and the optimized job performance index was used as the evaluation benchmark, making the job operation performance evaluation more objective and general. Afterwards, the performance metrics were quantified and stratified, hierarchical analysis was introduced to calculate their weights, and the performance evaluation model was constructed by combining job classifiers and evaluation benchmarks. Finally, experimental validation was conducted in three aspects: job type classification, workflow optimization method and performance evaluation. The experimental results show the effectiveness of the proposed job type classification and workflow optimization method, as well as the accuracy of the evaluation model.

Key words: Spark, assessment benchmark, quantification, hierarchical analysis

CLC Number: