Netinfo Security ›› 2019, Vol. 19 ›› Issue (4): 11-19.doi: 10.3969/j.issn.1671-1122.2019.04.002

Previous Articles     Next Articles

A Method for Improving the Performance of Spark on Container Cluster Based on Machine Learning

Chunqi TIAN1,2(), Jing LI1,2, Wei WANG1,2,3, Liqing ZHANG1,2   

  1. 1. Department of Computer Science and Engineering, Tongji University, Shanghai 200092, China
    2. The Key Laboratory of Embedded System and Service Computing of Ministry of Education, Tongji University, Shanghai 200092, China
    3. Hubei Engineering Research Center for Education Information, Wuhan Hubei 430062, China
  • Received:2018-11-19 Online:2019-04-10 Published:2020-05-11

Abstract:

At present, Spark-based applications are very extensive. Reasonable configuration will make Spark jobs have higher execution efficiency. A large number of scholars have conducted in-depth research on the parameter tuning of Spark on virtual machine clusters. In recent years, as an emerging cloud computing infrastructure, containers are more and more widely used in service clusters. Therefore, it is also important to study the parameter tuning of Spark on container clusters. This paper studies the parameter configuration problem of Spark on Docker container cluster, and proposes a new parameter tuning method(ContainerOpt), which uses machine learning method to learn and predict the performance of the job under different parameter combinations, and introduces node automatic scaling mechanism that enable higher-input jobs to achieve better performance. In order to achieve a better balance between job execution time and resource occupation, a performance representation model based on time and resource is proposed to replace the traditional performance representation model based on a single execution time. The experimental results show that compared with the default configuration, the parameter tuning method can improve the execution efficiency by 50%.

Key words: cloud computing, Spark, Docker, machine learning, parameter tuning

CLC Number: