A Scheme of Optimizing Deep Learning Model Using Bi-ADMM

doi:10.3969/j.issn.1671-1122.2023.02.007

Abstract

Abstract:

ADMM is widely used in the field of traditional machine learning model optimization, and it has solved some deep learning optimization problems, and its performance in deep learning optimization has exceeded most of the gradient-based optimization algorithms. Compared with ADMM, Bi-ADMM converges faster and it is more stable. This paper proposed a optimization scheme (dlBi-ADMM) to optimize deep learning problem, and used an accelerated proximal gradient algorithm to optimize coupled variables to reduce the complexity of matrix inversion operations. Then, it provided the specific function of the optimization subproblem for each variable in detail. Finally, experiments show that the optimization results of the dlBi-ADMM algorithm proposed in this paper can improve the accuracy of the model more than the results of the dlADMM optimization, and the dlBi-ADMM algorithm performs better than the dlADMM algorithm in time efficiency.

Key words: deep learning, ADMM, dlADMM, Bi-ADMM, accelerated proximal gradient algorithms

CLC Number:

TP309

XU Zhanyang, CHENG Luofei, CHENG Jianchun, XU Xiaolong. A Scheme of Optimizing Deep Learning Model Using Bi-ADMM[J]. Netinfo Security, 2023, 23(2): 54-63.

Figures/Tables 6

References 25

[1]	BOTTOU L. Large-Scale Machine Learning with Stochastic Gradient Descent[C]// Springer. Proceedings of COMPSTAT' 2010. Berlin:Springer, 2010: 177-186.
[2]	DUCHI J, HAZAN E, SINGER Y. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization[J]. Journal of Machine Learning research, 2011, 12(7): 257-269.
[3]	LIN Jiadong, SONG Chuanbiao, HE Kun, et al. Nesterov Accelerated Gradient and Scale Invariance for Adversarial Attacks[EB/OL]. (2020-02-03)[2022-10-05]. https://www.xueshufan.com/publication/2976752987.
[4]	ZEILER M D. Adadelta: An Adaptive Learning Rate Method[EB/OL]. (2012-12-22)[2022-10-05]. http://export.arxiv.org/pdf/1212.5701.
[5]	WANG Bao, NGUYEN T, SUN Tao, et al. Scheduled Restart Momentum for Accelerated Stochastic Gradient Descent[EB/OL]. (2020-04-26)[2022-10-05]. https://www.xueshufan.com/publication/3007093918.
[6]	ZOU Fangyu, SHEN Li, JIE Zequn, et al. A Sufficient Condition for Convergences of Adam and RMSprop[C]// IEEE. IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR). New York:IEEE, 2018: 11127-11135.
[7]	KINGMA D P, BA J. Adam: A Method for Stochastic Optimizationt[EB/OL]. (2017-01-30)[2022-10-05]. https://arxiv.org/pdf/1412.6980.
[8]	TAYLOR G, BURMERISTER R, XU Zheng, et al. Training Neural Networks without Gradients: A Scalable Admm Approach[C]// ACM. International Conference on Machine Learning. New York: ACM, 2016: 2722-2731.
[9]	WANG Junxiang, YU Fuxun, CHEN Xiang, et al. Admm for Efficient Deep Learning with Global Convergence[C]// ACM. 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2019: 111-119.
[10]	HONG Mingyi, LUO Zhiquan, RAZAVIYAYN M. Convergence Analysis of Alternating Direction Method of Multipliers for a Family of Nonconvex Problems[C]// IEEE. International Conference on Acoustics, Speech, and Signal Processing (ICASSP). New York:IEEE, 2015: 337-364.
[11]	GOLDSTEIN T, O'DONOGHUE B, SETZER S, et al. Fast Alternating Direction Optimization Methods[J]. SIAM Journal on Imaging Sciences, 2014, 7(3): 1588-1623. doi: 10.1137/120896219 URL
[12]	ZHANG Guoqiang, HEUADENS R. Bi-Alternating Direction Method of Multipliers over Graphs[C]// IEEE. 2015 IEEE International Conference on Acoustics, Speech and Signal Processing. New York: IEEE, 2013: 3317-3321.
[13]	ZHANG Guoqiang, HEUADENS R, KLEIJN W B. On the Convergence Rate of the Bi-Alternating Direction Method of Multipliers[C]// IEEE. 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). New York:IEEE, 2014: 3869-3873.
[14]	BECK A, TEBOULLE M. A Fast Iterative ShrinkageTthresholding Algorithm for Linear Inverse Problems[J]. SIAM Journal on Imaging Sciences, 2009, 2(1): 183-202. doi: 10.1137/080716542 URL
[15]	SCHEINBERG K, GOLDFARB D, BAI Xi. Fast First-Order Methods for Composite Convex Optimization with Backtracking[J]. Foundations of Computational Mathematics, 2014, 14(3): 389-417. doi: 10.1007/s10208-014-9189-9 URL
[16]	Wang Huahua, Banerjee A. Bregman Alternating Direction Method of Multipliers[EB/OL]. (2014-07-08) [2022-10-05]. https://doi.org/10.48550/arXiv.1306.3203. doi: https://doi.org/10.48550/arXiv.1306.3203
[17]	ZHOU Xingyu. On the Fenchel Duality Between Strong Convexity and Lipschitz Continuous Gradient[EB/OL]. (2018-03-17)[2022-10-05]. https://www.xueshufan.com/publication/2793948820.
[18]	HUTZENTHALER M, JENTZEN A, KRUSE T, et al. Multilevel Picard Approximations for High-Dimensional Semilinear Second-Order PDEs with Lipschitz Nonlinearitiest[EB/OL]. (2018-03-17)[2022-10-05].https://arxiv.org/abs/2009.02484.
[19]	BOYD S, PARIKH N, CHU E, et al. Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers[J]. Foundations and Trends® in Machine learning, 2011, 3(1): 1-122. doi: 10.1561/2200000016 URL
[20]	DOMBI J, JONAS T. The Generalized Sigmoid Function and its Connection with Logical Operators[J]. International Journal of Approximate Reasoning, 2022, 143(4): 121-138. doi: 10.1016/j.ijar.2022.01.006 URL
[21]	NAYEF B H, ABDULLAH S N H S, SULAIMAN R, et al. Optimized Leaky ReLU for Handwritten Arabic Character Recognition Using Convolution Neural Networks[J]. Multimedia Tools and Applications, 2022, 81(2): 2065-2094. doi: 10.1007/s11042-021-11593-6 URL
[22]	LECUN Y, BOTTOU L, BENGIO Y, et al. Gradient-Based Learning Applied to Document Recognition[J]. Proceedings of the IEEE, 1998, 86(11): 2278-2324. doi: 10.1109/5.726791 URL
[23]	XIAO Han, RASUL K, VOLLGRAF R. Fashion-MNIST: A Novel Image Dataset for Benchmarking Machine Learning Algorithms[EB/OL]. (2017-09-15)[2022-10-05]. https://arxiv.org/pdf/1708.07747.
[24]	GOLDSBOROUGH P. A Tour of Tensorflow[EB/OL]. (2016-10-01)[2022-10-05]. https://arxiv.org/pdf/1610.01178.
[25]	PASZKE A, GROSS S, MASSA F, et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library[J]. Advances in Neural Information Processing Systems, 2019, 32: 1-12.