# American Institute of Mathematical Sciences

doi: 10.3934/jimo.2021084
Online First

Online First articles are published articles within a journal that have not yet been assigned to a formal issue. This means they do not yet have a volume number, issue number, or page numbers assigned to them, however, they can still be found and cited using their DOI (Digital Object Identifier). Online First publication benefits the research community by making new scientific discoveries known as quickly as possible.

Readers can access Online First articles via the “Online First” tab for the selected journal.

## Variable metric proximal stochastic variance reduced gradient methods for nonconvex nonsmooth optimization

 1 School of Artificial Intelligence, Hebei University of Technology, Tianjin 300401, China 2 Institute of Mathematics, Hebei University of Technology, Tianjin 300401, China 3 LSEC, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China 4 School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China 5 School of Business, National University of Singapore, Singapore 119245, Singapore

* Corresponding author: Xin-Wei Liu

Received  January 2021 Revised  February 2021 Early access April 2021

We study the problem of minimizing the sum of two functions. The first function is the average of a large number of nonconvex component functions and the second function is a convex (possibly nonsmooth) function that admits a simple proximal mapping. With a diagonal Barzilai-Borwein stepsize for updating the metric, we propose a variable metric proximal stochastic variance reduced gradient method in the mini-batch setting, named VM-SVRG. It is proved that VM-SVRG converges sublinearly to a stationary point in expectation. We further suggest a variant of VM-SVRG to achieve linear convergence rate in expectation for nonconvex problems satisfying the proximal Polyak-Łojasiewicz inequality. The complexity of VM-SVRG is lower than that of the proximal gradient method and proximal stochastic gradient method, and is the same as the proximal stochastic variance reduced gradient method. Numerical experiments are conducted on standard data sets. Comparisons with other advanced proximal stochastic gradient methods show the efficiency of the proposed method.

Citation: Tengteng Yu, Xin-Wei Liu, Yu-Hong Dai, Jie Sun. Variable metric proximal stochastic variance reduced gradient methods for nonconvex nonsmooth optimization. Journal of Industrial and Management Optimization, doi: 10.3934/jimo.2021084
##### References:

show all references

##### References:
Comparison of VM-SVRG with different mini-batch sizes
Comparison of VM-SVRG and other modern methods for solving LR problem
Comparison of VM-SVRG with different mini-batch sizes
Comparison of VM-SVRG and mS2GD for solving SVM problem
 Algorithm 2 PL-VM-SVRG($w^0,m,b,U_0$) Input: Number of inner iterations $m$, initial point $\tilde{w}_0=w^0 \in \mathbb{R}^d$, initial metric $U_0$, mini-batch size $b\in \{1,2,\ldots,n\}$; 1: for $s=0,1,\ldots,S-1$ do2:    $w^{s+1} =$ VM-SVRG($w^s,m,b,U_0$).3: end forOutput: $w^S$.
 Algorithm 2 PL-VM-SVRG($w^0,m,b,U_0$) Input: Number of inner iterations $m$, initial point $\tilde{w}_0=w^0 \in \mathbb{R}^d$, initial metric $U_0$, mini-batch size $b\in \{1,2,\ldots,n\}$; 1: for $s=0,1,\ldots,S-1$ do2:    $w^{s+1} =$ VM-SVRG($w^s,m,b,U_0$).3: end forOutput: $w^S$.
Comparison of the SFO and PO complexity.
 Complexity Prox-GD Prox-SGD Prox-SVRG VM-SVRG SFO $\mathcal{O}(n/\epsilon)$ $\mathcal{O}(1/\epsilon^2)$ $\mathcal{O}(n+(n^{2/3}/\epsilon))$ $\mathcal{O}(n+(n^{2/3}/\epsilon))$ PO $\mathcal{O}(1/\epsilon)$ $\mathcal{O}(1/\epsilon)$ $\mathcal{O}(1/\epsilon)$ $\mathcal{O}(1/\epsilon)$ SFO(PL) $\mathcal{O}(n\kappa\log(1/\epsilon))$ $\mathcal{O}(1/\epsilon^2)$ $\mathcal{O}((n+\kappa n^{2/3})\log(1/\epsilon))$ $\mathcal{O}((n+\kappa n^{2/3})\log(1/\epsilon))$ PO(PL) $\mathcal{O}(\kappa \log(1/\epsilon))$ $\mathcal{O}(1/\epsilon)$ $\mathcal{O}(\kappa \log(1/\epsilon))$ $\mathcal{O}(\kappa \log(1/\epsilon))$
 Complexity Prox-GD Prox-SGD Prox-SVRG VM-SVRG SFO $\mathcal{O}(n/\epsilon)$ $\mathcal{O}(1/\epsilon^2)$ $\mathcal{O}(n+(n^{2/3}/\epsilon))$ $\mathcal{O}(n+(n^{2/3}/\epsilon))$ PO $\mathcal{O}(1/\epsilon)$ $\mathcal{O}(1/\epsilon)$ $\mathcal{O}(1/\epsilon)$ $\mathcal{O}(1/\epsilon)$ SFO(PL) $\mathcal{O}(n\kappa\log(1/\epsilon))$ $\mathcal{O}(1/\epsilon^2)$ $\mathcal{O}((n+\kappa n^{2/3})\log(1/\epsilon))$ $\mathcal{O}((n+\kappa n^{2/3})\log(1/\epsilon))$ PO(PL) $\mathcal{O}(\kappa \log(1/\epsilon))$ $\mathcal{O}(1/\epsilon)$ $\mathcal{O}(\kappa \log(1/\epsilon))$ $\mathcal{O}(\kappa \log(1/\epsilon))$
The information of data sets
 Data sets $n$ $d$ ijcnn1 49, 990 22 rcv1 20, 242 47, 236 real-sim 72, 309 20, 958 covtype 581, 012 54
 Data sets $n$ $d$ ijcnn1 49, 990 22 rcv1 20, 242 47, 236 real-sim 72, 309 20, 958 covtype 581, 012 54
Best choices of parameters for the methods
 Data sets mS2GD$(m,\eta)$ mS2GD-BB mSARAH$(m,\eta)$ mSARAH-BB VM-SVRG ijcnn1 $(0.02n,\frac{4}{L})$ $0.04n$ $(0.05n,\frac{1.8}{L})$ $0.04n$ $0.04n$ rcv1 $(0.1n,\frac{4}{L})$ $0.11n$ $(0.1n,\frac{3.5}{L})$ $0.09n$ $0.25n$ real-sim $(0.12n,\frac{0.6}{L})$ $0.15n$ $(0.07n,\frac{2}{L})$ $0.06$ $0.11n$ covtype $(0.07n,\frac{21}{L})$ $0.03n$ $(0.07n,\frac{25}{L})$ $0.008n$ $0.01n$
 Data sets mS2GD$(m,\eta)$ mS2GD-BB mSARAH$(m,\eta)$ mSARAH-BB VM-SVRG ijcnn1 $(0.02n,\frac{4}{L})$ $0.04n$ $(0.05n,\frac{1.8}{L})$ $0.04n$ $0.04n$ rcv1 $(0.1n,\frac{4}{L})$ $0.11n$ $(0.1n,\frac{3.5}{L})$ $0.09n$ $0.25n$ real-sim $(0.12n,\frac{0.6}{L})$ $0.15n$ $(0.07n,\frac{2}{L})$ $0.06$ $0.11n$ covtype $(0.07n,\frac{21}{L})$ $0.03n$ $(0.07n,\frac{25}{L})$ $0.008n$ $0.01n$
 [1] Yan Gu, Nobuo Yamashita. Alternating direction method of multipliers with variable metric indefinite proximal terms for convex optimization. Numerical Algebra, Control and Optimization, 2020, 10 (4) : 487-510. doi: 10.3934/naco.2020047 [2] Jin-Zan Liu, Xin-Wei Liu. A dual Bregman proximal gradient method for relatively-strongly convex optimization. Numerical Algebra, Control and Optimization, 2021  doi: 10.3934/naco.2021028 [3] Igor Griva, Roman A. Polyak. Proximal point nonlinear rescaling method for convex optimization. Numerical Algebra, Control and Optimization, 2011, 1 (2) : 283-299. doi: 10.3934/naco.2011.1.283 [4] Foxiang Liu, Lingling Xu, Yuehong Sun, Deren Han. A proximal alternating direction method for multi-block coupled convex optimization. Journal of Industrial and Management Optimization, 2019, 15 (2) : 723-737. doi: 10.3934/jimo.2018067 [5] Jueyou Li, Guoquan Li, Zhiyou Wu, Changzhi Wu, Xiangyu Wang, Jae-Myung Lee, Kwang-Hyo Jung. Incremental gradient-free method for nonsmooth distributed optimization. Journal of Industrial and Management Optimization, 2017, 13 (4) : 1841-1857. doi: 10.3934/jimo.2017021 [6] Sanming Liu, Zhijie Wang, Chongyang Liu. On convergence analysis of dual proximal-gradient methods with approximate gradient for a class of nonsmooth convex minimization problems. Journal of Industrial and Management Optimization, 2016, 12 (1) : 389-402. doi: 10.3934/jimo.2016.12.389 [7] Alain Haraux. Some applications of the Łojasiewicz gradient inequality. Communications on Pure and Applied Analysis, 2012, 11 (6) : 2417-2427. doi: 10.3934/cpaa.2012.11.2417 [8] Jie Shen, Jian Lv, Fang-Fang Guo, Ya-Li Gao, Rui Zhao. A new proximal chebychev center cutting plane algorithm for nonsmooth optimization and its convergence. Journal of Industrial and Management Optimization, 2018, 14 (3) : 1143-1155. doi: 10.3934/jimo.2018003 [9] Chunming Tang, Jinbao Jian, Guoyin Li. A proximal-projection partial bundle method for convex constrained minimax problems. Journal of Industrial and Management Optimization, 2019, 15 (2) : 757-774. doi: 10.3934/jimo.2018069 [10] Tim Hoheisel, Maxime Laborde, Adam Oberman. A regularization interpretation of the proximal point method for weakly convex functions. Journal of Dynamics and Games, 2020, 7 (1) : 79-96. doi: 10.3934/jdg.2020005 [11] Hai Huyen Dam, Siow Yong Low, Sven Nordholm. Two-level optimization approach with accelerated proximal gradient for objective measures in sparse speech reconstruction. Journal of Industrial and Management Optimization, 2021  doi: 10.3934/jimo.2021131 [12] Nobuko Sagara, Masao Fukushima. trust region method for nonsmooth convex optimization. Journal of Industrial and Management Optimization, 2005, 1 (2) : 171-180. doi: 10.3934/jimo.2005.1.171 [13] Yan Gu, Nobuo Yamashita. A proximal ADMM with the Broyden family for convex optimization problems. Journal of Industrial and Management Optimization, 2021, 17 (5) : 2715-2732. doi: 10.3934/jimo.2020091 [14] Sanming Liu, Zhijie Wang, Chongyang Liu. Proximal iterative Gaussian smoothing algorithm for a class of nonsmooth convex minimization problems. Numerical Algebra, Control and Optimization, 2015, 5 (1) : 79-89. doi: 10.3934/naco.2015.5.79 [15] Hssaine Boualam, Ahmed Roubi. Dual algorithms based on the proximal bundle method for solving convex minimax fractional programs. Journal of Industrial and Management Optimization, 2019, 15 (4) : 1897-1920. doi: 10.3934/jimo.2018128 [16] Kangkang Deng, Zheng Peng, Jianli Chen. Sparse probabilistic Boolean network problems: A partial proximal-type operator splitting method. Journal of Industrial and Management Optimization, 2019, 15 (4) : 1881-1896. doi: 10.3934/jimo.2018127 [17] Min Li. A three term Polak-Ribière-Polyak conjugate gradient method close to the memoryless BFGS quasi-Newton method. Journal of Industrial and Management Optimization, 2020, 16 (1) : 245-260. doi: 10.3934/jimo.2018149 [18] Weijun Zhou, Youhua Zhou. On the strong convergence of a modified Hestenes-Stiefel method for nonconvex optimization. Journal of Industrial and Management Optimization, 2013, 9 (4) : 893-899. doi: 10.3934/jimo.2013.9.893 [19] Hui Gao, Jian Lv, Xiaoliang Wang, Liping Pang. An alternating linearization bundle method for a class of nonconvex optimization problem with inexact information. Journal of Industrial and Management Optimization, 2021, 17 (2) : 805-825. doi: 10.3934/jimo.2019135 [20] Qingsong Duan, Mengwei Xu, Yue Lu, Liwei Zhang. A smoothing augmented Lagrangian method for nonconvex, nonsmooth constrained programs and its applications to bilevel problems. Journal of Industrial and Management Optimization, 2019, 15 (3) : 1241-1261. doi: 10.3934/jimo.2018094

2020 Impact Factor: 1.801