
-
Previous Article
Differential equation method based on approximate augmented Lagrangian for nonlinear programming
- JIMO Home
- This Issue
-
Next Article
Analytical modeling of laminated composite plates using Kirchhoff circuit and wave digital filters
Improved SVRG for finite sum structure optimization with application to binary classification
1. | School of Computer Science and Technology, Anhui University of Technology, Maanshan 243032, China |
2. | School of Sciences, Hangzhou Dianzi University, Hangzhou 310018, China |
This paper looks at a stochastic variance reduced gradient (SVRG) method for minimizing the sum of a finite number of smooth convex functions, which has been involved widely in the field of machine learning and data mining. Inspired by the excellent performance of two-point stepsize gradient method in batch learning, in this paper we present an improved SVRG algorithm, named stochastic two-point stepsize gradient method. Under some mild conditions, the proposed method achieves a linear convergence rate $ O(\rho^k) $ for smooth and strongly convex functions, where $ \rho\in(0.68, 1) $. Simulation experiments on several benchmark data sets are reported to demonstrate the performance of the proposed method.
References:
[1] |
J. Barzilai and J. M. Borwein,
Two-point step size gradient methods, IMA J. Numer. Anal., 8 (1988), 141-148.
doi: 10.1093/imanum/8.1.141. |
[2] |
S. Boyd and L. Vandenberghe, Convex Optimization, Cambridge University Press, New York, 2004.
doi: 10.1017/CBO9780511804441.![]() ![]() |
[3] |
Y. Chen, L. Qi and Q. Wang,
Computing extreme eigenvalues of large scale hankel tensors, J. Sci. Comput., 68 (2016), 716-738.
doi: 10.1007/s10915-015-0155-8. |
[4] |
W. Cheng and Y. Dai,
Gradient-based method with active set strategy for l1 optimization, Math. Comp., 87 (2018), 1283-1305.
doi: 10.1090/mcom/3238. |
[5] |
Y. Dai and R. Fletcher,
On the asymptotic behaviour of some new gradient methods, Math. Program., 103 (2005), 541-559.
doi: 10.1007/s10107-004-0516-9. |
[6] |
Y. Dai and R. Fletcher,
Projected Barzilai-Borwein methods for large-scale box-constrained quadratic programming, Numer. Math., 100 (2005), 21-47.
doi: 10.1007/s00211-004-0569-y. |
[7] |
R. Johnson and T. Zhang, Accelerating stochastic gradient descent using predictive variance reduction, in Adv. Neural Inf. Process. Syst., (2013), 315–323. Google Scholar |
[8] |
H. Liu, Z. Liu and X. Dong,
A new adaptive Barzilai and Borwein method for unconstrained optimization, Optim. Lett., 12 (2018), 845-873.
doi: 10.1007/s11590-017-1150-9. |
[9] |
J. Mairal,
Incremental majorization-minimization optimization with application to large-scale machine learning, Optim. Lett., 25 (2015), 829-855.
doi: 10.1137/140957639. |
[10] |
E. Min, Y. Zhao, J. Long, C. Wu, K. Li and J. Yin, SVRG with adaptive epoch size, in Proc. IEEE Int. Joint Conf. Neural Netw., (2017), 2935–2942.
doi: 10.1109/IJCNN.2017.7966219. |
[11] |
Y. Nesterov, Introductory Lectures on Convex Optimization: A Basic Course, Springer, Boston, 2004.
doi: 10.1007/978-1-4419-8853-9. |
[12] |
A. Nitanda, Accelerated stochastic gradient descent for minimizing finite sums, in Proc. Int. Conf. Artif. Intell. Statist., (2016), 195–203. Google Scholar |
[13] |
A. Nitanda, Stochastic proximal gradient descent with acceleration techniques, in Adv. Neural Inf. Process. Syst., (2016), 1574–1582. Google Scholar |
[14] |
W. B. Powell, Approximate Dynamic Programming: Solving the Curses of Dimensionality, John Wiley and Sons, New Jersey, 2007.
doi: 10.1002/9780470182963. |
[15] |
M. Raydan,
On the Barzilai and Borwein choice of steplength for the gradient method, IMA J. Numer. Anal., 13 (1993), 321-326.
doi: 10.1093/imanum/13.3.321. |
[16] |
H. Robbins and S. Monro,
A stochastic approximation method, Ann. Math. Statist., 22 (1951), 400-407.
doi: 10.1214/aoms/1177729586. |
[17] |
N. L. Roux, M. Schmidt and F. Bach, A stochastic gradient method with an exponential convergence rate for finite training sets, in Adv. Neural Inf. Process. Syst., (2012), 2663–2671. Google Scholar |
[18] |
Z. Shen, H. Qian, T. Zhou and T. Mu, Adaptive variance reducing for stochastic gradient descent, in Proc. Int. Joint Conf. Artif. Intell., (2016), 1990–1996. Google Scholar |
[19] |
C. Tan, S. Ma, Y. Dai and Y. Qian, Barzilai-Borwein step size for stochastic gradient descent, in Adv. Neural Inf. Process. Syst., (2016), 685–693. Google Scholar |
[20] |
Z. Wen, C. Yang, X. Liu and Y. Zhang,
Trace-penalty minimization for large-scale eigenspace computation, J. Sci. Comput., 66 (2016), 1175-1203.
doi: 10.1007/s10915-015-0061-0. |
[21] |
L. Xiao and T. Zhang,
A proximal stochastic gradient method with progressive variance reduction, SIAM J. Optim., 24 (2014), 2057-2075.
doi: 10.1137/140961791. |
[22] |
W. Xue, W. Zhang and J. Ren, Online leaming based on stochastic spectral gradient, Comput. Sci., 43 (2016), 47-51. Google Scholar |
[23] |
G. Yu and S. Niu,
Multivariate spectral gradient projection method for nonlinear monotone equations with convex constraints, J. Ind. Manag. Optim., 9 (2013), 117-129.
doi: 10.3934/jimo.2013.9.117. |
[24] |
B. Zhou, L. Gao and Y. Dai,
Gradient methods with adaptive step-sizes, Comput. Optim. Appl., 35 (2006), 69-86.
doi: 10.1007/s10589-006-6446-0. |
[25] |
R. Zhou, X. Shen and L. Niu,
A fast algorithm for nonsmooth penalized clustering, Neurocomputing, 273 (2018), 583-592.
doi: 10.1016/j.neucom.2017.08.048. |
show all references
References:
[1] |
J. Barzilai and J. M. Borwein,
Two-point step size gradient methods, IMA J. Numer. Anal., 8 (1988), 141-148.
doi: 10.1093/imanum/8.1.141. |
[2] |
S. Boyd and L. Vandenberghe, Convex Optimization, Cambridge University Press, New York, 2004.
doi: 10.1017/CBO9780511804441.![]() ![]() |
[3] |
Y. Chen, L. Qi and Q. Wang,
Computing extreme eigenvalues of large scale hankel tensors, J. Sci. Comput., 68 (2016), 716-738.
doi: 10.1007/s10915-015-0155-8. |
[4] |
W. Cheng and Y. Dai,
Gradient-based method with active set strategy for l1 optimization, Math. Comp., 87 (2018), 1283-1305.
doi: 10.1090/mcom/3238. |
[5] |
Y. Dai and R. Fletcher,
On the asymptotic behaviour of some new gradient methods, Math. Program., 103 (2005), 541-559.
doi: 10.1007/s10107-004-0516-9. |
[6] |
Y. Dai and R. Fletcher,
Projected Barzilai-Borwein methods for large-scale box-constrained quadratic programming, Numer. Math., 100 (2005), 21-47.
doi: 10.1007/s00211-004-0569-y. |
[7] |
R. Johnson and T. Zhang, Accelerating stochastic gradient descent using predictive variance reduction, in Adv. Neural Inf. Process. Syst., (2013), 315–323. Google Scholar |
[8] |
H. Liu, Z. Liu and X. Dong,
A new adaptive Barzilai and Borwein method for unconstrained optimization, Optim. Lett., 12 (2018), 845-873.
doi: 10.1007/s11590-017-1150-9. |
[9] |
J. Mairal,
Incremental majorization-minimization optimization with application to large-scale machine learning, Optim. Lett., 25 (2015), 829-855.
doi: 10.1137/140957639. |
[10] |
E. Min, Y. Zhao, J. Long, C. Wu, K. Li and J. Yin, SVRG with adaptive epoch size, in Proc. IEEE Int. Joint Conf. Neural Netw., (2017), 2935–2942.
doi: 10.1109/IJCNN.2017.7966219. |
[11] |
Y. Nesterov, Introductory Lectures on Convex Optimization: A Basic Course, Springer, Boston, 2004.
doi: 10.1007/978-1-4419-8853-9. |
[12] |
A. Nitanda, Accelerated stochastic gradient descent for minimizing finite sums, in Proc. Int. Conf. Artif. Intell. Statist., (2016), 195–203. Google Scholar |
[13] |
A. Nitanda, Stochastic proximal gradient descent with acceleration techniques, in Adv. Neural Inf. Process. Syst., (2016), 1574–1582. Google Scholar |
[14] |
W. B. Powell, Approximate Dynamic Programming: Solving the Curses of Dimensionality, John Wiley and Sons, New Jersey, 2007.
doi: 10.1002/9780470182963. |
[15] |
M. Raydan,
On the Barzilai and Borwein choice of steplength for the gradient method, IMA J. Numer. Anal., 13 (1993), 321-326.
doi: 10.1093/imanum/13.3.321. |
[16] |
H. Robbins and S. Monro,
A stochastic approximation method, Ann. Math. Statist., 22 (1951), 400-407.
doi: 10.1214/aoms/1177729586. |
[17] |
N. L. Roux, M. Schmidt and F. Bach, A stochastic gradient method with an exponential convergence rate for finite training sets, in Adv. Neural Inf. Process. Syst., (2012), 2663–2671. Google Scholar |
[18] |
Z. Shen, H. Qian, T. Zhou and T. Mu, Adaptive variance reducing for stochastic gradient descent, in Proc. Int. Joint Conf. Artif. Intell., (2016), 1990–1996. Google Scholar |
[19] |
C. Tan, S. Ma, Y. Dai and Y. Qian, Barzilai-Borwein step size for stochastic gradient descent, in Adv. Neural Inf. Process. Syst., (2016), 685–693. Google Scholar |
[20] |
Z. Wen, C. Yang, X. Liu and Y. Zhang,
Trace-penalty minimization for large-scale eigenspace computation, J. Sci. Comput., 66 (2016), 1175-1203.
doi: 10.1007/s10915-015-0061-0. |
[21] |
L. Xiao and T. Zhang,
A proximal stochastic gradient method with progressive variance reduction, SIAM J. Optim., 24 (2014), 2057-2075.
doi: 10.1137/140961791. |
[22] |
W. Xue, W. Zhang and J. Ren, Online leaming based on stochastic spectral gradient, Comput. Sci., 43 (2016), 47-51. Google Scholar |
[23] |
G. Yu and S. Niu,
Multivariate spectral gradient projection method for nonlinear monotone equations with convex constraints, J. Ind. Manag. Optim., 9 (2013), 117-129.
doi: 10.3934/jimo.2013.9.117. |
[24] |
B. Zhou, L. Gao and Y. Dai,
Gradient methods with adaptive step-sizes, Comput. Optim. Appl., 35 (2006), 69-86.
doi: 10.1007/s10589-006-6446-0. |
[25] |
R. Zhou, X. Shen and L. Niu,
A fast algorithm for nonsmooth penalized clustering, Neurocomputing, 273 (2018), 583-592.
doi: 10.1016/j.neucom.2017.08.048. |









Data set | | | |
a1a | 1605 | 30956 | 123 |
a9a | 32561 | 16281 | 123 |
w1a | 2477 | 47272 | 300 |
w8a | 49749 | 14951 | 300 |
Data set | | | |
a1a | 1605 | 30956 | 123 |
a9a | 32561 | 16281 | 123 |
w1a | 2477 | 47272 | 300 |
w8a | 49749 | 14951 | 300 |
[1] |
Ardeshir Ahmadi, Hamed Davari-Ardakani. A multistage stochastic programming framework for cardinality constrained portfolio optimization. Numerical Algebra, Control & Optimization, 2017, 7 (3) : 359-377. doi: 10.3934/naco.2017023 |
[2] |
Xingchun Wang, Yongjin Wang. Variance-optimal hedging for target volatility options. Journal of Industrial & Management Optimization, 2014, 10 (1) : 207-218. doi: 10.3934/jimo.2014.10.207 |
[3] |
Vakhtang Putkaradze, Stuart Rogers. Numerical simulations of a rolling ball robot actuated by internal point masses. Numerical Algebra, Control & Optimization, 2021, 11 (2) : 143-207. doi: 10.3934/naco.2020021 |
[4] |
Huy Dinh, Harbir Antil, Yanlai Chen, Elena Cherkaev, Akil Narayan. Model reduction for fractional elliptic problems using Kato's formula. Mathematical Control & Related Fields, 2021 doi: 10.3934/mcrf.2021004 |
[5] |
Eduardo Casas, Christian Clason, Arnd Rösch. Preface special issue on system modeling and optimization. Mathematical Control & Related Fields, 2021 doi: 10.3934/mcrf.2021008 |
[6] |
Peter Benner, Jens Saak, M. Monir Uddin. Balancing based model reduction for structured index-2 unstable descriptor systems with application to flow control. Numerical Algebra, Control & Optimization, 2016, 6 (1) : 1-20. doi: 10.3934/naco.2016.6.1 |
[7] |
Luke Finlay, Vladimir Gaitsgory, Ivan Lebedev. Linear programming solutions of periodic optimization problems: approximation of the optimal control. Journal of Industrial & Management Optimization, 2007, 3 (2) : 399-413. doi: 10.3934/jimo.2007.3.399 |
[8] |
M. Mahalingam, Parag Ravindran, U. Saravanan, K. R. Rajagopal. Two boundary value problems involving an inhomogeneous viscoelastic solid. Discrete & Continuous Dynamical Systems - S, 2017, 10 (6) : 1351-1373. doi: 10.3934/dcdss.2017072 |
[9] |
Marcelo Messias. Periodic perturbation of quadratic systems with two infinite heteroclinic cycles. Discrete & Continuous Dynamical Systems - A, 2012, 32 (5) : 1881-1899. doi: 10.3934/dcds.2012.32.1881 |
[10] |
Alina Chertock, Alexander Kurganov, Mária Lukáčová-Medvi${\rm{\check{d}}}$ová, Șeyma Nur Özcan. An asymptotic preserving scheme for kinetic chemotaxis models in two space dimensions. Kinetic & Related Models, 2019, 12 (1) : 195-216. doi: 10.3934/krm.2019009 |
[11] |
Olena Naboka. On synchronization of oscillations of two coupled Berger plates with nonlinear interior damping. Communications on Pure & Applied Analysis, 2009, 8 (6) : 1933-1956. doi: 10.3934/cpaa.2009.8.1933 |
[12] |
Zhigang Pan, Chanh Kieu, Quan Wang. Hopf bifurcations and transitions of two-dimensional Quasi-Geostrophic flows. Communications on Pure & Applied Analysis, , () : -. doi: 10.3934/cpaa.2021025 |
[13] |
J. Frédéric Bonnans, Justina Gianatti, Francisco J. Silva. On the convergence of the Sakawa-Shindo algorithm in stochastic control. Mathematical Control & Related Fields, 2016, 6 (3) : 391-406. doi: 10.3934/mcrf.2016008 |
[14] |
Diana Keller. Optimal control of a linear stochastic Schrödinger equation. Conference Publications, 2013, 2013 (special) : 437-446. doi: 10.3934/proc.2013.2013.437 |
[15] |
Seung-Yeal Ha, Dongnam Ko, Chanho Min, Xiongtao Zhang. Emergent collective behaviors of stochastic kuramoto oscillators. Discrete & Continuous Dynamical Systems - B, 2020, 25 (3) : 1059-1081. doi: 10.3934/dcdsb.2019208 |
[16] |
María J. Garrido-Atienza, Bohdan Maslowski, Jana Šnupárková. Semilinear stochastic equations with bilinear fractional noise. Discrete & Continuous Dynamical Systems - B, 2016, 21 (9) : 3075-3094. doi: 10.3934/dcdsb.2016088 |
[17] |
Hong Seng Sim, Wah June Leong, Chuei Yee Chen, Siti Nur Iqmal Ibrahim. Multi-step spectral gradient methods with modified weak secant relation for large scale unconstrained optimization. Numerical Algebra, Control & Optimization, 2018, 8 (3) : 377-387. doi: 10.3934/naco.2018024 |
[18] |
Abdulrazzaq T. Abed, Azzam S. Y. Aladool. Applying particle swarm optimization based on Padé approximant to solve ordinary differential equation. Numerical Algebra, Control & Optimization, 2021 doi: 10.3934/naco.2021008 |
[19] |
Mohsen Abdolhosseinzadeh, Mir Mohammad Alipour. Design of experiment for tuning parameters of an ant colony optimization method for the constrained shortest Hamiltonian path problem in the grid networks. Numerical Algebra, Control & Optimization, 2021, 11 (2) : 321-332. doi: 10.3934/naco.2020028 |
[20] |
Reza Lotfi, Yahia Zare Mehrjerdi, Mir Saman Pishvaee, Ahmad Sadeghieh, Gerhard-Wilhelm Weber. A robust optimization model for sustainable and resilient closed-loop supply chain network design considering conditional value at risk. Numerical Algebra, Control & Optimization, 2021, 11 (2) : 221-253. doi: 10.3934/naco.2020023 |
2019 Impact Factor: 1.366
Tools
Metrics
Other articles
by authors
[Back to Top]