• Previous Article
    Differential equation method based on approximate augmented Lagrangian for nonlinear programming
  • JIMO Home
  • This Issue
  • Next Article
    Analytical modeling of laminated composite plates using Kirchhoff circuit and wave digital filters
September  2020, 16(5): 2253-2266. doi: 10.3934/jimo.2019052

Improved SVRG for finite sum structure optimization with application to binary classification

1. 

School of Computer Science and Technology, Anhui University of Technology, Maanshan 243032, China

2. 

School of Sciences, Hangzhou Dianzi University, Hangzhou 310018, China

* Corresponding author: Wei Xue (E-mail: cswxue@ahut.edu.cn)

Received  May 2018 Revised  November 2018 Published  May 2019

This paper looks at a stochastic variance reduced gradient (SVRG) method for minimizing the sum of a finite number of smooth convex functions, which has been involved widely in the field of machine learning and data mining. Inspired by the excellent performance of two-point stepsize gradient method in batch learning, in this paper we present an improved SVRG algorithm, named stochastic two-point stepsize gradient method. Under some mild conditions, the proposed method achieves a linear convergence rate $ O(\rho^k) $ for smooth and strongly convex functions, where $ \rho\in(0.68, 1) $. Simulation experiments on several benchmark data sets are reported to demonstrate the performance of the proposed method.

Citation: Guangmei Shao, Wei Xue, Gaohang Yu, Xiao Zheng. Improved SVRG for finite sum structure optimization with application to binary classification. Journal of Industrial & Management Optimization, 2020, 16 (5) : 2253-2266. doi: 10.3934/jimo.2019052
References:
[1]

J. Barzilai and J. M. Borwein, Two-point step size gradient methods, IMA J. Numer. Anal., 8 (1988), 141-148.  doi: 10.1093/imanum/8.1.141.  Google Scholar

[2] S. Boyd and L. Vandenberghe, Convex Optimization, Cambridge University Press, New York, 2004.  doi: 10.1017/CBO9780511804441.  Google Scholar
[3]

Y. ChenL. Qi and Q. Wang, Computing extreme eigenvalues of large scale hankel tensors, J. Sci. Comput., 68 (2016), 716-738.  doi: 10.1007/s10915-015-0155-8.  Google Scholar

[4]

W. Cheng and Y. Dai, Gradient-based method with active set strategy for l1 optimization, Math. Comp., 87 (2018), 1283-1305.  doi: 10.1090/mcom/3238.  Google Scholar

[5]

Y. Dai and R. Fletcher, On the asymptotic behaviour of some new gradient methods, Math. Program., 103 (2005), 541-559.  doi: 10.1007/s10107-004-0516-9.  Google Scholar

[6]

Y. Dai and R. Fletcher, Projected Barzilai-Borwein methods for large-scale box-constrained quadratic programming, Numer. Math., 100 (2005), 21-47.  doi: 10.1007/s00211-004-0569-y.  Google Scholar

[7]

R. Johnson and T. Zhang, Accelerating stochastic gradient descent using predictive variance reduction, in Adv. Neural Inf. Process. Syst., (2013), 315–323. Google Scholar

[8]

H. LiuZ. Liu and X. Dong, A new adaptive Barzilai and Borwein method for unconstrained optimization, Optim. Lett., 12 (2018), 845-873.  doi: 10.1007/s11590-017-1150-9.  Google Scholar

[9]

J. Mairal, Incremental majorization-minimization optimization with application to large-scale machine learning, Optim. Lett., 25 (2015), 829-855.  doi: 10.1137/140957639.  Google Scholar

[10]

E. Min, Y. Zhao, J. Long, C. Wu, K. Li and J. Yin, SVRG with adaptive epoch size, in Proc. IEEE Int. Joint Conf. Neural Netw., (2017), 2935–2942. doi: 10.1109/IJCNN.2017.7966219.  Google Scholar

[11]

Y. Nesterov, Introductory Lectures on Convex Optimization: A Basic Course, Springer, Boston, 2004. doi: 10.1007/978-1-4419-8853-9.  Google Scholar

[12]

A. Nitanda, Accelerated stochastic gradient descent for minimizing finite sums, in Proc. Int. Conf. Artif. Intell. Statist., (2016), 195–203. Google Scholar

[13]

A. Nitanda, Stochastic proximal gradient descent with acceleration techniques, in Adv. Neural Inf. Process. Syst., (2016), 1574–1582. Google Scholar

[14]

W. B. Powell, Approximate Dynamic Programming: Solving the Curses of Dimensionality, John Wiley and Sons, New Jersey, 2007. doi: 10.1002/9780470182963.  Google Scholar

[15]

M. Raydan, On the Barzilai and Borwein choice of steplength for the gradient method, IMA J. Numer. Anal., 13 (1993), 321-326.  doi: 10.1093/imanum/13.3.321.  Google Scholar

[16]

H. Robbins and S. Monro, A stochastic approximation method, Ann. Math. Statist., 22 (1951), 400-407.  doi: 10.1214/aoms/1177729586.  Google Scholar

[17]

N. L. Roux, M. Schmidt and F. Bach, A stochastic gradient method with an exponential convergence rate for finite training sets, in Adv. Neural Inf. Process. Syst., (2012), 2663–2671. Google Scholar

[18]

Z. Shen, H. Qian, T. Zhou and T. Mu, Adaptive variance reducing for stochastic gradient descent, in Proc. Int. Joint Conf. Artif. Intell., (2016), 1990–1996. Google Scholar

[19]

C. Tan, S. Ma, Y. Dai and Y. Qian, Barzilai-Borwein step size for stochastic gradient descent, in Adv. Neural Inf. Process. Syst., (2016), 685–693. Google Scholar

[20]

Z. WenC. YangX. Liu and Y. Zhang, Trace-penalty minimization for large-scale eigenspace computation, J. Sci. Comput., 66 (2016), 1175-1203.  doi: 10.1007/s10915-015-0061-0.  Google Scholar

[21]

L. Xiao and T. Zhang, A proximal stochastic gradient method with progressive variance reduction, SIAM J. Optim., 24 (2014), 2057-2075.  doi: 10.1137/140961791.  Google Scholar

[22]

W. XueW. Zhang and J. Ren, Online leaming based on stochastic spectral gradient, Comput. Sci., 43 (2016), 47-51.   Google Scholar

[23]

G. Yu and S. Niu, Multivariate spectral gradient projection method for nonlinear monotone equations with convex constraints, J. Ind. Manag. Optim., 9 (2013), 117-129.  doi: 10.3934/jimo.2013.9.117.  Google Scholar

[24]

B. ZhouL. Gao and Y. Dai, Gradient methods with adaptive step-sizes, Comput. Optim. Appl., 35 (2006), 69-86.  doi: 10.1007/s10589-006-6446-0.  Google Scholar

[25]

R. ZhouX. Shen and L. Niu, A fast algorithm for nonsmooth penalized clustering, Neurocomputing, 273 (2018), 583-592.  doi: 10.1016/j.neucom.2017.08.048.  Google Scholar

show all references

References:
[1]

J. Barzilai and J. M. Borwein, Two-point step size gradient methods, IMA J. Numer. Anal., 8 (1988), 141-148.  doi: 10.1093/imanum/8.1.141.  Google Scholar

[2] S. Boyd and L. Vandenberghe, Convex Optimization, Cambridge University Press, New York, 2004.  doi: 10.1017/CBO9780511804441.  Google Scholar
[3]

Y. ChenL. Qi and Q. Wang, Computing extreme eigenvalues of large scale hankel tensors, J. Sci. Comput., 68 (2016), 716-738.  doi: 10.1007/s10915-015-0155-8.  Google Scholar

[4]

W. Cheng and Y. Dai, Gradient-based method with active set strategy for l1 optimization, Math. Comp., 87 (2018), 1283-1305.  doi: 10.1090/mcom/3238.  Google Scholar

[5]

Y. Dai and R. Fletcher, On the asymptotic behaviour of some new gradient methods, Math. Program., 103 (2005), 541-559.  doi: 10.1007/s10107-004-0516-9.  Google Scholar

[6]

Y. Dai and R. Fletcher, Projected Barzilai-Borwein methods for large-scale box-constrained quadratic programming, Numer. Math., 100 (2005), 21-47.  doi: 10.1007/s00211-004-0569-y.  Google Scholar

[7]

R. Johnson and T. Zhang, Accelerating stochastic gradient descent using predictive variance reduction, in Adv. Neural Inf. Process. Syst., (2013), 315–323. Google Scholar

[8]

H. LiuZ. Liu and X. Dong, A new adaptive Barzilai and Borwein method for unconstrained optimization, Optim. Lett., 12 (2018), 845-873.  doi: 10.1007/s11590-017-1150-9.  Google Scholar

[9]

J. Mairal, Incremental majorization-minimization optimization with application to large-scale machine learning, Optim. Lett., 25 (2015), 829-855.  doi: 10.1137/140957639.  Google Scholar

[10]

E. Min, Y. Zhao, J. Long, C. Wu, K. Li and J. Yin, SVRG with adaptive epoch size, in Proc. IEEE Int. Joint Conf. Neural Netw., (2017), 2935–2942. doi: 10.1109/IJCNN.2017.7966219.  Google Scholar

[11]

Y. Nesterov, Introductory Lectures on Convex Optimization: A Basic Course, Springer, Boston, 2004. doi: 10.1007/978-1-4419-8853-9.  Google Scholar

[12]

A. Nitanda, Accelerated stochastic gradient descent for minimizing finite sums, in Proc. Int. Conf. Artif. Intell. Statist., (2016), 195–203. Google Scholar

[13]

A. Nitanda, Stochastic proximal gradient descent with acceleration techniques, in Adv. Neural Inf. Process. Syst., (2016), 1574–1582. Google Scholar

[14]

W. B. Powell, Approximate Dynamic Programming: Solving the Curses of Dimensionality, John Wiley and Sons, New Jersey, 2007. doi: 10.1002/9780470182963.  Google Scholar

[15]

M. Raydan, On the Barzilai and Borwein choice of steplength for the gradient method, IMA J. Numer. Anal., 13 (1993), 321-326.  doi: 10.1093/imanum/13.3.321.  Google Scholar

[16]

H. Robbins and S. Monro, A stochastic approximation method, Ann. Math. Statist., 22 (1951), 400-407.  doi: 10.1214/aoms/1177729586.  Google Scholar

[17]

N. L. Roux, M. Schmidt and F. Bach, A stochastic gradient method with an exponential convergence rate for finite training sets, in Adv. Neural Inf. Process. Syst., (2012), 2663–2671. Google Scholar

[18]

Z. Shen, H. Qian, T. Zhou and T. Mu, Adaptive variance reducing for stochastic gradient descent, in Proc. Int. Joint Conf. Artif. Intell., (2016), 1990–1996. Google Scholar

[19]

C. Tan, S. Ma, Y. Dai and Y. Qian, Barzilai-Borwein step size for stochastic gradient descent, in Adv. Neural Inf. Process. Syst., (2016), 685–693. Google Scholar

[20]

Z. WenC. YangX. Liu and Y. Zhang, Trace-penalty minimization for large-scale eigenspace computation, J. Sci. Comput., 66 (2016), 1175-1203.  doi: 10.1007/s10915-015-0061-0.  Google Scholar

[21]

L. Xiao and T. Zhang, A proximal stochastic gradient method with progressive variance reduction, SIAM J. Optim., 24 (2014), 2057-2075.  doi: 10.1137/140961791.  Google Scholar

[22]

W. XueW. Zhang and J. Ren, Online leaming based on stochastic spectral gradient, Comput. Sci., 43 (2016), 47-51.   Google Scholar

[23]

G. Yu and S. Niu, Multivariate spectral gradient projection method for nonlinear monotone equations with convex constraints, J. Ind. Manag. Optim., 9 (2013), 117-129.  doi: 10.3934/jimo.2013.9.117.  Google Scholar

[24]

B. ZhouL. Gao and Y. Dai, Gradient methods with adaptive step-sizes, Comput. Optim. Appl., 35 (2006), 69-86.  doi: 10.1007/s10589-006-6446-0.  Google Scholar

[25]

R. ZhouX. Shen and L. Niu, A fast algorithm for nonsmooth penalized clustering, Neurocomputing, 273 (2018), 583-592.  doi: 10.1016/j.neucom.2017.08.048.  Google Scholar

Figure 1.  Objective loss and test classification accuracy versus epochs on data set "a9a" (best viewed in color)
Figure 2.  Objective loss and test classification accuracy versus epochs on data set "w8a"
Figure 3.  Evolutions of objective loss and test classification accuracy w.r.t. epochs for Eq. (14) on data set "a9a"
Figure 4.  Evolutions of objective loss and test classification accuracy w.r.t. epochs for Eq. (14) on data set "w8a"
Figure 5.  Evolutions of objective loss and test classification accuracy w.r.t. epochs for Eq. (14) on data set "a1a"
Figure 6.  Evolutions of objective loss and test classification accuracy w.r.t. epochs for Eq. (14) on data set "w1a"
Figure 7.  Evolutions of objective loss and test classification accuracy versus epochs on data set "a9a" with different $ \eta_0 $
Figure 8.  Evolutions of objective loss and test classification accuracy versus epochs on data set "w8a" with different $ \eta_0 $
Figure 9.  Comparison results between SVM and STSG on test classification accuracy on four test data sets
Table 1.  Details of data sets
Data set $\sharp$ of training samples $\sharp$ of test samples $\sharp$ of dimension
a1a160530956123
a9a3256116281123
w1a247747272300
w8a4974914951300
Data set $\sharp$ of training samples $\sharp$ of test samples $\sharp$ of dimension
a1a160530956123
a9a3256116281123
w1a247747272300
w8a4974914951300
[1]

Ardeshir Ahmadi, Hamed Davari-Ardakani. A multistage stochastic programming framework for cardinality constrained portfolio optimization. Numerical Algebra, Control & Optimization, 2017, 7 (3) : 359-377. doi: 10.3934/naco.2017023

[2]

Xingchun Wang, Yongjin Wang. Variance-optimal hedging for target volatility options. Journal of Industrial & Management Optimization, 2014, 10 (1) : 207-218. doi: 10.3934/jimo.2014.10.207

[3]

Vakhtang Putkaradze, Stuart Rogers. Numerical simulations of a rolling ball robot actuated by internal point masses. Numerical Algebra, Control & Optimization, 2021, 11 (2) : 143-207. doi: 10.3934/naco.2020021

[4]

Huy Dinh, Harbir Antil, Yanlai Chen, Elena Cherkaev, Akil Narayan. Model reduction for fractional elliptic problems using Kato's formula. Mathematical Control & Related Fields, 2021  doi: 10.3934/mcrf.2021004

[5]

Eduardo Casas, Christian Clason, Arnd Rösch. Preface special issue on system modeling and optimization. Mathematical Control & Related Fields, 2021  doi: 10.3934/mcrf.2021008

[6]

Peter Benner, Jens Saak, M. Monir Uddin. Balancing based model reduction for structured index-2 unstable descriptor systems with application to flow control. Numerical Algebra, Control & Optimization, 2016, 6 (1) : 1-20. doi: 10.3934/naco.2016.6.1

[7]

Luke Finlay, Vladimir Gaitsgory, Ivan Lebedev. Linear programming solutions of periodic optimization problems: approximation of the optimal control. Journal of Industrial & Management Optimization, 2007, 3 (2) : 399-413. doi: 10.3934/jimo.2007.3.399

[8]

M. Mahalingam, Parag Ravindran, U. Saravanan, K. R. Rajagopal. Two boundary value problems involving an inhomogeneous viscoelastic solid. Discrete & Continuous Dynamical Systems - S, 2017, 10 (6) : 1351-1373. doi: 10.3934/dcdss.2017072

[9]

Marcelo Messias. Periodic perturbation of quadratic systems with two infinite heteroclinic cycles. Discrete & Continuous Dynamical Systems - A, 2012, 32 (5) : 1881-1899. doi: 10.3934/dcds.2012.32.1881

[10]

Alina Chertock, Alexander Kurganov, Mária Lukáčová-Medvi${\rm{\check{d}}}$ová, Șeyma Nur Özcan. An asymptotic preserving scheme for kinetic chemotaxis models in two space dimensions. Kinetic & Related Models, 2019, 12 (1) : 195-216. doi: 10.3934/krm.2019009

[11]

Olena Naboka. On synchronization of oscillations of two coupled Berger plates with nonlinear interior damping. Communications on Pure & Applied Analysis, 2009, 8 (6) : 1933-1956. doi: 10.3934/cpaa.2009.8.1933

[12]

Zhigang Pan, Chanh Kieu, Quan Wang. Hopf bifurcations and transitions of two-dimensional Quasi-Geostrophic flows. Communications on Pure & Applied Analysis, , () : -. doi: 10.3934/cpaa.2021025

[13]

J. Frédéric Bonnans, Justina Gianatti, Francisco J. Silva. On the convergence of the Sakawa-Shindo algorithm in stochastic control. Mathematical Control & Related Fields, 2016, 6 (3) : 391-406. doi: 10.3934/mcrf.2016008

[14]

Diana Keller. Optimal control of a linear stochastic Schrödinger equation. Conference Publications, 2013, 2013 (special) : 437-446. doi: 10.3934/proc.2013.2013.437

[15]

Seung-Yeal Ha, Dongnam Ko, Chanho Min, Xiongtao Zhang. Emergent collective behaviors of stochastic kuramoto oscillators. Discrete & Continuous Dynamical Systems - B, 2020, 25 (3) : 1059-1081. doi: 10.3934/dcdsb.2019208

[16]

María J. Garrido-Atienza, Bohdan Maslowski, Jana  Šnupárková. Semilinear stochastic equations with bilinear fractional noise. Discrete & Continuous Dynamical Systems - B, 2016, 21 (9) : 3075-3094. doi: 10.3934/dcdsb.2016088

[17]

Hong Seng Sim, Wah June Leong, Chuei Yee Chen, Siti Nur Iqmal Ibrahim. Multi-step spectral gradient methods with modified weak secant relation for large scale unconstrained optimization. Numerical Algebra, Control & Optimization, 2018, 8 (3) : 377-387. doi: 10.3934/naco.2018024

[18]

Abdulrazzaq T. Abed, Azzam S. Y. Aladool. Applying particle swarm optimization based on Padé approximant to solve ordinary differential equation. Numerical Algebra, Control & Optimization, 2021  doi: 10.3934/naco.2021008

[19]

Mohsen Abdolhosseinzadeh, Mir Mohammad Alipour. Design of experiment for tuning parameters of an ant colony optimization method for the constrained shortest Hamiltonian path problem in the grid networks. Numerical Algebra, Control & Optimization, 2021, 11 (2) : 321-332. doi: 10.3934/naco.2020028

[20]

Reza Lotfi, Yahia Zare Mehrjerdi, Mir Saman Pishvaee, Ahmad Sadeghieh, Gerhard-Wilhelm Weber. A robust optimization model for sustainable and resilient closed-loop supply chain network design considering conditional value at risk. Numerical Algebra, Control & Optimization, 2021, 11 (2) : 221-253. doi: 10.3934/naco.2020023

2019 Impact Factor: 1.366

Metrics

  • PDF downloads (225)
  • HTML views (629)
  • Cited by (0)

Other articles
by authors

[Back to Top]