\`x^2+y_1+z_12^34\`
Advanced Search
Article Contents
Article Contents

Improved SVRG for finite sum structure optimization with application to binary classification

Abstract Full Text(HTML) Figure(9) / Table(1) Related Papers Cited by
  • This paper looks at a stochastic variance reduced gradient (SVRG) method for minimizing the sum of a finite number of smooth convex functions, which has been involved widely in the field of machine learning and data mining. Inspired by the excellent performance of two-point stepsize gradient method in batch learning, in this paper we present an improved SVRG algorithm, named stochastic two-point stepsize gradient method. Under some mild conditions, the proposed method achieves a linear convergence rate $ O(\rho^k) $ for smooth and strongly convex functions, where $ \rho\in(0.68, 1) $. Simulation experiments on several benchmark data sets are reported to demonstrate the performance of the proposed method.

    Mathematics Subject Classification: Primary: 65K05; Secondary: 68W27.

    Citation:

    \begin{equation} \\ \end{equation}
  • 加载中
  • Figure 1.  Objective loss and test classification accuracy versus epochs on data set "a9a" (best viewed in color)

    Figure 2.  Objective loss and test classification accuracy versus epochs on data set "w8a"

    Figure 3.  Evolutions of objective loss and test classification accuracy w.r.t. epochs for Eq. (14) on data set "a9a"

    Figure 4.  Evolutions of objective loss and test classification accuracy w.r.t. epochs for Eq. (14) on data set "w8a"

    Figure 5.  Evolutions of objective loss and test classification accuracy w.r.t. epochs for Eq. (14) on data set "a1a"

    Figure 6.  Evolutions of objective loss and test classification accuracy w.r.t. epochs for Eq. (14) on data set "w1a"

    Figure 7.  Evolutions of objective loss and test classification accuracy versus epochs on data set "a9a" with different $ \eta_0 $

    Figure 8.  Evolutions of objective loss and test classification accuracy versus epochs on data set "w8a" with different $ \eta_0 $

    Figure 9.  Comparison results between SVM and STSG on test classification accuracy on four test data sets

    Table 1.  Details of data sets

    Data set $\sharp$ of training samples $\sharp$ of test samples $\sharp$ of dimension
    a1a160530956123
    a9a3256116281123
    w1a247747272300
    w8a4974914951300
     | Show Table
    DownLoad: CSV
  • [1] J. Barzilai and J. M. Borwein, Two-point step size gradient methods, IMA J. Numer. Anal., 8 (1988), 141-148.  doi: 10.1093/imanum/8.1.141.
    [2] S. Boyd and  L. VandenbergheConvex Optimization, Cambridge University Press, New York, 2004.  doi: 10.1017/CBO9780511804441.
    [3] Y. ChenL. Qi and Q. Wang, Computing extreme eigenvalues of large scale hankel tensors, J. Sci. Comput., 68 (2016), 716-738.  doi: 10.1007/s10915-015-0155-8.
    [4] W. Cheng and Y. Dai, Gradient-based method with active set strategy for l1 optimization, Math. Comp., 87 (2018), 1283-1305.  doi: 10.1090/mcom/3238.
    [5] Y. Dai and R. Fletcher, On the asymptotic behaviour of some new gradient methods, Math. Program., 103 (2005), 541-559.  doi: 10.1007/s10107-004-0516-9.
    [6] Y. Dai and R. Fletcher, Projected Barzilai-Borwein methods for large-scale box-constrained quadratic programming, Numer. Math., 100 (2005), 21-47.  doi: 10.1007/s00211-004-0569-y.
    [7] R. Johnson and T. Zhang, Accelerating stochastic gradient descent using predictive variance reduction, in Adv. Neural Inf. Process. Syst., (2013), 315–323.
    [8] H. LiuZ. Liu and X. Dong, A new adaptive Barzilai and Borwein method for unconstrained optimization, Optim. Lett., 12 (2018), 845-873.  doi: 10.1007/s11590-017-1150-9.
    [9] J. Mairal, Incremental majorization-minimization optimization with application to large-scale machine learning, Optim. Lett., 25 (2015), 829-855.  doi: 10.1137/140957639.
    [10] E. Min, Y. Zhao, J. Long, C. Wu, K. Li and J. Yin, SVRG with adaptive epoch size, in Proc. IEEE Int. Joint Conf. Neural Netw., (2017), 2935–2942. doi: 10.1109/IJCNN.2017.7966219.
    [11] Y. Nesterov, Introductory Lectures on Convex Optimization: A Basic Course, Springer, Boston, 2004. doi: 10.1007/978-1-4419-8853-9.
    [12] A. Nitanda, Accelerated stochastic gradient descent for minimizing finite sums, in Proc. Int. Conf. Artif. Intell. Statist., (2016), 195–203.
    [13] A. Nitanda, Stochastic proximal gradient descent with acceleration techniques, in Adv. Neural Inf. Process. Syst., (2016), 1574–1582.
    [14] W. B. Powell, Approximate Dynamic Programming: Solving the Curses of Dimensionality, John Wiley and Sons, New Jersey, 2007. doi: 10.1002/9780470182963.
    [15] M. Raydan, On the Barzilai and Borwein choice of steplength for the gradient method, IMA J. Numer. Anal., 13 (1993), 321-326.  doi: 10.1093/imanum/13.3.321.
    [16] H. Robbins and S. Monro, A stochastic approximation method, Ann. Math. Statist., 22 (1951), 400-407.  doi: 10.1214/aoms/1177729586.
    [17] N. L. Roux, M. Schmidt and F. Bach, A stochastic gradient method with an exponential convergence rate for finite training sets, in Adv. Neural Inf. Process. Syst., (2012), 2663–2671.
    [18] Z. Shen, H. Qian, T. Zhou and T. Mu, Adaptive variance reducing for stochastic gradient descent, in Proc. Int. Joint Conf. Artif. Intell., (2016), 1990–1996.
    [19] C. Tan, S. Ma, Y. Dai and Y. Qian, Barzilai-Borwein step size for stochastic gradient descent, in Adv. Neural Inf. Process. Syst., (2016), 685–693.
    [20] Z. WenC. YangX. Liu and Y. Zhang, Trace-penalty minimization for large-scale eigenspace computation, J. Sci. Comput., 66 (2016), 1175-1203.  doi: 10.1007/s10915-015-0061-0.
    [21] L. Xiao and T. Zhang, A proximal stochastic gradient method with progressive variance reduction, SIAM J. Optim., 24 (2014), 2057-2075.  doi: 10.1137/140961791.
    [22] W. XueW. Zhang and J. Ren, Online leaming based on stochastic spectral gradient, Comput. Sci., 43 (2016), 47-51. 
    [23] G. Yu and S. Niu, Multivariate spectral gradient projection method for nonlinear monotone equations with convex constraints, J. Ind. Manag. Optim., 9 (2013), 117-129.  doi: 10.3934/jimo.2013.9.117.
    [24] B. ZhouL. Gao and Y. Dai, Gradient methods with adaptive step-sizes, Comput. Optim. Appl., 35 (2006), 69-86.  doi: 10.1007/s10589-006-6446-0.
    [25] R. ZhouX. Shen and L. Niu, A fast algorithm for nonsmooth penalized clustering, Neurocomputing, 273 (2018), 583-592.  doi: 10.1016/j.neucom.2017.08.048.
  • 加载中

Figures(9)

Tables(1)

SHARE

Article Metrics

HTML views(2180) PDF downloads(482) Cited by(0)

Access History

Other Articles By Authors

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return