August  2020, 19(8): 4069-4083. doi: 10.3934/cpaa.2020180

Quantitative convergence analysis of kernel based large-margin unified machines

1. 

Department of Mathematics, Hong Kong Baptist University, Kowloon, Hong Kong, China

2. 

Department of Mathematics, Zhejiang Normal University, Jinhua, Zhejiang 321004, China

* Corresponding author

Received  August 2019 Revised  September 2019 Published  May 2020

Fund Project: The work by J. Fan is partially supported by the Hong Kong RGC ECS grant 22303518, HKBU FRG grant FRG2/17-18/091 and the NSF grant of China (No. 11801478). The work by D. H. Xiang is supported by the National Natural Science Foundation of China under Grant 11871438 and 11771120

High-dimensional binary classification has been intensively studied in the community of machine learning in the last few decades. Support vector machine (SVM), one of the most popular classifier, depends on only a portion of training samples called support vectors which leads to suboptimal performance in the setting of high dimension and low sample size (HDLSS). Large-margin unified machines (LUMs) are a family of margin-based classifiers proposed to solve the so-called "data piling" problem which is inherent in SVM under HDLSS settings. In this paper we study the binary classification algorithms associated with LUM loss functions in the framework of reproducing kernel Hilbert spaces. Quantitative convergence analysis has been carried out for these algorithms by means of a novel application of projection operators to overcome the technical difficulty. The rates are explicitly derived under priori conditions on approximation and capacity of the reproducing kernel Hilbert space.

Citation: Jun Fan, Dao-Hong Xiang. Quantitative convergence analysis of kernel based large-margin unified machines. Communications on Pure and Applied Analysis, 2020, 19 (8) : 4069-4083. doi: 10.3934/cpaa.2020180
References:
[1]

N. Aronszajn, Theory of reproducing kernels, Trans. Amer. Math. Soc., 68 (1950), 337-404.  doi: 10.2307/1990404.

[2]

P. L. Bartlett, The sample complexity of pattern classification with neural networks: the size of the weights is more important than the size of the network, IEEE Trans. Inform. Theory, 44 (1998), 525-536.  doi: 10.1109/18.661502.

[3]

P. L. BartlettM. I. Jordan and J. D. McAuliffe, Convexity, classification, and risk bounds, J. Amer. Statist. Assoc., 101 (2006), 138-156.  doi: 10.1198/016214505000000907.

[4]

B. E. Boser, I. Guyon and V. Vapnik, A training algorithm for optimal margin classifiers, in Computational Learning Theory, Madison, WI, ACM, (1992), 144–152.

[5]

D. R. ChenQ. WuY. Ying and D. X. Zhou, Support vector machine soft margin classifiers: error analysis, J. Mach. Learn. Res., 5 (2004), 1143-1175. 

[6]

C. Cortes and V. Vapnik, Support vector networks, Mach. Learn., 20 (1995), 273-297. 

[7] F. Cucker and D. X. Zhou, Learning Theory: An Approximation Theory Viewpoint, Cambridge University Press, Cambridge, 2007.  doi: 10.1017/CBO9780511618796.
[8] D. E. Edmunds and H. Triebel, Function Spaces, Entropy Numbers, Differential Operators, Cambridge University Press, Cambridge, 1996.  doi: 10.1017/CBO9780511662201.
[9]

J. Fan and D. H. Xiang, Comparison theorems on large margin learning, Technique report, 2019.

[10]

Y. Freund and R. E. Schapire, A decision-theoretic generalization of on-line learning and an application to boosting, J. Comput. Syst. Sci., 55 (1997), 119-139.  doi: 10.1006/jcss.1997.1504.

[11]

J. FriedmanT. Hastie and R. Tibshirani, Additive logistic regression: a statistical view of boosting (with discussion), Ann. Statist., 28 (2000), 337-407.  doi: 10.1214/aos/1016218223.

[12]

T. Hastie, R. Tibshirani and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Springer-Verlag, New York, 2001. doi: 10.1007/978-0-387-21606-5.

[13]

Y. W. LeiU. DoganD. X. Zhou and M. Kloft, Data-dependent generalization bounds for multi-class classification, IEEE Trans. Inform. Theory, 65 (2019), 2995-3021.  doi: 10.1109/TIT.2019.2893916.

[14]

Y. F. LiuH. H. Zhang and Y. C. Wu, Hard or soft classificaiton? large-margin unified machines, J. Amer. Statist. Assoc., 106 (2011), 166-177.  doi: 10.1198/jasa.2011.tm10319.

[15]

J. S. MarronM. Todd and J. Ahn, Distance weighted discrimination, J. Amer. Statist. Assoc., 102 (2007), 1267-1271.  doi: 10.1198/016214507000001120.

[16]

S. Smale. and D. X. Zhou, Online learning with markov sampling, Anal. Appl., 7 (2009), 87-113.  doi: 10.1142/S0219530509001293.

[17]

S. Smale and D. X. Zhou, Learning theory estimates via integral operators and their approximations, Constr. Approx., 26 (2007), 153-172.  doi: 10.1007/s00365-006-0659-y.

[18]

I. Steinwart, On the influence of the kernel on the consistency of support vector machines, J. Mach. Learn. Res., 2 (2001), 67-93.  doi: 10.1162/153244302760185252.

[19]

I. Steinwart and A. Christmann, Estimating conditional quantiles with the help of the pinball loss, Bernoulli, 17 (2011), 211-225.  doi: 10.3150/10-BEJ267.

[20]

I. Steinwart and C. Scovel, Fast rates for support vector machines using gaussian kernels, Ann. Statist., 35 (2007), 575-607.  doi: 10.1214/009053606000001226.

[21]

A. B. Tsybakov, Optimal aggregation of classifiers in statistical learning, Ann. Statist., 32 (2004), 135-166.  doi: 10.1214/aos/1079120131.

[22]

G. Wahba, Spline Models for Observational Data, SIAM, 1990. doi: 10.1137/1.9781611970128.

[23]

C. Wang and T. Hu, Online minimum error entropy algorithm with unbounded sampling, Anal. Appl., 17 (2019), 293-322.  doi: 10.1142/S0219530518500148.

[24]

B. X. Wang and H. Zou, Another look at distance-weighted discrimination, J. R. Statist. Soc., 80 (2018), 177-198.  doi: 10.1111/rssb.12244.

[25]

R. WilliamsonA. J. Smola and B. Schölkopf, Generalization performance of regularization networks and support vector machines via entropy numbers of compact operators, IEEE Trans. Inform. Theory, 47 (2001), 2516-2532.  doi: 10.1109/18.945262.

[26]

Q. Wu, Classification and Regularization in Learning Theory, VDM Verlag, 2009.

[27]

Q. WuY. M. Ying and D. X. Zhou, Multi-kernel regularized classifiers, J. Complexity, 23 (2007), 108-134.  doi: 10.1016/j.jco.2006.06.007.

[28]

Q. WuY. Ying and D. X. Zhou, Learning rates of least-square regularized regression, Found. Comput. Math., 6 (2006), 171-192.  doi: 10.1007/s10208-004-0155-9.

[29]

Q. Wu and D. X. Zhou, Svm soft margin classifiers: linear programming versus quadratic programming, Neural Comput., 17 (2005), 1160-1187.  doi: 10.1162/0899766053491896.

[30]

Q. Wu and D. X. Zhou, Analysis of support vector machine classification, J. Comput. Anal. Appl., 8 (2006), 99-119.  doi: 10.1109/BMEI.2008.178.

[31]

D. H. Xiang, Classification with gaussians and convex loss ii: improving error bounds by noise conditions, Sci. China Math., 54 (2011), 165-171.  doi: 10.1007/s11425-010-4043-2.

[32]

D. H. Xiang, Logistic classification with varying gaussians, Comput. Math. Appl., 61 (2011), 397-407.  doi: 10.1016/j.camwa.2010.11.016.

[33]

D. H. Xiang, Conditional quantiles with varying gaussians, Adv. Comput. Math., 38 (2013), 723-735.  doi: 10.1007/s10444-011-9257-5.

[34]

D. H. Xiang and D. X. Zhou, Classification with gaussians and convex loss, J. Mach. Learn. Res., 10 (2009), 1447-1468.  doi: 10.1007/s11425-010-4043-2.

[35]

D. H. Xiang, T. Hu and D. X. Zhou, Approximation analysis of learning algorithms for support vector regression and quantile regression, J. Appl. Math., 2012 (2012), 17 pp. doi: 10.1155/2012/902139.

[36]

Y. Ying and D. X. Zhou, Learnability of Gaussians with Flexible Variances, J. Mach. Learn. Res., 8 (2007), 249-276. 

[37]

T. Zhang, Statistical behavior and consistency of classification methods based on convex risk minimization, Ann. Statist., 32 (2004), 56-85.  doi: 10.1214/aos/1079120130.

[38]

Y. L. ZhaoJ. Fan and L. Shi, Learing rates for regularized least squares ranking algorithm, Anal. Appl., 15 (2017), 815-836.  doi: 10.1142/S0219530517500063.

[39]

D. X. Zhou, The covering number in learning theory, J. Complexity, 18 (2002), 739-767.  doi: 10.1006/jcom.2002.0635.

[40]

D. X. Zhou, Capacity of reproducing kernel spaces in learning theory, IEEE. Trans. Inform. Theory, 49 (2003), 1743-1752.  doi: 10.1109/TIT.2003.813564.

show all references

References:
[1]

N. Aronszajn, Theory of reproducing kernels, Trans. Amer. Math. Soc., 68 (1950), 337-404.  doi: 10.2307/1990404.

[2]

P. L. Bartlett, The sample complexity of pattern classification with neural networks: the size of the weights is more important than the size of the network, IEEE Trans. Inform. Theory, 44 (1998), 525-536.  doi: 10.1109/18.661502.

[3]

P. L. BartlettM. I. Jordan and J. D. McAuliffe, Convexity, classification, and risk bounds, J. Amer. Statist. Assoc., 101 (2006), 138-156.  doi: 10.1198/016214505000000907.

[4]

B. E. Boser, I. Guyon and V. Vapnik, A training algorithm for optimal margin classifiers, in Computational Learning Theory, Madison, WI, ACM, (1992), 144–152.

[5]

D. R. ChenQ. WuY. Ying and D. X. Zhou, Support vector machine soft margin classifiers: error analysis, J. Mach. Learn. Res., 5 (2004), 1143-1175. 

[6]

C. Cortes and V. Vapnik, Support vector networks, Mach. Learn., 20 (1995), 273-297. 

[7] F. Cucker and D. X. Zhou, Learning Theory: An Approximation Theory Viewpoint, Cambridge University Press, Cambridge, 2007.  doi: 10.1017/CBO9780511618796.
[8] D. E. Edmunds and H. Triebel, Function Spaces, Entropy Numbers, Differential Operators, Cambridge University Press, Cambridge, 1996.  doi: 10.1017/CBO9780511662201.
[9]

J. Fan and D. H. Xiang, Comparison theorems on large margin learning, Technique report, 2019.

[10]

Y. Freund and R. E. Schapire, A decision-theoretic generalization of on-line learning and an application to boosting, J. Comput. Syst. Sci., 55 (1997), 119-139.  doi: 10.1006/jcss.1997.1504.

[11]

J. FriedmanT. Hastie and R. Tibshirani, Additive logistic regression: a statistical view of boosting (with discussion), Ann. Statist., 28 (2000), 337-407.  doi: 10.1214/aos/1016218223.

[12]

T. Hastie, R. Tibshirani and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Springer-Verlag, New York, 2001. doi: 10.1007/978-0-387-21606-5.

[13]

Y. W. LeiU. DoganD. X. Zhou and M. Kloft, Data-dependent generalization bounds for multi-class classification, IEEE Trans. Inform. Theory, 65 (2019), 2995-3021.  doi: 10.1109/TIT.2019.2893916.

[14]

Y. F. LiuH. H. Zhang and Y. C. Wu, Hard or soft classificaiton? large-margin unified machines, J. Amer. Statist. Assoc., 106 (2011), 166-177.  doi: 10.1198/jasa.2011.tm10319.

[15]

J. S. MarronM. Todd and J. Ahn, Distance weighted discrimination, J. Amer. Statist. Assoc., 102 (2007), 1267-1271.  doi: 10.1198/016214507000001120.

[16]

S. Smale. and D. X. Zhou, Online learning with markov sampling, Anal. Appl., 7 (2009), 87-113.  doi: 10.1142/S0219530509001293.

[17]

S. Smale and D. X. Zhou, Learning theory estimates via integral operators and their approximations, Constr. Approx., 26 (2007), 153-172.  doi: 10.1007/s00365-006-0659-y.

[18]

I. Steinwart, On the influence of the kernel on the consistency of support vector machines, J. Mach. Learn. Res., 2 (2001), 67-93.  doi: 10.1162/153244302760185252.

[19]

I. Steinwart and A. Christmann, Estimating conditional quantiles with the help of the pinball loss, Bernoulli, 17 (2011), 211-225.  doi: 10.3150/10-BEJ267.

[20]

I. Steinwart and C. Scovel, Fast rates for support vector machines using gaussian kernels, Ann. Statist., 35 (2007), 575-607.  doi: 10.1214/009053606000001226.

[21]

A. B. Tsybakov, Optimal aggregation of classifiers in statistical learning, Ann. Statist., 32 (2004), 135-166.  doi: 10.1214/aos/1079120131.

[22]

G. Wahba, Spline Models for Observational Data, SIAM, 1990. doi: 10.1137/1.9781611970128.

[23]

C. Wang and T. Hu, Online minimum error entropy algorithm with unbounded sampling, Anal. Appl., 17 (2019), 293-322.  doi: 10.1142/S0219530518500148.

[24]

B. X. Wang and H. Zou, Another look at distance-weighted discrimination, J. R. Statist. Soc., 80 (2018), 177-198.  doi: 10.1111/rssb.12244.

[25]

R. WilliamsonA. J. Smola and B. Schölkopf, Generalization performance of regularization networks and support vector machines via entropy numbers of compact operators, IEEE Trans. Inform. Theory, 47 (2001), 2516-2532.  doi: 10.1109/18.945262.

[26]

Q. Wu, Classification and Regularization in Learning Theory, VDM Verlag, 2009.

[27]

Q. WuY. M. Ying and D. X. Zhou, Multi-kernel regularized classifiers, J. Complexity, 23 (2007), 108-134.  doi: 10.1016/j.jco.2006.06.007.

[28]

Q. WuY. Ying and D. X. Zhou, Learning rates of least-square regularized regression, Found. Comput. Math., 6 (2006), 171-192.  doi: 10.1007/s10208-004-0155-9.

[29]

Q. Wu and D. X. Zhou, Svm soft margin classifiers: linear programming versus quadratic programming, Neural Comput., 17 (2005), 1160-1187.  doi: 10.1162/0899766053491896.

[30]

Q. Wu and D. X. Zhou, Analysis of support vector machine classification, J. Comput. Anal. Appl., 8 (2006), 99-119.  doi: 10.1109/BMEI.2008.178.

[31]

D. H. Xiang, Classification with gaussians and convex loss ii: improving error bounds by noise conditions, Sci. China Math., 54 (2011), 165-171.  doi: 10.1007/s11425-010-4043-2.

[32]

D. H. Xiang, Logistic classification with varying gaussians, Comput. Math. Appl., 61 (2011), 397-407.  doi: 10.1016/j.camwa.2010.11.016.

[33]

D. H. Xiang, Conditional quantiles with varying gaussians, Adv. Comput. Math., 38 (2013), 723-735.  doi: 10.1007/s10444-011-9257-5.

[34]

D. H. Xiang and D. X. Zhou, Classification with gaussians and convex loss, J. Mach. Learn. Res., 10 (2009), 1447-1468.  doi: 10.1007/s11425-010-4043-2.

[35]

D. H. Xiang, T. Hu and D. X. Zhou, Approximation analysis of learning algorithms for support vector regression and quantile regression, J. Appl. Math., 2012 (2012), 17 pp. doi: 10.1155/2012/902139.

[36]

Y. Ying and D. X. Zhou, Learnability of Gaussians with Flexible Variances, J. Mach. Learn. Res., 8 (2007), 249-276. 

[37]

T. Zhang, Statistical behavior and consistency of classification methods based on convex risk minimization, Ann. Statist., 32 (2004), 56-85.  doi: 10.1214/aos/1079120130.

[38]

Y. L. ZhaoJ. Fan and L. Shi, Learing rates for regularized least squares ranking algorithm, Anal. Appl., 15 (2017), 815-836.  doi: 10.1142/S0219530517500063.

[39]

D. X. Zhou, The covering number in learning theory, J. Complexity, 18 (2002), 739-767.  doi: 10.1006/jcom.2002.0635.

[40]

D. X. Zhou, Capacity of reproducing kernel spaces in learning theory, IEEE. Trans. Inform. Theory, 49 (2003), 1743-1752.  doi: 10.1109/TIT.2003.813564.

[1]

Stefan Kindermann, Antonio Leitão. Convergence rates for Kaczmarz-type regularization methods. Inverse Problems and Imaging, 2014, 8 (1) : 149-172. doi: 10.3934/ipi.2014.8.149

[2]

De-han Chen, Daijun jiang. Convergence rates of Tikhonov regularization for recovering growth rates in a Lotka-Volterra competition model with diffusion. Inverse Problems and Imaging, 2021, 15 (5) : 951-974. doi: 10.3934/ipi.2021023

[3]

Markus Grasmair. Well-posedness and convergence rates for sparse regularization with sublinear $l^q$ penalty term. Inverse Problems and Imaging, 2009, 3 (3) : 383-387. doi: 10.3934/ipi.2009.3.383

[4]

Philippe Angot, Pierre Fabrie. Convergence results for the vector penalty-projection and two-step artificial compressibility methods. Discrete and Continuous Dynamical Systems - B, 2012, 17 (5) : 1383-1405. doi: 10.3934/dcdsb.2012.17.1383

[5]

Guozhi Dong, Bert Jüttler, Otmar Scherzer, Thomas Takacs. Convergence of Tikhonov regularization for solving ill-posed operator equations with solutions defined on surfaces. Inverse Problems and Imaging, 2017, 11 (2) : 221-246. doi: 10.3934/ipi.2017011

[6]

Matteo Bonforte, Jean Dolbeault, Matteo Muratori, Bruno Nazaret. Weighted fast diffusion equations (Part Ⅱ): Sharp asymptotic rates of convergence in relative error by entropy methods. Kinetic and Related Models, 2017, 10 (1) : 61-91. doi: 10.3934/krm.2017003

[7]

James Broda, Alexander Grigo, Nikola P. Petrov. Convergence rates for semistochastic processes. Discrete and Continuous Dynamical Systems - B, 2019, 24 (1) : 109-125. doi: 10.3934/dcdsb.2019001

[8]

Thomas Schuster, Joachim Weickert. On the application of projection methods for computing optical flow fields. Inverse Problems and Imaging, 2007, 1 (4) : 673-690. doi: 10.3934/ipi.2007.1.673

[9]

Dang Van Hieu. Projection methods for solving split equilibrium problems. Journal of Industrial and Management Optimization, 2020, 16 (5) : 2331-2349. doi: 10.3934/jimo.2019056

[10]

Qinghua Ma, Zuoliang Xu, Liping Wang. Recovery of the local volatility function using regularization and a gradient projection method. Journal of Industrial and Management Optimization, 2015, 11 (2) : 421-437. doi: 10.3934/jimo.2015.11.421

[11]

Richard A. Norton, David I. McLaren, G. R. W. Quispel, Ari Stern, Antonella Zanna. Projection methods and discrete gradient methods for preserving first integrals of ODEs. Discrete and Continuous Dynamical Systems, 2015, 35 (5) : 2079-2098. doi: 10.3934/dcds.2015.35.2079

[12]

Ole Løseth Elvetun, Bjørn Fredrik Nielsen. A regularization operator for source identification for elliptic PDEs. Inverse Problems and Imaging, 2021, 15 (4) : 599-618. doi: 10.3934/ipi.2021006

[13]

Baohuai Sheng, Huanxiang Liu, Huimin Wang. Learning rates for the kernel regularized regression with a differentiable strongly convex loss. Communications on Pure and Applied Analysis, 2020, 19 (8) : 3973-4005. doi: 10.3934/cpaa.2020176

[14]

Xiangtuan Xiong, Jinmei Li, Jin Wen. Some novel linear regularization methods for a deblurring problem. Inverse Problems and Imaging, 2017, 11 (2) : 403-426. doi: 10.3934/ipi.2017019

[15]

Stefan Kindermann, Andreas Neubauer. On the convergence of the quasioptimality criterion for (iterated) Tikhonov regularization. Inverse Problems and Imaging, 2008, 2 (2) : 291-299. doi: 10.3934/ipi.2008.2.291

[16]

Frank Blume. Minimal rates of entropy convergence for rank one systems. Discrete and Continuous Dynamical Systems, 2000, 6 (4) : 773-796. doi: 10.3934/dcds.2000.6.773

[17]

Jie Zhao. Convergence rates for elliptic reiterated homogenization problems. Communications on Pure and Applied Analysis, 2013, 12 (6) : 2787-2795. doi: 10.3934/cpaa.2013.12.2787

[18]

Wilhelm Schlag. Regularity and convergence rates for the Lyapunov exponents of linear cocycles. Journal of Modern Dynamics, 2013, 7 (4) : 619-637. doi: 10.3934/jmd.2013.7.619

[19]

Philipp Harms. Strong convergence rates for markovian representations of fractional processes. Discrete and Continuous Dynamical Systems - B, 2021, 26 (10) : 5567-5579. doi: 10.3934/dcdsb.2020367

[20]

Jae-Hong Pyo, Jie Shen. Normal mode analysis of second-order projection methods for incompressible flows. Discrete and Continuous Dynamical Systems - B, 2005, 5 (3) : 817-840. doi: 10.3934/dcdsb.2005.5.817

2021 Impact Factor: 1.273

Metrics

  • PDF downloads (322)
  • HTML views (83)
  • Cited by (0)

Other articles
by authors

[Back to Top]