doi: 10.3934/mfc.2022028
Online First

Online First articles are published articles within a journal that have not yet been assigned to a formal issue. This means they do not yet have a volume number, issue number, or page numbers assigned to them, however, they can still be found and cited using their DOI (Digital Object Identifier). Online First publication benefits the research community by making new scientific discoveries known as quickly as possible.

Readers can access Online First articles via the “Online First” tab for the selected journal.

Error analysis of classification learning algorithms based on LUMs loss

School of Mathematical Science, University of Jinan, Jinan 250022, China

*Corresponding author: Hongwei Sun

Received  April 2022 Revised  July 2022 Early access August 2022

Fund Project: The second author is supported by National Natural Science Foundation of China (Grants No. 11671171 and 11871167)

In this paper, we study the learning performance of regularized large-margin unified machines (LUMs) for classification problem. The hypothesis space is taken to be a reproducing kernel Hilbert space $ {\mathcal H}_K $, and the penalty term is denoted by the norm of the function in $ {\mathcal H}_K $. Since the LUM loss functions are differentiable and convex, so the data piling phenomena can be avoided when dealing with the high-dimension low-sample size data. The error analysis of this classification learning machine mainly lies upon the comparison theorem [3] which ensures that the excess classification error can be bounded by the excess generalization error. Under a mild source condition which shows that the minimizer $ f_V $ of the generalization error can be approximated by the hypothesis space $ {\mathcal H}_K $, and by a leave one out variant technique proposed in [13], satisfying error bound and learning rate about the mean of excess classification error are deduced.

Citation: Xuqing He, Hongwei Sun. Error analysis of classification learning algorithms based on LUMs loss. Mathematical Foundations of Computing, doi: 10.3934/mfc.2022028
References:
[1]

N. Aronszajn, Theory of reproducing kernels, Trans. Amer. Math. Soc., 68 (1950), 337-404.  doi: 10.1090/S0002-9947-1950-0051437-7.

[2]

P. L. BartlettM. I. Jordan and J. D. Mcauliffe, Convexity, classification, and risk bounds, J. Amer. Statist. Assoc., 101 (2006), 138-156.  doi: 10.1198/016214505000000907.

[3]

A. Benabid, J. Fan and D.-H. Xiang, Comparison theorems on large-margin learning, Int. J. Wavelets Multiresolut. Inf. Process., 19 (2021), Paper No. 2150015, 18 pp. doi: 10.1142/S0219691321500156.

[4] N. Cristianini and J. Shawe-Taylor, An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods, Cambridge University Press, 2000. 
[5]

F. Critchley and C. Vitiello, The influence of observations on misclassification probability estimates in linear discriminant analysis, Biometrika, 78 (1991), 677-690.  doi: 10.1093/biomet/78.3.677.

[6]

J. Fan and D.-H. Xiang, Quantitative convergence analysis of kernel based large-margin unified machines, Commun. Pure Appl. Anal., 19 (2020), 4069-4083.  doi: 10.3934/cpaa.2020180.

[7]

S. HuangY. Feng and Q. Wu, Learning theory of minimum error entropy under weak moment conditions, Anal. Appl. (Singap.), 20 (2022), 121-139.  doi: 10.1142/S0219530521500044.

[8]

Y. LiuH. H. Zhang and Y. Wu, Hard or soft classification? Large-margin unified machines, J. Amer. Statist. Assoc., 106 (2011), 166-177.  doi: 10.1198/jasa.2011.tm10319.

[9]

J. S. MarronJ. M. Todd and J. Ahn, Distance-weighted discrimination, J. Amer. Statist. Assoc., 102 (2007), 1267-1271.  doi: 10.1198/016214507000001120.

[10]

S. Smale and D.-X. Zhou, Learning theory estimates via integral operators and their approximations, Constr. Approx., 26 (2007), 153-172.  doi: 10.1007/s00365-006-0659-y.

[11]

I. Steinwart, Consistency of support vector machines and other regularized kernel classifiers, IEEE Trans. Inform. Theory, 51 (2005), 128-142.  doi: 10.1109/TIT.2004.839514.

[12]

I. Steinwart and A. Christman, Estimating conditional quantiles with the help of the pinball loss, Bernoulli, 17 (2011), 211-225.  doi: 10.3150/10-BEJ267.

[13]

H. W. Sun and Q. Wu, Optimal rates of distributed regression with imperfect kernels, J. Mach. Learn. Res., 22 (2021), Paper No. 171, 34 pp. doi: 10.1007/s00023-020-00966-6.

[14]

A. B. Tsybakov, Optimal aggregation of classifiers in statistical learning, Ann. Statist., 32 (2004), 135-166.  doi: 10.1214/aos/1079120131.

[15]

B. Wang and H. Zou, Another look at distance weighted discrimination, J. R. Stat. Soc. Ser. B. Stat. Methodol., 80 (2018), 177-198. doi: 10.1111/rssb.12244.

[16]

Q. WuY. M. Ying and D.-X. Zhou, Learning rates of least-square regularized regression, Found. Comput. Math., 6 (2006), 171-192.  doi: 10.1007/s10208-004-0155-9.

[17]

D.-H. Xiang, T. Hu and D.-X. Zhou, Approximation analysis of learning algorithms for support vector regression and quantile regression, J. Appl. Math., (2012), Art. ID 902139, 17 pp. doi: 10.1155/2012/902139.

[18]

T. Zhang, Statistical behavior and consistency of classification methods based on convex risk minimization, Ann. Statist., 32 (2004), 56-85.  doi: 10.1214/aos/1079120130.

show all references

References:
[1]

N. Aronszajn, Theory of reproducing kernels, Trans. Amer. Math. Soc., 68 (1950), 337-404.  doi: 10.1090/S0002-9947-1950-0051437-7.

[2]

P. L. BartlettM. I. Jordan and J. D. Mcauliffe, Convexity, classification, and risk bounds, J. Amer. Statist. Assoc., 101 (2006), 138-156.  doi: 10.1198/016214505000000907.

[3]

A. Benabid, J. Fan and D.-H. Xiang, Comparison theorems on large-margin learning, Int. J. Wavelets Multiresolut. Inf. Process., 19 (2021), Paper No. 2150015, 18 pp. doi: 10.1142/S0219691321500156.

[4] N. Cristianini and J. Shawe-Taylor, An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods, Cambridge University Press, 2000. 
[5]

F. Critchley and C. Vitiello, The influence of observations on misclassification probability estimates in linear discriminant analysis, Biometrika, 78 (1991), 677-690.  doi: 10.1093/biomet/78.3.677.

[6]

J. Fan and D.-H. Xiang, Quantitative convergence analysis of kernel based large-margin unified machines, Commun. Pure Appl. Anal., 19 (2020), 4069-4083.  doi: 10.3934/cpaa.2020180.

[7]

S. HuangY. Feng and Q. Wu, Learning theory of minimum error entropy under weak moment conditions, Anal. Appl. (Singap.), 20 (2022), 121-139.  doi: 10.1142/S0219530521500044.

[8]

Y. LiuH. H. Zhang and Y. Wu, Hard or soft classification? Large-margin unified machines, J. Amer. Statist. Assoc., 106 (2011), 166-177.  doi: 10.1198/jasa.2011.tm10319.

[9]

J. S. MarronJ. M. Todd and J. Ahn, Distance-weighted discrimination, J. Amer. Statist. Assoc., 102 (2007), 1267-1271.  doi: 10.1198/016214507000001120.

[10]

S. Smale and D.-X. Zhou, Learning theory estimates via integral operators and their approximations, Constr. Approx., 26 (2007), 153-172.  doi: 10.1007/s00365-006-0659-y.

[11]

I. Steinwart, Consistency of support vector machines and other regularized kernel classifiers, IEEE Trans. Inform. Theory, 51 (2005), 128-142.  doi: 10.1109/TIT.2004.839514.

[12]

I. Steinwart and A. Christman, Estimating conditional quantiles with the help of the pinball loss, Bernoulli, 17 (2011), 211-225.  doi: 10.3150/10-BEJ267.

[13]

H. W. Sun and Q. Wu, Optimal rates of distributed regression with imperfect kernels, J. Mach. Learn. Res., 22 (2021), Paper No. 171, 34 pp. doi: 10.1007/s00023-020-00966-6.

[14]

A. B. Tsybakov, Optimal aggregation of classifiers in statistical learning, Ann. Statist., 32 (2004), 135-166.  doi: 10.1214/aos/1079120131.

[15]

B. Wang and H. Zou, Another look at distance weighted discrimination, J. R. Stat. Soc. Ser. B. Stat. Methodol., 80 (2018), 177-198. doi: 10.1111/rssb.12244.

[16]

Q. WuY. M. Ying and D.-X. Zhou, Learning rates of least-square regularized regression, Found. Comput. Math., 6 (2006), 171-192.  doi: 10.1007/s10208-004-0155-9.

[17]

D.-H. Xiang, T. Hu and D.-X. Zhou, Approximation analysis of learning algorithms for support vector regression and quantile regression, J. Appl. Math., (2012), Art. ID 902139, 17 pp. doi: 10.1155/2012/902139.

[18]

T. Zhang, Statistical behavior and consistency of classification methods based on convex risk minimization, Ann. Statist., 32 (2004), 56-85.  doi: 10.1214/aos/1079120130.

[1]

Shuhua Wang, Baohuai Sheng. Error analysis of kernel regularized pairwise learning with a strongly convex loss. Mathematical Foundations of Computing, 2022  doi: 10.3934/mfc.2022030

[2]

Shuhua Wang, Zhenlong Chen, Baohuai Sheng. Convergence of online pairwise regression learning with quadratic loss. Communications on Pure and Applied Analysis, 2020, 19 (8) : 4023-4054. doi: 10.3934/cpaa.2020178

[3]

D. Warren, K Najarian. Learning theory applied to Sigmoid network classification of protein biological function using primary protein structure. Conference Publications, 2003, 2003 (Special) : 898-904. doi: 10.3934/proc.2003.2003.898

[4]

Baohuai Sheng, Huanxiang Liu, Huimin Wang. Learning rates for the kernel regularized regression with a differentiable strongly convex loss. Communications on Pure and Applied Analysis, 2020, 19 (8) : 3973-4005. doi: 10.3934/cpaa.2020176

[5]

G. Calafiore, M.C. Campi. A learning theory approach to the construction of predictor models. Conference Publications, 2003, 2003 (Special) : 156-166. doi: 10.3934/proc.2003.2003.156

[6]

Miria Feng, Wenying Feng. Evaluation of parallel and sequential deep learning models for music subgenre classification. Mathematical Foundations of Computing, 2021, 4 (2) : 131-143. doi: 10.3934/mfc.2021008

[7]

Alan Beggs. Learning in monotone bayesian games. Journal of Dynamics and Games, 2015, 2 (2) : 117-140. doi: 10.3934/jdg.2015.2.117

[8]

Yangyang Xu, Wotao Yin, Stanley Osher. Learning circulant sensing kernels. Inverse Problems and Imaging, 2014, 8 (3) : 901-923. doi: 10.3934/ipi.2014.8.901

[9]

Christian Soize, Roger Ghanem. Probabilistic learning on manifolds. Foundations of Data Science, 2020, 2 (3) : 279-307. doi: 10.3934/fods.2020013

[10]

Mauro Maggioni, James M. Murphy. Learning by active nonlinear diffusion. Foundations of Data Science, 2019, 1 (3) : 271-291. doi: 10.3934/fods.2019012

[11]

Nicolás M. Crisosto, Christopher M. Kribs-Zaleta, Carlos Castillo-Chávez, Stephen Wirkus. Community resilience in collaborative learning. Discrete and Continuous Dynamical Systems - B, 2010, 14 (1) : 17-40. doi: 10.3934/dcdsb.2010.14.17

[12]

Gernot Holler, Karl Kunisch. Learning nonlocal regularization operators. Mathematical Control and Related Fields, 2022, 12 (1) : 81-114. doi: 10.3934/mcrf.2021003

[13]

Sriram Nagaraj. Optimization and learning with nonlocal calculus. Foundations of Data Science, 2022, 4 (3) : 323-353. doi: 10.3934/fods.2022009

[14]

Minlong Lin, Ke Tang. Selective further learning of hybrid ensemble for class imbalanced increment learning. Big Data & Information Analytics, 2017, 2 (1) : 1-21. doi: 10.3934/bdia.2017005

[15]

Ziju Shen, Yufei Wang, Dufan Wu, Xu Yang, Bin Dong. Learning to scan: A deep reinforcement learning approach for personalized scanning in CT imaging. Inverse Problems and Imaging, 2022, 16 (1) : 179-195. doi: 10.3934/ipi.2021045

[16]

Ning Zhang, Qiang Wu. Online learning for supervised dimension reduction. Mathematical Foundations of Computing, 2019, 2 (2) : 95-106. doi: 10.3934/mfc.2019008

[17]

Prashant Shekhar, Abani Patra. Hierarchical approximations for data reduction and learning at multiple scales. Foundations of Data Science, 2020, 2 (2) : 123-154. doi: 10.3934/fods.2020008

[18]

Mikhail Langovoy, Akhilesh Gotmare, Martin Jaggi. Unsupervised robust nonparametric learning of hidden community properties. Mathematical Foundations of Computing, 2019, 2 (2) : 127-147. doi: 10.3934/mfc.2019010

[19]

Émilie Chouzenoux, Henri Gérard, Jean-Christophe Pesquet. General risk measures for robust machine learning. Foundations of Data Science, 2019, 1 (3) : 249-269. doi: 10.3934/fods.2019011

[20]

Yang Wang, Zhengfang Zhou. Source extraction in audio via background learning. Inverse Problems and Imaging, 2013, 7 (1) : 283-290. doi: 10.3934/ipi.2013.7.283

 Impact Factor: 

Metrics

  • PDF downloads (60)
  • HTML views (49)
  • Cited by (0)

Other articles
by authors

[Back to Top]