# American Institute of Mathematical Sciences

doi: 10.3934/mfc.2022028
## Error analysis of classification learning algorithms based on LUMs loss

 School of Mathematical Science, University of Jinan, Jinan 250022, China

*Corresponding author: Hongwei Sun

Received  April 2022 Revised  July 2022 Early access August 2022

Fund Project: The second author is supported by National Natural Science Foundation of China (Grants No. 11671171 and 11871167)

In this paper, we study the learning performance of regularized large-margin unified machines (LUMs) for classification problem. The hypothesis space is taken to be a reproducing kernel Hilbert space ${\mathcal H}_K$, and the penalty term is denoted by the norm of the function in ${\mathcal H}_K$. Since the LUM loss functions are differentiable and convex, so the data piling phenomena can be avoided when dealing with the high-dimension low-sample size data. The error analysis of this classification learning machine mainly lies upon the comparison theorem [3] which ensures that the excess classification error can be bounded by the excess generalization error. Under a mild source condition which shows that the minimizer $f_V$ of the generalization error can be approximated by the hypothesis space ${\mathcal H}_K$, and by a leave one out variant technique proposed in [13], satisfying error bound and learning rate about the mean of excess classification error are deduced.

Citation: Xuqing He, Hongwei Sun. Error analysis of classification learning algorithms based on LUMs loss. Mathematical Foundations of Computing, doi: 10.3934/mfc.2022028
##### References:
 [1] N. Aronszajn, Theory of reproducing kernels, Trans. Amer. Math. Soc., 68 (1950), 337-404.  doi: 10.1090/S0002-9947-1950-0051437-7. [2] P. L. Bartlett, M. I. Jordan and J. D. Mcauliffe, Convexity, classification, and risk bounds, J. Amer. Statist. Assoc., 101 (2006), 138-156.  doi: 10.1198/016214505000000907. [3] A. Benabid, J. Fan and D.-H. Xiang, Comparison theorems on large-margin learning, Int. J. Wavelets Multiresolut. Inf. Process., 19 (2021), Paper No. 2150015, 18 pp. doi: 10.1142/S0219691321500156. [4] N. Cristianini and J. Shawe-Taylor, An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods, Cambridge University Press, 2000. [5] F. Critchley and C. Vitiello, The influence of observations on misclassification probability estimates in linear discriminant analysis, Biometrika, 78 (1991), 677-690.  doi: 10.1093/biomet/78.3.677. [6] J. Fan and D.-H. Xiang, Quantitative convergence analysis of kernel based large-margin unified machines, Commun. Pure Appl. Anal., 19 (2020), 4069-4083.  doi: 10.3934/cpaa.2020180. [7] S. Huang, Y. Feng and Q. Wu, Learning theory of minimum error entropy under weak moment conditions, Anal. Appl. (Singap.), 20 (2022), 121-139.  doi: 10.1142/S0219530521500044. [8] Y. Liu, H. H. Zhang and Y. Wu, Hard or soft classification? Large-margin unified machines, J. Amer. Statist. Assoc., 106 (2011), 166-177.  doi: 10.1198/jasa.2011.tm10319. [9] J. S. Marron, J. M. Todd and J. Ahn, Distance-weighted discrimination, J. Amer. Statist. Assoc., 102 (2007), 1267-1271.  doi: 10.1198/016214507000001120. [10] S. Smale and D.-X. Zhou, Learning theory estimates via integral operators and their approximations, Constr. Approx., 26 (2007), 153-172.  doi: 10.1007/s00365-006-0659-y. [11] I. Steinwart, Consistency of support vector machines and other regularized kernel classifiers, IEEE Trans. Inform. Theory, 51 (2005), 128-142.  doi: 10.1109/TIT.2004.839514. [12] I. Steinwart and A. Christman, Estimating conditional quantiles with the help of the pinball loss, Bernoulli, 17 (2011), 211-225.  doi: 10.3150/10-BEJ267. [13] H. W. Sun and Q. Wu, Optimal rates of distributed regression with imperfect kernels, J. Mach. Learn. Res., 22 (2021), Paper No. 171, 34 pp. doi: 10.1007/s00023-020-00966-6. [14] A. B. Tsybakov, Optimal aggregation of classifiers in statistical learning, Ann. Statist., 32 (2004), 135-166.  doi: 10.1214/aos/1079120131. [15] B. Wang and H. Zou, Another look at distance weighted discrimination, J. R. Stat. Soc. Ser. B. Stat. Methodol., 80 (2018), 177-198. doi: 10.1111/rssb.12244. [16] Q. Wu, Y. M. Ying and D.-X. Zhou, Learning rates of least-square regularized regression, Found. Comput. Math., 6 (2006), 171-192.  doi: 10.1007/s10208-004-0155-9. [17] D.-H. Xiang, T. Hu and D.-X. Zhou, Approximation analysis of learning algorithms for support vector regression and quantile regression, J. Appl. Math., (2012), Art. ID 902139, 17 pp. doi: 10.1155/2012/902139. [18] T. Zhang, Statistical behavior and consistency of classification methods based on convex risk minimization, Ann. Statist., 32 (2004), 56-85.  doi: 10.1214/aos/1079120130.
