# American Institute of Mathematical Sciences

November  2020, 3(4): 279-300. doi: 10.3934/mfc.2020018

## Support vector machine classifiers by non-Euclidean margins

 School of Mathematical Sciences, South China Normal University, Guangzhou, Guangdong 510631, China

* Corresponding author: Qi Ye

Received  January 2020 Revised  May 2020 Published  November 2020 Early access  June 2020

Fund Project: The first author is supported by the Innovation Project of Graduate School of South China Normal University. The second author is supported by the Natural Science Foundation of Guangdong Province (2019A1515011995, 2020B1515310013) and the National Natural Science Foundation of China (11931003)

In this article, the classical support vector machine (SVM) classifiers are generalized by the non-Euclidean margins. We first extend the linear models of the SVM classifiers by the non-Euclidean margins including the theorems and algorithms of the SVM classifiers by the hard margins and the soft margins. Specially, the SVM classifiers by the $\infty$-norm margins can be solved by the 1-norm optimization with sparsity. Next, we show that the non-linear models of the SVM classifiers by the $q$-norm margins can be equivalently transferred to the SVM in the $p$-norm reproducing kernel Banach spaces given by the hinge loss, where $1/p+1/q = 1$. Finally, we illustrate the numerical examples of artificial data and real data to compare the different algorithms of the SVM classifiers by the $\infty$-norm margin.

Citation: Ying Lin, Qi Ye. Support vector machine classifiers by non-Euclidean margins. Mathematical Foundations of Computing, 2020, 3 (4) : 279-300. doi: 10.3934/mfc.2020018
##### References:
 [1] B. E. Boser, I. M. Guyon and V. N. Vapnik, A training algorithm for optimal margin classifiers, in Proceedings of the Fifth Annual Workshop on Computational learning theory, ACM, (1992), 144–152. doi: 10.1145/130385.130401. [2] P. Bühlmann and S. van de Geer, Statistics for High-Dimensional Data. Methods, Theory and Applications, Springer Series in Statistics, Springer, Heidelberg, 2011. doi: 10.1007/978-3-642-20192-9. [3] L. Chen and H. Zhang, Statistical margin error bounds for l1-norm support vector machines, Neurocomputing, 339 (2019), 210-216.  doi: 10.1016/j.neucom.2019.02.015. [4] C. Cortes and V. Vapnik, Support-vector networks, Machine Learning, 20 (1995), 273-297.  doi: 10.1007/BF00994018. [5] R. Der and D. Lee, Large-margin classification in banach spaces, Journal of Machine Learning Research - Proceedings Track, 2 (2007), 91-98. [6] I. Ekeland and T. Turnbull, Infinite-Dimensional Optimization and Convexity, Chicago Lectures in Mathematics, University of Chicago Press, Chicago, IL, 1983. [7] T. Hastie, R. Tibshirani and M. Wainwright, Statistical Learning with Sparsity: The Lasso and Generalizations, Monographs on Statistics and Applied Probability, 143. CRC Press, Boca Raton, FL, 2015. [8] L. Huang, C. Liu, L. Tan and Q. Ye, Generalized representer theorems in Banach spaces, Anal. Appl. (Singap.), (2019). doi: 10.1142/S0219530519410100. [9] O. L. Mangasarian, Arbitrary-norm separating plane, Operations Research Letters, 24 (1999), 15-23.  doi: 10.1016/S0167-6377(98)00049-2. [10] J. Platt, Sequential minimal optimization: A fast algorithm for training support vector machines., [11] L. Q. Qi, H. B. Chen and Y. N.Chen, Tensor Eigenvalues and Their Applications, Advances in Mechanics and Mathematics, 39. Springer, Singapore, 2018. doi: 10.1007/978-981-10-8058-6. [12] R. T. Rockafellar, Convex Analysis, Princeton Mathematical Series, No. 28, Princeton University Press, Princeton, N.J., 1970. [13] B. Schölkopf and A. J. Smola, Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond, The MIT Press, Cambridge, 2001. [14] G. H. Song and H. Z. Zhang, Reproducing kernel Banach spaces with the $\ell^1$ norm Ⅱ: Error analysis for regularized least square regression, Neural Comput., 23 (2011), 2713-2729.  doi: 10.1162/NECO_a_00178. [15] G. H. Song, H. Z. Zhang and F. J. Hickernell, Reproducing kernel Banach spaces with the $\ell^1$ norm, Appl. Comput. Harmon. Anal., 34 (2013), 96-116.  doi: 10.1016/j.acha.2012.03.009. [16] I. Steinwart and A. Christmann, Support Vector Machines, Information Science and Statistics, Springer, New York, 2008. [17] V. N. Vapnik, The Nature of Statistical Learning Theory, Springer-Verlag, New York, 1995. doi: 10.1007/978-1-4757-2440-0. [18] Y. S. Xu and Q. Ye, Generalized mercer kernels and reproducing kernel banach spaces, Mem. Amer. Math. Soc., 258 (2019). doi: 10.1090/memo/1243. [19] H. Yang, X. Yang, F. Zhang, Q. Ye and X. Fan, Infinite norm large margin classifier, International Journal of Machine Learning and Cybernetics, 10 (2019), 2449-2457.  doi: 10.1007/s13042-018-0881-y. [20] H. Z. Zhang, Y. S. Xu and J. Zhang, Reproducing kernel Banach spaces for machine learning, J. Mach. Learn. Res., 10 (2009), 2741-2775.  doi: 10.1109/IJCNN.2009.5179093. [21] L. Zhang and W. Zhou, On the sparseness of 1-norm support vector machines, Neural Networks, 23 (2010), 373-385.  doi: 10.1016/j.neunet.2009.11.012. [22] J. Zhu, S. Rosset, R. Tibshirani and T. J. Hastie, 1-norm support vector machines, in Advances in Neural Information Processing Systems, (2004), 49–56.

show all references

##### References:
 [1] B. E. Boser, I. M. Guyon and V. N. Vapnik, A training algorithm for optimal margin classifiers, in Proceedings of the Fifth Annual Workshop on Computational learning theory, ACM, (1992), 144–152. doi: 10.1145/130385.130401. [2] P. Bühlmann and S. van de Geer, Statistics for High-Dimensional Data. Methods, Theory and Applications, Springer Series in Statistics, Springer, Heidelberg, 2011. doi: 10.1007/978-3-642-20192-9. [3] L. Chen and H. Zhang, Statistical margin error bounds for l1-norm support vector machines, Neurocomputing, 339 (2019), 210-216.  doi: 10.1016/j.neucom.2019.02.015. [4] C. Cortes and V. Vapnik, Support-vector networks, Machine Learning, 20 (1995), 273-297.  doi: 10.1007/BF00994018. [5] R. Der and D. Lee, Large-margin classification in banach spaces, Journal of Machine Learning Research - Proceedings Track, 2 (2007), 91-98. [6] I. Ekeland and T. Turnbull, Infinite-Dimensional Optimization and Convexity, Chicago Lectures in Mathematics, University of Chicago Press, Chicago, IL, 1983. [7] T. Hastie, R. Tibshirani and M. Wainwright, Statistical Learning with Sparsity: The Lasso and Generalizations, Monographs on Statistics and Applied Probability, 143. CRC Press, Boca Raton, FL, 2015. [8] L. Huang, C. Liu, L. Tan and Q. Ye, Generalized representer theorems in Banach spaces, Anal. Appl. (Singap.), (2019). doi: 10.1142/S0219530519410100. [9] O. L. Mangasarian, Arbitrary-norm separating plane, Operations Research Letters, 24 (1999), 15-23.  doi: 10.1016/S0167-6377(98)00049-2. [10] J. Platt, Sequential minimal optimization: A fast algorithm for training support vector machines., [11] L. Q. Qi, H. B. Chen and Y. N.Chen, Tensor Eigenvalues and Their Applications, Advances in Mechanics and Mathematics, 39. Springer, Singapore, 2018. doi: 10.1007/978-981-10-8058-6. [12] R. T. Rockafellar, Convex Analysis, Princeton Mathematical Series, No. 28, Princeton University Press, Princeton, N.J., 1970. [13] B. Schölkopf and A. J. Smola, Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond, The MIT Press, Cambridge, 2001. [14] G. H. Song and H. Z. Zhang, Reproducing kernel Banach spaces with the $\ell^1$ norm Ⅱ: Error analysis for regularized least square regression, Neural Comput., 23 (2011), 2713-2729.  doi: 10.1162/NECO_a_00178. [15] G. H. Song, H. Z. Zhang and F. J. Hickernell, Reproducing kernel Banach spaces with the $\ell^1$ norm, Appl. Comput. Harmon. Anal., 34 (2013), 96-116.  doi: 10.1016/j.acha.2012.03.009. [16] I. Steinwart and A. Christmann, Support Vector Machines, Information Science and Statistics, Springer, New York, 2008. [17] V. N. Vapnik, The Nature of Statistical Learning Theory, Springer-Verlag, New York, 1995. doi: 10.1007/978-1-4757-2440-0. [18] Y. S. Xu and Q. Ye, Generalized mercer kernels and reproducing kernel banach spaces, Mem. Amer. Math. Soc., 258 (2019). doi: 10.1090/memo/1243. [19] H. Yang, X. Yang, F. Zhang, Q. Ye and X. Fan, Infinite norm large margin classifier, International Journal of Machine Learning and Cybernetics, 10 (2019), 2449-2457.  doi: 10.1007/s13042-018-0881-y. [20] H. Z. Zhang, Y. S. Xu and J. Zhang, Reproducing kernel Banach spaces for machine learning, J. Mach. Learn. Res., 10 (2009), 2741-2775.  doi: 10.1109/IJCNN.2009.5179093. [21] L. Zhang and W. Zhou, On the sparseness of 1-norm support vector machines, Neural Networks, 23 (2010), 373-385.  doi: 10.1016/j.neunet.2009.11.012. [22] J. Zhu, S. Rosset, R. Tibshirani and T. J. Hastie, 1-norm support vector machines, in Advances in Neural Information Processing Systems, (2004), 49–56.
Examples of Euclidean margins and non-Euclidean margins are illustrated. The black line is the decision boundary. The distance of the gap between the two red dashed lines is the margin. We can see the difference between these two classifiers
The difference between hard margin and soft margin solutions
The geometrical interpretation of distance from a point to a plane
Maximal margin classifiers by 2 norm and $\infty$ norm in the case of 2 distinct points. We can see the difference between the abilities to obtain sparsity
The geometric explanation for infinite solutions for the SVM classifier by $\infty$-norm margin
The geometric explanation for unique solution for the SVM classifier by $\infty$-norm margin
The geometrical interpretation of the kernel tricks. In the original space, the data set is not linear separable, see Figure 6a. After the feature mapping using kernel function, the data set is linear separable in higher dimensional space, see Figure 6b. Finally, the decision boundary in higher dimensional space can be projected into the original space and becomes the non-linear boundary in the original space
Noiseless and noisy data sets. The classes overlaps in noisy data set
Three classifiers obtained by three different optimizations when $C = 0.5$, $\lambda = 2$
Two classifiers obtained by two different optimizations when $C = 0.1$, $\lambda = 10$
The comparison between handwritten digits 6 and 9. The first row is 6 and the second row is 9
The comparison between handwritten alphabets O and Q. The first row is O and the second row is Q
The results of experiments on MNIST. The row Linear SVM classifier by ∞-norm margin is the result obtained by (9) where $\lambda = 0.1$. The row Kernel SVM classifier by ∞-norm margin is the result obtained by (15) where $\lambda = 0.1$. The kernel function used here is Gaussian kernel $K(\boldsymbol x, \boldsymbol x') = \exp \frac{-\|\boldsymbol x- \boldsymbol x'\|_{2}^{2}}{\sigma^2}$ where $\sigma = 9.6$
 MNIST Training Errors Test Errors Sparsity Linear SVM classifier by ∞-norm margin 0/9939 5/1967 654/785 Kernel SVM classifier by ∞-norm margin 0/9939 4/1967 1760/9939
 MNIST Training Errors Test Errors Sparsity Linear SVM classifier by ∞-norm margin 0/9939 5/1967 654/785 Kernel SVM classifier by ∞-norm margin 0/9939 4/1967 1760/9939
The results of experiments on Handwritten Alphabets Database. The row Linear SVM classifier by ∞-norm margin is the result obtained by (9) where $\lambda = 0.001$. The row Kernel SVM classifier by ∞-norm margin is the result obtained by (15) where $\lambda = 0.001$. The kernel function used here is Gaussian kernel $K(\boldsymbol x, \boldsymbol x') = \exp \frac{-\|\boldsymbol x- \boldsymbol x'\|_{2}^{2}}{\sigma^2}$ where $\sigma = 2600$
 Handwritten Alphabets Training Errors Test Errors Sparsity Linear SVM> classifier by ∞-norm margin 0/7000 69/3000 401/785 Kernel SVM classifier by ∞-norm margin 0/7000 14/3000 1339/7000
 Handwritten Alphabets Training Errors Test Errors Sparsity Linear SVM> classifier by ∞-norm margin 0/7000 69/3000 401/785 Kernel SVM classifier by ∞-norm margin 0/7000 14/3000 1339/7000
 [1] Xin Yan, Hongmiao Zhu. A kernel-free fuzzy support vector machine with Universum. Journal of Industrial and Management Optimization, 2021  doi: 10.3934/jimo.2021184 [2] Herbert Koch. Partial differential equations with non-Euclidean geometries. Discrete and Continuous Dynamical Systems - S, 2008, 1 (3) : 481-504. doi: 10.3934/dcdss.2008.1.481 [3] Liming Yang, Yannan Chao. A new semi-supervised classifier based on maximum vector-angular margin. Journal of Industrial and Management Optimization, 2017, 13 (2) : 609-622. doi: 10.3934/jimo.2016035 [4] Yubo Yuan, Weiguo Fan, Dongmei Pu. Spline function smooth support vector machine for classification. Journal of Industrial and Management Optimization, 2007, 3 (3) : 529-542. doi: 10.3934/jimo.2007.3.529 [5] Yubo Yuan. Canonical duality solution for alternating support vector machine. Journal of Industrial and Management Optimization, 2012, 8 (3) : 611-621. doi: 10.3934/jimo.2012.8.611 [6] Ye Tian, Wei Yang, Gene Lai, Menghan Zhao. Predicting non-life insurer's insolvency using non-kernel fuzzy quadratic surface support vector machines. Journal of Industrial and Management Optimization, 2019, 15 (2) : 985-999. doi: 10.3934/jimo.2018081 [7] Ying Lin, Rongrong Lin, Qi Ye. Sparse regularized learning in the reproducing kernel banach spaces with the $\ell^1$ norm. Mathematical Foundations of Computing, 2020, 3 (3) : 205-218. doi: 10.3934/mfc.2020020 [8] Jian Luo, Shu-Cherng Fang, Yanqin Bai, Zhibin Deng. Fuzzy quadratic surface support vector machine based on fisher discriminant analysis. Journal of Industrial and Management Optimization, 2016, 12 (1) : 357-373. doi: 10.3934/jimo.2016.12.357 [9] Xin Li, Ziguan Cui, Linhui Sun, Guanming Lu, Debnath Narayan. Research on iterative repair algorithm of Hyperchaotic image based on support vector machine. Discrete and Continuous Dynamical Systems - S, 2019, 12 (4&5) : 1199-1218. doi: 10.3934/dcdss.2019083 [10] Fatemeh Bazikar, Saeed Ketabchi, Hossein Moosaei. Smooth augmented Lagrangian method for twin bounded support vector machine. Numerical Algebra, Control and Optimization, 2021  doi: 10.3934/naco.2021027 [11] Ahmad Mousavi, Zheming Gao, Lanshan Han, Alvin Lim. Quadratic surface support vector machine with L1 norm regularization. Journal of Industrial and Management Optimization, 2022, 18 (3) : 1835-1861. doi: 10.3934/jimo.2021046 [12] Kaitlyn (Voccola) Muller. A reproducing kernel Hilbert space framework for inverse scattering problems within the Born approximation. Inverse Problems and Imaging, 2019, 13 (6) : 1327-1348. doi: 10.3934/ipi.2019058 [13] Ali Akgül, Mustafa Inc, Esra Karatas. Reproducing kernel functions for difference equations. Discrete and Continuous Dynamical Systems - S, 2015, 8 (6) : 1055-1064. doi: 10.3934/dcdss.2015.8.1055 [14] Ali Akgül. A new application of the reproducing kernel method. Discrete and Continuous Dynamical Systems - S, 2021, 14 (7) : 2041-2053. doi: 10.3934/dcdss.2020261 [15] Ning Lu, Ying Liu. Application of support vector machine model in wind power prediction based on particle swarm optimization. Discrete and Continuous Dynamical Systems - S, 2015, 8 (6) : 1267-1276. doi: 10.3934/dcdss.2015.8.1267 [16] Huiqin Zhang, JinChun Wang, Meng Wang, Xudong Chen. Integration of cuckoo search and fuzzy support vector machine for intelligent diagnosis of production process quality. Journal of Industrial and Management Optimization, 2022, 18 (1) : 195-217. doi: 10.3934/jimo.2020150 [17] Qianru Zhai, Ye Tian, Jingyue Zhou. A SMOTE-based quadratic surface support vector machine for imbalanced classification with mislabeled information. Journal of Industrial and Management Optimization, 2022  doi: 10.3934/jimo.2021230 [18] Jun Fan, Dao-Hong Xiang. Quantitative convergence analysis of kernel based large-margin unified machines. Communications on Pure and Applied Analysis, 2020, 19 (8) : 4069-4083. doi: 10.3934/cpaa.2020180 [19] K. Schittkowski. Optimal parameter selection in support vector machines. Journal of Industrial and Management Optimization, 2005, 1 (4) : 465-476. doi: 10.3934/jimo.2005.1.465 [20] Pooja Louhan, S. K. Suneja. On fractional vector optimization over cones with support functions. Journal of Industrial and Management Optimization, 2017, 13 (2) : 549-572. doi: 10.3934/jimo.2016031

Impact Factor: