Data set | Artificial I | Artificial II | Artificial III | Artificial IV | Artificial 3-D |
$ n $ | 3 | 3 | 5 | 10 | 3 |
Sample size ($ N_1 $/$ N_2 $) | 67/58 | 79/71 | 106/81 | 204/171 | 99/101 |
We propose $ \ell_1 $ norm regularized quadratic surface support vector machine models for binary classification in supervised learning. We establish some desired theoretical properties, including the existence and uniqueness of the optimal solution, reduction to the standard SVMs over (almost) linearly separable data sets, and detection of true sparsity pattern over (almost) quadratically separable data sets if the penalty parameter on the $ \ell_1 $ norm is large enough. We also demonstrate their promising practical efficiency by conducting various numerical experiments on both synthetic and publicly available benchmark data sets.
Citation: |
Table 1. Basic information of artificial data sets
Data set | Artificial I | Artificial II | Artificial III | Artificial IV | Artificial 3-D |
$ n $ | 3 | 3 | 5 | 10 | 3 |
Sample size ($ N_1 $/$ N_2 $) | 67/58 | 79/71 | 106/81 | 204/171 | 99/101 |
Table 2. Description of 2-class data sets used
Data set | # of features | name of class | sample size |
Iris | 4 | versicolour | 50 |
virginica | 50 | ||
Car Evaluation | 6 | unacc | 1210 |
acc | 384 | ||
Diabetes | 8 | yes | 268 |
no | 500 | ||
German Credit Data | 20 | creditworthy | 700 |
non-creditworthy | 300 | ||
Ionosphere | 34 | good | 225 |
bad | 126 |
Table 3. Iris results
Training Rate $k$% | Model | Accuracy score (%) | CPU time (s) | |||
mean | std | min | max | |||
10 | L1-SQSSVM | 91.93 | 5.49 | 63.33 | 98.89 | 0.058 |
SQSSVM | 89.33 | 4.07 | 81.11 | 96.67 | 0.050 | |
SVM-Quad | 89.49 | 4.91 | 80.00 | 97.78 | 0.003 | |
SVM | 89.62 | 4.10 | 78.89 | 97.78 | 0.001 | |
20 | L1-SQSSVM | 94.33 | 2.20 | 90.00 | 98.75 | 0.063 |
SQSSVM | 92.60 | 2.57 | 82.50 | 96.25 | 0.055 | |
SVM-Quad | 93.03 | 2.72 | 86.25 | 98.75 | 0.002 | |
SVM | 93.00 | 3.01 | 82.50 | 97.50 | 0.002 | |
40 | L1-SQSSVM | 95.40 | 2.76 | 86.67 | 100.00 | 0.075 |
SQSSVM | 93.97 | 3.73 | 78.33 | 100.00 | 0.062 | |
SVM-Quad | 94.30 | 3.38 | 81.67 | 98.33 | 0.002 | |
SVM | 94.50 | 3.29 | 85.00 | 100.00 | 0.002 |
Table 4. Car evaluation results
Training Rate $k$% | Model | Accuracy score (%) | CPU time (s) | |||
mean | std | min | max | |||
10 | L1-SQSSVM | 90.48 | 2.13 | 83.48 | 95.05 | 0.961 |
SQSSVM | 90.48 | 2.35 | 80.98 | 94.49 | 0.937 | |
SVM-Quad | 88.32 | 2.70 | 80.98 | 93.45 | 0.023 | |
SVM | 84.40 | 1.09 | 81.88 | 86.90 | 0.001 | |
20 | L1-SQSSVM | 92.81 | 1.17 | 89.50 | 95.30 | 1.109 |
SQSSVM | 92.77 | 1.21 | 89.58 | 95.30 | 1.117 | |
SVM-Quad | 92.30 | 1.14 | 88.56 | 94.83 | 0.001 | |
SVM | 85.08 | 0.91 | 83.23 | 86.91 | 0.008 | |
40 | L1-SQSSVM | 95.80 | 0.73 | 93.83 | 97.07 | 1.501 |
SQSSVM | 95.76 | 0.77 | 93.83 | 97.28 | 1.521 | |
SVM-Quad | 93.69 | 0.83 | 91.43 | 95.72 | 0.087 | |
SVM | 85.26 | 1.09 | 81.71 | 87.36 | 0.003 |
Table 5. Diabetes results
Training Rate $k$% | Model | Accuracy score (%) | CPU time (s) | |||
mean | std | min | max | |||
10 | L1-SQSSVM | 74.21 | 1.53 | 71.24 | 76.01 | 0.692 |
SQSSVM | 64.38 | 3.65 | 57.80 | 71.68 | 0.679 | |
SVM-Quad | 66.07 | 4.53 | 57.66 | 71.53 | 0.102 | |
SVM | 72.95 | 3.49 | 65.61 | 76.16 | 0.003 | |
20 | L1-SQSSVM | 76.28 | 0.63 | 75.12 | 77.07 | 0.924 |
SQSSVM | 69.40 | 2.49 | 65.85 | 72.52 | 0.950 | |
SVM-Quad | 70.28 | 2.30 | 65.85 | 73.82 | 9.080 | |
SVM | 74.86 | 1.68 | 71.54 | 77.07 | 0.009 | |
40 | L1-SQSSVM | 76.62 | 1.83 | 73.97 | 79.61 | 1.459 |
SQSSVM | 74.34 | 1.99 | 71.15 | 77.01 | 1.490 | |
SVM-Quad | 75.21 | 1.23 | 73.54 | 77.22 | 86.561 | |
SVM | 76.29 | 2.15 | 73.10 | 80.26 | 0.006 |
Table 6. German Credit Data results
Training Rate $k$% | Model | Accuracy score (%) | CPU time (s) | |||
mean | std | min | max | |||
10 | L1-SQSSVM | 71.86 | 1.85 | 68.44 | 75.00 | 1.596 |
SQSSVM | 67.00 | 3.02 | 63.67 | 71.67 | 1.598 | |
SVM-Quad | 68.29 | 2.61 | 64.00 | 72.44 | 0.006 | |
SVM | 69.49 | 3.58 | 61.89 | 74.33 | 0.002 | |
20 | L1-SQSSVM | 73.88 | 1.29 | 71.38 | 75.88 | 2.572 |
SQSSVM | 67.55 | 2.78 | 62.88 | 72.88 | 2.541 | |
SVM-Quad | 67.78 | 2.75 | 64.13 | 72.13 | 0.005 | |
SVM | 73.86 | 1.22 | 71.25 | 75.88 | 0.005 | |
40 | L1-SQSSVM | 74.86 | 1.25 | 72.00 | 77.00 | 4.622 |
SQSSVM | 65.99 | 2.66 | 61.17 | 69.83 | 4.456 | |
SVM-Quad | 65.13 | 1.19 | 63.50 | 67.00 | 0.262 | |
SVM | 74.73 | 1.07 | 73.50 | 77.00 | 0.005 |
Table 7. Ionosphere results
Training Rate $k$% | Model | Accuracy score (%) | CPU time (s) | |||
mean | std | min | max | |||
10 | L1-SQSSVM | 82.75 | 3.69 | 76.27 | 88.29 | 4.141 |
SQSSVM | 79.24 | 3.15 | 74.37 | 83.86 | 3.945 | |
SVM-Quad | 83.48 | 2.39 | 78.48 | 78.48 | 0.003 | |
SVM | 80.09 | 2.24 | 75.95 | 82.28 | 0.006 | |
20 | L1-SQSSVM | 87.90 | 3.72 | 80.07 | 92.53 | 5.096 |
SQSSVM | 87.19 | 4.32 | 77.94 | 91.81 | 4.854 | |
SVM-Quad | 86.16 | 1.24 | 84.34 | 84.34 | 0.005 | |
SVM | 82.03 | 5.40 | 67.97 | 86.83 | 0.002 | |
40 | L1-SQSSVM | 90.28 | 3.33 | 83.41 | 94.31 | 7.063 |
SQSSVM | 89.53 | 4.23 | 81.99 | 94.31 | 6.781 | |
SVM-Quad | 86.40 | 3.03 | 81.04 | 91.00 | 0.007 | |
SVM | 83.60 | 3.46 | 76.78 | 88.63 | 0.006 |
Table 8. Summary of obtained theoretical results in this paper
![]() |
L1-QSSVM | L1-SQSSVM |
Linearly Separable |
● Solution existence ● z* is almost always unique ● Equivalence with SVM for large enough $\lambda$ |
● Solution existence ● z* is almost always unique ● Equivalence with SSVM for large enough $\lambda$ ● Solution is almost always unique with $\xi^*=\boldsymbol 0$ for large enough $\mu$ |
Quadratically Separable |
● Solution existence ● z* is almost always unique ● Capturing possible sparsity of $\textbf W^*$ for large enough $\lambda$ |
● Solution existence ● z* is almost always unique ● Solution is almost always unique with $\boldsymbol \xi^*=\boldsymbol 0$ for large enough $\mu$ ● Capturing possible sparsity of $\textbf W^*$ for large enough $\lambda$ |
$ \begin{array}{c} \rm{Neither} \end{array} $ | $ \begin{array}{l} \bullet \rm{ Solution existence}\\ \bullet \boldsymbol z^* \rm{ is almost always unique} \end{array} $ |
[1] |
Arthur Asuncion and David Newman, UCI Machine Learning Repository, 2007.,
![]() |
[2] |
Y. Bai, X. Han, T. Chen and H. Yu, Quadratic kernel-free least squares support vector machine for target diseases classification, Journal of Combinatorial Optimization, 30 (2015), 850-870.
doi: 10.1007/s10878-015-9848-z.![]() ![]() ![]() |
[3] |
D. P. Bertsekas, Nonlinear programming, Journal of the Operational Research Society, 48 (1997), 334-334.
![]() ![]() |
[4] |
J. Borwein and A. S. Lewis, Convex Analysis and Nonlinear Optimization: Theory and Examples, Springer Science & Business Media, 2010.
doi: 10.1007/978-0-387-31256-9.![]() ![]() ![]() |
[5] |
C. Cortes and V. Vapnik, Support-vector networks, Machine Learning, 20 (1995), 273-297.
doi: 10.1007/BF00994018.![]() ![]() |
[6] |
N. Cristianini and J. Shawe-Taylor, An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods, Cambridge University Press, 2000.
doi: 10.1017/CBO9780511801389.![]() ![]() |
[7] |
I. Dagher, Quadratic kernel-free non-linear support vector machine, Journal of Global Optimization, 41 (2008), 15-30.
doi: 10.1007/s10898-007-9162-0.![]() ![]() ![]() |
[8] |
Z. Dai and F. Wen, A generalized approach to sparse and stable portfolio optimization problem, Journal of Industrial and Management Optimization, 14 (2018), 1651-1666.
doi: 10.3934/jimo.2018025.![]() ![]() ![]() |
[9] |
N. Deng, Y. Tian and C. Zhang, Support Vector Machines: Optimization Based Theory, Algorithms, and Extensions, Chapman and Hall/CRC, 2012.
![]() ![]() |
[10] |
M. Di and E. M. Joo, A survey of machine learning in wireless sensor networks from networking and application perspectives, 2007 6th International Conference on Information, Communications & Signal Processing, IEEE, (2007), 1-5.
![]() |
[11] |
J. Gallier, Schur complements and applications, Geometric Methods and Applications, Springer, (2011), 431-437.
doi: 10.1007/978-1-4419-9961-0.![]() ![]() ![]() |
[12] |
Z. Gao, S.-C. Fang, J. Luo and and N. Medhin, A Kernel-Free Double Well Potential Support Vector Machine with Applications, European Journal of Operational Research, 290 (2021), 248-262.
doi: 10.1016/j.ejor.2020.10.040.![]() ![]() ![]() |
[13] |
Z. Gao and G. Petrova, Rescaled pure greedy algorithm for convex optimization, Calcolo, 56 (2019), 15.
doi: 10.1007/s10092-019-0311-x.![]() ![]() ![]() |
[14] |
B. Ghaddar and J. Naoum-Sawaya, High dimensional data classification and feature selection using support vector machines, European Journal of Operational Research, 265 (2018), 993-1004.
doi: 10.1016/j.ejor.2017.08.040.![]() ![]() ![]() |
[15] |
Y. Hao and F. Meng, A new method on gene selection for tissue classification, Journal of Industrial and Management Optimization, 3 (2007), 739.
doi: 10.3934/jimo. 2007.3.739.![]() ![]() ![]() |
[16] |
T. K. Ho and M. Basu, Complexity measures of supervised classification problems, IEEE Transactions on Pattern Analysis & Machine Intelligence, (2002), 289-300.
![]() |
[17] |
D. S. Kim, N. N. Tam and and N. D. Yen, Solution existence and stability of quadratically constrained convex quadratic programs, Optimization Letters, 6 (2012), 363-373.
doi: 10.1007/s11590-011-0300-8.![]() ![]() ![]() |
[18] |
P. Langley and H. A. Simon, Applications of machine learning and rule induction, Communications of the ACM, 38 (1995), 54-64.
doi: 10.21236/ADA292607.![]() ![]() |
[19] |
K. Lounici, M. Pontil, A. B. Tsybakov and S. Van De Geer, Taking Advantage of Sparsity in Multi-Task Learning, arXiv preprint, arXiv: 0903.1468, 2009.
![]() |
[20] |
J. Luo, S.-C. Fang, Y. Bai and Z. Deng, Fuzzy quadratic surface support vector machine based on Fisher discriminant analysis, Journal of Industrial and Management Optimization, 12 (2016), 357-373.
doi: 10.3934/jimo.2016.12.357.![]() ![]() ![]() |
[21] |
J. Luo, S. -C. Fang, Z. Deng and X. Guo, Soft quadratic surface support vector machine for binary classification, Asia-Pacific Journal of Operational Research, 33 (2016), 1650046.
doi: 10.1142/S0217595916500469.![]() ![]() ![]() |
[22] |
J. Luo, T. Hong and S.-C. Fang, Benchmarking robustness of load forecasting models under data integrity attacks, International Journal of Forecasting, 34 (2018), 89-104.
doi: 10.1016/j.ijforecast.2017.08.004.![]() ![]() |
[23] |
J. R. Magnus and H. Neudecker, The elimination matrix: Some lemmas and applications, SIAM Journal on Algebraic Discrete Methods, 1 (1980), 422-449.
doi: 10.1137/0601049.![]() ![]() ![]() |
[24] |
O. L. Mangasarian, Uniqueness of solution in linear programming, Linear Algebra and its Applications, 25 (1979), 151-162.
doi: 10.1016/0024-3795(79)90014-4.![]() ![]() ![]() |
[25] |
L. Monostori, A. Márkus, H. Van Brussel and E. Westkämpfer, Machine learning approaches to manufacturing, CIRP Annals, 45 (1996), 675-712.
doi: 10.1016/S0007-8506(L1-QSSVM")30216-6.![]() ![]() |
[26] |
A. Mousavi, M. Rezaee and R. Ayanzadeh, A survey on compressive sensing: classical results and recent advancements, Journal of Mathematical Modeling, 8 (2020), 309-344.
![]() ![]() |
[27] |
S. Mousavi and J. Shen, Solution uniqueness of convex piecewise affine functions based optimization with applications to constrained $\ell_1$ minimization, ESAIM: Control, Optimisation and Calculus of Variations, 25 (2019), 56.
doi: 10.1051/cocv/2018061.![]() ![]() ![]() |
[28] |
F. Pedregosa, et al., Scikit-learn: Machine learning in Python, Journal of Machine Learning Research, 12 (2011), 2825-2830.
![]() ![]() |
[29] |
H. Qiu, X. Chen, W. Liu, G. Zhou, Y. Wang and J. Lai, A fast $\ell_1$-solver and its applications to robust face recognition, Journal of Industrial and Management Optimization, 8 (2012), 163-178.
doi: 10.3934/jimo.2012.8.163.![]() ![]() ![]() |
[30] |
R. Saab, R. Chartrand and O. Yilmaz, Stable sparse approximations via nonconvex optimization, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing, (2008), 3885-3888.
doi: 10.1109/ICASSP. 2008.4518502.![]() ![]() |
[31] |
B. Scholkopf and A. J. Smola, Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond, MIT press, 2001.
doi: 10.7551/mitpress/4175.001.0001.![]() ![]() |
[32] |
J. Shen and S. Mousavi, Least sparsity of $p$-norm based optimization problems with $p>1$, SIAM Journal on Optimization, 28 (2018), 2721-2751.
doi: 10.1137/17M1140066.![]() ![]() ![]() |
[33] |
J. Shen and S. Mousavi, Exact Support and Vector Recovery of Constrained Sparse VVectors via Constrained Matching Pursuit, arXiv preprint, arXiv: 1903.07236, 2019.
![]() |
[34] |
C. Zhang, J. Wang and N. Xiu, Robust and sparse portfolio model for index tracking, Journal of Industrial and Management Optimization, 15 (2019), 1001-1015.
doi: 10.3934/jimo.![]() ![]() ![]() |
L1-QSSVM performance on linearly and quadratically separable data sets
Influence of the parameter
Influence of the parameter
Accuracy score against the parameter
Sparsity pattern detection using L1-SQSSVM as parameter