\`x^2+y_1+z_12^34\`
Advanced Search
Article Contents
Article Contents

Quadratic surface support vector machine with L1 norm regularization

  • * Corresponding author: Ahmad Mousavi

    * Corresponding author: Ahmad Mousavi 

This research was done in the Research and Development Department of Precima, which fully supported the work. The first two authors contributed equally

Abstract Full Text(HTML) Figure(5) / Table(8) Related Papers Cited by
  • We propose $ \ell_1 $ norm regularized quadratic surface support vector machine models for binary classification in supervised learning. We establish some desired theoretical properties, including the existence and uniqueness of the optimal solution, reduction to the standard SVMs over (almost) linearly separable data sets, and detection of true sparsity pattern over (almost) quadratically separable data sets if the penalty parameter on the $ \ell_1 $ norm is large enough. We also demonstrate their promising practical efficiency by conducting various numerical experiments on both synthetic and publicly available benchmark data sets.

    Mathematics Subject Classification: Primary: 58F15, 58F17; Secondary: 53C35.

    Citation:

    \begin{equation} \\ \end{equation}
  • 加载中
  • Figure 1.  L1-QSSVM performance on linearly and quadratically separable data sets

    Figure 2.  Influence of the parameter $ \lambda $ on the curvature of the optimal solution of L1-SQSSVM

    Figure 3.  Influence of the parameter $ \mu $ on the behavior of the optimal solution of L1-SQSSVM

    Figure 4.  Accuracy score against the parameter $ \lambda $

    Figure 5.  Sparsity pattern detection using L1-SQSSVM as parameter $ \lambda $ varies

    Table 1.  Basic information of artificial data sets

    Data set Artificial I Artificial II Artificial III Artificial IV Artificial 3-D
    $ n $ 3 3 5 10 3
    Sample size ($ N_1 $/$ N_2 $) 67/58 79/71 106/81 204/171 99/101
     | Show Table
    DownLoad: CSV

    Table 2.  Description of 2-class data sets used

    Data set # of features name of class sample size
    Iris 4 versicolour 50
    virginica 50
    Car Evaluation 6 unacc 1210
    acc 384
    Diabetes 8 yes 268
    no 500
    German Credit Data 20 creditworthy 700
    non-creditworthy 300
    Ionosphere 34 good 225
    bad 126
     | Show Table
    DownLoad: CSV

    Table 3.  Iris results

    Training Rate $k$% Model Accuracy score (%) CPU time (s)
    mean std min max
    10 L1-SQSSVM 91.93 5.49 63.33 98.89 0.058
    SQSSVM 89.33 4.07 81.11 96.67 0.050
    SVM-Quad 89.49 4.91 80.00 97.78 0.003
    SVM 89.62 4.10 78.89 97.78 0.001
    20 L1-SQSSVM 94.33 2.20 90.00 98.75 0.063
    SQSSVM 92.60 2.57 82.50 96.25 0.055
    SVM-Quad 93.03 2.72 86.25 98.75 0.002
    SVM 93.00 3.01 82.50 97.50 0.002
    40 L1-SQSSVM 95.40 2.76 86.67 100.00 0.075
    SQSSVM 93.97 3.73 78.33 100.00 0.062
    SVM-Quad 94.30 3.38 81.67 98.33 0.002
    SVM 94.50 3.29 85.00 100.00 0.002
     | Show Table
    DownLoad: CSV

    Table 4.  Car evaluation results

    Training Rate $k$% Model Accuracy score (%) CPU time (s)
    mean std min max
    10 L1-SQSSVM 90.48 2.13 83.48 95.05 0.961
    SQSSVM 90.48 2.35 80.98 94.49 0.937
    SVM-Quad 88.32 2.70 80.98 93.45 0.023
    SVM 84.40 1.09 81.88 86.90 0.001
    20 L1-SQSSVM 92.81 1.17 89.50 95.30 1.109
    SQSSVM 92.77 1.21 89.58 95.30 1.117
    SVM-Quad 92.30 1.14 88.56 94.83 0.001
    SVM 85.08 0.91 83.23 86.91 0.008
    40 L1-SQSSVM 95.80 0.73 93.83 97.07 1.501
    SQSSVM 95.76 0.77 93.83 97.28 1.521
    SVM-Quad 93.69 0.83 91.43 95.72 0.087
    SVM 85.26 1.09 81.71 87.36 0.003
     | Show Table
    DownLoad: CSV

    Table 5.  Diabetes results

    Training Rate $k$% Model Accuracy score (%) CPU time (s)
    mean std min max
    10 L1-SQSSVM 74.21 1.53 71.24 76.01 0.692
    SQSSVM 64.38 3.65 57.80 71.68 0.679
    SVM-Quad 66.07 4.53 57.66 71.53 0.102
    SVM 72.95 3.49 65.61 76.16 0.003
    20 L1-SQSSVM 76.28 0.63 75.12 77.07 0.924
    SQSSVM 69.40 2.49 65.85 72.52 0.950
    SVM-Quad 70.28 2.30 65.85 73.82 9.080
    SVM 74.86 1.68 71.54 77.07 0.009
    40 L1-SQSSVM 76.62 1.83 73.97 79.61 1.459
    SQSSVM 74.34 1.99 71.15 77.01 1.490
    SVM-Quad 75.21 1.23 73.54 77.22 86.561
    SVM 76.29 2.15 73.10 80.26 0.006
     | Show Table
    DownLoad: CSV

    Table 6.  German Credit Data results

    Training Rate $k$% Model Accuracy score (%) CPU time (s)
    mean std min max
    10 L1-SQSSVM 71.86 1.85 68.44 75.00 1.596
    SQSSVM 67.00 3.02 63.67 71.67 1.598
    SVM-Quad 68.29 2.61 64.00 72.44 0.006
    SVM 69.49 3.58 61.89 74.33 0.002
    20 L1-SQSSVM 73.88 1.29 71.38 75.88 2.572
    SQSSVM 67.55 2.78 62.88 72.88 2.541
    SVM-Quad 67.78 2.75 64.13 72.13 0.005
    SVM 73.86 1.22 71.25 75.88 0.005
    40 L1-SQSSVM 74.86 1.25 72.00 77.00 4.622
    SQSSVM 65.99 2.66 61.17 69.83 4.456
    SVM-Quad 65.13 1.19 63.50 67.00 0.262
    SVM 74.73 1.07 73.50 77.00 0.005
     | Show Table
    DownLoad: CSV

    Table 7.  Ionosphere results

    Training Rate $k$% Model Accuracy score (%) CPU time (s)
    mean std min max
    10 L1-SQSSVM 82.75 3.69 76.27 88.29 4.141
    SQSSVM 79.24 3.15 74.37 83.86 3.945
    SVM-Quad 83.48 2.39 78.48 78.48 0.003
    SVM 80.09 2.24 75.95 82.28 0.006
    20 L1-SQSSVM 87.90 3.72 80.07 92.53 5.096
    SQSSVM 87.19 4.32 77.94 91.81 4.854
    SVM-Quad 86.16 1.24 84.34 84.34 0.005
    SVM 82.03 5.40 67.97 86.83 0.002
    40 L1-SQSSVM 90.28 3.33 83.41 94.31 7.063
    SQSSVM 89.53 4.23 81.99 94.31 6.781
    SVM-Quad 86.40 3.03 81.04 91.00 0.007
    SVM 83.60 3.46 76.78 88.63 0.006
     | Show Table
    DownLoad: CSV

    Table 8.  Summary of obtained theoretical results in this paper

    L1-QSSVM L1-SQSSVM
    Linearly
    Separable
    ● Solution existence
    z* is almost always unique
    ● Equivalence with SVM for large enough $\lambda$
    ● Solution existence
    z* is almost always unique
    ● Equivalence with SSVM for large enough $\lambda$
    ● Solution is almost always unique with $\xi^*=\boldsymbol 0$ for large enough $\mu$
    Quadratically
    Separable
    ● Solution existence
    z* is almost always unique
    ● Capturing possible sparsity of $\textbf W^*$ for large enough $\lambda$
    ● Solution existence
    z* is almost always unique
    ● Solution is almost always unique with $\boldsymbol \xi^*=\boldsymbol 0$ for large enough $\mu$
    ● Capturing possible sparsity of $\textbf W^*$ for large enough $\lambda$
    $ \begin{array}{c} \rm{Neither} \end{array} $ $ \begin{array}{l} \bullet \rm{ Solution existence}\\ \bullet \boldsymbol z^* \rm{ is almost always unique} \end{array} $
     | Show Table
    DownLoad: CSV
  • [1] Arthur Asuncion and David Newman, UCI Machine Learning Repository, 2007.,
    [2] Y. BaiX. HanT. Chen and H. Yu, Quadratic kernel-free least squares support vector machine for target diseases classification, Journal of Combinatorial Optimization, 30 (2015), 850-870.  doi: 10.1007/s10878-015-9848-z.
    [3] D. P. Bertsekas, Nonlinear programming, Journal of the Operational Research Society, 48 (1997), 334-334. 
    [4] J. Borwein and A. S. Lewis, Convex Analysis and Nonlinear Optimization: Theory and Examples, Springer Science & Business Media, 2010. doi: 10.1007/978-0-387-31256-9.
    [5] C. Cortes and V. Vapnik, Support-vector networks, Machine Learning, 20 (1995), 273-297.  doi: 10.1007/BF00994018.
    [6] N. Cristianini and  J. Shawe-TaylorAn Introduction to Support Vector Machines and Other Kernel-Based Learning Methods, Cambridge University Press, 2000.  doi: 10.1017/CBO9780511801389.
    [7] I. Dagher, Quadratic kernel-free non-linear support vector machine, Journal of Global Optimization, 41 (2008), 15-30.  doi: 10.1007/s10898-007-9162-0.
    [8] Z. Dai and F. Wen, A generalized approach to sparse and stable portfolio optimization problem, Journal of Industrial and Management Optimization, 14 (2018), 1651-1666.  doi: 10.3934/jimo.2018025.
    [9] N. Deng, Y. Tian and C. Zhang, Support Vector Machines: Optimization Based Theory, Algorithms, and Extensions, Chapman and Hall/CRC, 2012.
    [10] M. Di and E. M. Joo, A survey of machine learning in wireless sensor networks from networking and application perspectives, 2007 6th International Conference on Information, Communications & Signal Processing, IEEE, (2007), 1-5.
    [11] J. Gallier, Schur complements and applications, Geometric Methods and Applications, Springer, (2011), 431-437. doi: 10.1007/978-1-4419-9961-0.
    [12] Z. GaoS.-C. FangJ. Luo and and N. Medhin, A Kernel-Free Double Well Potential Support Vector Machine with Applications, European Journal of Operational Research, 290 (2021), 248-262.  doi: 10.1016/j.ejor.2020.10.040.
    [13] Z. Gao and G. Petrova, Rescaled pure greedy algorithm for convex optimization, Calcolo, 56 (2019), 15. doi: 10.1007/s10092-019-0311-x.
    [14] B. Ghaddar and J. Naoum-Sawaya, High dimensional data classification and feature selection using support vector machines, European Journal of Operational Research, 265 (2018), 993-1004.  doi: 10.1016/j.ejor.2017.08.040.
    [15] Y. Hao and F. Meng, A new method on gene selection for tissue classification, Journal of Industrial and Management Optimization, 3 (2007), 739. doi: 10.3934/jimo. 2007.3.739.
    [16] T. K. Ho and M. Basu, Complexity measures of supervised classification problems, IEEE Transactions on Pattern Analysis & Machine Intelligence, (2002), 289-300.
    [17] D. S. KimN. N. Tam and and N. D. Yen, Solution existence and stability of quadratically constrained convex quadratic programs, Optimization Letters, 6 (2012), 363-373.  doi: 10.1007/s11590-011-0300-8.
    [18] P. Langley and H. A. Simon, Applications of machine learning and rule induction, Communications of the ACM, 38 (1995), 54-64.  doi: 10.21236/ADA292607.
    [19] K. Lounici, M. Pontil, A. B. Tsybakov and S. Van De Geer, Taking Advantage of Sparsity in Multi-Task Learning, arXiv preprint, arXiv: 0903.1468, 2009.
    [20] J. LuoS.-C. FangY. Bai and Z. Deng, Fuzzy quadratic surface support vector machine based on Fisher discriminant analysis, Journal of Industrial and Management Optimization, 12 (2016), 357-373.  doi: 10.3934/jimo.2016.12.357.
    [21] J. Luo, S. -C. Fang, Z. Deng and X. Guo, Soft quadratic surface support vector machine for binary classification, Asia-Pacific Journal of Operational Research, 33 (2016), 1650046. doi: 10.1142/S0217595916500469.
    [22] J. LuoT. Hong and S.-C. Fang, Benchmarking robustness of load forecasting models under data integrity attacks, International Journal of Forecasting, 34 (2018), 89-104.  doi: 10.1016/j.ijforecast.2017.08.004.
    [23] J. R. Magnus and H. Neudecker, The elimination matrix: Some lemmas and applications, SIAM Journal on Algebraic Discrete Methods, 1 (1980), 422-449.  doi: 10.1137/0601049.
    [24] O. L. Mangasarian, Uniqueness of solution in linear programming, Linear Algebra and its Applications, 25 (1979), 151-162.  doi: 10.1016/0024-3795(79)90014-4.
    [25] L. MonostoriA. MárkusH. Van Brussel and E. Westkämpfer, Machine learning approaches to manufacturing, CIRP Annals, 45 (1996), 675-712.  doi: 10.1016/S0007-8506(L1-QSSVM")30216-6.
    [26] A. MousaviM. Rezaee and R. Ayanzadeh, A survey on compressive sensing: classical results and recent advancements, Journal of Mathematical Modeling, 8 (2020), 309-344. 
    [27] S. Mousavi and J. Shen, Solution uniqueness of convex piecewise affine functions based optimization with applications to constrained $\ell_1$ minimization, ESAIM: Control, Optimisation and Calculus of Variations, 25 (2019), 56. doi: 10.1051/cocv/2018061.
    [28] F. Pedregosa, et al., Scikit-learn: Machine learning in Python, Journal of Machine Learning Research, 12 (2011), 2825-2830. 
    [29] H. QiuX. ChenW. LiuG. ZhouY. Wang and J. Lai, A fast $\ell_1$-solver and its applications to robust face recognition, Journal of Industrial and Management Optimization, 8 (2012), 163-178.  doi: 10.3934/jimo.2012.8.163.
    [30] R. Saab, R. Chartrand and O. Yilmaz, Stable sparse approximations via nonconvex optimization, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing, (2008), 3885-3888. doi: 10.1109/ICASSP. 2008.4518502.
    [31] B. Scholkopf and  A. J. SmolaLearning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond, MIT press, 2001.  doi: 10.7551/mitpress/4175.001.0001.
    [32] J. Shen and S. Mousavi, Least sparsity of $p$-norm based optimization problems with $p>1$, SIAM Journal on Optimization, 28 (2018), 2721-2751.  doi: 10.1137/17M1140066.
    [33] J. Shen and S. Mousavi, Exact Support and Vector Recovery of Constrained Sparse VVectors via Constrained Matching Pursuit, arXiv preprint, arXiv: 1903.07236, 2019.
    [34] C. ZhangJ. Wang and N. Xiu, Robust and sparse portfolio model for index tracking, Journal of Industrial and Management Optimization, 15 (2019), 1001-1015.  doi: 10.3934/jimo.
  • 加载中

Figures(5)

Tables(8)

SHARE

Article Metrics

HTML views(1819) PDF downloads(578) Cited by(0)

Access History

Other Articles By Authors

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return