August  2020, 3(3): 205-218. doi: 10.3934/mfc.2020020

Sparse regularized learning in the reproducing kernel banach spaces with the $ \ell^1 $ norm

1. 

School of Mathematical Sciences, South China Normal University, Guangzhou, Guangdong 510631, China

2. 

School of Data and Computer Science, Sun Yat-sen University, Guangzhou, Guangdong 510006, China

* Corresponding author: Rongrong Lin

Received  November 2019 Revised  June 2020 Published  June 2020

We present a sparse representer theorem for regularization networks in a reproducing kernel Banach space with the $ \ell^1 $ norm by the theory of convex analysis. The theorem states that extreme points of the solution set of regularization networks in such a sparsity-promoting space belong to the span of kernel functions centered on at most $ n $ adaptive points of the input space, where $ n $ is the number of training data. Under the Lebesgue constant assumptions on reproducing kernels, we can recover the relaxed representer theorem and the exact representer theorem in that space in the literature. Finally, we perform numerical experiments for synthetic data and real-world benchmark data in the reproducing kernel Banach spaces with the $ \ell^1 $ norm and the reproducing kernel Hilbert spaces both with Laplacian kernels. The numerical performance demonstrates the advantages of sparse regularized learning.

Citation: Ying Lin, Rongrong Lin, Qi Ye. Sparse regularized learning in the reproducing kernel banach spaces with the $ \ell^1 $ norm. Mathematical Foundations of Computing, 2020, 3 (3) : 205-218. doi: 10.3934/mfc.2020020
References:
[1]

S. BoydN. ParikhE. ChuB. Peleato and J. Eckstein, Distributed optimization and statistical learning via the alternating direction method of multipliers, Found. Trends Mach. Learn., 3 (2011), 1-122.  doi: 10.1561/2200000016.

[2]

C. BoyerA. ChambolleY. De CastroV. DuvalF. de Gournay and P. Weiss, On representer theorems and convex regularization, SIAM J. Optim., 29 (2019), 1260-1281.  doi: 10.1137/18M1200750.

[3]

O. Christensen, An Introduction to Frames and Riesz Bases, 2nd edition, Applied and Numerical Harmonic Analysis, Birkhäuser/Springer, [Cham], 2016. doi: 10.1007/978-3-319-25613-9.

[4]

H. G. Dales, F. K. Dashiell Jr., A. T.-M. Lau and D. Strauss, Banach Spaces of Continuous Functions as Dual Spaces, CMS Books in Mathematics/Ouvrages de Mathématiques de la SMC, Springer, Cham, 2016. doi: 10.1007/978-3-319-32349-7.

[5]

S. De Marchi and R. Schaback, Stability of kernel-based interpolation, Adv. Comput. Math., 32 (2010), 155-161.  doi: 10.1007/s10444-008-9093-4.

[6]

D. L. Donoho, For most large underdetermined systems of equations, the minimal $l_1$-norm near-solution approximates the sparsest near-solution, Comm. Pure Appl. Math., 59 (2006), 907-934.  doi: 10.1002/cpa.20131.

[7]

G. Fasshauer and M. McCourt, Kernel-Based Approximation Methods using MATLAB, Interdisciplinary Mathematical Sciences. Vol. 19, World Scientific, 2015.

[8]

Z.-C. Guo and L. Shi, Learning with coefficient-based regularization and $\ell^1$-penalty, Adv. Comput. Math., 39 (2013), 493-510.  doi: 10.1007/s10444-012-9288-6.

[9]

T. HangelbroekF. J. Narcowich and J. D. Ward, Kernel approximation on manifolds I: bounding the Lebesgue constant, SIAM J. Math. Anal., 42 (2010), 1732-1760.  doi: 10.1137/090769570.

[10]

L. Huang, C. Liu, L. Tan and Q. Ye, Generalized representer theorems in Banach spaces, Anal. Appl. (Singap.), (2019). doi: 10.1142/S0219530519410100.

[11]

V. Klee, On a theorem of Dubins, J. Math. Anal. Appl., 7 (1963), 425-427.  doi: 10.1016/0022-247X(63)90063-5.

[12]

Z. Li, Y. Xu and Q. Ye, Sparse support vector machines in reproducing kernel banach spaces, Contemporary Computational Mathematics-A Celebration of the 80th Birthday of Ian Sloan, Springer, Cham, 1 (2018), 869–887.

[13]

R. Lin, G. Song and H. Zhang, Multi-task learning in vector-valued reproducing kernel Banach spaces with the $\ell^1$ norm, https://arXiv.org/abs/1901.01036.

[14]

R. Lin, H. Zhang and J. Zhang, On reproducing kernel Banach spaces: Generic definitions and unified framework of constructions, https://arXiv.org/abs/1901.01002.

[15]

A. Rudi, R. Camoriano and L. Rosasco, Less is more: Nyström computational regularization, in Advances in Neural Information Processing Systems, (2015), 1657–1665.

[16]

K. Schlegel, When is there a representer theorem? Nondifferentiable regularisers and Banach spaces, J. Global Optim., 74 (2019), 401-415.  doi: 10.1007/s10898-019-00767-0.

[17] B. Schölkopf and A. J. Smola, Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond, The MIT Press, Cambridge, 2001. 
[18] B. Simon, Convexity: An Analytic Viewpoint, vol. 187 of Cambridge Tracts in Mathematics, Cambridge University Press, Cambridge, 2011.  doi: 10.1017/CBO9780511910135.
[19]

G. Song and H. Zhang, Reproducing kernel Banach spaces with the $\ell^1$ norm ii: Error analysis for regularized least square regression, Neural Comput., 23 (2011), 2713-2729.  doi: 10.1162/NECO_a_00178.

[20]

G. SongH. Zhang and F. J. Hickernell, Reproducing kernel Banach spaces with the $\ell^1$ norm, Appl. Comput. Harmon. Anal., 34 (2013), 96-116.  doi: 10.1016/j.acha.2012.03.009.

[21]

B. K. SriperumbudurK. Fukumizu and G. R. G. Lanckriet, Universality, characteristic kernels and RKHS embedding of measures, J. Mach. Learn. Res., 12 (2011), 2389-2410. 

[22]

I. Steinwart and A. Christmann, Support Vector Machines, Information Science and Statistics, Springer, New York, 2008.

[23]

M. UnserJ. Fageot and H. Gupta, Representer theorems for sparsity-promoting $\ell_1$ regularization, IEEE Trans. Inform. Theory, 62 (2016), 5167-5180.  doi: 10.1109/TIT.2016.2590421.

[24]

Y. Xu and Q. Ye, Generalized Mercer kernels and reproducing kernel Banach spaces, Mem. Amer. Math. Soc., 258 (2019), no. 1243,122 pp. doi: 10.1090/memo/1243.

[25]

H. ZhangY. Xu and J. Zhang, Reproducing kernel Banach spaces for machine learning, J. Mach. Learn. Res., 10 (2009), 2741-2775.  doi: 10.1109/IJCNN.2009.5179093.

[26]

H. Zhang and L. Zhao, On the inclusion relation of reproducing kernel Hilbert spaces, Anal. Appl. (Singap.), 11 (2013), 1350014, 31pp. doi: 10.1142/S0219530513500140.

show all references

References:
[1]

S. BoydN. ParikhE. ChuB. Peleato and J. Eckstein, Distributed optimization and statistical learning via the alternating direction method of multipliers, Found. Trends Mach. Learn., 3 (2011), 1-122.  doi: 10.1561/2200000016.

[2]

C. BoyerA. ChambolleY. De CastroV. DuvalF. de Gournay and P. Weiss, On representer theorems and convex regularization, SIAM J. Optim., 29 (2019), 1260-1281.  doi: 10.1137/18M1200750.

[3]

O. Christensen, An Introduction to Frames and Riesz Bases, 2nd edition, Applied and Numerical Harmonic Analysis, Birkhäuser/Springer, [Cham], 2016. doi: 10.1007/978-3-319-25613-9.

[4]

H. G. Dales, F. K. Dashiell Jr., A. T.-M. Lau and D. Strauss, Banach Spaces of Continuous Functions as Dual Spaces, CMS Books in Mathematics/Ouvrages de Mathématiques de la SMC, Springer, Cham, 2016. doi: 10.1007/978-3-319-32349-7.

[5]

S. De Marchi and R. Schaback, Stability of kernel-based interpolation, Adv. Comput. Math., 32 (2010), 155-161.  doi: 10.1007/s10444-008-9093-4.

[6]

D. L. Donoho, For most large underdetermined systems of equations, the minimal $l_1$-norm near-solution approximates the sparsest near-solution, Comm. Pure Appl. Math., 59 (2006), 907-934.  doi: 10.1002/cpa.20131.

[7]

G. Fasshauer and M. McCourt, Kernel-Based Approximation Methods using MATLAB, Interdisciplinary Mathematical Sciences. Vol. 19, World Scientific, 2015.

[8]

Z.-C. Guo and L. Shi, Learning with coefficient-based regularization and $\ell^1$-penalty, Adv. Comput. Math., 39 (2013), 493-510.  doi: 10.1007/s10444-012-9288-6.

[9]

T. HangelbroekF. J. Narcowich and J. D. Ward, Kernel approximation on manifolds I: bounding the Lebesgue constant, SIAM J. Math. Anal., 42 (2010), 1732-1760.  doi: 10.1137/090769570.

[10]

L. Huang, C. Liu, L. Tan and Q. Ye, Generalized representer theorems in Banach spaces, Anal. Appl. (Singap.), (2019). doi: 10.1142/S0219530519410100.

[11]

V. Klee, On a theorem of Dubins, J. Math. Anal. Appl., 7 (1963), 425-427.  doi: 10.1016/0022-247X(63)90063-5.

[12]

Z. Li, Y. Xu and Q. Ye, Sparse support vector machines in reproducing kernel banach spaces, Contemporary Computational Mathematics-A Celebration of the 80th Birthday of Ian Sloan, Springer, Cham, 1 (2018), 869–887.

[13]

R. Lin, G. Song and H. Zhang, Multi-task learning in vector-valued reproducing kernel Banach spaces with the $\ell^1$ norm, https://arXiv.org/abs/1901.01036.

[14]

R. Lin, H. Zhang and J. Zhang, On reproducing kernel Banach spaces: Generic definitions and unified framework of constructions, https://arXiv.org/abs/1901.01002.

[15]

A. Rudi, R. Camoriano and L. Rosasco, Less is more: Nyström computational regularization, in Advances in Neural Information Processing Systems, (2015), 1657–1665.

[16]

K. Schlegel, When is there a representer theorem? Nondifferentiable regularisers and Banach spaces, J. Global Optim., 74 (2019), 401-415.  doi: 10.1007/s10898-019-00767-0.

[17] B. Schölkopf and A. J. Smola, Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond, The MIT Press, Cambridge, 2001. 
[18] B. Simon, Convexity: An Analytic Viewpoint, vol. 187 of Cambridge Tracts in Mathematics, Cambridge University Press, Cambridge, 2011.  doi: 10.1017/CBO9780511910135.
[19]

G. Song and H. Zhang, Reproducing kernel Banach spaces with the $\ell^1$ norm ii: Error analysis for regularized least square regression, Neural Comput., 23 (2011), 2713-2729.  doi: 10.1162/NECO_a_00178.

[20]

G. SongH. Zhang and F. J. Hickernell, Reproducing kernel Banach spaces with the $\ell^1$ norm, Appl. Comput. Harmon. Anal., 34 (2013), 96-116.  doi: 10.1016/j.acha.2012.03.009.

[21]

B. K. SriperumbudurK. Fukumizu and G. R. G. Lanckriet, Universality, characteristic kernels and RKHS embedding of measures, J. Mach. Learn. Res., 12 (2011), 2389-2410. 

[22]

I. Steinwart and A. Christmann, Support Vector Machines, Information Science and Statistics, Springer, New York, 2008.

[23]

M. UnserJ. Fageot and H. Gupta, Representer theorems for sparsity-promoting $\ell_1$ regularization, IEEE Trans. Inform. Theory, 62 (2016), 5167-5180.  doi: 10.1109/TIT.2016.2590421.

[24]

Y. Xu and Q. Ye, Generalized Mercer kernels and reproducing kernel Banach spaces, Mem. Amer. Math. Soc., 258 (2019), no. 1243,122 pp. doi: 10.1090/memo/1243.

[25]

H. ZhangY. Xu and J. Zhang, Reproducing kernel Banach spaces for machine learning, J. Mach. Learn. Res., 10 (2009), 2741-2775.  doi: 10.1109/IJCNN.2009.5179093.

[26]

H. Zhang and L. Zhao, On the inclusion relation of reproducing kernel Hilbert spaces, Anal. Appl. (Singap.), 11 (2013), 1350014, 31pp. doi: 10.1142/S0219530513500140.

Figure 1.  Numerical results of models (12) and (13) for Tai Chi data set are illustrated in Figure 1(a) and Figure 1(b), respectively.
Figure 2.  Numerical results of models (12) and (13) for the second data set are illustrated in Figure 2(a) and Figure 2(b), respectively
Table 1.  Lebesgue constants of the Laplacian kernel $ e^{-\|x-x'\|_2} $ on a set of $ n $ grid points of $ [-1, 1]^2 $
n=100 n=400 n=900 n=1600 n=2500
1.237204 1.244770 1.249653 1.246516 1.246808
n=100 n=400 n=900 n=1600 n=2500
1.237204 1.244770 1.249653 1.246516 1.246808
Table 2.  Lebesgue constants of the Laplacian kernel $ e^{-\|x-x'\|_2} $, $ x, x'\in{\mathbb R}^3 $ on a set of $ n $ grid points of $ [-1, 1]^3 $
n=125 n=1000 n=1728 n=2197 n=3375
1.624012 1.705158 1.709775 1.711243 1.712867
n=125 n=1000 n=1728 n=2197 n=3375
1.624012 1.705158 1.709775 1.711243 1.712867
[1]

Pengbo Geng, Wengu Chen. Unconstrained $ \ell_1 $-$ \ell_2 $ minimization for sparse recovery via mutual coherence. Mathematical Foundations of Computing, 2020, 3 (2) : 65-79. doi: 10.3934/mfc.2020006

[2]

Jun Wang, Xing Tao Wang. Sparse signal reconstruction via the approximations of $ \ell_{0} $ quasinorm. Journal of Industrial and Management Optimization, 2020, 16 (4) : 1907-1925. doi: 10.3934/jimo.2019035

[3]

Peili Li, Xiliang Lu, Yunhai Xiao. Smoothing Newton method for $ \ell^0 $-$ \ell^2 $ regularized linear inverse problem. Inverse Problems and Imaging, 2022, 16 (1) : 153-177. doi: 10.3934/ipi.2021044

[4]

Umberto De Maio, Peter I. Kogut, Gabriella Zecca. On optimal $ L^1 $-control in coefficients for quasi-linear Dirichlet boundary value problems with $ BMO $-anisotropic $ p $-Laplacian. Mathematical Control and Related Fields, 2020, 10 (4) : 827-854. doi: 10.3934/mcrf.2020021

[5]

Lidan Li, Hongwei Zhang, Liwei Zhang. Inverse quadratic programming problem with $ l_1 $ norm measure. Journal of Industrial and Management Optimization, 2020, 16 (5) : 2425-2437. doi: 10.3934/jimo.2019061

[6]

Tianling Gao, Qiang Liu, Zhiguang Zhang. Fractional $ 1 $-Laplacian evolution equations to remove multiplicative noise. Discrete and Continuous Dynamical Systems - B, 2021  doi: 10.3934/dcdsb.2021254

[7]

Jiangang Qi, Bing Xie. Extremum estimates of the $ L^1 $-norm of weights for eigenvalue problems of vibrating string equations based on critical equations. Discrete and Continuous Dynamical Systems - B, 2021, 26 (7) : 3505-3516. doi: 10.3934/dcdsb.2020243

[8]

Chengxiang Wang, Li Zeng, Wei Yu, Liwei Xu. Existence and convergence analysis of $\ell_{0}$ and $\ell_{2}$ regularizations for limited-angle CT reconstruction. Inverse Problems and Imaging, 2018, 12 (3) : 545-572. doi: 10.3934/ipi.2018024

[9]

Chaoqian Li, Yajun Liu, Yaotang Li. Note on $ Z $-eigenvalue inclusion theorems for tensors. Journal of Industrial and Management Optimization, 2021, 17 (2) : 687-693. doi: 10.3934/jimo.2019129

[10]

Gang Wang, Yuan Zhang. $ Z $-eigenvalue exclusion theorems for tensors. Journal of Industrial and Management Optimization, 2020, 16 (4) : 1987-1998. doi: 10.3934/jimo.2019039

[11]

Lianjun Zhang, Lingchen Kong, Yan Li, Shenglong Zhou. A smoothing iterative method for quantile regression with nonconvex $ \ell_p $ penalty. Journal of Industrial and Management Optimization, 2017, 13 (1) : 93-112. doi: 10.3934/jimo.2016006

[12]

Federico Rodriguez Hertz, Zhiren Wang. On $ \epsilon $-escaping trajectories in homogeneous spaces. Discrete and Continuous Dynamical Systems, 2021, 41 (1) : 329-357. doi: 10.3934/dcds.2020365

[13]

Victor Vargas. On involution kernels and large deviations principles on $ \beta $-shifts. Discrete and Continuous Dynamical Systems, 2022, 42 (6) : 2699-2718. doi: 10.3934/dcds.2021208

[14]

Mihai Mihăilescu, Julio D. Rossi. Monotonicity with respect to $ p $ of the First Nontrivial Eigenvalue of the $ p $-Laplacian with Homogeneous Neumann Boundary Conditions. Communications on Pure and Applied Analysis, 2020, 19 (9) : 4363-4371. doi: 10.3934/cpaa.2020198

[15]

Mohan Mallick, R. Shivaji, Byungjae Son, S. Sundar. Bifurcation and multiplicity results for a class of $n\times n$ $p$-Laplacian system. Communications on Pure and Applied Analysis, 2018, 17 (3) : 1295-1304. doi: 10.3934/cpaa.2018062

[16]

Jong Yoon Hyun, Yoonjin Lee, Yansheng Wu. Connection of $ p $-ary $ t $-weight linear codes to Ramanujan Cayley graphs with $ t+1 $ eigenvalues. Advances in Mathematics of Communications, 2021  doi: 10.3934/amc.2020133

[17]

Rakesh Nandi, Sujit Kumar Samanta, Chesoong Kim. Analysis of $ D $-$ BMAP/G/1 $ queueing system under $ N $-policy and its cost optimization. Journal of Industrial and Management Optimization, 2021, 17 (6) : 3603-3631. doi: 10.3934/jimo.2020135

[18]

Abdeladim El Akri, Lahcen Maniar. Uniform indirect boundary controllability of semi-discrete $ 1 $-$ d $ coupled wave equations. Mathematical Control and Related Fields, 2020, 10 (4) : 669-698. doi: 10.3934/mcrf.2020015

[19]

Sugata Gangopadhyay, Constanza Riera, Pantelimon Stănică. Gowers $ U_2 $ norm as a measure of nonlinearity for Boolean functions and their generalizations. Advances in Mathematics of Communications, 2021, 15 (2) : 241-256. doi: 10.3934/amc.2020056

[20]

Victor Churchill, Rick Archibald, Anne Gelb. Edge-adaptive $ \ell_2 $ regularization image reconstruction from non-uniform Fourier data. Inverse Problems and Imaging, 2019, 13 (5) : 931-958. doi: 10.3934/ipi.2019042

 Impact Factor: 

Metrics

  • PDF downloads (304)
  • HTML views (454)
  • Cited by (0)

Other articles
by authors

[Back to Top]