-
Previous Article
Support vector machine classifiers by non-Euclidean margins
- MFC Home
- This Issue
-
Next Article
Network centralities, demographic disparities, and voluntary participation
Modeling interactive components by coordinate kernel polynomial models
1. | Department of Applied Mathematics, The Hong Kong Polytechnic University, Hong Kong, China |
2. | Division of Biostatistics, University of California, Berkeley, Berkeley, CA 94720, USA |
3. | Department of Mathematical Sciences, Middle Tennessee State University, Murfreesboro, TN 37132, USA |
We proposed the use of coordinate kernel polynomials in kernel regression. This new approach, called coordinate kernel polynomial regression, can simultaneously identify active variables and effective interactive components. Reparametrization refinement is found critical to improve the modeling accuracy and prediction power. The post-training component selection allows one to identify effective interactive components. Generalization error bounds are used to explain the effectiveness of the algorithm from a learning theory perspective and simulation studies are used to show its empirical effectiveness.
References:
[1] |
F. R. Bach,
Consistency of the group lasso and multiple kernel learning, J. Mach. Learn. Res., 9 (2008), 1179-1225.
|
[2] |
P. L. Bartlett and S. Mendelson,
Rademacher and Gaussian complexities: Risk bounds and structural results, J. Mach. Learn. Res., 3 (2003), 463-482.
doi: 10.1162/153244303321897690. |
[3] |
M. D. Buhmann, Radial Basis Functions: Theory and Implementations, Cambridge Monographs on Applied and Computational Mathematics, 12, Cambridge University Press, Cambridge, 2003.
doi: 10.1017/CBO9780511543241. |
[4] |
C. M. Carvalho, J. Chang, J. E. Lucas, J. R. Nevins, Q. Wang and M. West,
High-dimensional sparse factor modeling: Applications in gene expression genomics, J. Amer. Statist. Assoc., 103 (2008), 1438-1456.
doi: 10.1198/016214508000000869. |
[5] |
C. Cortes, M. Mohri and A. Rostamizadeh, Learning non-linear combinations of kernels, in Advances in Neural Information Processing Systems, Curran Associates, Inc., (2009), 396–404. Google Scholar |
[6] |
J. H. Friedman,
Multivariate adaptive regression splines, Ann. Statist., 19 (1991), 1-141.
doi: 10.1214/aos/1176347963. |
[7] |
I. Guyon, J. Weston, S. Barnhill and V. Vapnik,
Gene selection for cancer classification using support vector machines, Machine Learning, 46 (2002), 389-422.
doi: 10.1023/A:1012487302797. |
[8] |
R. Kohavi and G. H. John, Wrappers for feature subset selection, Artificial Intelligence, 97 (1997), 273-324. Google Scholar |
[9] |
G. R. G. Lanckriet, N. Cristianini, P. Bartlett, L. El Ghaoui and M. I. Jordan,
Learning the kernel matrix with semidefinite programming, J. Mach. Learn. Res., 5 (2003/04), 27-72.
|
[10] |
S. L. Lauritzen, Graphical Models, Oxford Statistical Science Series, 17, The Clarendon Press,
Oxford University Press, New York, 1996. |
[11] |
L. Li and X. Yin,
Sliced inverse regression with regularizations, Biometrics, 64 (2008), 124-131.
doi: 10.1111/j.1541-0420.2007.00836.x. |
[12] |
F. Liang, K. Mao, M. Liao, S. Mukherjee and M. West, Nonparametric Bayesian Kernel Models, Technical report, Department of Statistical Science, Duke University, 2007. Google Scholar |
[13] |
Y. Lin and H. Zhang,
Component selection and smoothing in multivariate nonparametric regression, Ann. Statist., 34 (2006), 2272-2297.
doi: 10.1214/009053606000000722. |
[14] |
C. McDiarmid, On the method of bounded differences, in Surveys in Combinatorics, London Math. Soc. Lecture Note Ser., 141, Cambridge Univ. Press, Cambridge, (1989), 148–188 |
[15] |
R. Meir and T. Zhang,
Generalization error bounds for Bayesian mixture algorithms, J. Mach. Learn. Res., 4 (2004), 839-860.
doi: 10.1162/1532443041424300. |
[16] |
S. Mukherjee and Q. Wu,
Estimation of gradients and coordinate covariation in classification, J. Mach. Learn. Res., 7 (2006), 2481-2514.
|
[17] |
S. Mukherjee and D.-X. Zhou,
Learning coordinate covariances via gradients, J. Mach. Learn. Res., 7 (2006), 519-549.
|
[18] |
M. Pontil and C. Micchelli,
Learning the kernel function via regularization, J. Mach. Learn. Res., 6 (2005), 1099-1125.
|
[19] |
H. Qin and X. Guo,
Semi-supervised learning with summary statistics, Anal. Appl. (Singap.), 17 (2019), 837-851.
doi: 10.1142/S0219530519400037. |
[20] |
B. Schölkopf, A. Smola and K.-R. Müller, Kernel principal component analysis, in Artificial Neural Networks–ICANN'97, Lecture Notes in Computer Science, 1327, Springer, Berlin, Heidelberg, (1997), 583–588. Google Scholar |
[21] |
L. Shi,
Distributed learning with indefinite kernels, Anal. Appl. (Singap.), 17 (2019), 947-975.
doi: 10.1142/S021953051850032X. |
[22] |
T. P. Speed and H. T. Kiiveri, Gaussian Markov distributions over finite graphs, Ann. Statist., 138–150.
doi: 10.1214/aos/1176349846. |
[23] |
R. Tibshirani,
Regression shrinkage and selection via the lasso, J. Roy. Statist. Soc. Ser. B, 58 (1996), 267-288.
doi: 10.1111/j.2517-6161.1996.tb02080.x. |
[24] |
V. N. Vapnik, Statistical Learning Theory, John Wiley & Sons, Inc., New York, 1998. |
[25] |
G. Wahba, Spline models for observational data, CBMS-NSF Regional Conference Series in Applied Mathematics, 59, Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 1990.
doi: 10.1137/1.9781611970128. |
[26] |
Q. Wang and X. Yin,
A nonlinear multi-dimensional variable selection method for high dimensional data: Sparse MAVE, Comput. Statist. Data Anal., 52 (2008), 4512-4520.
doi: 10.1016/j.csda.2008.03.003. |
[27] |
Q. Wu, Y. Ying and D.-X. Zhou,
Multi-kernel regularized classifiers, J. Complexity, 23 (2007), 108-134.
doi: 10.1016/j.jco.2006.06.007. |
[28] |
Y. Xu and H. Zhang,
Refinable kernels, J. Mach. Learn. Res., 8 (2007), 2083-2120.
doi: 10.1109/IIHMSP.2010.145. |
[29] |
Y. Xu and H. Zhang,
Refinement of reproducing kernels, J. Mach. Learn. Res., 10 (2009), 107-140.
|
[30] |
W. Yao and Q. Wang,
Robust variable selection through MAVE, Comput. Statist. Data Anal., 63 (2013), 42-49.
doi: 10.1016/j.csda.2013.01.021. |
[31] |
Y. Ying and C. Campbell,
Rademacher chaos complexities for learning the kernel problem, Neural Comput., 22 (2010), 2858-2886.
doi: 10.1162/NECO_a_00028. |
[32] |
H. H. Zhang,
Variable selection for support vector machines via smoothing spline ANOVA, Statist. Sinica, 16 (2006), 659-674.
|
[33] |
N. Zhang, Z. Yu and Q. Wu,
Overlapping sliced inverse regression for dimension reduction, Anal. Appl. (Singap.), 17 (2019), 715-736.
doi: 10.1142/S0219530519400013. |
[34] |
H. Zou and T. Hastie,
Regularization and variable selection via the elastic net, J. R. Stat. Soc. Ser. B Stat. Methodol., 67 (2005), 301-320.
doi: 10.1111/j.1467-9868.2005.00503.x. |
show all references
References:
[1] |
F. R. Bach,
Consistency of the group lasso and multiple kernel learning, J. Mach. Learn. Res., 9 (2008), 1179-1225.
|
[2] |
P. L. Bartlett and S. Mendelson,
Rademacher and Gaussian complexities: Risk bounds and structural results, J. Mach. Learn. Res., 3 (2003), 463-482.
doi: 10.1162/153244303321897690. |
[3] |
M. D. Buhmann, Radial Basis Functions: Theory and Implementations, Cambridge Monographs on Applied and Computational Mathematics, 12, Cambridge University Press, Cambridge, 2003.
doi: 10.1017/CBO9780511543241. |
[4] |
C. M. Carvalho, J. Chang, J. E. Lucas, J. R. Nevins, Q. Wang and M. West,
High-dimensional sparse factor modeling: Applications in gene expression genomics, J. Amer. Statist. Assoc., 103 (2008), 1438-1456.
doi: 10.1198/016214508000000869. |
[5] |
C. Cortes, M. Mohri and A. Rostamizadeh, Learning non-linear combinations of kernels, in Advances in Neural Information Processing Systems, Curran Associates, Inc., (2009), 396–404. Google Scholar |
[6] |
J. H. Friedman,
Multivariate adaptive regression splines, Ann. Statist., 19 (1991), 1-141.
doi: 10.1214/aos/1176347963. |
[7] |
I. Guyon, J. Weston, S. Barnhill and V. Vapnik,
Gene selection for cancer classification using support vector machines, Machine Learning, 46 (2002), 389-422.
doi: 10.1023/A:1012487302797. |
[8] |
R. Kohavi and G. H. John, Wrappers for feature subset selection, Artificial Intelligence, 97 (1997), 273-324. Google Scholar |
[9] |
G. R. G. Lanckriet, N. Cristianini, P. Bartlett, L. El Ghaoui and M. I. Jordan,
Learning the kernel matrix with semidefinite programming, J. Mach. Learn. Res., 5 (2003/04), 27-72.
|
[10] |
S. L. Lauritzen, Graphical Models, Oxford Statistical Science Series, 17, The Clarendon Press,
Oxford University Press, New York, 1996. |
[11] |
L. Li and X. Yin,
Sliced inverse regression with regularizations, Biometrics, 64 (2008), 124-131.
doi: 10.1111/j.1541-0420.2007.00836.x. |
[12] |
F. Liang, K. Mao, M. Liao, S. Mukherjee and M. West, Nonparametric Bayesian Kernel Models, Technical report, Department of Statistical Science, Duke University, 2007. Google Scholar |
[13] |
Y. Lin and H. Zhang,
Component selection and smoothing in multivariate nonparametric regression, Ann. Statist., 34 (2006), 2272-2297.
doi: 10.1214/009053606000000722. |
[14] |
C. McDiarmid, On the method of bounded differences, in Surveys in Combinatorics, London Math. Soc. Lecture Note Ser., 141, Cambridge Univ. Press, Cambridge, (1989), 148–188 |
[15] |
R. Meir and T. Zhang,
Generalization error bounds for Bayesian mixture algorithms, J. Mach. Learn. Res., 4 (2004), 839-860.
doi: 10.1162/1532443041424300. |
[16] |
S. Mukherjee and Q. Wu,
Estimation of gradients and coordinate covariation in classification, J. Mach. Learn. Res., 7 (2006), 2481-2514.
|
[17] |
S. Mukherjee and D.-X. Zhou,
Learning coordinate covariances via gradients, J. Mach. Learn. Res., 7 (2006), 519-549.
|
[18] |
M. Pontil and C. Micchelli,
Learning the kernel function via regularization, J. Mach. Learn. Res., 6 (2005), 1099-1125.
|
[19] |
H. Qin and X. Guo,
Semi-supervised learning with summary statistics, Anal. Appl. (Singap.), 17 (2019), 837-851.
doi: 10.1142/S0219530519400037. |
[20] |
B. Schölkopf, A. Smola and K.-R. Müller, Kernel principal component analysis, in Artificial Neural Networks–ICANN'97, Lecture Notes in Computer Science, 1327, Springer, Berlin, Heidelberg, (1997), 583–588. Google Scholar |
[21] |
L. Shi,
Distributed learning with indefinite kernels, Anal. Appl. (Singap.), 17 (2019), 947-975.
doi: 10.1142/S021953051850032X. |
[22] |
T. P. Speed and H. T. Kiiveri, Gaussian Markov distributions over finite graphs, Ann. Statist., 138–150.
doi: 10.1214/aos/1176349846. |
[23] |
R. Tibshirani,
Regression shrinkage and selection via the lasso, J. Roy. Statist. Soc. Ser. B, 58 (1996), 267-288.
doi: 10.1111/j.2517-6161.1996.tb02080.x. |
[24] |
V. N. Vapnik, Statistical Learning Theory, John Wiley & Sons, Inc., New York, 1998. |
[25] |
G. Wahba, Spline models for observational data, CBMS-NSF Regional Conference Series in Applied Mathematics, 59, Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 1990.
doi: 10.1137/1.9781611970128. |
[26] |
Q. Wang and X. Yin,
A nonlinear multi-dimensional variable selection method for high dimensional data: Sparse MAVE, Comput. Statist. Data Anal., 52 (2008), 4512-4520.
doi: 10.1016/j.csda.2008.03.003. |
[27] |
Q. Wu, Y. Ying and D.-X. Zhou,
Multi-kernel regularized classifiers, J. Complexity, 23 (2007), 108-134.
doi: 10.1016/j.jco.2006.06.007. |
[28] |
Y. Xu and H. Zhang,
Refinable kernels, J. Mach. Learn. Res., 8 (2007), 2083-2120.
doi: 10.1109/IIHMSP.2010.145. |
[29] |
Y. Xu and H. Zhang,
Refinement of reproducing kernels, J. Mach. Learn. Res., 10 (2009), 107-140.
|
[30] |
W. Yao and Q. Wang,
Robust variable selection through MAVE, Comput. Statist. Data Anal., 63 (2013), 42-49.
doi: 10.1016/j.csda.2013.01.021. |
[31] |
Y. Ying and C. Campbell,
Rademacher chaos complexities for learning the kernel problem, Neural Comput., 22 (2010), 2858-2886.
doi: 10.1162/NECO_a_00028. |
[32] |
H. H. Zhang,
Variable selection for support vector machines via smoothing spline ANOVA, Statist. Sinica, 16 (2006), 659-674.
|
[33] |
N. Zhang, Z. Yu and Q. Wu,
Overlapping sliced inverse regression for dimension reduction, Anal. Appl. (Singap.), 17 (2019), 715-736.
doi: 10.1142/S0219530519400013. |
[34] |
H. Zou and T. Hastie,
Regularization and variable selection via the elastic net, J. R. Stat. Soc. Ser. B Stat. Methodol., 67 (2005), 301-320.
doi: 10.1111/j.1467-9868.2005.00503.x. |
Algorithm | TPR( |
TPR( |
FPR | MSE |
CKPR-L | 1.00 | 1.00 | 0.000 | 0.008 (0.000) |
CKPR-G | 1.00 | 1.00 | 0.011 | 0.109 (0.015) |
LASSO | 1.00 | 0.18 | 0.040 | 1.129 (0.015) |
COSSO | 0.90 | 0.02 | 0.020 | 10.879 (8.345) |
SR-SIR (AIC) | 1.00 | 0.89 | 0.460 | - |
SR-SIR (BIC) | 1.00 | 0.85 | 0.181 | - |
SR-SIR (RIC) | 1.00 | 0.75 | 0.053 | - |
Algorithm | TPR( |
TPR( |
FPR | MSE |
CKPR-L | 1.00 | 1.00 | 0.000 | 0.008 (0.000) |
CKPR-G | 1.00 | 1.00 | 0.011 | 0.109 (0.015) |
LASSO | 1.00 | 0.18 | 0.040 | 1.129 (0.015) |
COSSO | 0.90 | 0.02 | 0.020 | 10.879 (8.345) |
SR-SIR (AIC) | 1.00 | 0.89 | 0.460 | - |
SR-SIR (BIC) | 1.00 | 0.85 | 0.181 | - |
SR-SIR (RIC) | 1.00 | 0.75 | 0.053 | - |
CKPR-G | 0.119 (0.003) | 0.054 (0.001) | 0.025 (0.0004) |
COSSO(GCV) | 0.358 (0.009) | 0.100 (0.003) | 0.045 (0.001) |
COSSO(5CV) | 0.378 (0.005) | 0.094 (0.004) | 0.043 (0.001) |
MARS | 0.239 (0.008) | 0.109 (0.003) | 0.084 (0.001) |
CKPR-G | 0.119 (0.003) | 0.054 (0.001) | 0.025 (0.0004) |
COSSO(GCV) | 0.358 (0.009) | 0.100 (0.003) | 0.045 (0.001) |
COSSO(5CV) | 0.378 (0.005) | 0.094 (0.004) | 0.043 (0.001) |
MARS | 0.239 (0.008) | 0.109 (0.003) | 0.084 (0.001) |
Ionosphere | Sonar MR | Wisc. BC | |
351 | 208 | 683 | |
33 | 60 | 9 | |
CKPR-L | |||
CKPR-G | |||
Best in [5] |
Ionosphere | Sonar MR | Wisc. BC | |
351 | 208 | 683 | |
33 | 60 | 9 | |
CKPR-L | |||
CKPR-G | |||
Best in [5] |
[1] |
Pierre Baras. A generalization of a criterion for the existence of solutions to semilinear elliptic equations. Discrete & Continuous Dynamical Systems - S, 2021, 14 (2) : 465-504. doi: 10.3934/dcdss.2020439 |
[2] |
Bahaaeldin Abdalla, Thabet Abdeljawad. Oscillation criteria for kernel function dependent fractional dynamic equations. Discrete & Continuous Dynamical Systems - S, 2020 doi: 10.3934/dcdss.2020443 |
[3] |
Liam Burrows, Weihong Guo, Ke Chen, Francesco Torella. Reproducible kernel Hilbert space based global and local image segmentation. Inverse Problems & Imaging, 2021, 15 (1) : 1-25. doi: 10.3934/ipi.2020048 |
[4] |
Kengo Nakai, Yoshitaka Saiki. Machine-learning construction of a model for a macroscopic fluid variable using the delay-coordinate of a scalar observable. Discrete & Continuous Dynamical Systems - S, 2021, 14 (3) : 1079-1092. doi: 10.3934/dcdss.2020352 |
[5] |
Hui Gao, Jian Lv, Xiaoliang Wang, Liping Pang. An alternating linearization bundle method for a class of nonconvex optimization problem with inexact information. Journal of Industrial & Management Optimization, 2021, 17 (2) : 805-825. doi: 10.3934/jimo.2019135 |
[6] |
Marc Homs-Dones. A generalization of the Babbage functional equation. Discrete & Continuous Dynamical Systems - A, 2021, 41 (2) : 899-919. doi: 10.3934/dcds.2020303 |
[7] |
Hongliang Chang, Yin Chen, Runxuan Zhang. A generalization on derivations of Lie algebras. Electronic Research Archive, , () : -. doi: 10.3934/era.2020124 |
[8] |
Peizhao Yu, Guoshan Zhang, Yi Zhang. Decoupling of cubic polynomial matrix systems. Numerical Algebra, Control & Optimization, 2021, 11 (1) : 13-26. doi: 10.3934/naco.2020012 |
[9] |
Bimal Mandal, Aditi Kar Gangopadhyay. A note on generalization of bent boolean functions. Advances in Mathematics of Communications, 2021, 15 (2) : 329-346. doi: 10.3934/amc.2020069 |
[10] |
Liupeng Wang, Yunqing Huang. Error estimates for second-order SAV finite element method to phase field crystal model. Electronic Research Archive, 2021, 29 (1) : 1735-1752. doi: 10.3934/era.2020089 |
[11] |
Wai-Ki Ching, Jia-Wen Gu, Harry Zheng. On correlated defaults and incomplete information. Journal of Industrial & Management Optimization, 2021, 17 (2) : 889-908. doi: 10.3934/jimo.2020003 |
[12] |
Nicolas Rougerie. On two properties of the Fisher information. Kinetic & Related Models, 2021, 14 (1) : 77-88. doi: 10.3934/krm.2020049 |
[13] |
Yahia Zare Mehrjerdi. A new methodology for solving bi-criterion fractional stochastic programming. Numerical Algebra, Control & Optimization, 2020 doi: 10.3934/naco.2020054 |
[14] |
Cheng He, Changzheng Qu. Global weak solutions for the two-component Novikov equation. Electronic Research Archive, 2020, 28 (4) : 1545-1562. doi: 10.3934/era.2020081 |
[15] |
Ningyu Sha, Lei Shi, Ming Yan. Fast algorithms for robust principal component analysis with an upper bound on the rank. Inverse Problems & Imaging, 2021, 15 (1) : 109-128. doi: 10.3934/ipi.2020067 |
[16] |
Shengxin Zhu, Tongxiang Gu, Xingping Liu. AIMS: Average information matrix splitting. Mathematical Foundations of Computing, 2020, 3 (4) : 301-308. doi: 10.3934/mfc.2020012 |
[17] |
Jan Bouwe van den Berg, Elena Queirolo. A general framework for validated continuation of periodic orbits in systems of polynomial ODEs. Journal of Computational Dynamics, 2021, 8 (1) : 59-97. doi: 10.3934/jcd.2021004 |
[18] |
Honglin Yang, Jiawu Peng. Coordinating a supply chain with demand information updating. Journal of Industrial & Management Optimization, 2020 doi: 10.3934/jimo.2020181 |
[19] |
M. S. Lee, H. G. Harno, B. S. Goh, K. H. Lim. On the bang-bang control approach via a component-wise line search strategy for unconstrained optimization. Numerical Algebra, Control & Optimization, 2021, 11 (1) : 45-61. doi: 10.3934/naco.2020014 |
[20] |
Yueyang Zheng, Jingtao Shi. A stackelberg game of backward stochastic differential equations with partial information. Mathematical Control & Related Fields, 2020 doi: 10.3934/mcrf.2020047 |
Impact Factor:
Tools
Metrics
Other articles
by authors
[Back to Top]