• Previous Article
    A numerical method to compute Fisher information for a special case of heterogeneous negative binomial regression
  • CPAA Home
  • This Issue
  • Next Article
    Computing eigenpairs of two-parameter Sturm-Liouville systems using the bivariate sinc-Gauss formula
August  2020, 19(8): 4159-4177. doi: 10.3934/cpaa.2020186

Kernel-based maximum correntropy criterion with gradient descent method

School of Mathematics and Statistics, Wuhan University, Wuhan, China

Received  September 2019 Revised  December 2019 Published  May 2020

Fund Project: The author is supported by NSFC grant 11671307 and 11571078

In this paper, we study the convergence of the gradient descent method for the maximum correntropy criterion (MCC) associated with reproducing kernel Hilbert spaces (RKHSs). MCC is widely used in many real-world applications because of its robustness and ability to deal with non-Gaussian impulse noises. In the regression context, we show that the gradient descent iterates of MCC can approximate the target function and derive the capacity-dependent convergence rate by taking a suitable iteration number. Our result can nearly match the optimal convergence rate stated in the previous work, and in which we can see that the scaling parameter is crucial to MCC's approximation ability and robustness property. The novelty of our work lies in a sharp estimate for the norms of the gradient descent iterates and the projection operation on the last iterate.

Citation: Ting Hu. Kernel-based maximum correntropy criterion with gradient descent method. Communications on Pure and Applied Analysis, 2020, 19 (8) : 4159-4177. doi: 10.3934/cpaa.2020186
References:
[1]

N. Aronszajn, Theory of reproducing kernels, Trans. Amer. Math. Soc., 68 (1950), 337-404.  doi: 10.2307/1990404.

[2]

R. J. BessaV. Miranda and J. Gama, Entropy and correntropy against minimum square error in offline and online three-day ahead wind power forecasting, IEEE Trans. Power Syst., 24 (2009), 1657-1666.  doi: 10.1109/TPWRS.2009.2030291.

[3]

D. R. ChenQ. WuY. Ying and D. X. Zhou, Support vector machine soft margin classifiers: Error analysis, J. Mach. Learn. Res., 5 (2004), 1143-1175. 

[4]

M. DebruyneA. ChristmannM. Hubert and J. A. K. Suykens, Robustness of reweighted least squares kernel based regression, J. Multi. Anal., 101 (2010), 447-463.  doi: 10.1016/j.jmva.2009.09.007.

[5]

Y. FengJ. Fan and J. A. Suykens, A statistical learning approach to modal regression, J. Mach. Learn. Res., 21 (2020), 1-35. 

[6]

Y. FengX. HuangS. LeiY. Yang and J. A. K. Suykens, Learning with the maximum correntropy criterion induced losses for regression, J. Mach. Learn. Res., 16 (2015), 993-1034. 

[7]

Y. Feng and Y. Ying, Learning with correntropy-induced losses for regression with mixture of symmetric stable noise, Appl. Comput. Harmon. Anal., 48 (2020), 795-810.  doi: 10.1016/j.acha.2019.09.001.

[8]

Z. C. Guo, T. Hu and L. Shi, Gradient descent for robust kernel-based regression, Inverse Probl., 34 (2018), Art. 065009. doi: 10.1088/1361-6420/aabe55.

[9]

R. HeW. S. Zheng and B. G. Hu, Maximum correntropy criterion for robust face recognition, IEEE Trans. Pattern Anal. Mach. Intell., 33 (2011), 1561-1576.  doi: 10.1109/TPAMI.2010.220.

[10]

R. HeW. S. ZhengB. G. Hu and X. W. Kong, A regularized correntropy framework for robust pattern recognition, Neural Comput., 23 (2011), 2074-2100.  doi: 10.1162/NECO_a_00155.

[11]

P. W. Holland and R. E. Welsch, Robust regression using iteratively reweighted least-squares, Commun. Statist., 6 (1977), 813-827.  doi: 10.1016/j.neucom.2016.12.029.

[12]

T. HuQ. Wu and D. X. Zhou, Distributed kernel gradient descent algorithm for minimum error entropy principle, Appl. Comput. Harmon. Anal., 49 (2020), 229-256.  doi: 10.1016/j.acha.2019.01.002.

[13]

P. J. Huber, Robust Statistics., Wiley, New York, 2004.

[14]

J. LinL. Rosasco and D. X. Zhou, Iterative regularization for learning with convex loss functions, J. Mach. Learn. Res., 17 (2016), 2718-2755. 

[15]

W. LiuP. P. Pokharel and J. C. Principe, Correntropy: Properties and applications in non-gaussian signal processing, IEEE Trans. Signal Process., 55 (2007), 5286-5298.  doi: 10.1109/TSP.2007.896065.

[16]

I. Pinelis et al., Optimum bounds for the distributions of martingales in banach spaces, Ann. Probab., 22 (1994), 1679-1706. 

[17]

K. N. PlataniotisD. Androutsos and A. N. Venetsanopoulos, Nonlinear filtering of non-gaussian noise, J. Intell. Robot. Syst., 19 (1997), 207-231.  doi: 10.1023/A:1007974400149.

[18]

J. C. Principe, Information Theoretic Learning: Renyi's Entropy Entropy and Kernel Perspectives, Springer, New York, 2010. doi: 10.1007/978-1-4419-1570-2.

[19]

I. SantamariaP. P. Pokharel and J. C. Principe, Generalized correlation function: Definition, properties, and application to blind equalization, IEEE Trans. Signal Process., 54 (2006), 2187-2197.  doi: 10.1109/TSP.2006.872524.

[20]

S. Smale and D. X. Zhou, Estimating the approximation error in learning theory, Anal. Appl., 1 (2003), 17-41.  doi: 10.1142/S0219530503000089.

[21]

S. Smale and D. X. Zhou, Learning theory estimates via integral operators and their approximations, Constr. Approx., 26 (2007), 153-172.  doi: 10.1007/s00365-006-0659-y.

[22]

I. Steinwart, Oracle inequalities for support vector machines that are based on random entropy numbers, J. Complexity, 25 (2009), 437-454.  doi: 10.1016/j.jco.2009.06.002.

[23]

I. Steinwart and A. Christmann, Support Vector Machines, Springer Science & Business Media, 2008.

[24]

X. WangY. JiangM. Huang and H. Zhang, Robust variable selection with exponential squared loss, J. Amer. Statist. Assoc., 108 (2013), 632-643.  doi: 10.1080/01621459.2013.766613.

[25]

B. Weng and K. E. Barner, Nonlinear system identification in impulsive environments, IEEE Trans. Signal Process., 53 (2005), 2588-2594.  doi: 10.1109/TSP.2005.849213.

[26]

Q. WuY. Ying and D. X. Zhou., Multi-kernel regularized classifiers, J. Complexity, 23 (2007), 108-134.  doi: 10.1016/j.jco.2006.06.007.

show all references

References:
[1]

N. Aronszajn, Theory of reproducing kernels, Trans. Amer. Math. Soc., 68 (1950), 337-404.  doi: 10.2307/1990404.

[2]

R. J. BessaV. Miranda and J. Gama, Entropy and correntropy against minimum square error in offline and online three-day ahead wind power forecasting, IEEE Trans. Power Syst., 24 (2009), 1657-1666.  doi: 10.1109/TPWRS.2009.2030291.

[3]

D. R. ChenQ. WuY. Ying and D. X. Zhou, Support vector machine soft margin classifiers: Error analysis, J. Mach. Learn. Res., 5 (2004), 1143-1175. 

[4]

M. DebruyneA. ChristmannM. Hubert and J. A. K. Suykens, Robustness of reweighted least squares kernel based regression, J. Multi. Anal., 101 (2010), 447-463.  doi: 10.1016/j.jmva.2009.09.007.

[5]

Y. FengJ. Fan and J. A. Suykens, A statistical learning approach to modal regression, J. Mach. Learn. Res., 21 (2020), 1-35. 

[6]

Y. FengX. HuangS. LeiY. Yang and J. A. K. Suykens, Learning with the maximum correntropy criterion induced losses for regression, J. Mach. Learn. Res., 16 (2015), 993-1034. 

[7]

Y. Feng and Y. Ying, Learning with correntropy-induced losses for regression with mixture of symmetric stable noise, Appl. Comput. Harmon. Anal., 48 (2020), 795-810.  doi: 10.1016/j.acha.2019.09.001.

[8]

Z. C. Guo, T. Hu and L. Shi, Gradient descent for robust kernel-based regression, Inverse Probl., 34 (2018), Art. 065009. doi: 10.1088/1361-6420/aabe55.

[9]

R. HeW. S. Zheng and B. G. Hu, Maximum correntropy criterion for robust face recognition, IEEE Trans. Pattern Anal. Mach. Intell., 33 (2011), 1561-1576.  doi: 10.1109/TPAMI.2010.220.

[10]

R. HeW. S. ZhengB. G. Hu and X. W. Kong, A regularized correntropy framework for robust pattern recognition, Neural Comput., 23 (2011), 2074-2100.  doi: 10.1162/NECO_a_00155.

[11]

P. W. Holland and R. E. Welsch, Robust regression using iteratively reweighted least-squares, Commun. Statist., 6 (1977), 813-827.  doi: 10.1016/j.neucom.2016.12.029.

[12]

T. HuQ. Wu and D. X. Zhou, Distributed kernel gradient descent algorithm for minimum error entropy principle, Appl. Comput. Harmon. Anal., 49 (2020), 229-256.  doi: 10.1016/j.acha.2019.01.002.

[13]

P. J. Huber, Robust Statistics., Wiley, New York, 2004.

[14]

J. LinL. Rosasco and D. X. Zhou, Iterative regularization for learning with convex loss functions, J. Mach. Learn. Res., 17 (2016), 2718-2755. 

[15]

W. LiuP. P. Pokharel and J. C. Principe, Correntropy: Properties and applications in non-gaussian signal processing, IEEE Trans. Signal Process., 55 (2007), 5286-5298.  doi: 10.1109/TSP.2007.896065.

[16]

I. Pinelis et al., Optimum bounds for the distributions of martingales in banach spaces, Ann. Probab., 22 (1994), 1679-1706. 

[17]

K. N. PlataniotisD. Androutsos and A. N. Venetsanopoulos, Nonlinear filtering of non-gaussian noise, J. Intell. Robot. Syst., 19 (1997), 207-231.  doi: 10.1023/A:1007974400149.

[18]

J. C. Principe, Information Theoretic Learning: Renyi's Entropy Entropy and Kernel Perspectives, Springer, New York, 2010. doi: 10.1007/978-1-4419-1570-2.

[19]

I. SantamariaP. P. Pokharel and J. C. Principe, Generalized correlation function: Definition, properties, and application to blind equalization, IEEE Trans. Signal Process., 54 (2006), 2187-2197.  doi: 10.1109/TSP.2006.872524.

[20]

S. Smale and D. X. Zhou, Estimating the approximation error in learning theory, Anal. Appl., 1 (2003), 17-41.  doi: 10.1142/S0219530503000089.

[21]

S. Smale and D. X. Zhou, Learning theory estimates via integral operators and their approximations, Constr. Approx., 26 (2007), 153-172.  doi: 10.1007/s00365-006-0659-y.

[22]

I. Steinwart, Oracle inequalities for support vector machines that are based on random entropy numbers, J. Complexity, 25 (2009), 437-454.  doi: 10.1016/j.jco.2009.06.002.

[23]

I. Steinwart and A. Christmann, Support Vector Machines, Springer Science & Business Media, 2008.

[24]

X. WangY. JiangM. Huang and H. Zhang, Robust variable selection with exponential squared loss, J. Amer. Statist. Assoc., 108 (2013), 632-643.  doi: 10.1080/01621459.2013.766613.

[25]

B. Weng and K. E. Barner, Nonlinear system identification in impulsive environments, IEEE Trans. Signal Process., 53 (2005), 2588-2594.  doi: 10.1109/TSP.2005.849213.

[26]

Q. WuY. Ying and D. X. Zhou., Multi-kernel regularized classifiers, J. Complexity, 23 (2007), 108-134.  doi: 10.1016/j.jco.2006.06.007.

[1]

Bingzheng Li, Zhengzhan Dai. Error analysis on regularized regression based on the Maximum correntropy criterion. Mathematical Foundations of Computing, 2020, 3 (1) : 25-40. doi: 10.3934/mfc.2020003

[2]

Xiaming Chen. Kernel-based online gradient descent using distributed approach. Mathematical Foundations of Computing, 2019, 2 (1) : 1-9. doi: 10.3934/mfc.2019001

[3]

Kaitlyn (Voccola) Muller. A reproducing kernel Hilbert space framework for inverse scattering problems within the Born approximation. Inverse Problems and Imaging, 2019, 13 (6) : 1327-1348. doi: 10.3934/ipi.2019058

[4]

Ying Lin, Rongrong Lin, Qi Ye. Sparse regularized learning in the reproducing kernel banach spaces with the $ \ell^1 $ norm. Mathematical Foundations of Computing, 2020, 3 (3) : 205-218. doi: 10.3934/mfc.2020020

[5]

Ali Akgül, Mustafa Inc, Esra Karatas. Reproducing kernel functions for difference equations. Discrete and Continuous Dynamical Systems - S, 2015, 8 (6) : 1055-1064. doi: 10.3934/dcdss.2015.8.1055

[6]

Ali Akgül. A new application of the reproducing kernel method. Discrete and Continuous Dynamical Systems - S, 2021, 14 (7) : 2041-2053. doi: 10.3934/dcdss.2020261

[7]

Sylvia Serfaty. Gamma-convergence of gradient flows on Hilbert and metric spaces and applications. Discrete and Continuous Dynamical Systems, 2011, 31 (4) : 1427-1451. doi: 10.3934/dcds.2011.31.1427

[8]

Irene Benedetti, Luisa Malaguti, Valentina Taddei. Nonlocal problems in Hilbert spaces. Conference Publications, 2015, 2015 (special) : 103-111. doi: 10.3934/proc.2015.0103

[9]

Fritz Gesztesy, Rudi Weikard, Maxim Zinchenko. On a class of model Hilbert spaces. Discrete and Continuous Dynamical Systems, 2013, 33 (11&12) : 5067-5088. doi: 10.3934/dcds.2013.33.5067

[10]

Zhiming Li, Yujun Zhu. Entropies of commuting transformations on Hilbert spaces. Discrete and Continuous Dynamical Systems, 2020, 40 (10) : 5795-5814. doi: 10.3934/dcds.2020246

[11]

Feng Bao, Thomas Maier. Stochastic gradient descent algorithm for stochastic optimization in solving analytic continuation problems. Foundations of Data Science, 2020, 2 (1) : 1-17. doi: 10.3934/fods.2020001

[12]

Shishun Li, Zhengda Huang. Guaranteed descent conjugate gradient methods with modified secant condition. Journal of Industrial and Management Optimization, 2008, 4 (4) : 739-755. doi: 10.3934/jimo.2008.4.739

[13]

Wataru Nakamura, Yasushi Narushima, Hiroshi Yabe. Nonlinear conjugate gradient methods with sufficient descent properties for unconstrained optimization. Journal of Industrial and Management Optimization, 2013, 9 (3) : 595-619. doi: 10.3934/jimo.2013.9.595

[14]

Yacine Chitour, Zhenyu Liao, Romain Couillet. A geometric approach of gradient descent algorithms in linear neural networks. Mathematical Control and Related Fields, 2022  doi: 10.3934/mcrf.2022021

[15]

Liam Burrows, Weihong Guo, Ke Chen, Francesco Torella. Reproducible kernel Hilbert space based global and local image segmentation. Inverse Problems and Imaging, 2021, 15 (1) : 1-25. doi: 10.3934/ipi.2020048

[16]

Hanbing Liu, Yongdong Huang, Chongjun Li. Weaving K-fusion frames in hilbert spaces. Mathematical Foundations of Computing, 2020, 3 (2) : 101-116. doi: 10.3934/mfc.2020008

[17]

Jin-Mun Jeong, Seong-Ho Cho. Identification problems of retarded differential systems in Hilbert spaces. Evolution Equations and Control Theory, 2017, 6 (1) : 77-91. doi: 10.3934/eect.2017005

[18]

Giuseppe Da Prato, Franco Flandoli. Some results for pathwise uniqueness in Hilbert spaces. Communications on Pure and Applied Analysis, 2014, 13 (5) : 1789-1797. doi: 10.3934/cpaa.2014.13.1789

[19]

Guangcun Lu. The splitting lemmas for nonsmooth functionals on Hilbert spaces I. Discrete and Continuous Dynamical Systems, 2013, 33 (7) : 2939-2990. doi: 10.3934/dcds.2013.33.2939

[20]

Bernd Hofmann, Barbara Kaltenbacher, Elena Resmerita. Lavrentiev's regularization method in Hilbert spaces revisited. Inverse Problems and Imaging, 2016, 10 (3) : 741-764. doi: 10.3934/ipi.2016019

2020 Impact Factor: 1.916

Metrics

  • PDF downloads (307)
  • HTML views (84)
  • Cited by (0)

Other articles
by authors

[Back to Top]