# American Institute of Mathematical Sciences

ISSN:
1534-0392

eISSN:
1553-5258

All Issues

## Communications on Pure & Applied Analysis

August 2020 , Volume 19 , Issue 8

Select all articles

Export/Reference:

2020, 19(8): i-iii doi: 10.3934/cpaa.2020171 +[Abstract](864) +[HTML](188) +[PDF](95.35KB)
Abstract:
2020, 19(8): 3917-3932 doi: 10.3934/cpaa.2020172 +[Abstract](687) +[HTML](69) +[PDF](434.57KB)
Abstract:

This paper is concerned with learning rates for partial linear functional models (PLFM) within reproducing kernel Hilbert spaces (RKHS), where all the covariates consist of two parts: functional-type covariates and scalar ones. As opposed to frequently used functional principal component analysis for functional models, the finite number of basis functions in the proposed approach can be generated automatically by taking advantage of reproducing property of RKHS. This avoids additional computational costs on PCA decomposition and the choice of the number of principal components. Moreover, the coefficient estimators with bounded covariates converge to the true coefficients with linear rates, as if the functional term in PLFM has no effect on the linear part. In contrast, the prediction error for the functional estimator is significantly affected by the ambient dimension of the scalar covariates. Finally, we develop the proposed numerical algorithm for the proposed penalized approach, and some simulated experiments are implemented to support our theoretical results.

Jia Cai and
2020, 19(8): 3933-3945 doi: 10.3934/cpaa.2020173 +[Abstract](676) +[HTML](65) +[PDF](691.54KB)
Abstract:

Canonical correlation analysis (CCA) is a powerful statistical tool for detecting mutual information between two sets of multi-dimensional random variables. Unlike CCA, Generalized CCA (GCCA), a natural extension of CCA, could detect the relations of multiple datasets (more than two). To interpret canonical variates more efficiently, this paper addresses a novel sparse GCCA algorithm via linearized Bregman method, which is a generalization of traditional sparse CCA methods. Experimental results on both synthetic dataset and real datasets demonstrate the effectiveness and efficiency of the proposed algorithm when compared with several state-of-the-art sparse CCA and deep CCA algorithms.

2020, 19(8): 3947-3956 doi: 10.3934/cpaa.2020174 +[Abstract](685) +[HTML](178) +[PDF](374.28KB)
Abstract:

The huge amount of available data nowadays is a challenge for kernel-based machine learning algorithms like SVMs with respect to runtime and storage capacities. Local approaches might help to relieve these issues and to improve statistical accuracy. It has already been shown that these local approaches are consistent and robust in a basic sense. This article refines the analysis of robustness properties towards the so-called influence function which expresses the differentiability of the learning method: We show that there is a differentiable dependency of our locally learned predictor on the underlying distribution. The assumptions of the proven theorems can be verified without knowing anything about this distribution. This makes the results interesting also from an applied point of view.

2020, 19(8): 3957-3971 doi: 10.3934/cpaa.2020175 +[Abstract](550) +[HTML](67) +[PDF](448.05KB)
Abstract:

We study the efficiency of the approximation of the functions from the Besov space \begin{document}$B_{p\theta}^\Omega(\mathbf{T}^d)$\end{document} in the norm of \begin{document}$L_q(\mathbf{T}^d)$\end{document} by various random methods. We determine the exact asymptotic orders of Kolmogorov widths, linear widths, and Gel'fand widths of the unit ball of \begin{document}$B_{p\theta}^\Omega(\mathbf{T}^d)$\end{document} in \begin{document}$L_q(\mathbf{T}^d)$\end{document}. Our results show that the convergence rates of the randomized linear and Gel'fand methods are faster than the deterministic counterparts in some cases. The maximal improvement can reach a factor \begin{document}$n^{-1/2}$\end{document} roughly.

2020, 19(8): 3973-4005 doi: 10.3934/cpaa.2020176 +[Abstract](649) +[HTML](113) +[PDF](617.25KB)
Abstract:

We consider learning rates of kernel regularized regression (KRR) based on reproducing kernel Hilbert spaces (RKHSs) and differentiable strongly convex losses and provide some new strongly convex losses. We first show the robustness with the maximum mean discrepancy (MMD) and the Hutchinson metric respectively, and, along this line, bound the learning rate of the KRR. We first provide a capacity dependent learning rate and then give the learning rates for four concrete strongly convex losses respectively. In particular, we provide the learning rates when the hypothesis RKHS's logarithmic complexity exponent is arbitrarily small as well as sufficiently large.

2020, 19(8): 4007-4022 doi: 10.3934/cpaa.2020177 +[Abstract](503) +[HTML](57) +[PDF](440.46KB)
Abstract:

The perfect achievements have been made for \begin{document}$L^{p}\; (1\leq p<+\infty)$\end{document} risk estimation, when a density function has compact support. However, there does not exist \begin{document}$L^{1}$\end{document} risk estimation for uncompactly supported densities in general. Motivated by the work of Juditsky & Lambert-Lacroix (A. Juditsky and S. Lambert-Lacroix, On minimax density estimation on \begin{document}$\mathbb{R}$\end{document}, Bernoulli, 10(2004), 187-220) and Goldenshluger & Lepski (A. Goldenshluger and O. Lepski, On adaptive minimax density estimation on \begin{document}$\mathbb{R}^{d}$\end{document}, Probab. Theory Relat. Fields., 159(2014), 479-543), we provide an adaptive estimate for a family of density functions not necessarily having compact supports in this paper.

2020, 19(8): 4023-4054 doi: 10.3934/cpaa.2020178 +[Abstract](672) +[HTML](68) +[PDF](570.92KB)
Abstract:

Recent investigations on the error analysis of kernel regularized pairwise learning initiate the theoretical research on pairwise reproducing kernel Hilbert spaces (PRKHSs). In the present paper, we provide a method of constructing PRKHSs with classical Jacobi orthogonal polynomials. The performance of the kernel regularized online pairwise regression learning algorithms based on a quadratic loss function is investigated. Applying convex analysis and Rademacher complexity techniques, the bounds for the generalization error are provided explicitly. It is shown that the convergence rate can be greatly improved by adjusting the scale parameters in the loss function.

Hang Xu and
2020, 19(8): 4055-4068 doi: 10.3934/cpaa.2020179 +[Abstract](537) +[HTML](73) +[PDF](397.53KB)
Abstract:

This paper considers recovery of signals that are corrupted with noise. We focus on a novel model which is called relaxed ALASSO (RALASSO) model introduced by Z. Tan et al. (2014). Compared to the well-known ALASSO, RALASSO can be solved better in practice. Z. Tan et al. (2014) used the \begin{document}$D$\end{document}-RIP to characterize the sparse or approximately sparse solutions for RALASSO when the \begin{document}$D$\end{document}-RIP constant \begin{document}$\delta_{2k} < 0.1907$\end{document}, where the solution is sparse or approximately sparse in terms of a tight frame \begin{document}$D$\end{document}. However, their estimate of error bound for solution heavily depends on the term \begin{document}$\Vert D^*D\Vert_{1, 1}$\end{document}. Besides, compared to other works on signals recovering from ALASSO, the condition \begin{document}$\delta_{2k} < 0.1907$\end{document} is even stronger. Based on the RALASSO model, we use new methods to get a better estimate of error bound and give a weaker sufficient condition in this article for the inadequacies of the results by Z. Tan et al. (2014). One of the result of this paper is to use another method called the robust \begin{document}$\ell_2$\end{document} \begin{document}$D$\end{document}-Null Space Property to obtain the sparse or non-sparse solution of RALASSO and give the error estimation of RALASSO, where we eliminate the term \begin{document}$\Vert D^*D\Vert_{1, 1}$\end{document} in the constants. Another result of the paper is to utilize the \begin{document}$D$\end{document}-RIP to obtain a new condition \begin{document}$\delta_{2k} < 0.3162$\end{document} which is weaker than the condition \begin{document}$\delta_{2k} < 0.1907$\end{document}. To some extent, RALASSO is equivalent to ALASSO and the condition is also weaker than the similar one \begin{document}$\delta_{3k} < 0.25$\end{document} by J. Lin, and S. Li (2014) and \begin{document}$\delta_{2k}<0.25$\end{document} by Y. Xia, and S. Li (2016).

2020, 19(8): 4069-4083 doi: 10.3934/cpaa.2020180 +[Abstract](615) +[HTML](81) +[PDF](422.51KB)
Abstract:

High-dimensional binary classification has been intensively studied in the community of machine learning in the last few decades. Support vector machine (SVM), one of the most popular classifier, depends on only a portion of training samples called support vectors which leads to suboptimal performance in the setting of high dimension and low sample size (HDLSS). Large-margin unified machines (LUMs) are a family of margin-based classifiers proposed to solve the so-called "data piling" problem which is inherent in SVM under HDLSS settings. In this paper we study the binary classification algorithms associated with LUM loss functions in the framework of reproducing kernel Hilbert spaces. Quantitative convergence analysis has been carried out for these algorithms by means of a novel application of projection operators to overcome the technical difficulty. The rates are explicitly derived under priori conditions on approximation and capacity of the reproducing kernel Hilbert space.

2020, 19(8): 4085-4095 doi: 10.3934/cpaa.2020181 +[Abstract](683) +[HTML](82) +[PDF](514.57KB)
Abstract:

We show that deep networks are better than shallow networks at approximating functions that can be expressed as a composition of functions described by a directed acyclic graph, because the deep networks can be designed to have the same compositional structure, while a shallow network cannot exploit this knowledge. Thus, the blessing of compositionality mitigates the curse of dimensionality. On the other hand, a theorem called good propagation of errors allows to "lift" theorems about shallow networks to those about deep networks with an appropriate choice of norms, smoothness, etc. We illustrate this in three contexts where each channel in the deep network calculates a spherical polynomial, a non-smooth ReLU network, or another zonal function network related closely with the ReLU network.

2020, 19(8): 4097-4109 doi: 10.3934/cpaa.2020182 +[Abstract](642) +[HTML](67) +[PDF](379.71KB)
Abstract:

Inverses of certain positive linear operators have been investigated in several recent papers, in connection with problems like decomposition of classical operators, representation of Lagrange-type operators, asymptotic formulas of Voronovskaja type. Motivated by such researches, in this paper we give some representations for the inverses of certain positive linear operators, as Bernstein, Beta, Bernstein - Durrmeyer, genuine Bernstein - Durrmeyer and Kantorovich operators. Moreover, some Voronovskaja type formulas for the inverses of these operators are obtained. Several techniques are used in order to get such results.

2020, 19(8): 4111-4126 doi: 10.3934/cpaa.2020183 +[Abstract](887) +[HTML](74) +[PDF](452.44KB)
Abstract:

In this paper, we consider the nonlinear ill-posed inverse problem with noisy data in the statistical learning setting. The Tikhonov regularization scheme in Hilbert scales is considered to reconstruct the estimator from the random noisy data. In this statistical learning setting, we derive the rates of convergence for the regularized solution under certain assumptions on the nonlinear forward operator and the prior assumptions. We discuss estimates of the reconstruction error using the approach of reproducing kernel Hilbert spaces.

2020, 19(8): 4127-4142 doi: 10.3934/cpaa.2020184 +[Abstract](656) +[HTML](62) +[PDF](5820.91KB)
Abstract:

The aim of this paper is twofold. Firstly, we derive an explicit expression of the (theoretical) solutions of stochastic differential equations with affine coefficients driven by \begin{document}$\alpha$\end{document}-stable white noise. This is done by means of Itô formula. Secondly, we develop a detection algorithm for the first jump time in simulation of sampling trajectories which are described by the solutions. The algorithm is carried out through a multivariate Lagrange interpolation approach. To this end, we utilise a computer simulation algorithm in MATLAB to visualise the sampling trajectories of the jump-diffusions for two combinations of parameters arising in the modelling structure of stochastic differential equations with affine coefficients.

2020, 19(8): 4143-4158 doi: 10.3934/cpaa.2020185 +[Abstract](734) +[HTML](78) +[PDF](617.52KB)
Abstract:

The use of sampling methods in computing eigenpairs of two-parameter boundary value problems is extremely rare. As far as we know, there are only two studies up to now using the bivariate version of the classical and regularized sampling series. These series have a slow convergence rate. In this paper, we use the bivariate sinc-Gauss sampling formula that was proposed in [6] to construct a new sampling method to compute eigenpairs of a two-parameter Sturm-Liouville system. The convergence rate of this method will be of exponential order, i.e. \begin{document}$O(\mathrm{e}^{-\delta N}/\sqrt{N})$\end{document} where \begin{document}$\delta$\end{document} is a positive number and \begin{document}$N$\end{document} is the number of terms in the bivariate sinc-Gaussian formula. We estimate the amplitude error associated to this formula, which gives us the possibility to establish the rigorous error analysis of this method. Numerical illustrative examples are presented to demonstrate our method in comparison with the results of the bivariate classical sampling method.

2020, 19(8): 4159-4177 doi: 10.3934/cpaa.2020186 +[Abstract](763) +[HTML](72) +[PDF](442.3KB)
Abstract:

In this paper, we study the convergence of the gradient descent method for the maximum correntropy criterion (MCC) associated with reproducing kernel Hilbert spaces (RKHSs). MCC is widely used in many real-world applications because of its robustness and ability to deal with non-Gaussian impulse noises. In the regression context, we show that the gradient descent iterates of MCC can approximate the target function and derive the capacity-dependent convergence rate by taking a suitable iteration number. Our result can nearly match the optimal convergence rate stated in the previous work, and in which we can see that the scaling parameter is crucial to MCC's approximation ability and robustness property. The novelty of our work lies in a sharp estimate for the norms of the gradient descent iterates and the projection operation on the last iterate.

2020, 19(8): 4179-4189 doi: 10.3934/cpaa.2020187 +[Abstract](683) +[HTML](62) +[PDF](486.69KB)
Abstract:

Negative binomial regression has been widely applied in various research settings to account for counts with overdispersion. Yet, when the gamma scale parameter, \begin{document}$\nu$\end{document}, is parameterized, there is no direct algorithmic solution to the Fisher Information matrix of the associated heterogeneous negative binomial regression, which seriously limits its applications to a wide range of complex problems. In this research, we propose a numerical method to calculate the Fisher information of heterogeneous negative binomial regression and accordingly develop a preliminary framework for analyzing incomplete counts with overdispersion. This method is implemented in R and illustrated using an empirical example of teenage drug use in America.

2020, 19(8): 4191-4212 doi: 10.3934/cpaa.2020188 +[Abstract](876) +[HTML](82) +[PDF](1392.43KB)
Abstract:

Recently, there is considerable work on developing efficient stochastic optimization algorithms for AUC maximization. However, most of them focus on the least square loss which may be not the best option in practice. The main difficulty for dealing with the general convex loss is the pairwise nonlinearity w.r.t. the sampling distribution generating the data. In this paper, we use Bernstein polynomials to uniformly approximate the general losses which are able to decouple the pairwise nonlinearity. In particular, we show that this reduction for AUC maximization with a general loss is equivalent to a weakly convex (nonconvex) min-max formulation. Then, we develop a novel SGD algorithm for AUC maximization with per-iteration cost linearly w.r.t. the data dimension, making it amenable for streaming data analysis. Despite its non-convexity, we prove its global convergence by exploring the appealing convexity-preserving property of Bernstein polynomials and the intrinsic structure of the min-max formulation. Experiments are performed to validate the effectiveness of the proposed approach.

2020, 19(8): 4213-4225 doi: 10.3934/cpaa.2020189 +[Abstract](800) +[HTML](85) +[PDF](414.25KB)
Abstract:

In a recent paper, for univariate max-product sampling operators based on general kernels with bounded generalized absolute moments, we have obtained several \begin{document}$L^{p}_{\mu}$\end{document} convergence properties on bounded intervals or on the whole real axis. In this paper, firstly we obtain quantitative estimates with respect to a \begin{document}$K$\end{document}-functional, for the multivariate Kantorovich variant of these max-product sampling operators with the integrals written in terms of Borel probability measures. Applications of these approximation results to learning theory are obtained.

2019  Impact Factor: 1.105

[Back to Top]