September  2019, 1(3): 329-387. doi: 10.3934/fods.2019015

Randomized learning of the second-moment matrix of a smooth function

1. 

Institute of Electrical Engineering, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland

2. 

Department of Electrical Engineering, Colorado School of Mines, Denver, CO 80401, USA

3. 

Departments of Statistics and Computer Science, Rutgers University, Piscataway, NJ 08854, USA

4. 

Department of Computer Science, University of Colorado Boulder, Boulder, CO 80309, USA

* Corresponding author: Armin Eftekhari

Published  September 2019

Fund Project: AE was partially supported by the Alan Turing Institute under the EPSRC grant EP/N510129/1 and also by the Turing Seed Funding grant SF019. MBW was partially supported by NSF grant CCF-1409258 and NSF CAREER grant CCF-1149225. PGC was partially supported by the U.S. Department of Energy Office of Science, Office of Advanced Scientific Computing Research, Applied Mathematics program under award DE-SC0011077 and the Defense Advanced Research Projects Agency's Enabling Quantification of Uncertainty in Physical Systems.

Consider an open set $ \mathbb{D}\subseteq\mathbb{R}^n $, equipped with a probability measure $ \mu $. An important characteristic of a smooth function $ f:\mathbb{D}\rightarrow\mathbb{R} $ is its second-moment matrix $ \Sigma_{\mu}: = \int \nabla f(x) \nabla f(x)^* \mu(dx) \in\mathbb{R}^{n\times n} $, where $ \nabla f(x)\in\mathbb{R}^n $ is the gradient of $ f(\cdot) $ at $ x\in\mathbb{D} $ and $ * $ stands for transpose. For instance, the span of the leading $ r $ eigenvectors of $ \Sigma_{\mu} $ forms an active subspace of $ f(\cdot) $, which contains the directions along which $ f(\cdot) $ changes the most and is of particular interest in ridge approximation. In this work, we propose a simple algorithm for estimating $ \Sigma_{\mu} $ from random point evaluations of $ f(\cdot) $ without imposing any structural assumptions on $ \Sigma_{\mu} $. Theoretical guarantees for this algorithm are established with the aid of the same technical tools that have proved valuable in the context of covariance matrix estimation from partial measurements.

Citation: Armin Eftekhari, Michael B. Wakin, Ping Li, Paul G. Constantine. Randomized learning of the second-moment matrix of a smooth function. Foundations of Data Science, 2019, 1 (3) : 329-387. doi: 10.3934/fods.2019015
References:
[1]

R. Adamczak, Logarithmic Sobolev inequalities and concentration of measure for convex functions and polynomial chaoses, Bull. Pol. Acad. Sci. Math., 53 (2005), 221-238.  doi: 10.4064/ba53-2-10.

[2]

F. P. Anaraki and S. Hughes, Memory and computation efficient PCA via very sparse random projections, in Proceedings of the International Conference on Machine Learning (ICML-14), 2014, 1341–1349.

[3]

M. Azizyan, A. Krishnamurthy and A. Singh, Extreme compressive sampling for covariance estimation, IEEE Trans. Inform. Theory, 64 (2018), 7613–7635, arXiv: 1506.00898. doi: 10.1109/TIT.2018.2871077.

[4]

D. S. Bernstein, Matrix Mathematics: Theory, Facts, and Formulas (Second Edition), Princeton reference, Princeton University Press, 2009, https://books.google.co.uk/books?id=x7isojLkDTcC. doi: 10.1515/9781400833344.

[5]

I. Bogunovic, V. Cevher, J. Haupt and J. Scarlett, Active learning of self-concordant like multi-index functions, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2015, 2189–2193. doi: 10.1109/ICASSP.2015.7178359.

[6]

T. T. Cai and A. Zhang, ROP: Matrix recovery via rank-one projections, The Annals of Statistics, 43 (2015), 102-138.  doi: 10.1214/14-AOS1267.

[7]

E. J. Candes, Ridgelets: Theory and applications, PhD thesis, Stanford University, 1998.

[8]

Y. ChenY. Chi and A. J. Goldsmith, Exact and stable covariance estimation from quadratic sampling via convex programming, IEEE Transactions on Information Theory, 61 (2015), 4034-4059.  doi: 10.1109/TIT.2015.2429594.

[9]

A. CohenI. DaubechiesR. DeVoreG. Kerkyacharian and D. Picard, Capturing ridge functions in high dimensions from point queries, Constructive Approximation, 35 (2012), 225-243.  doi: 10.1007/s00365-011-9147-6.

[10]

P. Constantine and D. Gleich, Computing active subspaces with Monte Carlo, arXiv preprint, arXiv: 1408.0545.

[11]

P. G. Constantine, Active Subspaces: Emerging Ideas for Dimension Reduction in Parameter Studies, SIAM, Philadelphia, 2015, https://books.google.com/books?id=TOJ9BwAAQBAJ.

[12]

P. G. Constantine, A. Eftekhari and R. Ward, A near-stationary subspace for ridge approximation, Computer Methods in Applied Mechanics and Engineering, 326 (2017), 402–421, arXiv: 1606.01929. doi: 10.1016/j.cma.2017.07.038.

[13]

R. D. Cook, Using dimension-reduction subspaces to identify important inputs in models of physical systems, in Proceedings of the section on Physical and Engineering Sciences, American Statistical Association Alexandria, VA, 1994, 18–25.

[14]

G. DasarathyP. ShahB. N. Bhaskar and R. D. Nowak, Sketching sparse matrices, covariances, and graphs via tensor products, IEEE Transactions on Information Theory, 61 (2015), 1373-1388.  doi: 10.1109/TIT.2015.2391251.

[15]

R. DeVoreG. Petrova and P. Wojtaszczyk, Approximation of functions of few variables in high dimensions, Constructive Approximation, 33 (2011), 125-143.  doi: 10.1007/s00365-010-9105-8.

[16]

D. L. Donoho and I. M. Johnstone, Projection-based approximation and a duality with kernel methods, The Annals of Statistics, 17 (1989), 58-106.  doi: 10.1214/aos/1176347004.

[17]

A. Eftekhari, L. Balzano and M. B. Wakin, What to expect when you are expecting on the Grassmannian, IEEE Signal Processing Letters, 24 (2017), 872–876, arXiv: 1611.07216. doi: 10.1109/LSP.2017.2684784.

[18]

A. Eftekhari, M. B. Wakin and R. A. Ward, MC$^2$: A two-phase algorithm for leveraged matrix completion, Inf. Inference, 7 (2018), 581–604, arXiv: 1609.01795. doi: 10.1093/imaiai/iax020.

[19]

M. FornasierK. Schnass and J. Vybiral, Learning functions of few arbitrary linear parameters in high dimensions, Foundations of Computational Mathematics, 12 (2012), 229-262.  doi: 10.1007/s10208-012-9115-y.

[20]

J. Friedman, T. Hastie and R. Tibshirani, The Elements of Statistical Learning, vol. 1, Springer series in statistics Springer, Berlin, 2001. doi: 10.1007/978-0-387-21606-5.

[21]

J. FriedmanT. Hastie and R. Tibshirani, Sparse inverse covariance estimation with the graphical lasso, Biostatistics, 9 (2008), 432-441.  doi: 10.1093/biostatistics/kxm045.

[22]

J. H. Friedman and W. Stuetzle, Projection pursuit regression, Journal of the American statistical Association, 76 (1981), 817-823.  doi: 10.1080/01621459.1981.10477729.

[23]

K. FukumizuF. R. Bach and M. I. Jordan, Dimensionality reduction for supervised learning with reproducing kernel Hilbert spaces, The Journal of Machine Learning Research, 5 (2004), 73-99. 

[24]

S. Gaiffas and G. Lecue, Optimal rates and adaptation in the single-index model using aggregation, Electronic Journal of Statistics, 1 (2007), 538-573.  doi: 10.1214/07-EJS077.

[25]

A. T. Glaws, P. G. Constantine and R. D. Cook, Inverse regression for ridge recovery, arXiv preprint, arXiv: 1702.02227.

[26]

G. K. Golubev, Asymptotic minimax estimation of regression in the additive model, Problemy Peredachi Informatsii, 28 (1992), 3-15. 

[27]

A. Gonen, D. Rosenbaum, Y. Eldar and S. Shalev-Shwartz, The sample complexity of subspace learning with partial information, J. Mach. Learn. Res., 17 (2016), Paper No. 52, 21 pp. arXiv: 1402.4844.

[28]

D. Gross, Recovering low-rank matrices from few coefficients in any basis, IEEE Transactions on Information Theory, 57 (2011), 1548-1566.  doi: 10.1109/TIT.2011.2104999.

[29]

N. HalkoP. G. Martinsson and J. A. Tropp, Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions, SIAM Review, 53 (2011), 217-288.  doi: 10.1137/090771806.

[30]

T. J. Hastie and R. J. Tibshirani, Generalized Additive Models, Chapman and Hall/CRC Monographs on Statistics and Applied Probability, Taylor and Francis, 1990, https://books.google.com/books?id=qa29r1Ze1coC.

[31]

J. HauptR. M. Castro and R. Nowak, Distilled sensing: Adaptive sampling for sparse detection and estimation, IEEE Transactions on Information Theory, 57 (2011), 6222-6235.  doi: 10.1109/TIT.2011.2162269.

[32]

M. HristacheA. JuditskyJ. Polzehl and V. Spokoiny, Structure adaptive approach for dimension reduction, The Annals of Statistics, 29 (2001), 1537-1566.  doi: 10.1214/aos/1015345954.

[33]

P. J. Huber, Projection pursuit, The annals of Statistics, 13 (1985), 435-475.  doi: 10.1214/aos/1176349519.

[34]

A. B. JuditskyO. V. Lepski and A. B. Tsybakov, Nonparametric estimation of composite functions, The Annals of Statistics, 37 (2009), 1360-1404.  doi: 10.1214/08-AOS611.

[35]

S. Keiper, Analysis of generalized ridge functions in high dimensions, in International Conference on Sampling Theory and Applications (SampTA), IEEE, 2015,259–263. doi: 10.1109/SAMPTA.2015.7148892.

[36]

M. Kolar and E. P. Xing, Consistent covariance selection from data with missing values, in Proceedings of the International Conference on Machine Learning (ICML-12), 2012,551–558.

[37]

A. Krishnamurthy, M. Azizyan and A. Singh, Subspace learning from extremely compressed measurements, arXiv preprint, arXiv: 1404.0751.

[38]

R. Lam, O. Zahm, Y. Marzouk and K. Willcox, Multifidelity dimension reduction via active subspaces, arXiv preprint, arXiv: 1809.05567.

[39]

K. C. Li, Sliced inverse regression for dimension reduction, Journal of the American Statistical Association, 86 (1991), 316-327.  doi: 10.1080/01621459.1991.10475035.

[40]

E. LibertyF. WoolfeP. G. MartinssonV. Rokhlin and M. Tygert, Randomized algorithms for the low-rank approximation of matrices, Proceedings of the National Academy of Sciences, 104 (2007), 20167-20172.  doi: 10.1073/pnas.0709640104.

[41]

P. L. Loh and M. J. Wainwright, High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity, Ann. Statist., 40 (2012), 1637-1664.  doi: 10.1214/12-AOS1018.

[42]

K. Lounici, High-dimensional covariance matrix estimation with missing observations, Bernoulli, 20 (2014), 1029-1058.  doi: 10.3150/12-BEJ487.

[43]

E. Novak and H. Woźniakowski, Tractability of Multivariate Problems: Standard information for functionals, vol. 12, European Mathematical Society, 2010. doi: 10.4171/084.

[44]

F. W. J. Olver, NIST Handbook of Mathematical Functions, Cambridge University Press, 2010.

[45]

A. Pinkus, Approximation theory of the MLP model in neural networks, Acta Numerica, 8 (1999), 143-195.  doi: 10.1017/S0962492900002919.

[46]

A. Pinkus, Ridge Functions, Cambridge Tracts in Mathematics, Cambridge University Press, 2015, https://books.google.com/books?id=dtMmCgAAQBAJ. doi: 10.1017/CBO9781316408124.

[47]

F. Pourkamali-Anaraki, Estimation of the sample covariance matrix from compressive measurements, IET Signal Processing, 10 (2016), p1089, arXiv: 1512.08887. doi: 10.1049/iet-spr.2016.0169.

[48]

C. E. Rasmussen and C. K. I. Williams, Gaussian Processes for Machine Learning, The MIT Press, 2006.

[49]

P. RavikumarM. J. WainwrightG. Raskutti and B. Yu, High-dimensional covariance estimation by minimizing $l_1$-penalized log-determinant divergence, Electronic Journal of Statistics, 5 (2011), 935-980.  doi: 10.1214/11-EJS631.

[50]

B. Recht, A simpler approach to matrix completion, The Journal of Machine Learning Research, 12 (2011), 3413-3430. 

[51]

B. RechtM. Fazel and P. A. Parrilo, Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization, SIAM Review, 52 (2010), 471-501.  doi: 10.1137/070697835.

[52]

A. M. Samarov, Exploring regression structure using nonparametric functional estimation, Journal of the American Statistical Association, 88 (1993), 836-847.  doi: 10.1080/01621459.1993.10476348.

[53]

T. Sarlos, Improved approximation algorithms for large matrices via random projections, in Annual IEEE Symposium on Foundations of Computer Science (FOCS'06), IEEE, 2006,143–152. doi: 10.1109/FOCS.2006.37.

[54]

C. J. Stone, Additive regression and other nonparametric models, The Annals of Statistics, 13 (1985), 689-705.  doi: 10.1214/aos/1176349548.

[55]

J. F. Traub and H. Wozniakowski, A General Theory of Optimal Algorithms, Technical report, Academic Press New York, 1980.

[56]

R. TripathyI. Bilionis and M. Gonzalez, Gaussian processes with built-in dimensionality reduction: Applications to high-dimensional uncertainty propagation, Journal of Computational Physics, 321 (2016), 191-223.  doi: 10.1016/j.jcp.2016.05.039.

[57]

H. Tyagi and V. Cevher, Learning non-parametric basis independent models from point queries via low-rank methods, Applied and Computational Harmonic Analysis, 37 (2014), 389-412.  doi: 10.1016/j.acha.2014.01.002.

[58]

R. Vershynin, Introduction to the non-asymptotic analysis of random matrices, Compressed Sensing, Cambridge Univ. Press, Cambridge, 2012, 210–268, arXiv: 1011.3027.

[59]

F. Vivarelli and C. K. I. Williams, Discovering hidden features with Gaussian processes regression, Advances in Neural Information Processing Systems, 613–619.

[60]

P. Wedin, Perturbation bounds in connection with singular value decomposition, BIT Numerical Mathematics, 12 (1972), 99-111.  doi: 10.1007/bf01932678.

[61]

H. Wendland, Scattered Data Approximation, vol. 17, Cambridge University Press, Cambridge, 2005.

[62]

Y. XiaH. TongW. K. Li and L. X. Zhu, An adaptive estimation of dimension reduction space, Journal of the Royal Statistical Society: Series B (Statistical Methodology), 64 (2002), 363-410.  doi: 10.1111/1467-9868.03411.

[63]

X. Yin and B. Li, Sufficient dimension reduction based on an ensemble of minimum average variance estimators, The Annals of Statistics, 39 (2011), 3392-3416.  doi: 10.1214/11-AOS950.

[64]

O. Zahm, P. Constantine, C. Prieur and Y. Marzouk, Gradient-based dimension reduction of multivariate vector-valued functions, arXiv preprint, arXiv: 1801.07922.

show all references

References:
[1]

R. Adamczak, Logarithmic Sobolev inequalities and concentration of measure for convex functions and polynomial chaoses, Bull. Pol. Acad. Sci. Math., 53 (2005), 221-238.  doi: 10.4064/ba53-2-10.

[2]

F. P. Anaraki and S. Hughes, Memory and computation efficient PCA via very sparse random projections, in Proceedings of the International Conference on Machine Learning (ICML-14), 2014, 1341–1349.

[3]

M. Azizyan, A. Krishnamurthy and A. Singh, Extreme compressive sampling for covariance estimation, IEEE Trans. Inform. Theory, 64 (2018), 7613–7635, arXiv: 1506.00898. doi: 10.1109/TIT.2018.2871077.

[4]

D. S. Bernstein, Matrix Mathematics: Theory, Facts, and Formulas (Second Edition), Princeton reference, Princeton University Press, 2009, https://books.google.co.uk/books?id=x7isojLkDTcC. doi: 10.1515/9781400833344.

[5]

I. Bogunovic, V. Cevher, J. Haupt and J. Scarlett, Active learning of self-concordant like multi-index functions, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2015, 2189–2193. doi: 10.1109/ICASSP.2015.7178359.

[6]

T. T. Cai and A. Zhang, ROP: Matrix recovery via rank-one projections, The Annals of Statistics, 43 (2015), 102-138.  doi: 10.1214/14-AOS1267.

[7]

E. J. Candes, Ridgelets: Theory and applications, PhD thesis, Stanford University, 1998.

[8]

Y. ChenY. Chi and A. J. Goldsmith, Exact and stable covariance estimation from quadratic sampling via convex programming, IEEE Transactions on Information Theory, 61 (2015), 4034-4059.  doi: 10.1109/TIT.2015.2429594.

[9]

A. CohenI. DaubechiesR. DeVoreG. Kerkyacharian and D. Picard, Capturing ridge functions in high dimensions from point queries, Constructive Approximation, 35 (2012), 225-243.  doi: 10.1007/s00365-011-9147-6.

[10]

P. Constantine and D. Gleich, Computing active subspaces with Monte Carlo, arXiv preprint, arXiv: 1408.0545.

[11]

P. G. Constantine, Active Subspaces: Emerging Ideas for Dimension Reduction in Parameter Studies, SIAM, Philadelphia, 2015, https://books.google.com/books?id=TOJ9BwAAQBAJ.

[12]

P. G. Constantine, A. Eftekhari and R. Ward, A near-stationary subspace for ridge approximation, Computer Methods in Applied Mechanics and Engineering, 326 (2017), 402–421, arXiv: 1606.01929. doi: 10.1016/j.cma.2017.07.038.

[13]

R. D. Cook, Using dimension-reduction subspaces to identify important inputs in models of physical systems, in Proceedings of the section on Physical and Engineering Sciences, American Statistical Association Alexandria, VA, 1994, 18–25.

[14]

G. DasarathyP. ShahB. N. Bhaskar and R. D. Nowak, Sketching sparse matrices, covariances, and graphs via tensor products, IEEE Transactions on Information Theory, 61 (2015), 1373-1388.  doi: 10.1109/TIT.2015.2391251.

[15]

R. DeVoreG. Petrova and P. Wojtaszczyk, Approximation of functions of few variables in high dimensions, Constructive Approximation, 33 (2011), 125-143.  doi: 10.1007/s00365-010-9105-8.

[16]

D. L. Donoho and I. M. Johnstone, Projection-based approximation and a duality with kernel methods, The Annals of Statistics, 17 (1989), 58-106.  doi: 10.1214/aos/1176347004.

[17]

A. Eftekhari, L. Balzano and M. B. Wakin, What to expect when you are expecting on the Grassmannian, IEEE Signal Processing Letters, 24 (2017), 872–876, arXiv: 1611.07216. doi: 10.1109/LSP.2017.2684784.

[18]

A. Eftekhari, M. B. Wakin and R. A. Ward, MC$^2$: A two-phase algorithm for leveraged matrix completion, Inf. Inference, 7 (2018), 581–604, arXiv: 1609.01795. doi: 10.1093/imaiai/iax020.

[19]

M. FornasierK. Schnass and J. Vybiral, Learning functions of few arbitrary linear parameters in high dimensions, Foundations of Computational Mathematics, 12 (2012), 229-262.  doi: 10.1007/s10208-012-9115-y.

[20]

J. Friedman, T. Hastie and R. Tibshirani, The Elements of Statistical Learning, vol. 1, Springer series in statistics Springer, Berlin, 2001. doi: 10.1007/978-0-387-21606-5.

[21]

J. FriedmanT. Hastie and R. Tibshirani, Sparse inverse covariance estimation with the graphical lasso, Biostatistics, 9 (2008), 432-441.  doi: 10.1093/biostatistics/kxm045.

[22]

J. H. Friedman and W. Stuetzle, Projection pursuit regression, Journal of the American statistical Association, 76 (1981), 817-823.  doi: 10.1080/01621459.1981.10477729.

[23]

K. FukumizuF. R. Bach and M. I. Jordan, Dimensionality reduction for supervised learning with reproducing kernel Hilbert spaces, The Journal of Machine Learning Research, 5 (2004), 73-99. 

[24]

S. Gaiffas and G. Lecue, Optimal rates and adaptation in the single-index model using aggregation, Electronic Journal of Statistics, 1 (2007), 538-573.  doi: 10.1214/07-EJS077.

[25]

A. T. Glaws, P. G. Constantine and R. D. Cook, Inverse regression for ridge recovery, arXiv preprint, arXiv: 1702.02227.

[26]

G. K. Golubev, Asymptotic minimax estimation of regression in the additive model, Problemy Peredachi Informatsii, 28 (1992), 3-15. 

[27]

A. Gonen, D. Rosenbaum, Y. Eldar and S. Shalev-Shwartz, The sample complexity of subspace learning with partial information, J. Mach. Learn. Res., 17 (2016), Paper No. 52, 21 pp. arXiv: 1402.4844.

[28]

D. Gross, Recovering low-rank matrices from few coefficients in any basis, IEEE Transactions on Information Theory, 57 (2011), 1548-1566.  doi: 10.1109/TIT.2011.2104999.

[29]

N. HalkoP. G. Martinsson and J. A. Tropp, Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions, SIAM Review, 53 (2011), 217-288.  doi: 10.1137/090771806.

[30]

T. J. Hastie and R. J. Tibshirani, Generalized Additive Models, Chapman and Hall/CRC Monographs on Statistics and Applied Probability, Taylor and Francis, 1990, https://books.google.com/books?id=qa29r1Ze1coC.

[31]

J. HauptR. M. Castro and R. Nowak, Distilled sensing: Adaptive sampling for sparse detection and estimation, IEEE Transactions on Information Theory, 57 (2011), 6222-6235.  doi: 10.1109/TIT.2011.2162269.

[32]

M. HristacheA. JuditskyJ. Polzehl and V. Spokoiny, Structure adaptive approach for dimension reduction, The Annals of Statistics, 29 (2001), 1537-1566.  doi: 10.1214/aos/1015345954.

[33]

P. J. Huber, Projection pursuit, The annals of Statistics, 13 (1985), 435-475.  doi: 10.1214/aos/1176349519.

[34]

A. B. JuditskyO. V. Lepski and A. B. Tsybakov, Nonparametric estimation of composite functions, The Annals of Statistics, 37 (2009), 1360-1404.  doi: 10.1214/08-AOS611.

[35]

S. Keiper, Analysis of generalized ridge functions in high dimensions, in International Conference on Sampling Theory and Applications (SampTA), IEEE, 2015,259–263. doi: 10.1109/SAMPTA.2015.7148892.

[36]

M. Kolar and E. P. Xing, Consistent covariance selection from data with missing values, in Proceedings of the International Conference on Machine Learning (ICML-12), 2012,551–558.

[37]

A. Krishnamurthy, M. Azizyan and A. Singh, Subspace learning from extremely compressed measurements, arXiv preprint, arXiv: 1404.0751.

[38]

R. Lam, O. Zahm, Y. Marzouk and K. Willcox, Multifidelity dimension reduction via active subspaces, arXiv preprint, arXiv: 1809.05567.

[39]

K. C. Li, Sliced inverse regression for dimension reduction, Journal of the American Statistical Association, 86 (1991), 316-327.  doi: 10.1080/01621459.1991.10475035.

[40]

E. LibertyF. WoolfeP. G. MartinssonV. Rokhlin and M. Tygert, Randomized algorithms for the low-rank approximation of matrices, Proceedings of the National Academy of Sciences, 104 (2007), 20167-20172.  doi: 10.1073/pnas.0709640104.

[41]

P. L. Loh and M. J. Wainwright, High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity, Ann. Statist., 40 (2012), 1637-1664.  doi: 10.1214/12-AOS1018.

[42]

K. Lounici, High-dimensional covariance matrix estimation with missing observations, Bernoulli, 20 (2014), 1029-1058.  doi: 10.3150/12-BEJ487.

[43]

E. Novak and H. Woźniakowski, Tractability of Multivariate Problems: Standard information for functionals, vol. 12, European Mathematical Society, 2010. doi: 10.4171/084.

[44]

F. W. J. Olver, NIST Handbook of Mathematical Functions, Cambridge University Press, 2010.

[45]

A. Pinkus, Approximation theory of the MLP model in neural networks, Acta Numerica, 8 (1999), 143-195.  doi: 10.1017/S0962492900002919.

[46]

A. Pinkus, Ridge Functions, Cambridge Tracts in Mathematics, Cambridge University Press, 2015, https://books.google.com/books?id=dtMmCgAAQBAJ. doi: 10.1017/CBO9781316408124.

[47]

F. Pourkamali-Anaraki, Estimation of the sample covariance matrix from compressive measurements, IET Signal Processing, 10 (2016), p1089, arXiv: 1512.08887. doi: 10.1049/iet-spr.2016.0169.

[48]

C. E. Rasmussen and C. K. I. Williams, Gaussian Processes for Machine Learning, The MIT Press, 2006.

[49]

P. RavikumarM. J. WainwrightG. Raskutti and B. Yu, High-dimensional covariance estimation by minimizing $l_1$-penalized log-determinant divergence, Electronic Journal of Statistics, 5 (2011), 935-980.  doi: 10.1214/11-EJS631.

[50]

B. Recht, A simpler approach to matrix completion, The Journal of Machine Learning Research, 12 (2011), 3413-3430. 

[51]

B. RechtM. Fazel and P. A. Parrilo, Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization, SIAM Review, 52 (2010), 471-501.  doi: 10.1137/070697835.

[52]

A. M. Samarov, Exploring regression structure using nonparametric functional estimation, Journal of the American Statistical Association, 88 (1993), 836-847.  doi: 10.1080/01621459.1993.10476348.

[53]

T. Sarlos, Improved approximation algorithms for large matrices via random projections, in Annual IEEE Symposium on Foundations of Computer Science (FOCS'06), IEEE, 2006,143–152. doi: 10.1109/FOCS.2006.37.

[54]

C. J. Stone, Additive regression and other nonparametric models, The Annals of Statistics, 13 (1985), 689-705.  doi: 10.1214/aos/1176349548.

[55]

J. F. Traub and H. Wozniakowski, A General Theory of Optimal Algorithms, Technical report, Academic Press New York, 1980.

[56]

R. TripathyI. Bilionis and M. Gonzalez, Gaussian processes with built-in dimensionality reduction: Applications to high-dimensional uncertainty propagation, Journal of Computational Physics, 321 (2016), 191-223.  doi: 10.1016/j.jcp.2016.05.039.

[57]

H. Tyagi and V. Cevher, Learning non-parametric basis independent models from point queries via low-rank methods, Applied and Computational Harmonic Analysis, 37 (2014), 389-412.  doi: 10.1016/j.acha.2014.01.002.

[58]

R. Vershynin, Introduction to the non-asymptotic analysis of random matrices, Compressed Sensing, Cambridge Univ. Press, Cambridge, 2012, 210–268, arXiv: 1011.3027.

[59]

F. Vivarelli and C. K. I. Williams, Discovering hidden features with Gaussian processes regression, Advances in Neural Information Processing Systems, 613–619.

[60]

P. Wedin, Perturbation bounds in connection with singular value decomposition, BIT Numerical Mathematics, 12 (1972), 99-111.  doi: 10.1007/bf01932678.

[61]

H. Wendland, Scattered Data Approximation, vol. 17, Cambridge University Press, Cambridge, 2005.

[62]

Y. XiaH. TongW. K. Li and L. X. Zhu, An adaptive estimation of dimension reduction space, Journal of the Royal Statistical Society: Series B (Statistical Methodology), 64 (2002), 363-410.  doi: 10.1111/1467-9868.03411.

[63]

X. Yin and B. Li, Sufficient dimension reduction based on an ensemble of minimum average variance estimators, The Annals of Statistics, 39 (2011), 3392-3416.  doi: 10.1214/11-AOS950.

[64]

O. Zahm, P. Constantine, C. Prieur and Y. Marzouk, Gradient-based dimension reduction of multivariate vector-valued functions, arXiv preprint, arXiv: 1801.07922.

Figure 1.  Visualization of the problem setup. The probability measure $ \mu $ is supported on the domain $ \mathbb{D}\subseteq \mathbb{R}^n $ of a smooth function $ f(\cdot) $. Here, $ N = 3 $ and $ X = \{x_i\}_{i = 1}^N $ are drawn independently from $ \mu $. For sufficiently small $ \epsilon $, we let $ \mathbb{B}_{x_i, \epsilon} $ denote the $ \epsilon $-neighborhood of each $ x_i $ and set $ \mathbb{B}_{X, \epsilon} = \cup_{i = 1}^N \mathbb{B}_{x_i, \epsilon} $. On $ \mathbb{B}_{X, \epsilon} $, $ \mu $ induces the conditional measure $ \mu_{X, \epsilon} $, from which $ N_{X, \epsilon} $ points are independently drawn and collected in $ Y_{X, \epsilon} = \{y_{ij}\}_{i, j} $. Here, $ x_1 $ has $ N_{x_1, \epsilon} = 2 $ neighbors in $ Y_{X, \epsilon} $ and we set $ Y_{x_1, \epsilon} = \{y_{1, j} \}_{j = 1}^{N_{x_1, \epsilon}} $. Similarly, $ Y_{x_2, \epsilon} $ and $ Y_{x_3, \epsilon} $ are formed. Note that $ Y_{X, \epsilon} = \cup_{i = 1}^N Y_{x_i, \epsilon} $. Our objective is to estimate the second-moment matrix of $ f(\cdot) $ (with respect to the probability measure $ \mu $) given $ \{x_i, f(x_i)\} $ and $ \{y_{ij}, f(y_{ij})\} $
Figure 2.  A simulation study using a 500-dimensional ($ n = 500 $) quadratic function for the relative error in the approximation (7) as a function of the number $ N_{X, \epsilon} $ of total samples. All simulations use $ \epsilon = 10^{-4} $. Each subfigure uses a different value for the minimum number $ N_{X, \text{min}, \epsilon} $ of points in the $ \epsilon $-neighborhood of each center point in $ X $, and varies the number $ N $ of centers. Each of the five groups of blue dots in each subfigure indicates a different number $ N $. Within each group, there are ten replications. The slope of each line is $ -1/2 $, which verifies the expected convergence rate from (6)
[1]

Lori Badea, Marius Cocou. Approximation results and subspace correction algorithms for implicit variational inequalities. Discrete and Continuous Dynamical Systems - S, 2013, 6 (6) : 1507-1524. doi: 10.3934/dcdss.2013.6.1507

[2]

YunKyong Hyon. Hysteretic behavior of a moment-closure approximation for FENE model. Kinetic and Related Models, 2014, 7 (3) : 493-507. doi: 10.3934/krm.2014.7.493

[3]

T. Hillen. On the $L^2$-moment closure of transport equations: The Cattaneo approximation. Discrete and Continuous Dynamical Systems - B, 2004, 4 (4) : 961-982. doi: 10.3934/dcdsb.2004.4.961

[4]

Yitong Guo, Bingo Wing-Kuen Ling. Principal component analysis with drop rank covariance matrix. Journal of Industrial and Management Optimization, 2021, 17 (5) : 2345-2366. doi: 10.3934/jimo.2020072

[5]

Yong-Jung Kim. A generalization of the moment problem to a complex measure space and an approximation technique using backward moments. Discrete and Continuous Dynamical Systems, 2011, 30 (1) : 187-207. doi: 10.3934/dcds.2011.30.187

[6]

Pablo Ochoa. Approximation schemes for non-linear second order equations on the Heisenberg group. Communications on Pure and Applied Analysis, 2015, 14 (5) : 1841-1863. doi: 10.3934/cpaa.2015.14.1841

[7]

Martin Redmann, Peter Benner. Approximation and model order reduction for second order systems with Levy-noise. Conference Publications, 2015, 2015 (special) : 945-953. doi: 10.3934/proc.2015.0945

[8]

Yongchao Liu, Hailin Sun, Huifu Xu. An approximation scheme for stochastic programs with second order dominance constraints. Numerical Algebra, Control and Optimization, 2016, 6 (4) : 473-490. doi: 10.3934/naco.2016021

[9]

Arnaud Münch, Ademir Fernando Pazoto. Boundary stabilization of a nonlinear shallow beam: theory and numerical approximation. Discrete and Continuous Dynamical Systems - B, 2008, 10 (1) : 197-219. doi: 10.3934/dcdsb.2008.10.197

[10]

Yanzhao Cao, Anping Liu, Zhimin Zhang. Special section on differential equations: Theory, application, and numerical approximation. Discrete and Continuous Dynamical Systems - B, 2015, 20 (5) : i-ii. doi: 10.3934/dcdsb.2015.20.5i

[11]

Haixia Liu, Jian-Feng Cai, Yang Wang. Subspace clustering by (k,k)-sparse matrix factorization. Inverse Problems and Imaging, 2017, 11 (3) : 539-551. doi: 10.3934/ipi.2017025

[12]

Laurent Baratchart, Sylvain Chevillard, Douglas Hardin, Juliette Leblond, Eduardo Andrade Lima, Jean-Paul Marmorat. Magnetic moment estimation and bounded extremal problems. Inverse Problems and Imaging, 2019, 13 (1) : 39-67. doi: 10.3934/ipi.2019003

[13]

Gilberto M. Kremer, Wilson Marques Jr.. Fourteen moment theory for granular gases. Kinetic and Related Models, 2011, 4 (1) : 317-331. doi: 10.3934/krm.2011.4.317

[14]

Marco Di Francesco, Simone Fagioli, Massimiliano D. Rosini. Many particle approximation of the Aw-Rascle-Zhang second order model for vehicular traffic. Mathematical Biosciences & Engineering, 2017, 14 (1) : 127-141. doi: 10.3934/mbe.2017009

[15]

Mohamed Assellaou, Olivier Bokanowski, Hasnaa Zidani. Error estimates for second order Hamilton-Jacobi-Bellman equations. Approximation of probabilistic reachable sets. Discrete and Continuous Dynamical Systems, 2015, 35 (9) : 3933-3964. doi: 10.3934/dcds.2015.35.3933

[16]

Abraão D. C. Nascimento, Leandro C. Rêgo, Raphaela L. B. A. Nascimento. Compound truncated Poisson normal distribution: Mathematical properties and Moment estimation. Inverse Problems and Imaging, 2019, 13 (4) : 787-803. doi: 10.3934/ipi.2019036

[17]

Nguyen Thi Hoai. Asymptotic approximation to a solution of a singularly perturbed linear-quadratic optimal control problem with second-order linear ordinary differential equation of state variable. Numerical Algebra, Control and Optimization, 2021, 11 (4) : 495-512. doi: 10.3934/naco.2020040

[18]

D. Lannes. Consistency of the KP approximation. Conference Publications, 2003, 2003 (Special) : 517-525. doi: 10.3934/proc.2003.2003.517

[19]

H. N. Mhaskar, T. Poggio. Function approximation by deep networks. Communications on Pure and Applied Analysis, 2020, 19 (8) : 4085-4095. doi: 10.3934/cpaa.2020181

[20]

Cristina Stoica. An approximation theorem in classical mechanics. Journal of Geometric Mechanics, 2016, 8 (3) : 359-374. doi: 10.3934/jgm.2016011

 Impact Factor: 

Metrics

  • PDF downloads (244)
  • HTML views (1185)
  • Cited by (1)

[Back to Top]