Article Contents
Article Contents

# Probabilistic learning on manifolds

• * Corresponding author: Christian Soize
• This paper presents novel mathematical results in support of the probabilistic learning on manifolds (PLoM) recently introduced by the authors. An initial dataset, constituted of a small number of points given in an Euclidean space, is given. The points are independent realizations of a vector-valued random variable for which its non-Gaussian probability measure is unknown but is, a priori, concentrated in an unknown subset of the Euclidean space. A learned dataset, constituted of additional realizations, is constructed. A transport of the probability measure estimated with the initial dataset is done through a linear transformation constructed using a reduced-order diffusion-maps basis. It is proven that this transported measure is a marginal distribution of the invariant measure of a reduced-order Itô stochastic differential equation. The concentration of the probability measure is preserved. This property is shown by analyzing a distance between the random matrix constructed with the PLoM and the matrix representing the initial dataset, as a function of the dimension of the basis. It is further proven that this distance has a minimum for a dimension of the reduced-order diffusion-maps basis that is strictly smaller than the number of points in the initial dataset.

Mathematics Subject Classification: Primary: 68Q32, 62G09; Secondary: 60J22.

 Citation:

• Figure 1.  Left figure: for $\varepsilon_ {\hbox{DM}} = \varepsilon_ {\hbox{opt}}$, distribution of the eigenvalues $\lambda_\alpha(\varepsilon_ {\hbox{opt}})$ in log scale as a function of rank $\alpha$. Right figure: graph of function $m\mapsto \varepsilon_d(m)$

Figure 2.  Left figure: distribution of the eigenvalues $\lambda_\alpha(\varepsilon_ {\hbox{opt}})$ in log scale as a function of rank $\alpha\leq 50$ for $\varepsilon_ {\hbox{DM}} = \varepsilon_ {\hbox{opt}} = 60$. Right figure: graph of function $m\mapsto \varepsilon_d(m)$ for $m\leq 50$

Figure 3.  Left figure: graph of function $m \mapsto f_d(m)$. Right figure: graph of function $m\mapsto \underline g(m)$

Figure 4.  Left figure: graph of function $m \mapsto d_N^{2, {\hbox{sim}}}(m)$. Right figure: graph of function $m \mapsto d_N^{2, {\hbox{sim}}}(m)$ (blue dashed line), and for $m\geq m_ {\hbox{opt}}$, graphs of $m \mapsto d_N^{2,c}(m)$ (dark thick straight line) and $m \mapsto d_N^{2, {\hbox{app}}}(m)$ (red thick curve line)

•  [1] M. Arnst, C. Soize and K. Bulthies, Computation of Sobol indices in global sensitivity analysis from small data sets by probabilistic learning on manifolds, International Journal for Uncertainty Quantification, 1–34, online 18 August 2020. doi: 10.1615/Int.J.UncertaintyQuantification.2020032674. [2] A. Bowman and  A. Azzalini,  Applied Smoothing Techniques for Data Analysis: The Kernel Approach With S-Plus Illustrations, vol. 18, Oxford University Press, Oxford: Clarendon Press, New York, 1997. [3] K. Burrage, I. Lenane and G. Lythe, Numerical methods for second-order stochastic differential equations, SIAM J. Sci. Comput., 29 (2007), 245-264.  doi: 10.1137/050646032. [4] R. R. Coifman and S. Lafon, Diffusion maps, Appl. Comput. Harmon. Anal., 21 (2006), 5-30.  doi: 10.1016/j.acha.2006.04.006. [5] R. R. Coifman and S. Lafon, Geometric harmonics: A novel tool for multiscale out-of-sample extension of empirical functions, Appl. Comput. Harmon. Anal., 21 (2006), 31-52.  doi: 10.1016/j.acha.2005.07.005. [6] R. R. Coifman, S. Lafon, A. B. Lee, M. Maggioni, B. Nadler, F. Warner and S. W. Zucker, Geometric diffusions as a tool for harmonic analysis and structure definition of data: Diffusion maps, PNAS, 102 (2005), 7426-7431.  doi: 10.1073/pnas.0500334102. [7] T. M. Cover and J. A. Thomas, Elements of Information Theory, Second Edition, John Wiley & Sons, Hoboken, NJ, 2006. [8] J. L. Doob, Stochastic Processes, John Wiley & Sons, Inc., New York; Chapman & Hall, Limited, London, 1953. [9] T. Duong, A. Cowling, I. Koch and M. P. Wand, Feature significance for multivariate kernel density estimation, Comput. Statist. Data Anal., 52 (2008), 4225-4242.  doi: 10.1016/j.csda.2008.02.035. [10] T. Duong and M. L. Hazelton, Cross-validation bandwidth matrices for multivariate kernel density estimation, Scand. J. Statist., 32 (2005), 485-506.  doi: 10.1111/j.1467-9469.2005.00445.x. [11] C. Farhat, R. Tezaur, T. Chapman, P. Avery and C. Soize, Feasible probabilistic learning method for model-form uncertainty quantification in vibration analysis, AIAA Journal, 57 (2019), 4978-4991.  doi: 10.2514/1.J057797. [12] M. Filippone and G. Sanguinetti, Approximate inference of the bandwidth in multivariate kernel density estimation, Comput. Statist. Data Anal., 55 (2011), 3104-3122.  doi: 10.1016/j.csda.2011.05.023. [13] R. Ghanem and C. Soize, Probabilistic nonconvex constrained optimization with fixed number of function evaluations, Internat. J. Numer. Methods Engrg., 113 (2018), 719-741.  doi: 10.1002/nme.5632. [14] R. Ghanem, C. Soize, L. Mehrez and V. Aitharaju, Probabilistic learning and updating of a digital twin for composite material systems, International Journal for Numerical Methods in Engineering, 1–21. doi: 10.1002/nme.6430. [15] R. Ghanem, C. Soize and C. Thimmisetty, Optimal well-placement using probabilistic learning, Data-Enabled Discovery and Applications, 2 (2018), 1-16.  doi: 10.1007/s41688-017-0014-x. [16] R. G. Ghanem, C. Soize, C. Safta, X. Huan, G. Lacaze, J. C. Oefelein and H. N. Najm, Design optimization of a scramjet under uncertainty using probabilistic learning on manifolds, J. Comput. Phys., 399 (2019), 108930, 14 pp. doi: 10.1016/j.jcp.2019.108930. [17] M. Girolami and B. Calderhead, Riemann manifold Langevin and Hamiltonian Monte Carlo methods, J. R. Stat. Soc. Ser. B Stat. Methodol., 73 (2011), 123-214.  doi: 10.1111/j.1467-9868.2010.00765.x. [18] J. Guilleminot and J. E. Dolbow, Data-driven enhancement of fracture paths in random composites, Mechanics Research Communications, 103 (2020), 103443, 1–12. doi: 10.1016/j.mechrescom.2019.103443. [19] E. Hairer, C. Lubich and G. Wanner, Geometric Numerical Integration. Structure-Preserving Algorithms for Ordinary Differential Equations, Second edition. Springer Series in Computational Mathematics, 31. Springer-Verlag, Berlin, 2006. [20] E. T. Jaynes, Information theory and statistical mechanics, Phys. Rev., 106 (1957), 620-630.  doi: 10.1103/PhysRev.106.620. [21] E. T. Jaynes, Information theory and statistical mechanics. ii, Phys. Rev., 108 (1957), 171-190.  doi: 10.1103/PhysRev.108.171. [22] J. Kaipio and E. Somersalo, Statistical and Computational Inverse Problems, Applied Mathematical Sciences, 160. Springer-Verlag, New York, 2005. [23] R. Z. Khasminskiǐ, Stochastic Stability of Differential Equations, vol. 66, Translated from the Russian by D. Louvish. Monographs and Textbooks on Mechanics of Solids and Fluids: Mechanics and Analysis, 7. Sijthoff & Noordhoff, Alphen aan den Rijn-Germantown, Md., 1980. [24] P. Krée and C. Soize, Mathematics of Random Phenomena, Random vibrations of mechanical structures. Translated from the French by Andrei Iacob. With a preface by Paul Germain. Mathematics and its Applications, 32. D. Reidel Publishing Co., Dordrecht, 1986. doi: 10.1007/978-94-009-4770-2. [25] S. Lafon and A. B. Lee, Diffusion maps and coarse-graining: A unified framework for dimensionality reduction, graph partitioning, and data set parameterization, IEEE Transactions on Pattern Analysis and Machine Intelligence, 28 (2006), 1393-1403.  doi: 10.1109/TPAMI.2006.184. [26] R. M. Neal, MCMC using Hamiltonian dynamics, in Handbook of Markov Chain Monte Carlo, 113-162, Chapman & Hall/CRC Handb. Mod. Stat. Methods, CRC Press, Boca Raton, FL, 2011. [27] E. Parzen, On estimation of a probability density function and mode, Ann. Math. Statist., 33 (1962), 1065-1076.  doi: 10.1214/aoms/1177704472. [28] C. P. Robert and G. Casella, Monte Carlo Statistical Methods, Second edition. Springer Texts in Statistics. Springer-Verlag, New York, 2004. doi: 10.1007/978-1-4757-4145-2. [29] C. E. Shannon, A mathematical theory of communication, Bell System Tech. J., 27 (1948), 379–423 & 623–656. doi: 10.1002/j.1538-7305.1948.tb01338.x. [30] C. Soize, The Fokker-Planck Equation for Stochastic Dynamical Systems and its Explicit Steady State Solutions, Series on Advances in Mathematics for Applied Sciences, 17. World Scientific Publishing Co., Inc., River Edge, NJ, 1994. doi: 10.1142/9789814354110. [31] C. Soize, Construction of probability distributions in high dimension using the maximum entropy principle. applications to stochastic processes, random fields and random matrices, Internat. J. Numer. Methods Engrg., 76 (2008), 1583-1611.  doi: 10.1002/nme.2385. [32] C. Soize, Design optimization under uncertainties of a mesoscale implant in biological tissues using a probabilistic learning algorithm, Comput. Mech., 62 (2018), 477-497.  doi: 10.1007/s00466-017-1509-x. [33] C. Soize and C. Farhat, Probabilistic learning for modeling and quantifying model-form uncertainties in nonlinear computational mechanics, Internat. J. Numer. Methods Engrg., 117 (2019), 819-843.  doi: 10.1002/nme.5980. [34] C. Soize and R. Ghanem, Data-driven probability concentration and sampling on manifold, J. Comput. Phys., 321 (2016), 242-258.  doi: 10.1016/j.jcp.2016.05.044. [35] C. Soize and R. Ghanem, Physics-constrained non-Gaussian probabilistic learning on manifolds, Internat. J. Numer. Methods Engrg., 121 (2020), 110-145.  doi: 10.1002/nme.6202. [36] C. Soize, R. G. Ghanem and C. Desceliers, Sampling of Bayesian posteriors with a non-Gaussian probabilistic learning on manifolds from a small dataset, Statistics and Computing, 1–25 and Supplementary Material, 1–13, on line 08 June, 2020. doi: 10.1007/s11222-020-09954-6. [37] C. Soize, R. Ghanem, C. Safta, X. Huan, Z. P. Vane, J. C. Oefelein, G. Lacaze, H. N. Najm, Q. Tang and X. Chen, Entropy-based closure for probabilistic learning on manifolds, J. Comput. Phys., 388 (2019), 518-533.  doi: 10.1016/j.jcp.2018.12.029. [38] J. C. Spall, Introduction to Stochastic Searh and Optimization, Wiley-Interscience, 2003. doi: 10.1002/0471722138. [39] N. G. Trillos, F. Hoffmann and B. Hosseini, Geometric structure of graph laplacian embeddings, arXiv: 1901.10651, (2019). [40] N. Zougab, S. Adjabi and C. C. Kokonendji, Bayesian estimation of adaptive bandwidth matrices in multivariate kernel density estimation, Comput. Statist. Data Anal., 75 (2014), 28-38.  doi: 10.1016/j.csda.2014.02.002.

Figures(4)