# American Institute of Mathematical Sciences

September  2021, 3(3): 589-614. doi: 10.3934/fods.2021019

## A surrogate-based approach to nonlinear, non-Gaussian joint state-parameter data assimilation

 1 School of Mathematical Sciences, University of Adelaide, SA 5005, Australia 2 Department of Mathematical and Statistical Sciences, Marquette University, Milwaukee, WI 53201, USA

* Corresponding author: John Maclean

Received  October 2020 Revised  June 2021 Published  September 2021 Early access  August 2021

Fund Project: The first author is supported by the ARC grant DP180100050, and acknowledges past support from ONR grant N00014-18-1-2204. The second author is supported by NSF grant DMS-1821338

Many recent advances in sequential assimilation of data into nonlinear high-dimensional models are modifications to particle filters which employ efficient searches of a high-dimensional state space. In this work, we present a complementary strategy that combines statistical emulators and particle filters. The emulators are used to learn and offer a computationally cheap approximation to the forward dynamic mapping. This emulator-particle filter (Emu-PF) approach requires a modest number of forward-model runs, but yields well-resolved posterior distributions even in non-Gaussian cases. We explore several modifications to the Emu-PF that utilize mechanisms for dimension reduction to efficiently fit the statistical emulator, and present a series of simulation experiments on an atypical Lorenz-96 system to demonstrate their performance. We conclude with a discussion on how the Emu-PF can be paired with modern particle filtering algorithms.

Citation: John Maclean, Elaine T. Spiller. A surrogate-based approach to nonlinear, non-Gaussian joint state-parameter data assimilation. Foundations of Data Science, 2021, 3 (3) : 589-614. doi: 10.3934/fods.2021019
##### References:

show all references

##### References:
Schematic for state dependence on parameters: we plot the state at eight different samples (1a), then apply a variety of interpolating schemes (1b) and lastly a statistical surrogate (1c). The shaded region in the rightmost plot shows one standard deviation in uncertainty. The second and third plot allow for the state to be estimated at a variety of parameter values
) can be used in a particle filter update step. (A) The same GP mapping from parameter to state space is plotted (black line) along with the design points (blue dots along black line) used to fit that mapping and an observation (red dot and line) in state space along the left axis. A bi-modal prior distribution is plotted (light blue) along with samples from that distribution ($10^3$ in light blue and 8 in black) along the horizontal axis. (B) Stem plots of the eight-sample PF posterior along with $10^3$-sample normalized posterior histogram of parameters taking into account the likelihood of GP mapped prior samples given the observation in (A). Plotted behind the emuPF histogram is the equivalent (and nearly identical) histogram using the true mapping instead of the GP mapping for each of the $10^3$ samples.">Figure 2.  Here we demonstrate how the GP example mapping from parameter space to state space (as in Figure 1) can be used in a particle filter update step. (A) The same GP mapping from parameter to state space is plotted (black line) along with the design points (blue dots along black line) used to fit that mapping and an observation (red dot and line) in state space along the left axis. A bi-modal prior distribution is plotted (light blue) along with samples from that distribution ($10^3$ in light blue and 8 in black) along the horizontal axis. (B) Stem plots of the eight-sample PF posterior along with $10^3$-sample normalized posterior histogram of parameters taking into account the likelihood of GP mapped prior samples given the observation in (A). Plotted behind the emuPF histogram is the equivalent (and nearly identical) histogram using the true mapping instead of the GP mapping for each of the $10^3$ samples.
Overview of the novel synthesis of Gaussian process emulators with Data Assimilation methods
Visualisation of the internal Emu-PF mechanisms over one assimilation step. Left column shows components of dimension $n_D = 100.$ Right column shows components of dimension $N_F = 10,000$. (a): parameter ensembles at time $t_j$. (b): distribution of one state variable as a function of parameters. (c): parameter ensembles at time $t_{j+1}$. Full details for this $8$ state, $2$ parameter experiment are given in section 4
Long term error statistics for the implementation of Emu-PF from fig. 4, compared to: a "coarse" PF that employs $n_D = 100$ model runs (as in the Emu-PF), and a "fine" PF that employs $N_F = 10,000$ model runs, equal to the number of samples in the Emu-PF emulator. Performance of Emu-PF is markedly better than the coarse PF
Error statistics for Experiment One, $m = 8$ observations at each observation time, of accuracy $\sigma_0 = 1$. In this (and every) plot, only every $20$th data point is shown. For this mildly difficult filtering problem, we observe that the $\Gamma = -1$ implementation of section 3.2, that uses no state variables at all as emulator inputs, is stable and reasonably accurate
Error statistics for Experiment Two, $m = 2$ observations at each observation time, of accuracy $\sigma_0 = 1$. The $\Gamma = -1$ Emu-PF and fine PF both under-perform compared to their mean behaviour; the Emu-PF employing PCA is stable and accurate
Error statistics for Experiment Three, $m = 4$ observations at each observation time, of accuracy $\sigma_0 = 0.5$. In this case the $\Gamma = 1$ Emu-PF performs only as well as the coarse PF. However the Emu-PF employing PCA is still competitive with the, much more expensive, fine PF
Summary statistics for Experiment Four, long-time state estimation with $m = 4$ observations of accuracy $\sigma_o = 1$. The median RMSE for EnKF and fine PF are similar; however the EnKF error occasionally spikes. The sliced Emu-PF of section 3.2 is stable, with no large error spikes, and performs close to the fine PF in accuracy
RMSE against time for Experiment Five: dashed red lines plot the Fine PF (formulated under the Optimal Proposal), and solid blue lines plot the best-performing Emu-PF according to table 5. There is a clear improvement in skill in parameter estimation. State estimates are similar in skill (and, importantly, do possess some skill: the state RMSE is well below $5$, the approximate long-term or climatic mean RMSE of forecasting with no DA)
Summary statistics for twenty repetitions of experiment One. The 'Resampling' column counts how many resampling steps, out of a thousand, were performed by each algorithm.
 RMSE ($\theta$) Var ($\theta$) RMSE (${{\mathbf x}}$) Var (${{\mathbf x}}$) Resampling Fine PF 0.066 0.0035 0.34 0.15 226 Coarse PF 0.79 0.0015 2.1 0.16 663 EnKF 0.048 0.0018 0.32 0.12 - Emu-PF ($\Gamma=-1$) 0.13 0.00026 2.4 5.1 483
 RMSE ($\theta$) Var ($\theta$) RMSE (${{\mathbf x}}$) Var (${{\mathbf x}}$) Resampling Fine PF 0.066 0.0035 0.34 0.15 226 Coarse PF 0.79 0.0015 2.1 0.16 663 EnKF 0.048 0.0018 0.32 0.12 - Emu-PF ($\Gamma=-1$) 0.13 0.00026 2.4 5.1 483
Summary statistics for twenty repetitions of experiment Two.
 RMSE ($\theta$) Var ($\theta$) RMSE (${{\mathbf x}}$) Var (${{\mathbf x}}$) Resampling Fine PF 0.074 0.0043 0.7 0.61 173 Coarse PF 0.49 0.0016 4.9 0.13 312 EnKF 0.065 0.0027 0.78 0.66 - Emu-PF ($\Gamma=-1$) 0.38 0.00085 3.8 6.1 526 Emu-PF (PCA) 0.27 0.00051 3.1 0.58 339
 RMSE ($\theta$) Var ($\theta$) RMSE (${{\mathbf x}}$) Var (${{\mathbf x}}$) Resampling Fine PF 0.074 0.0043 0.7 0.61 173 Coarse PF 0.49 0.0016 4.9 0.13 312 EnKF 0.065 0.0027 0.78 0.66 - Emu-PF ($\Gamma=-1$) 0.38 0.00085 3.8 6.1 526 Emu-PF (PCA) 0.27 0.00051 3.1 0.58 339
Summary statistics for twenty repetitions of experiment Three.
 RMSE ($\theta$) Var ($\theta$) RMSE (${{\mathbf x}}$) Var (${{\mathbf x}}$) Resampling Fine PF 0.062 0.0032 0.28 0.13 243 Coarse PF 1 0.0012 3.4 0.11 739 EnKF 0.045 0.0017 0.25 0.1 - Emu-PF ($\Gamma=-1$) 0.15 0.00034 2.4 5.1 590 Emu-PF (PCA) 0.084 0.00075 1.5 0.085 334
 RMSE ($\theta$) Var ($\theta$) RMSE (${{\mathbf x}}$) Var (${{\mathbf x}}$) Resampling Fine PF 0.062 0.0032 0.28 0.13 243 Coarse PF 1 0.0012 3.4 0.11 739 EnKF 0.045 0.0017 0.25 0.1 - Emu-PF ($\Gamma=-1$) 0.15 0.00034 2.4 5.1 590 Emu-PF (PCA) 0.084 0.00075 1.5 0.085 334
Summary statistics for Experiment Four
 RMSE (${{\mathbf x}}$) Var (${{\mathbf x}}$) Resampling Fine PF 0.47 0.15 1706 Coarse PF 5.1 0.16 9917 EnKF 1 0.096 - Emu-PF (Localized) 0.83 0.31 3566
 RMSE (${{\mathbf x}}$) Var (${{\mathbf x}}$) Resampling Fine PF 0.47 0.15 1706 Coarse PF 5.1 0.16 9917 EnKF 1 0.096 - Emu-PF (Localized) 0.83 0.31 3566
Summary statistics for twenty repetitions of experiment Five.
 RMSE ($\theta$) Var ($\theta$) RMSE (${{\mathbf x}}$) Var (${{\mathbf x}}$) Resampling Fine OP-PF 1.2 0.0075 1.9 3.3 226 Coarse OP-PF 1.2 0.004 2.0 2.8 205 EnKF 1.1 0.00042 1.5 1.8 - Emu-PF ($\Gamma=-1$) 0.5 0.0017 2.6 3.5 243 Emu-PF ($\Gamma=+2$) 0.75 0.0035 2.0 3.7 232 Emu-PF (PCA) 1.1 0.061 2.0 2.8 238
 RMSE ($\theta$) Var ($\theta$) RMSE (${{\mathbf x}}$) Var (${{\mathbf x}}$) Resampling Fine OP-PF 1.2 0.0075 1.9 3.3 226 Coarse OP-PF 1.2 0.004 2.0 2.8 205 EnKF 1.1 0.00042 1.5 1.8 - Emu-PF ($\Gamma=-1$) 0.5 0.0017 2.6 3.5 243 Emu-PF ($\Gamma=+2$) 0.75 0.0035 2.0 3.7 232 Emu-PF (PCA) 1.1 0.061 2.0 2.8 238
 [1] Alex Capaldi, Samuel Behrend, Benjamin Berman, Jason Smith, Justin Wright, Alun L. Lloyd. Parameter estimation and uncertainty quantification for an epidemic model. Mathematical Biosciences & Engineering, 2012, 9 (3) : 553-576. doi: 10.3934/mbe.2012.9.553 [2] Ryan Bennink, Ajay Jasra, Kody J. H. Law, Pavel Lougovski. Estimation and uncertainty quantification for the output from quantum simulators. Foundations of Data Science, 2019, 1 (2) : 157-176. doi: 10.3934/fods.2019007 [3] Weihong Guo, Yifei Lou, Jing Qin, Ming Yan. IPI special issue on "mathematical/statistical approaches in data science" in the Inverse Problem and Imaging. Inverse Problems & Imaging, 2021, 15 (1) : I-I. doi: 10.3934/ipi.2021007 [4] Jiangqi Wu, Linjie Wen, Jinglai Li. Resampled ensemble Kalman inversion for Bayesian parameter estimation with sequential data. Discrete & Continuous Dynamical Systems - S, 2021  doi: 10.3934/dcdss.2021045 [5] Alexandre J. Chorin, Fei Lu, Robert N. Miller, Matthias Morzfeld, Xuemin Tu. Sampling, feasibility, and priors in data assimilation. Discrete & Continuous Dynamical Systems, 2016, 36 (8) : 4227-4246. doi: 10.3934/dcds.2016.36.4227 [6] Débora A. F. Albanez, Maicon J. Benvenutti. Continuous data assimilation algorithm for simplified Bardina model. Evolution Equations & Control Theory, 2018, 7 (1) : 33-52. doi: 10.3934/eect.2018002 [7] Jochen Bröcker. Existence and uniqueness for variational data assimilation in continuous time. Mathematical Control & Related Fields, 2021  doi: 10.3934/mcrf.2021050 [8] Andreas Chirstmann, Qiang Wu, Ding-Xuan Zhou. Preface to the special issue on analysis in machine learning and data science. Communications on Pure & Applied Analysis, 2020, 19 (8) : i-iii. doi: 10.3934/cpaa.2020171 [9] Xin Guo, Lei Shi. Preface of the special issue on analysis in data science: Methods and applications. Mathematical Foundations of Computing, 2020, 3 (4) : i-ii. doi: 10.3934/mfc.2020026 [10] Issam S. Strub, Julie Percelay, Olli-Pekka Tossavainen, Alexandre M. Bayen. Comparison of two data assimilation algorithms for shallow water flows. Networks & Heterogeneous Media, 2009, 4 (2) : 409-430. doi: 10.3934/nhm.2009.4.409 [11] Joshua Hudson, Michael Jolly. Numerical efficacy study of data assimilation for the 2D magnetohydrodynamic equations. Journal of Computational Dynamics, 2019, 6 (1) : 131-145. doi: 10.3934/jcd.2019006 [12] Yuan Pei. Continuous data assimilation for the 3D primitive equations of the ocean. Communications on Pure & Applied Analysis, 2019, 18 (2) : 643-661. doi: 10.3934/cpaa.2019032 [13] Juan Carlos De los Reyes, Estefanía Loayza-Romero. Total generalized variation regularization in data assimilation for Burgers' equation. Inverse Problems & Imaging, 2019, 13 (4) : 755-786. doi: 10.3934/ipi.2019035 [14] Karl R. B. Schmitt, Linda Clark, Katherine M. Kinnaird, Ruth E. H. Wertz, Björn Sandstede. Evaluation of EDISON's data science competency framework through a comparative literature analysis. Foundations of Data Science, 2021  doi: 10.3934/fods.2021031 [15] Michele La Rocca, Cira Perna. Designing neural networks for modeling biological data: A statistical perspective. Mathematical Biosciences & Engineering, 2014, 11 (2) : 331-342. doi: 10.3934/mbe.2014.11.331 [16] Adam Larios, Yuan Pei. Approximate continuous data assimilation of the 2D Navier-Stokes equations via the Voigt-regularization with observable data. Evolution Equations & Control Theory, 2020, 9 (3) : 733-751. doi: 10.3934/eect.2020031 [17] Xiao-Bing Li, Qi-Lin Wang, Zhi Lin. Optimality conditions and duality for minimax fractional programming problems with data uncertainty. Journal of Industrial & Management Optimization, 2019, 15 (3) : 1133-1151. doi: 10.3934/jimo.2018089 [18] Andrew J. Majda, Michal Branicki. Lessons in uncertainty quantification for turbulent dynamical systems. Discrete & Continuous Dynamical Systems, 2012, 32 (9) : 3133-3221. doi: 10.3934/dcds.2012.32.3133 [19] Jing Li, Panos Stinis. Mori-Zwanzig reduced models for uncertainty quantification. Journal of Computational Dynamics, 2019, 6 (1) : 39-68. doi: 10.3934/jcd.2019002 [20] H. T. Banks, Robert Baraldi, Karissa Cross, Kevin Flores, Christina McChesney, Laura Poag, Emma Thorpe. Uncertainty quantification in modeling HIV viral mechanics. Mathematical Biosciences & Engineering, 2015, 12 (5) : 937-964. doi: 10.3934/mbe.2015.12.937

Impact Factor: