Advanced Search
Article Contents
Article Contents

A surrogate-based approach to nonlinear, non-Gaussian joint state-parameter data assimilation

  • * Corresponding author: John Maclean

    * Corresponding author: John Maclean 
The first author is supported by the ARC grant DP180100050, and acknowledges past support from ONR grant N00014-18-1-2204. The second author is supported by NSF grant DMS-1821338
Abstract Full Text(HTML) Figure(10) / Table(5) Related Papers Cited by
  • Many recent advances in sequential assimilation of data into nonlinear high-dimensional models are modifications to particle filters which employ efficient searches of a high-dimensional state space. In this work, we present a complementary strategy that combines statistical emulators and particle filters. The emulators are used to learn and offer a computationally cheap approximation to the forward dynamic mapping. This emulator-particle filter (Emu-PF) approach requires a modest number of forward-model runs, but yields well-resolved posterior distributions even in non-Gaussian cases. We explore several modifications to the Emu-PF that utilize mechanisms for dimension reduction to efficiently fit the statistical emulator, and present a series of simulation experiments on an atypical Lorenz-96 system to demonstrate their performance. We conclude with a discussion on how the Emu-PF can be paired with modern particle filtering algorithms.

    Mathematics Subject Classification: Primary: 62R07, 62M20, 62C12; Secondary: 62M05, 62G07, 86A10.


    \begin{equation} \\ \end{equation}
  • 加载中
  • Figure 1.  Schematic for state dependence on parameters: we plot the state at eight different samples (1a), then apply a variety of interpolating schemes (1b) and lastly a statistical surrogate (1c). The shaded region in the rightmost plot shows one standard deviation in uncertainty. The second and third plot allow for the state to be estimated at a variety of parameter values

    Figure 2.  Here we demonstrate how the GP example mapping from parameter space to state space (as in Figure 1) can be used in a particle filter update step. (A) The same GP mapping from parameter to state space is plotted (black line) along with the design points (blue dots along black line) used to fit that mapping and an observation (red dot and line) in state space along the left axis. A bi-modal prior distribution is plotted (light blue) along with samples from that distribution ($ 10^3 $ in light blue and 8 in black) along the horizontal axis. (B) Stem plots of the eight-sample PF posterior along with $ 10^3 $-sample normalized posterior histogram of parameters taking into account the likelihood of GP mapped prior samples given the observation in (A). Plotted behind the emuPF histogram is the equivalent (and nearly identical) histogram using the true mapping instead of the GP mapping for each of the $ 10^3 $ samples.

    Figure 3.  Overview of the novel synthesis of Gaussian process emulators with Data Assimilation methods

    Figure 4.  Visualisation of the internal Emu-PF mechanisms over one assimilation step. Left column shows components of dimension $ n_D = 100. $ Right column shows components of dimension $ N_F = 10,000 $. (a): parameter ensembles at time $ t_j $. (b): distribution of one state variable as a function of parameters. (c): parameter ensembles at time $ t_{j+1} $. Full details for this $ 8 $ state, $ 2 $ parameter experiment are given in section 4

    Figure 5.  Long term error statistics for the implementation of Emu-PF from fig. 4, compared to: a "coarse" PF that employs $ n_D = 100 $ model runs (as in the Emu-PF), and a "fine" PF that employs $ N_F = 10,000 $ model runs, equal to the number of samples in the Emu-PF emulator. Performance of Emu-PF is markedly better than the coarse PF

    Figure 6.  Error statistics for Experiment One, $ m = 8 $ observations at each observation time, of accuracy $ \sigma_0 = 1 $. In this (and every) plot, only every $ 20 $th data point is shown. For this mildly difficult filtering problem, we observe that the $ \Gamma = -1 $ implementation of section 3.2, that uses no state variables at all as emulator inputs, is stable and reasonably accurate

    Figure 7.  Error statistics for Experiment Two, $ m = 2 $ observations at each observation time, of accuracy $ \sigma_0 = 1 $. The $ \Gamma = -1 $ Emu-PF and fine PF both under-perform compared to their mean behaviour; the Emu-PF employing PCA is stable and accurate

    Figure 8.  Error statistics for Experiment Three, $ m = 4 $ observations at each observation time, of accuracy $ \sigma_0 = 0.5 $. In this case the $ \Gamma = 1 $ Emu-PF performs only as well as the coarse PF. However the Emu-PF employing PCA is still competitive with the, much more expensive, fine PF

    Figure 9.  Summary statistics for Experiment Four, long-time state estimation with $ m = 4 $ observations of accuracy $ \sigma_o = 1 $. The median RMSE for EnKF and fine PF are similar; however the EnKF error occasionally spikes. The sliced Emu-PF of section 3.2 is stable, with no large error spikes, and performs close to the fine PF in accuracy

    Figure 10.  RMSE against time for Experiment Five: dashed red lines plot the Fine PF (formulated under the Optimal Proposal), and solid blue lines plot the best-performing Emu-PF according to table 5. There is a clear improvement in skill in parameter estimation. State estimates are similar in skill (and, importantly, do possess some skill: the state RMSE is well below $ 5 $, the approximate long-term or climatic mean RMSE of forecasting with no DA)

    Table 1.  Summary statistics for twenty repetitions of experiment One. The 'Resampling' column counts how many resampling steps, out of a thousand, were performed by each algorithm.

    RMSE ($ \theta $) Var ($ \theta $) RMSE ($ {{\mathbf x}} $) Var ($ {{\mathbf x}} $) Resampling
    Fine PF 0.066 0.0035 0.34 0.15 226
    Coarse PF 0.79 0.0015 2.1 0.16 663
    EnKF 0.048 0.0018 0.32 0.12 -
    Emu-PF ($ \Gamma=-1 $) 0.13 0.00026 2.4 5.1 483
     | Show Table
    DownLoad: CSV

    Table 2.  Summary statistics for twenty repetitions of experiment Two.

    RMSE ($ \theta $) Var ($ \theta $) RMSE ($ {{\mathbf x}} $) Var ($ {{\mathbf x}} $) Resampling
    Fine PF 0.074 0.0043 0.7 0.61 173
    Coarse PF 0.49 0.0016 4.9 0.13 312
    EnKF 0.065 0.0027 0.78 0.66 -
    Emu-PF ($ \Gamma=-1 $) 0.38 0.00085 3.8 6.1 526
    Emu-PF (PCA) 0.27 0.00051 3.1 0.58 339
     | Show Table
    DownLoad: CSV

    Table 3.  Summary statistics for twenty repetitions of experiment Three.

    RMSE ($ \theta $) Var ($ \theta $) RMSE ($ {{\mathbf x}} $) Var ($ {{\mathbf x}} $) Resampling
    Fine PF 0.062 0.0032 0.28 0.13 243
    Coarse PF 1 0.0012 3.4 0.11 739
    EnKF 0.045 0.0017 0.25 0.1 -
    Emu-PF ($ \Gamma=-1 $) 0.15 0.00034 2.4 5.1 590
    Emu-PF (PCA) 0.084 0.00075 1.5 0.085 334
     | Show Table
    DownLoad: CSV

    Table 4.  Summary statistics for Experiment Four

    RMSE ($ {{\mathbf x}} $) Var ($ {{\mathbf x}} $) Resampling
    Fine PF 0.47 0.15 1706
    Coarse PF 5.1 0.16 9917
    EnKF 1 0.096 -
    Emu-PF (Localized) 0.83 0.31 3566
     | Show Table
    DownLoad: CSV

    Table 5.  Summary statistics for twenty repetitions of experiment Five.

    RMSE ($ \theta $) Var ($ \theta $) RMSE ($ {{\mathbf x}} $) Var ($ {{\mathbf x}} $) Resampling
    Fine OP-PF 1.2 0.0075 1.9 3.3 226
    Coarse OP-PF 1.2 0.004 2.0 2.8 205
    EnKF 1.1 0.00042 1.5 1.8 -
    Emu-PF ($ \Gamma=-1 $) 0.5 0.0017 2.6 3.5 243
    Emu-PF ($ \Gamma=+2 $) 0.75 0.0035 2.0 3.7 232
    Emu-PF (PCA) 1.1 0.061 2.0 2.8 238
     | Show Table
    DownLoad: CSV
  • [1] M. J. BayarriJ. O. BergerJ. CafeoG. Garcia-Donato and F. Liu, et al., Computer model validation with functional output, Ann. Statist., 35 (2007), 1874-1906.  doi: 10.1214/009053607000000163.
    [2] J. Betancourt, F. Bachoc, T. Klein, D. Idier, R. Pedreros and J. Rohmer, Gaussian process metamodeling of functional-input code for coastal flood hazard assessment, Reliability Engineering & System Safety, 198 (2020). doi: 10.1016/j.ress.2020.106870.
    [3] M. BocquetJ. BrajardA. Carrassi and L. Bertino, Bayesian inference of chaotic dynamics by merging data assimilation, machine learning and expectation-maximization, Foundations of Data Science, 2 (2020), 55-80.  doi: 10.3934/fods.2020004.
    [4] J. Brajard, A. Carassi, M. Bocquet and L. Bertino, Combining data assimilation and machine learning to emulate a dynamical model from sparse and noisy observations: A case study with the Lorenz 96 model, J. Comput. Sci., 44 (2020), 11pp. doi: 10.1016/j.jocs.2020.101171.
    [5] A. Carrassi, M. Bocquet, L. Bertino and G. Evensen, Data assimilation in the geosciences: An overview of methods, issues, and perspectives, Wiley Interdisciplinary Reviews: Climate Change, 9 (2018). doi: 10.1002/wcc.535.
    [6] E. Cleary, A. Garbuno-Inigo, S. Lan, T. Schneider and A. M. Stuart, Calibrate, emulate, sample, J. Comput. Phys., 424 (2021), 20pp. doi: 10.1016/j.jcp.2020.109716.
    [7] D. Crisan and K. Li, Generalised particle filters with Gaussian mixtures, Stochastic Process. Appl., 125 (2015), 2643-2673.  doi: 10.1016/j.spa.2015.01.008.
    [8] A. Doucet, N. de Freitas and N. Gordon, Sequential Monte Carlo Methods in Practice, Statistics for Engineering and Information Science, Springer-Verlag, New York, 2001. doi: 10.1007/978-1-4757-3437-9.
    [9] G. Evensen, Data Assimilation. The Ensemble Kalman Filter, Springer-Verlag, Berlin, 2009. doi: 10.1007/978-3-642-03711-5.
    [10] G. Evensen, The ensemble Kalman filter: Theoretical formulation and practical implementation, Ocean Dynamics, 53 (2003), 343-367.  doi: 10.1007/s10236-003-0036-9.
    [11] G. A. Gottwald and S. Reich, Supervised learning from noisy observations: Combining machine-learning techniques with data assimilation, Phys. D, 423 (2021), 15pp. doi: 10.1016/j.physd.2021.132911.
    [12] M. Gu and J. O. Berger, Parallel partial Gaussian process emulation for computer models with massive output, Ann. Appl. Stat., 10 (2016), 1317-1347.  doi: 10.1214/16-AOAS934.
    [13] M. GuJ. Palomo and J. O. Berger, RobustGaSP: Robust Gaussian Stochastic Process Emulation in R, The R Journal, 11 (2019), 112-136.  doi: 10.32614/RJ-2019-011.
    [14] M. E. JohnsonL. M. Moore and D. Ylvisaker, Minimax and maximin distance designs, J. Statist. Plann. Inference, 26 (1990), 131-148.  doi: 10.1016/0378-3758(90)90122-B.
    [15] K. Law, A. Stuart and K. Zygalakis, Data Assimilation. A Mathematical Introduction, Texts in Applied Mathematics, 62, Springer, Cham, 2015. doi: 10.1007/978-3-319-20325-6.
    [16] J. Liu and M. West, Combined parameter and state estimation in simulation-based filtering, in Sequential Monte Carlo Methods in Practice, Stat. Eng. Inf. Sci., Springer, New York, 2001,197–223. doi: 10.1007/978-1-4757-3437-9_10.
    [17] J. S. Liu and R. Chen, Sequential Monte Carlo methods for dynamic systems, J. Amer. Statist. Assoc., 93 (1998), 1032-1044.  doi: 10.1080/01621459.1998.10473765.
    [18] X. Liu and S. Guillas, Dimension reduction for Gaussian process emulation: An application to the influence of bathymetry on tsunami heights, SIAM/ASA J. Uncertain. Quantif., 5 (2017), 787-812.  doi: 10.1137/16M1090648.
    [19] E. N. LorenzPredictability - A problem partly solved, in Proceedings of Seminar on Predictability, Cambridge University Press, Reading, UK, 1996.  doi: 10.1017/CBO9780511617652.004.
    [20] J. Maclean and E. S. V. Vleck, Particle filters for data assimilation based on reduced-order data models, Q. J. Roy. Meteor. Soc., 147 (2021), 1892-1907.  doi: 10.1002/qj.4001.
    [21] M. Morzfeld and D. Hodyss, Gaussian approximations in filters and smoothers for data assimilation, Tellus A, 71 (2019). doi: 10.1080/16000870.2019.1600344.
    [22] S. NakanoG. Ueno and T. Higuchi, Merging particle filter for sequential data assimilation, Nonlin. Processes Geophys., 14 (2007), 395-408.  doi: 10.5194/npg-14-395-2007.
    [23] D. Orrell and L. A. Smith, Visualizing bifurcations in high dimensional systems: The spectral bifurcation diagram, Internat. J. Bifur. Chaos Appl. Sci. Engrg., 13 (2003), 3015-3027.  doi: 10.1142/S0218127403008387.
    [24] J. Poterjoy, A localized particle filter for high-dimensional nonlinear systems, Monthly Weather Review, 144 (2016), 59-76.  doi: 10.1175/MWR-D-15-0163.1.
    [25] R. PotthastA. Walter and A. Rhodin, A localized adaptive particle filter within an operational NWP framework, Monthly Weather Review, 147 (2019), 345-362.  doi: 10.1175/MWR-D-18-0028.1.
    [26] C. E. Rasmussen and C. K. I. Williams, Gaussian Processes for Machine Learning, Adaptative Computation and Machine Learning, MIT Press, Cambridge, MA, 2006. Available from: http://www.gaussianprocess.org/gpml/chapters.
    [27] S. Reich and  C. CotterProbabilistic Forecasting and Bayesian Data Assimilation, Cambridge University Press, New York, 2015.  doi: 10.1017/CBO9781107706804.
    [28] J. SacksW. J. WelchT. J. Mitchell and H. P. Wynn, Design and analysis of computer experiments, Statist. Sci., 4 (1989), 409-423.  doi: 10.1214/ss/1177012413.
    [29] N. Santitissadeekorn and C. Jones, Two-stage filtering for joint state-parameter estimation, Monthly Weather Review, 143 (2015), 2028-2042.  doi: 10.1175/MWR-D-14-00176.1.
    [30] T. J. Santner, B. J. Williams and W. I. Notz, The Design and Analysis of Computer Experiments, Springer Series in Statistics, Springer, New York, 2018. doi: 10.1007/978-1-4939-8847-1.
    [31] C. Snyder, Particle filters, the "optimal" proposal and high-dimensional systems, in Proceedings of the ECMWF Seminar on Data Assimilation for Atmosphere and Ocean, 2011, 1–10. Available from: https://www.ecmwf.int/sites/default/files/elibrary/2012/12354-particle-filters-optimal-proposal-and-high-dimensional-systems.pdf.
    [32] C. SnyderT. BengtssonP. Bickel and J. Anderson, Obstacles to high-dimensional particle filtering, Monthly Weather Review, 136 (2008), 4629-4640.  doi: 10.1175/2008MWR2529.1.
    [33] P. J. van Leeuwen, Nonlinear data assimilation in geosciences: An extremely efficient particle filter, Q. J. Roy. Meteor. Soc., 136 (2010), 1991-1999.  doi: 10.1002/qj.699.
    [34] P. J. van LeeuwenH. R. KünschL. NergerR. Potthast and S. Reich, Particle filters for high-dimensional geoscience applications: A review, Q. J. Roy. Meteor. Soc., 145 (2019), 2335-2365.  doi: 10.1002/qj.3551.
    [35] W. J. WelchR. J. BuckJ. SacksH. P. WynnT. J. Mitchell and M. D. Morris, Screening, predicting, and computer experiments, Technometrics, 34 (1992), 15-25.  doi: 10.2307/1269548.
  • 加载中




Article Metrics

HTML views(693) PDF downloads(131) Cited by(0)

Access History

Other Articles By Authors



    DownLoad:  Full-Size Img  PowerPoint