\`x^2+y_1+z_12^34\`
Advanced Search
Article Contents
Article Contents

A perturbation approach to a class of discounted approximate value iteration algorithms with borel spaces

Abstract Related Papers Cited by
  • The present paper gives computable performance bounds for the approximate value iteration (AVI) algorithm when are used approximation operators satisfying the following properties: (i) they are positive linear operators; (ii) constant functions are fixed points of such operators; (iii) they have certain continuity property. Such operators define transition probabilities on the state space of the controlled systems. This has two important consequences: (a) one can see the approximating function as the average value of the target function with respect to the induced transition probability; (b) the approximation step in the AVI algorithm can be thought of as a perturbation of the original Markov model. These two facts enable us to give finite-time bounds for the AVI algorithm performance depending on the operators accuracy to approximate the cost function and the transition law of the system. The results are illustrated with numerical approximations for a class of inventory systems.
    Mathematics Subject Classification: Primary: 93E20, 90C59; Secondary: 90C40.

    Citation:

    \begin{equation} \\ \end{equation}
  • [1]

    A. Almudevar, Approximate fixed point iteration with an application to infinite horizon Markov decision processes, SIAM Journal on Control and Optimization, 47 (2008), 2303-2347.doi: 10.1137/S0363012904441520.

    [2]

    E. F. Arruda, M. D. Fragoso and J. B. R. do Val, Approximate dynamic programming via direct search in the space of value function approximations, European Journal of Operational Research, 211 (2011), 343-351.doi: 10.1016/j.ejor.2010.11.019.

    [3]

    D. P. Bertsekas, Dynamic Programming: Deterministic and Stochastic Models, Prentice-Hall, Englewood Cliffs, NJ, 1987.

    [4]

    D. P. Bertsekas, Approximate policy iteration: A survey and some new methods, Journal of Control Theory and Applications, 9 (2011), 310-335.doi: 10.1007/s11768-011-1005-3.

    [5]

    D. P. Bertsekas and J. N. Tsitsiklis, Neuro-Dynamic Programming, Athena Scientific, Belmont, MA, 1996.

    [6]

    L. Beutel, H. Gonska and D. Kacsó, On variation-diminishing Shoenberg operators: new quantitative statements, in Multivariate approximation and interpolations with applications (ed. M. Gasca), Monografías de la Academia de Ciencias de Zaragoza, 20 (2002), 9-58.

    [7]

    R. A. DeVore, The Approximation of Continuous Functions by Positive Linear Operators, Lectures Notes in Mathematics 293, Springer-Verlag, Berlin Heidelberg, 1972.

    [8]

    F. Dufour and T. Prieto-Rumeau, Approximation of infinite horizon discounted cost Markov decision processes, in Optimization, Control, and Applications of Stochastic Systems, In Honor of Onésimo Hernández-Lerma (eds. D. Hernández-Hernández and J. A. Minjárez-Sosa), Birkhäuser, (2012), 59-76.doi: 10.1007/978-0-8176-8337-5_4.

    [9]

    D. P. de Farias and B. Van Roy, On the existence of fixed points for approximate value iteration and temporal difference learning, Journal of Optimization Theory and Applications, 105 (2000), 589-608.doi: 10.1023/A:1004641123405.

    [10]

    G. J. Gordon, Stable function approximation in dynamic programming, Proceedings of the Twelfth International Conference on Machine Learning, Tahoe City, CA, 1995, 261-268.doi: 10.1016/B978-1-55860-377-6.50040-2.

    [11]

    O. Hernández-Lerma, Adaptive Markov Control Processes, Springer-Verlag, NY, 1989.doi: 10.1007/978-1-4419-8714-3.

    [12]

    O. Hernández-Lerma and J. B. Lasserre, Discrete-Time Markov Control Processes. Basic Optimality Criteria, Springer-Verlag, NY, 1996.doi: 10.1007/978-1-4612-0729-0.

    [13]

    O. Hernández-Lerma and J. B. Lasserre, Further Topics on Discrete-Time Markov Control Processes, Springer-Verlag, NY, 1999.doi: 10.1007/978-1-4612-0561-6.

    [14]

    D. R. Jiang and W. B. Powel, An approximate dynamic programming algorithm for monotone value functions, Operations Research, 63 (2015), 1489-1511.doi: 10.1287/opre.2015.1425.

    [15]

    R. Munos, Performance bounds in $L_p$ -norm for approximate value iteration, SIAM Journal on Control and Optimization, 46 (2007), 541-561.doi: 10.1137/040614384.

    [16]

    W. P. Powell, Approximate Dynamic Programming. Solving the Curse of Dimensionality, John Wiley & Sons Inc., 2007.doi: 10.1002/9780470182963.

    [17]

    W. P. Powell and J. Ma, A review of stochastic algorithms with continuous value function approximation and some new approximate policy iteration alogrithms for multidimensional continuous applications, Journal of Control Theory and Applications, 9 (2011), 336-352.doi: 10.1007/s11768-011-0313-y.

    [18]

    W. P. Powell, Perspectives of approximate dynamic programming, Annals of Operations Research, 241 (2016), 319-356.doi: 10.1007/s10479-012-1077-6.

    [19]

    M. L. Puterman, Markov Decision Processes: Discrete Stochastic Dynamic Programming, Wiley, NY, 1994.

    [20]

    J. Rust, Numerical dynamic programming in economics, in Handbook of Computational Economics, (eds. H. Amman, D. Kendrick and J. Rust), Elsevier, 1 (1996), 619-729.

    [21]

    J. Stachurski, Continuous state dynamic programming via nonexpansive approximation, Computational Economics, 31 (2008), 141-160.doi: 10.1007/s10614-007-9111-5.

    [22]

    J. Stachurski, Dynamic Economic: Theory and Computation, MIT Press, Cambridge, MA, 2009.

    [23]

    S. Stidham Jr. and R. Weber, A survey of Markov decision models for control of networks of queues, Queueing Systems, 13 (1993), 291-314.doi: 10.1007/BF01158935.

    [24]

    R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, MIT Press, Cambridge, MA, 1998.

    [25]

    B. Van Roy, Performance loss bounds for approximate value iteration with state space aggregation, Mathematics of Operations Research, 31 (2006), 234-244.doi: 10.1287/moor.1060.0188.

    [26]

    O. Vega-Amaya and R. Montes-de-Oca, Application of average dynamic programming to inventory systems, Mathematical Methods of Operations Research, 47 (1998), 451-471.doi: 10.1007/BF01198405.

    [27]

    D. J. White, Real applications of Markov decision processes, Interfaces, 15 (1985), 73-83.doi: 10.1287/inte.15.6.73.

    [28]

    D. J. White, Further real applications of Markov decision processes, Interfaces, 18 (1988), 55-61.doi: 10.1287/inte.18.5.55.

    [29]

    D. J. White, A survey of applications of Markov decision processes, The Journal of the Operational Research Society, 44 (1993), 1073-1096.

    [30]

    S. Yakowitz, Dynamic programming applications in water resources, Water Resource Research, 18 (1982), 673-696.doi: 10.1029/WR018i004p00673.

    [31]

    W. W.-G. Yeh, Reservoir management and operation models: A state-of-the-art review, Water Resource Research, 21 (1985), 1797-1818.doi: 10.1029/WR021i012p01797.

  • 加载中
SHARE

Article Metrics

HTML views() PDF downloads(184) Cited by(0)

Access History

Other Articles By Authors

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return