# American Institute of Mathematical Sciences

July  2016, 3(3): 261-278. doi: 10.3934/jdg.2016014

## A perturbation approach to a class of discounted approximate value iteration algorithms with borel spaces

 1 Departamento de Matemáticas, Universidad de Sonora, Rosales s/n, Col. Centro, 83000, Hermosillo, Sonora, Mexico, Mexico

Received  December 2015 Revised  July 2016 Published  August 2016

The present paper gives computable performance bounds for the approximate value iteration (AVI) algorithm when are used approximation operators satisfying the following properties: (i) they are positive linear operators; (ii) constant functions are fixed points of such operators; (iii) they have certain continuity property. Such operators define transition probabilities on the state space of the controlled systems. This has two important consequences: (a) one can see the approximating function as the average value of the target function with respect to the induced transition probability; (b) the approximation step in the AVI algorithm can be thought of as a perturbation of the original Markov model. These two facts enable us to give finite-time bounds for the AVI algorithm performance depending on the operators accuracy to approximate the cost function and the transition law of the system. The results are illustrated with numerical approximations for a class of inventory systems.
Citation: Óscar Vega-Amaya, Joaquín López-Borbón. A perturbation approach to a class of discounted approximate value iteration algorithms with borel spaces. Journal of Dynamics and Games, 2016, 3 (3) : 261-278. doi: 10.3934/jdg.2016014
##### References:
 [1] A. Almudevar, Approximate fixed point iteration with an application to infinite horizon Markov decision processes, SIAM Journal on Control and Optimization, 47 (2008), 2303-2347. doi: 10.1137/S0363012904441520. [2] E. F. Arruda, M. D. Fragoso and J. B. R. do Val, Approximate dynamic programming via direct search in the space of value function approximations, European Journal of Operational Research, 211 (2011), 343-351. doi: 10.1016/j.ejor.2010.11.019. [3] D. P. Bertsekas, Dynamic Programming: Deterministic and Stochastic Models, Prentice-Hall, Englewood Cliffs, NJ, 1987. [4] D. P. Bertsekas, Approximate policy iteration: A survey and some new methods, Journal of Control Theory and Applications, 9 (2011), 310-335. doi: 10.1007/s11768-011-1005-3. [5] D. P. Bertsekas and J. N. Tsitsiklis, Neuro-Dynamic Programming, Athena Scientific, Belmont, MA, 1996. [6] L. Beutel, H. Gonska and D. Kacsó, On variation-diminishing Shoenberg operators: new quantitative statements, in Multivariate approximation and interpolations with applications (ed. M. Gasca), Monografías de la Academia de Ciencias de Zaragoza, 20 (2002), 9-58. [7] R. A. DeVore, The Approximation of Continuous Functions by Positive Linear Operators, Lectures Notes in Mathematics 293, Springer-Verlag, Berlin Heidelberg, 1972. [8] F. Dufour and T. Prieto-Rumeau, Approximation of infinite horizon discounted cost Markov decision processes, in Optimization, Control, and Applications of Stochastic Systems, In Honor of Onésimo Hernández-Lerma (eds. D. Hernández-Hernández and J. A. Minjárez-Sosa), Birkhäuser, (2012), 59-76. doi: 10.1007/978-0-8176-8337-5_4. [9] D. P. de Farias and B. Van Roy, On the existence of fixed points for approximate value iteration and temporal difference learning, Journal of Optimization Theory and Applications, 105 (2000), 589-608. doi: 10.1023/A:1004641123405. [10] G. J. Gordon, Stable function approximation in dynamic programming, Proceedings of the Twelfth International Conference on Machine Learning, Tahoe City, CA, 1995, 261-268. doi: 10.1016/B978-1-55860-377-6.50040-2. [11] O. Hernández-Lerma, Adaptive Markov Control Processes, Springer-Verlag, NY, 1989. doi: 10.1007/978-1-4419-8714-3. [12] O. Hernández-Lerma and J. B. Lasserre, Discrete-Time Markov Control Processes. Basic Optimality Criteria, Springer-Verlag, NY, 1996. doi: 10.1007/978-1-4612-0729-0. [13] O. Hernández-Lerma and J. B. Lasserre, Further Topics on Discrete-Time Markov Control Processes, Springer-Verlag, NY, 1999. doi: 10.1007/978-1-4612-0561-6. [14] D. R. Jiang and W. B. Powel, An approximate dynamic programming algorithm for monotone value functions, Operations Research, 63 (2015), 1489-1511. doi: 10.1287/opre.2015.1425. [15] R. Munos, Performance bounds in $L_p$ -norm for approximate value iteration, SIAM Journal on Control and Optimization, 46 (2007), 541-561. doi: 10.1137/040614384. [16] W. P. Powell, Approximate Dynamic Programming. Solving the Curse of Dimensionality, John Wiley & Sons Inc., 2007. doi: 10.1002/9780470182963. [17] W. P. Powell and J. Ma, A review of stochastic algorithms with continuous value function approximation and some new approximate policy iteration alogrithms for multidimensional continuous applications, Journal of Control Theory and Applications, 9 (2011), 336-352. doi: 10.1007/s11768-011-0313-y. [18] W. P. Powell, Perspectives of approximate dynamic programming, Annals of Operations Research, 241 (2016), 319-356. doi: 10.1007/s10479-012-1077-6. [19] M. L. Puterman, Markov Decision Processes: Discrete Stochastic Dynamic Programming, Wiley, NY, 1994. [20] J. Rust, Numerical dynamic programming in economics, in Handbook of Computational Economics, (eds. H. Amman, D. Kendrick and J. Rust), Elsevier, 1 (1996), 619-729. [21] J. Stachurski, Continuous state dynamic programming via nonexpansive approximation, Computational Economics, 31 (2008), 141-160. doi: 10.1007/s10614-007-9111-5. [22] J. Stachurski, Dynamic Economic: Theory and Computation, MIT Press, Cambridge, MA, 2009. [23] S. Stidham Jr. and R. Weber, A survey of Markov decision models for control of networks of queues, Queueing Systems, 13 (1993), 291-314. doi: 10.1007/BF01158935. [24] R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, MIT Press, Cambridge, MA, 1998. [25] B. Van Roy, Performance loss bounds for approximate value iteration with state space aggregation, Mathematics of Operations Research, 31 (2006), 234-244. doi: 10.1287/moor.1060.0188. [26] O. Vega-Amaya and R. Montes-de-Oca, Application of average dynamic programming to inventory systems, Mathematical Methods of Operations Research, 47 (1998), 451-471. doi: 10.1007/BF01198405. [27] D. J. White, Real applications of Markov decision processes, Interfaces, 15 (1985), 73-83. doi: 10.1287/inte.15.6.73. [28] D. J. White, Further real applications of Markov decision processes, Interfaces, 18 (1988), 55-61. doi: 10.1287/inte.18.5.55. [29] D. J. White, A survey of applications of Markov decision processes, The Journal of the Operational Research Society, 44 (1993), 1073-1096. [30] S. Yakowitz, Dynamic programming applications in water resources, Water Resource Research, 18 (1982), 673-696. doi: 10.1029/WR018i004p00673. [31] W. W.-G. Yeh, Reservoir management and operation models: A state-of-the-art review, Water Resource Research, 21 (1985), 1797-1818. doi: 10.1029/WR021i012p01797.

show all references

##### References:
 [1] A. Almudevar, Approximate fixed point iteration with an application to infinite horizon Markov decision processes, SIAM Journal on Control and Optimization, 47 (2008), 2303-2347. doi: 10.1137/S0363012904441520. [2] E. F. Arruda, M. D. Fragoso and J. B. R. do Val, Approximate dynamic programming via direct search in the space of value function approximations, European Journal of Operational Research, 211 (2011), 343-351. doi: 10.1016/j.ejor.2010.11.019. [3] D. P. Bertsekas, Dynamic Programming: Deterministic and Stochastic Models, Prentice-Hall, Englewood Cliffs, NJ, 1987. [4] D. P. Bertsekas, Approximate policy iteration: A survey and some new methods, Journal of Control Theory and Applications, 9 (2011), 310-335. doi: 10.1007/s11768-011-1005-3. [5] D. P. Bertsekas and J. N. Tsitsiklis, Neuro-Dynamic Programming, Athena Scientific, Belmont, MA, 1996. [6] L. Beutel, H. Gonska and D. Kacsó, On variation-diminishing Shoenberg operators: new quantitative statements, in Multivariate approximation and interpolations with applications (ed. M. Gasca), Monografías de la Academia de Ciencias de Zaragoza, 20 (2002), 9-58. [7] R. A. DeVore, The Approximation of Continuous Functions by Positive Linear Operators, Lectures Notes in Mathematics 293, Springer-Verlag, Berlin Heidelberg, 1972. [8] F. Dufour and T. Prieto-Rumeau, Approximation of infinite horizon discounted cost Markov decision processes, in Optimization, Control, and Applications of Stochastic Systems, In Honor of Onésimo Hernández-Lerma (eds. D. Hernández-Hernández and J. A. Minjárez-Sosa), Birkhäuser, (2012), 59-76. doi: 10.1007/978-0-8176-8337-5_4. [9] D. P. de Farias and B. Van Roy, On the existence of fixed points for approximate value iteration and temporal difference learning, Journal of Optimization Theory and Applications, 105 (2000), 589-608. doi: 10.1023/A:1004641123405. [10] G. J. Gordon, Stable function approximation in dynamic programming, Proceedings of the Twelfth International Conference on Machine Learning, Tahoe City, CA, 1995, 261-268. doi: 10.1016/B978-1-55860-377-6.50040-2. [11] O. Hernández-Lerma, Adaptive Markov Control Processes, Springer-Verlag, NY, 1989. doi: 10.1007/978-1-4419-8714-3. [12] O. Hernández-Lerma and J. B. Lasserre, Discrete-Time Markov Control Processes. Basic Optimality Criteria, Springer-Verlag, NY, 1996. doi: 10.1007/978-1-4612-0729-0. [13] O. Hernández-Lerma and J. B. Lasserre, Further Topics on Discrete-Time Markov Control Processes, Springer-Verlag, NY, 1999. doi: 10.1007/978-1-4612-0561-6. [14] D. R. Jiang and W. B. Powel, An approximate dynamic programming algorithm for monotone value functions, Operations Research, 63 (2015), 1489-1511. doi: 10.1287/opre.2015.1425. [15] R. Munos, Performance bounds in $L_p$ -norm for approximate value iteration, SIAM Journal on Control and Optimization, 46 (2007), 541-561. doi: 10.1137/040614384. [16] W. P. Powell, Approximate Dynamic Programming. Solving the Curse of Dimensionality, John Wiley & Sons Inc., 2007. doi: 10.1002/9780470182963. [17] W. P. Powell and J. Ma, A review of stochastic algorithms with continuous value function approximation and some new approximate policy iteration alogrithms for multidimensional continuous applications, Journal of Control Theory and Applications, 9 (2011), 336-352. doi: 10.1007/s11768-011-0313-y. [18] W. P. Powell, Perspectives of approximate dynamic programming, Annals of Operations Research, 241 (2016), 319-356. doi: 10.1007/s10479-012-1077-6. [19] M. L. Puterman, Markov Decision Processes: Discrete Stochastic Dynamic Programming, Wiley, NY, 1994. [20] J. Rust, Numerical dynamic programming in economics, in Handbook of Computational Economics, (eds. H. Amman, D. Kendrick and J. Rust), Elsevier, 1 (1996), 619-729. [21] J. Stachurski, Continuous state dynamic programming via nonexpansive approximation, Computational Economics, 31 (2008), 141-160. doi: 10.1007/s10614-007-9111-5. [22] J. Stachurski, Dynamic Economic: Theory and Computation, MIT Press, Cambridge, MA, 2009. [23] S. Stidham Jr. and R. Weber, A survey of Markov decision models for control of networks of queues, Queueing Systems, 13 (1993), 291-314. doi: 10.1007/BF01158935. [24] R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, MIT Press, Cambridge, MA, 1998. [25] B. Van Roy, Performance loss bounds for approximate value iteration with state space aggregation, Mathematics of Operations Research, 31 (2006), 234-244. doi: 10.1287/moor.1060.0188. [26] O. Vega-Amaya and R. Montes-de-Oca, Application of average dynamic programming to inventory systems, Mathematical Methods of Operations Research, 47 (1998), 451-471. doi: 10.1007/BF01198405. [27] D. J. White, Real applications of Markov decision processes, Interfaces, 15 (1985), 73-83. doi: 10.1287/inte.15.6.73. [28] D. J. White, Further real applications of Markov decision processes, Interfaces, 18 (1988), 55-61. doi: 10.1287/inte.18.5.55. [29] D. J. White, A survey of applications of Markov decision processes, The Journal of the Operational Research Society, 44 (1993), 1073-1096. [30] S. Yakowitz, Dynamic programming applications in water resources, Water Resource Research, 18 (1982), 673-696. doi: 10.1029/WR018i004p00673. [31] W. W.-G. Yeh, Reservoir management and operation models: A state-of-the-art review, Water Resource Research, 21 (1985), 1797-1818. doi: 10.1029/WR021i012p01797.
 [1] Mathias Staudigl. A limit theorem for Markov decision processes. Journal of Dynamics and Games, 2014, 1 (4) : 639-659. doi: 10.3934/jdg.2014.1.639 [2] A. Mittal, N. Hemachandra. Learning algorithms for finite horizon constrained Markov decision processes. Journal of Industrial and Management Optimization, 2007, 3 (3) : 429-444. doi: 10.3934/jimo.2007.3.429 [3] Qiuli Liu, Xiaolong Zou. A risk minimization problem for finite horizon semi-Markov decision processes with loss rates. Journal of Dynamics and Games, 2018, 5 (2) : 143-163. doi: 10.3934/jdg.2018009 [4] Heikki Haario, Leonid Kalachev, Marko Laine. Reduction and identification of dynamic models. Simple example: Generic receptor model. Discrete and Continuous Dynamical Systems - B, 2013, 18 (2) : 417-435. doi: 10.3934/dcdsb.2013.18.417 [5] Wael Bahsoun, Paweł Góra. SRB measures for certain Markov processes. Discrete and Continuous Dynamical Systems, 2011, 30 (1) : 17-37. doi: 10.3934/dcds.2011.30.17 [6] Artur Stephan, Holger Stephan. Memory equations as reduced Markov processes. Discrete and Continuous Dynamical Systems, 2019, 39 (4) : 2133-2155. doi: 10.3934/dcds.2019089 [7] H.Thomas Banks, Shuhua Hu. Nonlinear stochastic Markov processes and modeling uncertainty in populations. Mathematical Biosciences & Engineering, 2012, 9 (1) : 1-25. doi: 10.3934/mbe.2012.9.1 [8] Angelica Pachon, Federico Polito, Costantino Ricciuti. On discrete-time semi-Markov processes. Discrete and Continuous Dynamical Systems - B, 2021, 26 (3) : 1499-1529. doi: 10.3934/dcdsb.2020170 [9] Thomas Kruse, Mikhail Urusov. Approximating exit times of continuous Markov processes. Discrete and Continuous Dynamical Systems - B, 2020, 25 (9) : 3631-3650. doi: 10.3934/dcdsb.2020076 [10] Xian Chen, Zhi-Ming Ma. A transformation of Markov jump processes and applications in genetic study. Discrete and Continuous Dynamical Systems, 2014, 34 (12) : 5061-5084. doi: 10.3934/dcds.2014.34.5061 [11] A. M. Vershik. Polymorphisms, Markov processes, quasi-similarity. Discrete and Continuous Dynamical Systems, 2005, 13 (5) : 1305-1324. doi: 10.3934/dcds.2005.13.1305 [12] Rich Stankewitz, Hiroki Sumi. Random backward iteration algorithm for Julia sets of rational semigroups. Discrete and Continuous Dynamical Systems, 2015, 35 (5) : 2165-2175. doi: 10.3934/dcds.2015.35.2165 [13] Scott Crass. New light on solving the sextic by iteration: An algorithm using reliable dynamics. Journal of Modern Dynamics, 2011, 5 (2) : 397-408. doi: 10.3934/jmd.2011.5.397 [14] Vladimir Gaitsgory, Alex Parkinson, Ilya Shvartsman. Linear programming based optimality conditions and approximate solution of a deterministic infinite horizon discounted optimal control problem in discrete time. Discrete and Continuous Dynamical Systems - B, 2019, 24 (4) : 1743-1767. doi: 10.3934/dcdsb.2018235 [15] Chandan Pal, Somnath Pradhan. Zero-sum games for pure jump processes with risk-sensitive discounted cost criteria. Journal of Dynamics and Games, 2022, 9 (1) : 13-25. doi: 10.3934/jdg.2021020 [16] Necdet Bildik, Sinan Deniz. New approximate solutions to the nonlinear Klein-Gordon equations using perturbation iteration techniques. Discrete and Continuous Dynamical Systems - S, 2020, 13 (3) : 503-518. doi: 10.3934/dcdss.2020028 [17] Zongwei Chen. An online-decision algorithm for the multi-period bank clearing problem. Journal of Industrial and Management Optimization, 2022, 18 (4) : 2783-2803. doi: 10.3934/jimo.2021091 [18] Gholam Hassan Shirdel, Somayeh Ramezani-Tarkhorani. A new method for ranking decision making units using common set of weights: A developed criterion. Journal of Industrial and Management Optimization, 2020, 16 (2) : 633-651. doi: 10.3934/jimo.2018171 [19] Evangelos Evangelou. Approximate Bayesian inference for geostatistical generalised linear models. Foundations of Data Science, 2019, 1 (1) : 39-60. doi: 10.3934/fods.2019002 [20] Samuel N. Cohen. Uncertainty and filtering of hidden Markov models in discrete time. Probability, Uncertainty and Quantitative Risk, 2020, 5 (0) : 4-. doi: 10.1186/s41546-020-00046-x

Impact Factor: