- Previous Article
- JDG Home
- This Issue
-
Next Article
A new perspective on the classical Cournot duopoly
A probability criterion for zero-sum stochastic games
1. | School of Computer Science and Network Security, Dongguan University of Technology, Dongguan, 523808, China |
a. | School of Mathematics, Sun Yat-Sen University, Guangzhou, 510275, China |
b. | Sun Yat-sen Business School, Sun Yat-Sen University, Guangzhou, 510275, China |
This paper introduces a probability criterion for two-person zero-sum stochastic games, and focuses on the probability that the payoff before the first passage time to some target state set exceeds a level formulated by both players, which shows the security for player 1 and the risk for player 2. For the game model based on discrete-time Markov chains, under a suitable condition on the game's primitive data, we establish the Shapley equation, from which the existences of the value of the game and a pair of optimal policies are ensured. We also provide a recursive way of computing (or at least approximating) the value of the game. At last, the application of our main result is exhibited via an inventory system.
References:
[1] |
K. Fan,
Minimax theorems, Proc. Nat. Acad. Sci., 39 (1953), 42-47.
doi: 10.1073/pnas.39.1.42. |
[2] |
E. A. Feinberg and J. Fei,
An inequality for variances of the discounted rewards, J. Appl. Probab., 46 (2009), 1209-1212.
doi: 10.1017/S0021900200006240. |
[3] |
X. P. Guo and O. Hernández-Lerma,
Zero-sum games for continuous-time Markov chains with unbounded transition and average payoff rates, J. Appl. Probab., 40 (2003), 327-345.
doi: 10.1017/S0021900200019331. |
[4] |
X.P. Guo and O. Hernández-Lerma,
Nonzero-sum games for continuous-time Markov chains with unbounded discounted payoffs, J. Appl. Probab., 42 (2005), 303-320.
doi: 10.1017/S002190020000036X. |
[5] |
X. P. Guo and O. Hernández-Lerma,
Zero-sum games for continuous-time jump Markov processes in Polish spaces: discounted payoffs, Adv. in Appl. Probab., 39 (2007), 645-668.
doi: 10.1017/S0001867800001981. |
[6] |
X. P. Guo and O. Hernández-Lerma,
Continuous-Time Markov Decision Processes: Theory and Applications, Springer-Verlag, Berlin, 2009.
doi: 10.1007/978-3-642-02547-1. |
[7] |
X. P. Guo, M. Vykertas and Y. Zhang,
Absorbing continuous-time Markov decision processes with total cost criteria, Adv. in Appl. Probab., 45 (2013), 490-519.
doi: 10.1017/S0001867800006418. |
[8] |
O. Hernández-Lerma and J. B. Lasserre,
Zero-sum stochastic games in Borel spaces: average payoff criterion, SIAM J. Control Optim., 39 (2000), 1520-1539.
doi: 10.1137/S0363012999361962. |
[9] |
O. Hernández-Lerma and J. B. Lasserre,
Discrete-time Markov Control Processes: Basic Optimality Criteria, Springer-Verlag, New York, 1996.
doi: 10.1007/978-1-4612-0729-0. |
[10] |
O. Hernández-Lerma and J. B. Lasserre,
Further Topics on Discrete-Time Markov Control Processes, Springer-Verlag, New York, 1999.
doi: 10.1007/978-1-4612-0561-6. |
[11] |
Y. H. Huang, X. P. Guo and X. Y. Song,
Performance analysis for controlled semi-Markov systems with application to maintenance, J. Optim. Theory Appl., 150 (2011), 395-415.
doi: 10.1007/s10957-011-9813-7. |
[12] |
Y. H. Huang, X. P. Guo and Z. F. Li,
Minimum risk probability for finite horizon semi-Markov decision processes, J. Math. Anal. Appl., 402 (2013), 378-391.
doi: 10.1016/j.jmaa.2013.01.021. |
[13] |
A. Ja |
[14] |
A. Ja |
[15] |
A. Jaśkiewicz and A. S. Nowak,
Non-Zero-Sum Stochastic Games, In: Basar T, Zaccour G (eds.) Handbook of Dynamic Games, 2016. |
[16] |
A. S. Nowak,
Optimal strategies in a class of zero-sum ergodic stochastic games, Math. Methods Oper. Res., 50 (1999), 399-419.
doi: 10.1007/s001860050078. |
[17] |
A. S. Nowak,
Measurable selection theorems for minimax stochastic optimization problems, SIAM J.Control Optim., 23 (1985), 466-476.
doi: 10.1137/0323030. |
[18] |
Y. Ohtsubo,
Minimizing risk models in stochastic shortest path problems, Math. Methods Oper. Res., 57 (2003), 79-88.
doi: 10.1007/s001860200246. |
[19] |
Y. Ohtsubo,
Optimal threshold probability in undiscounted Markov decision processes with a target set, Appl. Math. Comput., 149 (2004), 519-532.
doi: 10.1016/S0096-3003(03)00158-9. |
[20] |
T. Parthasarathy and S. Sinha,
Existence of equilibrium stationary strategies in nonzero-sum discounted stochastic games with uncountable state space and state independent transitions, Internat. J. Game Theory, 18 (1989), 189-194.
doi: 10.1007/BF01268158. |
[21] |
M. L. Puterman,
Markov Decision Processes: Discrete Stochastic Dynamic Programming, John Wiley & Sons, Inc., New York, 1994. |
[22] |
S. Saha,
Zero-sum stochastic games with partial information and average payoff, J. Optim. Theory Appl., 160 (2014), 344-354.
doi: 10.1007/s10957-013-0359-8. |
[23] |
M. Sakaguchi and Y. Ohtsubo,
Optimal threshold probability and expectation in semi-Markov decision processes, Appl. Math. Comput., 216 (2010), 2947-2958.
doi: 10.1016/j.amc.2010.04.007. |
[24] |
L. I. Sennott,
Nonzero-sum stochastic games with unbounded costs: discounted and average cost cases, Z. Oper. Res., 40 (1994), 145-162.
doi: 10.1007/BF01432807. |
[25] |
O. Vega-Amaya,
Zero-sum average semi-Markov games: fixed-point solutions of the Shapley equation, SIAM J. Control Optim., 42 (2003), 1876-1894.
doi: 10.1137/S0363012902408423. |
show all references
References:
[1] |
K. Fan,
Minimax theorems, Proc. Nat. Acad. Sci., 39 (1953), 42-47.
doi: 10.1073/pnas.39.1.42. |
[2] |
E. A. Feinberg and J. Fei,
An inequality for variances of the discounted rewards, J. Appl. Probab., 46 (2009), 1209-1212.
doi: 10.1017/S0021900200006240. |
[3] |
X. P. Guo and O. Hernández-Lerma,
Zero-sum games for continuous-time Markov chains with unbounded transition and average payoff rates, J. Appl. Probab., 40 (2003), 327-345.
doi: 10.1017/S0021900200019331. |
[4] |
X.P. Guo and O. Hernández-Lerma,
Nonzero-sum games for continuous-time Markov chains with unbounded discounted payoffs, J. Appl. Probab., 42 (2005), 303-320.
doi: 10.1017/S002190020000036X. |
[5] |
X. P. Guo and O. Hernández-Lerma,
Zero-sum games for continuous-time jump Markov processes in Polish spaces: discounted payoffs, Adv. in Appl. Probab., 39 (2007), 645-668.
doi: 10.1017/S0001867800001981. |
[6] |
X. P. Guo and O. Hernández-Lerma,
Continuous-Time Markov Decision Processes: Theory and Applications, Springer-Verlag, Berlin, 2009.
doi: 10.1007/978-3-642-02547-1. |
[7] |
X. P. Guo, M. Vykertas and Y. Zhang,
Absorbing continuous-time Markov decision processes with total cost criteria, Adv. in Appl. Probab., 45 (2013), 490-519.
doi: 10.1017/S0001867800006418. |
[8] |
O. Hernández-Lerma and J. B. Lasserre,
Zero-sum stochastic games in Borel spaces: average payoff criterion, SIAM J. Control Optim., 39 (2000), 1520-1539.
doi: 10.1137/S0363012999361962. |
[9] |
O. Hernández-Lerma and J. B. Lasserre,
Discrete-time Markov Control Processes: Basic Optimality Criteria, Springer-Verlag, New York, 1996.
doi: 10.1007/978-1-4612-0729-0. |
[10] |
O. Hernández-Lerma and J. B. Lasserre,
Further Topics on Discrete-Time Markov Control Processes, Springer-Verlag, New York, 1999.
doi: 10.1007/978-1-4612-0561-6. |
[11] |
Y. H. Huang, X. P. Guo and X. Y. Song,
Performance analysis for controlled semi-Markov systems with application to maintenance, J. Optim. Theory Appl., 150 (2011), 395-415.
doi: 10.1007/s10957-011-9813-7. |
[12] |
Y. H. Huang, X. P. Guo and Z. F. Li,
Minimum risk probability for finite horizon semi-Markov decision processes, J. Math. Anal. Appl., 402 (2013), 378-391.
doi: 10.1016/j.jmaa.2013.01.021. |
[13] |
A. Ja |
[14] |
A. Ja |
[15] |
A. Jaśkiewicz and A. S. Nowak,
Non-Zero-Sum Stochastic Games, In: Basar T, Zaccour G (eds.) Handbook of Dynamic Games, 2016. |
[16] |
A. S. Nowak,
Optimal strategies in a class of zero-sum ergodic stochastic games, Math. Methods Oper. Res., 50 (1999), 399-419.
doi: 10.1007/s001860050078. |
[17] |
A. S. Nowak,
Measurable selection theorems for minimax stochastic optimization problems, SIAM J.Control Optim., 23 (1985), 466-476.
doi: 10.1137/0323030. |
[18] |
Y. Ohtsubo,
Minimizing risk models in stochastic shortest path problems, Math. Methods Oper. Res., 57 (2003), 79-88.
doi: 10.1007/s001860200246. |
[19] |
Y. Ohtsubo,
Optimal threshold probability in undiscounted Markov decision processes with a target set, Appl. Math. Comput., 149 (2004), 519-532.
doi: 10.1016/S0096-3003(03)00158-9. |
[20] |
T. Parthasarathy and S. Sinha,
Existence of equilibrium stationary strategies in nonzero-sum discounted stochastic games with uncountable state space and state independent transitions, Internat. J. Game Theory, 18 (1989), 189-194.
doi: 10.1007/BF01268158. |
[21] |
M. L. Puterman,
Markov Decision Processes: Discrete Stochastic Dynamic Programming, John Wiley & Sons, Inc., New York, 1994. |
[22] |
S. Saha,
Zero-sum stochastic games with partial information and average payoff, J. Optim. Theory Appl., 160 (2014), 344-354.
doi: 10.1007/s10957-013-0359-8. |
[23] |
M. Sakaguchi and Y. Ohtsubo,
Optimal threshold probability and expectation in semi-Markov decision processes, Appl. Math. Comput., 216 (2010), 2947-2958.
doi: 10.1016/j.amc.2010.04.007. |
[24] |
L. I. Sennott,
Nonzero-sum stochastic games with unbounded costs: discounted and average cost cases, Z. Oper. Res., 40 (1994), 145-162.
doi: 10.1007/BF01432807. |
[25] |
O. Vega-Amaya,
Zero-sum average semi-Markov games: fixed-point solutions of the Shapley equation, SIAM J. Control Optim., 42 (2003), 1876-1894.
doi: 10.1137/S0363012902408423. |
[1] |
Alexander J. Zaslavski. Turnpike properties of approximate solutions of dynamic discrete time zero-sum games. Journal of Dynamics and Games, 2014, 1 (2) : 299-330. doi: 10.3934/jdg.2014.1.299 |
[2] |
Alexander J. Zaslavski. Structure of approximate solutions of dynamic continuous time zero-sum games. Journal of Dynamics and Games, 2014, 1 (1) : 153-179. doi: 10.3934/jdg.2014.1.153 |
[3] |
Qingmeng Wei, Zhiyong Yu. Time-inconsistent recursive zero-sum stochastic differential games. Mathematical Control and Related Fields, 2018, 8 (3&4) : 1051-1079. doi: 10.3934/mcrf.2018045 |
[4] |
Fernando Luque-Vásquez, J. Adolfo Minjárez-Sosa. Average optimal strategies for zero-sum Markov games with poorly known payoff function on one side. Journal of Dynamics and Games, 2014, 1 (1) : 105-119. doi: 10.3934/jdg.2014.1.105 |
[5] |
Salah Eddine Choutri, Boualem Djehiche, Hamidou Tembine. Optimal control and zero-sum games for Markov chains of mean-field type. Mathematical Control and Related Fields, 2019, 9 (3) : 571-605. doi: 10.3934/mcrf.2019026 |
[6] |
Angelica Pachon, Federico Polito, Costantino Ricciuti. On discrete-time semi-Markov processes. Discrete and Continuous Dynamical Systems - B, 2021, 26 (3) : 1499-1529. doi: 10.3934/dcdsb.2020170 |
[7] |
Xi Zhu, Meixia Li, Chunfa Li. Consensus in discrete-time multi-agent systems with uncertain topologies and random delays governed by a Markov chain. Discrete and Continuous Dynamical Systems - B, 2020, 25 (12) : 4535-4551. doi: 10.3934/dcdsb.2020111 |
[8] |
Zhen Wu, Feng Zhang. Maximum principle for discrete-time stochastic optimal control problem and stochastic game. Mathematical Control and Related Fields, 2022, 12 (2) : 475-493. doi: 10.3934/mcrf.2021031 |
[9] |
Valery Y. Glizer, Oleg Kelis. Singular infinite horizon zero-sum linear-quadratic differential game: Saddle-point equilibrium sequence. Numerical Algebra, Control and Optimization, 2017, 7 (1) : 1-20. doi: 10.3934/naco.2017001 |
[10] |
Abd El-Monem A. Megahed, Ebrahim A. Youness, Hebatallah K. Arafat. Optimization method in counter terrorism: Min-Max zero-sum differential game approach. Numerical Algebra, Control and Optimization, 2022 doi: 10.3934/naco.2022013 |
[11] |
Ming Chen, Hao Wang. Dynamics of a discrete-time stoichiometric optimal foraging model. Discrete and Continuous Dynamical Systems - B, 2021, 26 (1) : 107-120. doi: 10.3934/dcdsb.2020264 |
[12] |
Veena Goswami, Gopinath Panda. Optimal information policy in discrete-time queues with strategic customers. Journal of Industrial and Management Optimization, 2019, 15 (2) : 689-703. doi: 10.3934/jimo.2018065 |
[13] |
Veena Goswami, Gopinath Panda. Optimal customer behavior in observable and unobservable discrete-time queues. Journal of Industrial and Management Optimization, 2021, 17 (1) : 299-316. doi: 10.3934/jimo.2019112 |
[14] |
Marianne Akian, Stéphane Gaubert, Antoine Hochart. Ergodicity conditions for zero-sum games. Discrete and Continuous Dynamical Systems, 2015, 35 (9) : 3901-3931. doi: 10.3934/dcds.2015.35.3901 |
[15] |
Yueyuan Zhang, Yanyan Yin, Fei Liu. Robust observer-based control for discrete-time semi-Markov jump systems with actuator saturation. Journal of Industrial and Management Optimization, 2021, 17 (6) : 3013-3026. doi: 10.3934/jimo.2020105 |
[16] |
Michael C. Fu, Bingqing Li, Rongwen Wu, Tianqi Zhang. Option pricing under a discrete-time Markov switching stochastic volatility with co-jump model. Frontiers of Mathematical Finance, 2022, 1 (1) : 137-160. doi: 10.3934/fmf.2021005 |
[17] |
Qiuying Li, Lifang Huang, Jianshe Yu. Modulation of first-passage time for bursty gene expression via random signals. Mathematical Biosciences & Engineering, 2017, 14 (5&6) : 1261-1277. doi: 10.3934/mbe.2017065 |
[18] |
Sylvain Sorin, Guillaume Vigeral. Reversibility and oscillations in zero-sum discounted stochastic games. Journal of Dynamics and Games, 2015, 2 (1) : 103-115. doi: 10.3934/jdg.2015.2.103 |
[19] |
Antoine Hochart. An accretive operator approach to ergodic zero-sum stochastic games. Journal of Dynamics and Games, 2019, 6 (1) : 27-51. doi: 10.3934/jdg.2019003 |
[20] |
Zhi-Wei Sun. Unification of zero-sum problems, subset sums and covers of Z. Electronic Research Announcements, 2003, 9: 51-60. |
Impact Factor:
Tools
Metrics
Other articles
by authors
[Back to Top]