# American Institute of Mathematical Sciences

• Previous Article
Self adaptive inertial relaxed $CQ$ algorithms for solving split feasibility problem with multiple output sets
• JIMO Home
• This Issue
• Next Article
$\mu$ and $H_\infty$ optimization control based on optimal oxygen excess ratio for the Proton Exchange Membrane Fuel Cell (PEMFC)
doi: 10.3934/jimo.2022047
Online First

Online First articles are published articles within a journal that have not yet been assigned to a formal issue. This means they do not yet have a volume number, issue number, or page numbers assigned to them, however, they can still be found and cited using their DOI (Digital Object Identifier). Online First publication benefits the research community by making new scientific discoveries known as quickly as possible.

Readers can access Online First articles via the “Online First” tab for the selected journal.

## A heuristically accelerated reinforcement learning method for maintenance policy of an assembly line

 1 School of Safety Engineering, Shenyang Aerospace University, Shenyang 110136, China 2 School of Mechanical Engineering, Shenyang Institute of Engineering, Shenyang 110136, China

*Corresponding author: Xiao Wang

Received  June 2021 Revised  February 2022 Early access April 2022

This paper aims to investigate the maintenance policy for a two-machine one-buffer (2M1B) assembly line system. We assume that the observed quality states of the deteriorating machines in the system are characterized by multiple decreasing yield stages. A semi-Markov decision process (SMDP) model is used for describing the deteriorating process of the system. A heuristically accelerated multi-agent reinforcement learning (HAMRL) method is conducted to solve the problem model. The asynchronous updating rules are introduced in the HAMRL method, and the production time, preventive maintenance (PM) time and corrective repair (CR) time are random, and the deterioration mode of the device is not fixed. Meanwhile, a comparison with a simulated annealing search (SAS) based exploration algorithm and a neighborhood search (NS) based exploration algorithm in reinforcement learning (RL) is presented. The empirical results indicate that the proposed HAMRL algorithm can speed up the learning process, and has a certain advantage for the larger space and the more practical problem. And the maintenance strategy for the 2M1B assembly line system is obtained under the condition of convergent system average cost rate. This paper provides new and practical insights into the application and selection of techniques for maintenance policy of the 2M1B assembly line system.

Citation: Xiao Wang, Guowei Zhang, Yongqiang Li, Na Qu. A heuristically accelerated reinforcement learning method for maintenance policy of an assembly line. Journal of Industrial and Management Optimization, doi: 10.3934/jimo.2022047
##### References:
 [1] M. Abramson and H. Wechsler, Tabu search exploration for on-policy reinforcement learning, Proceedings of the International Joint Conference on Neural Networks, Portland, OR, USA, (2003), 2910–2915. [2] N. Aissani, A. Bekrar, D. Trentesaux and B. Beldjilali, Dynamic scheduling for multi-site companies: A decisional approach based on reinforcement multi-agent learning, J. Intell. Manuf., 23 (2012), 2513-2529. [3] C. P. Andriotis and K. G. Papakonstantinou, Managing engineering systems with large state and action spaces through deep reinforcement learning, Reliab. Eng. Syst. Safe., 191 (2019). [4] R. A. C. Bianchi, C. H. C. Ribeiro and A. H. R. Costa, Accelerating autonomous learning by using heuristic selection of actions, J. Heuristics., 14 (2008), 135-168. [5] R. A. C. Bianchi, C. H. C. Ribeiro and A. H. R. Costa, Heuristic selection of actions in multi-agent reinforcement learning, Proceedings of the 20th International Joint Conference on Artificial Intelligence, San Francisco, CA, USA, 2007,690–696. [6] L. Busoniu, R. Babuska and B. De Schutter, A comprehensive survey of multi-agent reinforcement learning, Innovations in Multi-Ggent Systems and Applications, Stud. Comput. Intell., 310 (2010), 183-221.  doi: 10.1007/978-3-642-14435-6_7. [7] Y. L. Chen, C. C. Chang, Z. G. Zhang and X. Chen, Optimal preventive "maintenance-first or -last" policies with generalized imperfect maintenance models, J. Ind. Manag. Optim., 17 (2021), 501-516.  doi: 10.3934/jimo.2020149. [8] T. K. Das, A. Gosavi, S. Mahadevan and N. Marchalleck, Solving semi-markov decision problems using average reward reinforcement learning, Manage. Sci., 45 (1999), 560-574. [9] L. M. Gambardella and M. Dorigo, Ant-Q: A reinforcement learning approach to the traveling salesman problem, Proceedings of Twelfth International Conference on Machine Learning, San Francisco, CA, USA, (1995), 252–260. [10] X. Gu, W. Guo and X. Jin, Performance evaluation for manufacturing systems under control-limit maintenance policy, J. Manuf. Syst., 55 (2020), 221-232. [11] X. Han, Z. Wang, M. Xie, Y. He, Y. Li and W. Wang, Remaining useful life prediction and predictive maintenance strategies for multi-state manufacturing systems considering functional dependence, Reliab. Eng. Syst. Safe., 210 (2021). [12] J. Huang, Q. Chang and J. Arinez, Deep reinforcement learning based preventive maintenance policy for serial production lines, Expert Syst. Appl., 160 (2020). [13] L. Kang, Y. Xie and S. You, Non-numerical parallel algorithms, J. Heuristics., 14 (1998), 22-55. [14] S. Kapetanakis and D. Kudenko, Reinforcement learning of coordination in cooperative multi-agent systems, Eighteenth National Conference on Artificial Intelligence, American Association for Artificial Intelligence, Menlo Park, CA, USA, (2002), 326–331. [15] C. C. Karamatsoukis and E. G. Kyriakidis, Optimal maintenance of two stochastically deteriorating machines with an intermediate buffer, Eur. J. Oper. Res., 207 (2010), 297-308.  doi: 10.1016/j.ejor.2010.04.022. [16] S. Kirkpatrick, C. D. Gelatt and M. P. Vecchi, Optimization by simulated annealing, Science., 220 (1983), 671-680.  doi: 10.1126/science.220.4598.671. [17] E. G. Kyriakidis and T. D. Dimitrakos, Optimal preventive maintenance of a production system with an intermediate buffer, Eur. J. Oper. Res., 168 (2005), 86-99.  doi: 10.1016/j.ejor.2004.01.052. [18] Q. Liu, M. Dong, F. Chen, W. Liu and C. Ye, Multi-objective imperfect maintenance optimization for production system with an intermediate buffer, J. Manuf. Syst., 56 (2020), 452-462. [19] Y. Liu, Y. M. Chen and T. Jiang, Dynamic selective maintenance optimization for multi-state systems over a finite horizon: A deep reinforcement learning approach, Eur. J. Oper. Res., 283 (2020), 166-181.  doi: 10.1016/j.ejor.2019.10.049. [20] Y. Liu, Y. M. Chen and T. Jiang, Imperfect maintenance, Eur. J. Oper. Res., 94 (1996), 425-438. [21] S. Mahadevan and G. Theocharous, Optimizing production manufacturing using reinforcement learning, Proceedings of the Eleventh International Florida Artificial Intelligence Research Society Conference, San Francisco, CA, USA, (1998), 372–377. [22] V. Mnih, K. Kavukcuoglu, D. Silver D, A. A. Rusu, J. Veness and M. G. Bellemare, Human-level control through deep reinforcement learning, Nature, 518 (2015), 529-533. [23] D. E. Moriarty, A. C. Schultz and J. J. Grefenstette, Evolutionary algorithms for reinforcement learning, J. Artif. Intell. Res., 11 (1999), 241-276. [24] A. Noglik, M. Muller and J. Pauli, Application of a heuristic function in reinforcement learning of an agent, In HYCAS 2009 1st International Workshop on Hybrid Control of Autonomous Systems –-Integrating Learning, Deliberation and Reactive Control, hycas.org, (2009), 41–48. [25] A. Pavitsos A and E. G. Kyriakidis, Markov decision models for the optimal maintenance of a production unit with an upstream buffer, Comput. Oper. RES., 36 (2009), 1993-2006. [26] S. Rawat, Stock Market Prediction Using Reinforcement Learning, M. S thesis, Utah State University in Logan, 2005. [27] J. Smoller, Shock Waves and Reaction-Diffusion Equations, 2$^{nd}$ edition, Springer-Verlag, New York, 1994. doi: 10.1007/978-1-4612-0873-0. [28] F. A. Van der Duyn Schouten and S. G. Vanneste, Maintenance optimization of a production system with buffer capacity, Eur. J. Oper. Res., 82 (1995), 323-338. [29] N. A. Vien, N. H. Viet, S. G. Lee and T. C. Chung, Heuristic search based exploration in reinforcement learning, Proceedings of the 9th International Work Conference on Artificial Neural Networks, San Sebastian, Spain, (2007), 110–118. [30] G. Wang and S. Mahadevan, Hierarchical optimization of policy-coupled semi-Markov decision processes, 16th International Conference on Machine Learning, San Francisco, CA, USA, (1999), 464–473. [31] C. J. C. H. Watkins, Learning from Delayed Rewards, Ph.D thesis, University of Cambridge in Cambridge, 1989. [32] W. Xing W and J. Xie, Modern Optimization Calculation Method, Tsinghua University Press, Beijing, 1999. [33] Z. Yang and C. Qi, Preventive maintenance policy for multi-yield deteriorating equipment based on reinforcement learning, Systems Engineering - Theory and Practice, 33 (2013), 1647-1653. [34] N. L. Zhang and W. J. Si, Deep reinforcement learning for condition-based maintenance planning of multi-component systems under dependent competing risks, Reliab. Eng. Syst. Safe., 203 (2020). [35] Y. Zhou, B. Li and T. R. Lin, Maintenance optimisation of multicomponent systems using hierarchical coordinated reinforcement learning, Reliab. Eng. Syst. Safe., 217 (2022).

show all references

##### References:
 [1] M. Abramson and H. Wechsler, Tabu search exploration for on-policy reinforcement learning, Proceedings of the International Joint Conference on Neural Networks, Portland, OR, USA, (2003), 2910–2915. [2] N. Aissani, A. Bekrar, D. Trentesaux and B. Beldjilali, Dynamic scheduling for multi-site companies: A decisional approach based on reinforcement multi-agent learning, J. Intell. Manuf., 23 (2012), 2513-2529. [3] C. P. Andriotis and K. G. Papakonstantinou, Managing engineering systems with large state and action spaces through deep reinforcement learning, Reliab. Eng. Syst. Safe., 191 (2019). [4] R. A. C. Bianchi, C. H. C. Ribeiro and A. H. R. Costa, Accelerating autonomous learning by using heuristic selection of actions, J. Heuristics., 14 (2008), 135-168. [5] R. A. C. Bianchi, C. H. C. Ribeiro and A. H. R. Costa, Heuristic selection of actions in multi-agent reinforcement learning, Proceedings of the 20th International Joint Conference on Artificial Intelligence, San Francisco, CA, USA, 2007,690–696. [6] L. Busoniu, R. Babuska and B. De Schutter, A comprehensive survey of multi-agent reinforcement learning, Innovations in Multi-Ggent Systems and Applications, Stud. Comput. Intell., 310 (2010), 183-221.  doi: 10.1007/978-3-642-14435-6_7. [7] Y. L. Chen, C. C. Chang, Z. G. Zhang and X. Chen, Optimal preventive "maintenance-first or -last" policies with generalized imperfect maintenance models, J. Ind. Manag. Optim., 17 (2021), 501-516.  doi: 10.3934/jimo.2020149. [8] T. K. Das, A. Gosavi, S. Mahadevan and N. Marchalleck, Solving semi-markov decision problems using average reward reinforcement learning, Manage. Sci., 45 (1999), 560-574. [9] L. M. Gambardella and M. Dorigo, Ant-Q: A reinforcement learning approach to the traveling salesman problem, Proceedings of Twelfth International Conference on Machine Learning, San Francisco, CA, USA, (1995), 252–260. [10] X. Gu, W. Guo and X. Jin, Performance evaluation for manufacturing systems under control-limit maintenance policy, J. Manuf. Syst., 55 (2020), 221-232. [11] X. Han, Z. Wang, M. Xie, Y. He, Y. Li and W. Wang, Remaining useful life prediction and predictive maintenance strategies for multi-state manufacturing systems considering functional dependence, Reliab. Eng. Syst. Safe., 210 (2021). [12] J. Huang, Q. Chang and J. Arinez, Deep reinforcement learning based preventive maintenance policy for serial production lines, Expert Syst. Appl., 160 (2020). [13] L. Kang, Y. Xie and S. You, Non-numerical parallel algorithms, J. Heuristics., 14 (1998), 22-55. [14] S. Kapetanakis and D. Kudenko, Reinforcement learning of coordination in cooperative multi-agent systems, Eighteenth National Conference on Artificial Intelligence, American Association for Artificial Intelligence, Menlo Park, CA, USA, (2002), 326–331. [15] C. C. Karamatsoukis and E. G. Kyriakidis, Optimal maintenance of two stochastically deteriorating machines with an intermediate buffer, Eur. J. Oper. Res., 207 (2010), 297-308.  doi: 10.1016/j.ejor.2010.04.022. [16] S. Kirkpatrick, C. D. Gelatt and M. P. Vecchi, Optimization by simulated annealing, Science., 220 (1983), 671-680.  doi: 10.1126/science.220.4598.671. [17] E. G. Kyriakidis and T. D. Dimitrakos, Optimal preventive maintenance of a production system with an intermediate buffer, Eur. J. Oper. Res., 168 (2005), 86-99.  doi: 10.1016/j.ejor.2004.01.052. [18] Q. Liu, M. Dong, F. Chen, W. Liu and C. Ye, Multi-objective imperfect maintenance optimization for production system with an intermediate buffer, J. Manuf. Syst., 56 (2020), 452-462. [19] Y. Liu, Y. M. Chen and T. Jiang, Dynamic selective maintenance optimization for multi-state systems over a finite horizon: A deep reinforcement learning approach, Eur. J. Oper. Res., 283 (2020), 166-181.  doi: 10.1016/j.ejor.2019.10.049. [20] Y. Liu, Y. M. Chen and T. Jiang, Imperfect maintenance, Eur. J. Oper. Res., 94 (1996), 425-438. [21] S. Mahadevan and G. Theocharous, Optimizing production manufacturing using reinforcement learning, Proceedings of the Eleventh International Florida Artificial Intelligence Research Society Conference, San Francisco, CA, USA, (1998), 372–377. [22] V. Mnih, K. Kavukcuoglu, D. Silver D, A. A. Rusu, J. Veness and M. G. Bellemare, Human-level control through deep reinforcement learning, Nature, 518 (2015), 529-533. [23] D. E. Moriarty, A. C. Schultz and J. J. Grefenstette, Evolutionary algorithms for reinforcement learning, J. Artif. Intell. Res., 11 (1999), 241-276. [24] A. Noglik, M. Muller and J. Pauli, Application of a heuristic function in reinforcement learning of an agent, In HYCAS 2009 1st International Workshop on Hybrid Control of Autonomous Systems –-Integrating Learning, Deliberation and Reactive Control, hycas.org, (2009), 41–48. [25] A. Pavitsos A and E. G. Kyriakidis, Markov decision models for the optimal maintenance of a production unit with an upstream buffer, Comput. Oper. RES., 36 (2009), 1993-2006. [26] S. Rawat, Stock Market Prediction Using Reinforcement Learning, M. S thesis, Utah State University in Logan, 2005. [27] J. Smoller, Shock Waves and Reaction-Diffusion Equations, 2$^{nd}$ edition, Springer-Verlag, New York, 1994. doi: 10.1007/978-1-4612-0873-0. [28] F. A. Van der Duyn Schouten and S. G. Vanneste, Maintenance optimization of a production system with buffer capacity, Eur. J. Oper. Res., 82 (1995), 323-338. [29] N. A. Vien, N. H. Viet, S. G. Lee and T. C. Chung, Heuristic search based exploration in reinforcement learning, Proceedings of the 9th International Work Conference on Artificial Neural Networks, San Sebastian, Spain, (2007), 110–118. [30] G. Wang and S. Mahadevan, Hierarchical optimization of policy-coupled semi-Markov decision processes, 16th International Conference on Machine Learning, San Francisco, CA, USA, (1999), 464–473. [31] C. J. C. H. Watkins, Learning from Delayed Rewards, Ph.D thesis, University of Cambridge in Cambridge, 1989. [32] W. Xing W and J. Xie, Modern Optimization Calculation Method, Tsinghua University Press, Beijing, 1999. [33] Z. Yang and C. Qi, Preventive maintenance policy for multi-yield deteriorating equipment based on reinforcement learning, Systems Engineering - Theory and Practice, 33 (2013), 1647-1653. [34] N. L. Zhang and W. J. Si, Deep reinforcement learning for condition-based maintenance planning of multi-component systems under dependent competing risks, Reliab. Eng. Syst. Safe., 203 (2020). [35] Y. Zhou, B. Li and T. R. Lin, Maintenance optimisation of multicomponent systems using hierarchical coordinated reinforcement learning, Reliab. Eng. Syst. Safe., 217 (2022).
A 2M1B assembly line system
Flow chart of heuristic accelerated cost-sharing-RL method
The statistical box diagram of learning times of the three methods
Change curve of the system average cost rate
Numerical empirical parameters
 M$_1$ M$_2$ Production time per part: gamma distr $(A,B)^1$ (18, 0.05) (22, 0.05) Mean time to failure: gamma distr $(A,B)$ (300, 0.3) (300, 0.3) Time for preventive maintenance: uniform distr $(a,b)^2$ (10, 20) (10, 20) Time for corrective repair: gamma distr $(A,B)$ (300, 0.2) (300, 0.2) Production costs 2 3 Preventive maintenance costs 80 80 Corrective repair costs 800 800 Scrapped unit loss costs 2 3 Production lost cost for machine M$_2$ 30 1 For a gamma distribution characterized by (A, B), the mean is A × B.  2 For a uniform distribution characterized by (a, b), the mean is given by (a + b)/2.
 M$_1$ M$_2$ Production time per part: gamma distr $(A,B)^1$ (18, 0.05) (22, 0.05) Mean time to failure: gamma distr $(A,B)$ (300, 0.3) (300, 0.3) Time for preventive maintenance: uniform distr $(a,b)^2$ (10, 20) (10, 20) Time for corrective repair: gamma distr $(A,B)$ (300, 0.2) (300, 0.2) Production costs 2 3 Preventive maintenance costs 80 80 Corrective repair costs 800 800 Scrapped unit loss costs 2 3 Production lost cost for machine M$_2$ 30 1 For a gamma distribution characterized by (A, B), the mean is A × B.  2 For a uniform distribution characterized by (a, b), the mean is given by (a + b)/2.
System average cost rate based on three search technology methods
 HAMSL NS SAS System average cost rate 14.127 12.842 12.431
 HAMSL NS SAS System average cost rate 14.127 12.842 12.431
The $t$-Test for a comparison (the significance level for these tests is 5%)
 (a) (b) (c) HAMRL NS HAMRL SAS SAS NS Mean updating steps 21480.74 26696.58 21480.74 32399.68 32399.68 26696.58 Variance 75659236.2 127955422.3 75659236.2 207746426.1 207746426.1 127955422.3 Number of cases 100 100 100 100 100 100 $T$-value -3.6553 -6.486 3.1127 Critical 1.972 1.972 1.972
 (a) (b) (c) HAMRL NS HAMRL SAS SAS NS Mean updating steps 21480.74 26696.58 21480.74 32399.68 32399.68 26696.58 Variance 75659236.2 127955422.3 75659236.2 207746426.1 207746426.1 127955422.3 Number of cases 100 100 100 100 100 100 $T$-value -3.6553 -6.486 3.1127 Critical 1.972 1.972 1.972
Percentage improvement of the HAMRL algorithm versus other methods
 NS SAS Min Max average Min Max average 5.15 31.69 18.42 21.49 43.93 32.71 1) Percentage improvement over $X$ = ($X$ performance - HAMSL performance)/$X$ performance $\times$100%.  2) $X$ represents NS, and SAS based algorithm with a 95% confidence level.
 NS SAS Min Max average Min Max average 5.15 31.69 18.42 21.49 43.93 32.71 1) Percentage improvement over $X$ = ($X$ performance - HAMSL performance)/$X$ performance $\times$100%.  2) $X$ represents NS, and SAS based algorithm with a 95% confidence level.
The statistical results of updating times of $Q$-value
 Min Median Max Mean Std dev Ave cost rate Heuristic function 6865 19873 39511 21480 8698 14.127 NS 9085 25118 51859 26696 11311 12.842 SAS 8527 30916 60013 32399 14413 12.431
 Min Median Max Mean Std dev Ave cost rate Heuristic function 6865 19873 39511 21480 8698 14.127 NS 9085 25118 51859 26696 11311 12.842 SAS 8527 30916 60013 32399 14413 12.431
Maintenance actions of different buffer inventory numbers in different states for device M$_1$
 1 2 3 4 5 6 7 8 0 0 0 0 1 1 0 0 0 0 1 2 0 0 0 0 1 3 0 0 0 0 1 4 0 0 0 0 1 5 0 0 0 0 0 1 6 0 0 0 0 0 1 7 0 0 0 0 0 1 8 0 0 0 0 0 1
 1 2 3 4 5 6 7 8 0 0 0 0 1 1 0 0 0 0 1 2 0 0 0 0 1 3 0 0 0 0 1 4 0 0 0 0 1 5 0 0 0 0 0 1 6 0 0 0 0 0 1 7 0 0 0 0 0 1 8 0 0 0 0 0 1
Maintenance actions of different buffer inventory numbers in different states for device M$_2$
 1 2 3 4 5 6 7 8 0 0 0 0 0 0 0 1 1 0 0 0 0 0 1 2 0 0 0 0 1 3 0 0 0 0 1 4 0 0 0 0 1 5 0 0 0 0 1 6 0 0 0 1 7 0 0 0 1 8 0 0 1
 1 2 3 4 5 6 7 8 0 0 0 0 0 0 0 1 1 0 0 0 0 0 1 2 0 0 0 0 1 3 0 0 0 0 1 4 0 0 0 0 1 5 0 0 0 0 1 6 0 0 0 1 7 0 0 0 1 8 0 0 1
Maintenance strategies for the device M$_1$ and M$_2$
 Buffer inventory level $x$ 0 1 2 3 4 5 6 7 8 Maintenance strategies $\{S_{M_1},S_{M_2}\}$ {4, 7} {5, 6} {5, 5} {5, 5} {5, 5} {6, 5} {6, 4} {6, 4} {6, 3}
 Buffer inventory level $x$ 0 1 2 3 4 5 6 7 8 Maintenance strategies $\{S_{M_1},S_{M_2}\}$ {4, 7} {5, 6} {5, 5} {5, 5} {5, 5} {6, 5} {6, 4} {6, 4} {6, 3}
 [1] Richard Carney, Monique Chyba, Chris Gray, George Wilkens, Corey Shanbrom. Multi-agent systems for quadcopters. Journal of Geometric Mechanics, 2022, 14 (1) : 1-28. doi: 10.3934/jgm.2021005 [2] Giulia Cavagnari, Antonio Marigonda, Benedetto Piccoli. Optimal synchronization problem for a multi-agent system. Networks and Heterogeneous Media, 2017, 12 (2) : 277-295. doi: 10.3934/nhm.2017012 [3] Dieudonné Nijimbere, Songzheng Zhao, Xunhao Gu, Moses Olabhele Esangbedo, Nyiribakwe Dominique. Tabu search guided by reinforcement learning for the max-mean dispersion problem. Journal of Industrial and Management Optimization, 2021, 17 (6) : 3223-3246. doi: 10.3934/jimo.2020115 [4] Zhongqiang Wu, Zongkui Xie. A multi-objective lion swarm optimization based on multi-agent. Journal of Industrial and Management Optimization, 2022  doi: 10.3934/jimo.2022001 [5] Seung-Yeal Ha, Dohyun Kim, Jaeseung Lee, Se Eun Noh. Emergent dynamics of an orientation flocking model for multi-agent system. Discrete and Continuous Dynamical Systems, 2020, 40 (4) : 2037-2060. doi: 10.3934/dcds.2020105 [6] Nadia Loy, Andrea Tosin. Boltzmann-type equations for multi-agent systems with label switching. Kinetic and Related Models, 2021, 14 (5) : 867-894. doi: 10.3934/krm.2021027 [7] Mei Luo, Jinrong Wang, Yumei Liao. Bounded consensus of double-integrator stochastic multi-agent systems. Discrete and Continuous Dynamical Systems - S, 2022  doi: 10.3934/dcdss.2022088 [8] Brendan Pass. Multi-marginal optimal transport and multi-agent matching problems: Uniqueness and structure of solutions. Discrete and Continuous Dynamical Systems, 2014, 34 (4) : 1623-1639. doi: 10.3934/dcds.2014.34.1623 [9] Zhiyong Sun, Toshiharu Sugie. Identification of Hessian matrix in distributed gradient-based multi-agent coordination control systems. Numerical Algebra, Control and Optimization, 2019, 9 (3) : 297-318. doi: 10.3934/naco.2019020 [10] Rui Li, Yingjing Shi. Finite-time optimal consensus control for second-order multi-agent systems. Journal of Industrial and Management Optimization, 2014, 10 (3) : 929-943. doi: 10.3934/jimo.2014.10.929 [11] Tyrone E. Duncan. Some partially observed multi-agent linear exponential quadratic stochastic differential games. Evolution Equations and Control Theory, 2018, 7 (4) : 587-597. doi: 10.3934/eect.2018028 [12] Xi Zhu, Meixia Li, Chunfa Li. Consensus in discrete-time multi-agent systems with uncertain topologies and random delays governed by a Markov chain. Discrete and Continuous Dynamical Systems - B, 2020, 25 (12) : 4535-4551. doi: 10.3934/dcdsb.2020111 [13] Zhongkui Li, Zhisheng Duan, Guanrong Chen. Consensus of discrete-time linear multi-agent systems with observer-type protocols. Discrete and Continuous Dynamical Systems - B, 2011, 16 (2) : 489-505. doi: 10.3934/dcdsb.2011.16.489 [14] Yibo Zhang, Jinfeng Gao, Jia Ren, Huijiao Wang. A type of new consensus protocol for two-dimension multi-agent systems. Numerical Algebra, Control and Optimization, 2017, 7 (3) : 345-357. doi: 10.3934/naco.2017022 [15] Hongru Ren, Shubo Li, Changxin Lu. Event-triggered adaptive fault-tolerant control for multi-agent systems with unknown disturbances. Discrete and Continuous Dynamical Systems - S, 2021, 14 (4) : 1395-1414. doi: 10.3934/dcdss.2020379 [16] Ke Yang, Wencheng Zou, Zhengrong Xiang, Ronghao Wang. Fully distributed consensus for higher-order nonlinear multi-agent systems with unmatched disturbances. Discrete and Continuous Dynamical Systems - S, 2021, 14 (4) : 1535-1551. doi: 10.3934/dcdss.2020396 [17] Hong Man, Yibin Yu, Yuebang He, Hui Huang. Design of one type of linear network prediction controller for multi-agent system. Discrete and Continuous Dynamical Systems - S, 2019, 12 (4&5) : 727-734. doi: 10.3934/dcdss.2019047 [18] Xiaojin Huang, Hongfu Yang, Jianhua Huang. Consensus stability analysis for stochastic multi-agent systems with multiplicative measurement noises and Markovian switching topologies. Numerical Algebra, Control and Optimization, 2022, 12 (3) : 601-610. doi: 10.3934/naco.2021024 [19] Shengyang Jia, Lei Deng, Quanwu Zhao, Yunkai Chen. An adaptive large neighborhood search heuristic for multi-commodity two-echelon vehicle routing problem with satellite synchronization. Journal of Industrial and Management Optimization, 2022  doi: 10.3934/jimo.2021225 [20] Ziju Shen, Yufei Wang, Dufan Wu, Xu Yang, Bin Dong. Learning to scan: A deep reinforcement learning approach for personalized scanning in CT imaging. Inverse Problems and Imaging, 2022, 16 (1) : 179-195. doi: 10.3934/ipi.2021045

2021 Impact Factor: 1.411