doi: 10.3934/mfc.2022014
Online First

Online First articles are published articles within a journal that have not yet been assigned to a formal issue. This means they do not yet have a volume number, issue number, or page numbers assigned to them, however, they can still be found and cited using their DOI (Digital Object Identifier). Online First publication benefits the research community by making new scientific discoveries known as quickly as possible.

Readers can access Online First articles via the “Online First” tab for the selected journal.

Adaptive attitude determination of bionic polarization integrated navigation system based on reinforcement learning strategy

1. 

School of Information Science and Technology, North China University of Technology, Beijing 100144, China

2. 

Laboratory of Measurement and Control of Complex Systems of Engineering, Ministry of Education, Southeast University, Nanjing, China, 210096

*Corresponding author: Tao Du

Received  March 2022 Revised  April 2022 Early access May 2022

The bionic polarization integrated navigation system includes three-axis gyroscopes, three-axis accelerometers, three-axis magnetometers, and polarization sensors, which provide pitch, roll, and yaw. When the magnetometers are interfered or the polarization sensors are obscured, the accuracy of attitude will be decreased due to abnormal measurement. To improve the accuracy of attitude of the integrated navigation system under these complex environments, an adaptive complementary filter based on DQN (Deep Q-learning Network) is proposed. The complementary filter is first designed to fuse the measurements from the gyroscopes, accelerometers, magnetometers, and polarization sensors. Then, a reward function of the bionic polarization integrated navigation system is defined as the function of the absolute value of the attitude angle error. The action-value function is introduced by a fully-connected network obtained by historical sensor data training. The strategy can be calculated by the deep Q-learning network and the action that optimal action-value function is obtained. Based on the optimized action, three types of integration are switched automatically to adapt to the different environments. Three cases of simulations are conducted to validate the effectiveness of the proposed algorithm. The results show that the adaptive attitude determination of bionic polarization integrated navigation system based on DQN can improve the accuracy of the attitude estimation.

Citation: Huiyi Bao, Tao Du, Luyue Sun. Adaptive attitude determination of bionic polarization integrated navigation system based on reinforcement learning strategy. Mathematical Foundations of Computing, doi: 10.3934/mfc.2022014
References:
[1]

R. Wehner, Desert ant navigation: How miniature brains solve complex tasks, J. Comparative Physiology A, 189 (2003), 579-588.  doi: 10.1007/s00359-003-0431-1.

[2]

S. M. ReppertH. Zhu and R. H. White, Polarized light helps monarch butterflies navigate, Current Biology, 14 (2004), 155-158.  doi: 10.1016/j.cub.2003.12.034.

[3]

R. Muheim, Behavioural and physiological mechanisms of polarized light sensitivity in birds, Philos. Trans. Roy. Soc. B Bio. Sci., 366 (2011), 763-771.  doi: 10.1098/rstb.2010.0196.

[4]

J. ChuZ. WangL. Guan and Z. Liu, Integrated polarization dependent photodetector and its application for polarization navigation, IEEE Photonics Tech. Lett., 26 (2014), 469-472.  doi: 10.1109/LPT.2013.2296945.

[5]

D. LambrinosH. KobayashiR. PfeiferM. Maris and T. Labhart, An autonomous agent navigating with a polarized light compass, Adaptive Behavior, 6 (1997), 131-161.  doi: 10.1177/105971239700600104.

[6]

J. ChuK. ZhaoQ. Zhang and T. Wang, Construction and performance test of a novel polarization sensor for navigation, Sensors Actuators A Phys., 148 (2008), 75-82.  doi: 10.1016/j.sna.2008.07.016.

[7]

D. WangH. LiangH. Zhu and S. Zhang, A bionic camera-based polarization navigation sensor, Sensors, 14 (2014), 13006-13023.  doi: 10.3390/s140713006.

[8]

J. Chahl and A. Mizutani, Biomimetic attitude and orientation sensors, IEEE Sensors J., 12 (2012), 289-297.  doi: 10.1109/JSEN.2010.2078806.

[9]

T. DuY. H. ZengJ. YangC. Z. Tian and P. F. Bai, Multi-sensor fusion SLAM approach for the mobile robot with a bio-inspired polarised skylight sensor, IET Radar Sonar Navigation, 14 (2020), 1950-1957.  doi: 10.1049/iet-rsn.2020.0260.

[10]

C. J. C. H. Watkins and P. Dayan, Q-learning, Machine Learning, 8 (1992), 279-292.  doi: 10.1007/BF00992698.

[11]

V. MnihK. KavukcuogluD. SilverA. A. Rusu and J. Veness, Human-level control through deep reinforcement learning, Nature, 518 (2015), 529-533.  doi: 10.1038/nature14236.

[12]

T. Hester, M. Vecerik, O. Pietquin, M. Lanctot and T. Schaul, et al., Deep q-learning from demonstrations, in Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018, 3223–3230.

[13]

Q. ZhangT. Du and C. Tian, A Sim2real method based on DDQN for training a self-driving scale car, Math. Found. Comput., 2 (2019), 315-331.  doi: 10.3934/mfc.2019020.

[14]

J. Fan, Z. Wang, Y. Xie and Z. Yang, A theoretical analysis of deep Q-learning, preprint, 2020, arXiv: 1901.00137.

[15]

J. YangT. DuX. LiuB. Niu and L. Guo, Method and implementation of a bioinspired polarization-based attitude and heading reference system by integration of polarization compass and inertial sensors, IEEE Trans. Industrial Electron., 67 (2020), 9802-9812.  doi: 10.1109/TIE.2019.2952799.

[16]

G. ShaniD. Heckerman and R. I. Brafman, An MDP-based recommender system, J. Mach. Learn. Res., 6 (2005), 1265-1295. 

[17]

S. James and E. Johns, 3D simulation for robot arm control with deep Q-learning, preprint, 2016, arXiv: 1609.03759.

[18]

J. L. Crassidis and J. L. Junkins, Optimal estimation of dynamic systems, Chapman & Hall/CRC Applied Mathematics and Nonlinear Science Series, 2, Chapman & Hall/CRC, Boca Raton, FL, 2004. doi: 10.1201/9780203509128.

[19]

S. de Marco, M.-D. Hua, T. Hamel and C. Samson, Position, velocity, attitude and accelerometer-bias estimation from IMU and bearing measurements, 2020 European Control Conference (ECC), St. Petersburg, Russia, 2020. doi: 10.23919/ECC51009.2020.9143918.

[20]

R. L. Farrenkopf, Analytic steady-state accuracy solutions for two common spacecraft attitude estimators, J. Guidance Control, 1 (1978), 282-284.  doi: 10.2514/3.55779.

[21]

S. O. H. Madgwick, A. J. L. Harrison and R. Vaidyanathan, Estimation of IMU and MARG orientation using a gradient descent algorithm, IEEE International Conference on Rehabilitation Robotics, Zurich, Switzerland, 2011. doi: 10.1109/ICORR.2011.5975346.

[22]

T. DuC. TianJ. YangS. WangX. Liu and L. Guo, An autonomous initial alignment and observability analysis for SINS with bio-inspired polarized skylight sensors, IEEE Sensors J., 20 (2020), 7941-7956.  doi: 10.1109/JSEN.2020.2981171.

[23]

R. MahonyT. Hamel and J.-M. Pflimlin, Nonlinear complementary filters on the special orthogonal group, IEEE Trans. Automat. Control, 53 (2008), 1203-1218.  doi: 10.1109/TAC.2008.923738.

[24]

V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves and I. Antonoglou, et al., Playing atari with deep reinforcement learning, preprint, 2013, arXiv: 1312.5602.

show all references

References:
[1]

R. Wehner, Desert ant navigation: How miniature brains solve complex tasks, J. Comparative Physiology A, 189 (2003), 579-588.  doi: 10.1007/s00359-003-0431-1.

[2]

S. M. ReppertH. Zhu and R. H. White, Polarized light helps monarch butterflies navigate, Current Biology, 14 (2004), 155-158.  doi: 10.1016/j.cub.2003.12.034.

[3]

R. Muheim, Behavioural and physiological mechanisms of polarized light sensitivity in birds, Philos. Trans. Roy. Soc. B Bio. Sci., 366 (2011), 763-771.  doi: 10.1098/rstb.2010.0196.

[4]

J. ChuZ. WangL. Guan and Z. Liu, Integrated polarization dependent photodetector and its application for polarization navigation, IEEE Photonics Tech. Lett., 26 (2014), 469-472.  doi: 10.1109/LPT.2013.2296945.

[5]

D. LambrinosH. KobayashiR. PfeiferM. Maris and T. Labhart, An autonomous agent navigating with a polarized light compass, Adaptive Behavior, 6 (1997), 131-161.  doi: 10.1177/105971239700600104.

[6]

J. ChuK. ZhaoQ. Zhang and T. Wang, Construction and performance test of a novel polarization sensor for navigation, Sensors Actuators A Phys., 148 (2008), 75-82.  doi: 10.1016/j.sna.2008.07.016.

[7]

D. WangH. LiangH. Zhu and S. Zhang, A bionic camera-based polarization navigation sensor, Sensors, 14 (2014), 13006-13023.  doi: 10.3390/s140713006.

[8]

J. Chahl and A. Mizutani, Biomimetic attitude and orientation sensors, IEEE Sensors J., 12 (2012), 289-297.  doi: 10.1109/JSEN.2010.2078806.

[9]

T. DuY. H. ZengJ. YangC. Z. Tian and P. F. Bai, Multi-sensor fusion SLAM approach for the mobile robot with a bio-inspired polarised skylight sensor, IET Radar Sonar Navigation, 14 (2020), 1950-1957.  doi: 10.1049/iet-rsn.2020.0260.

[10]

C. J. C. H. Watkins and P. Dayan, Q-learning, Machine Learning, 8 (1992), 279-292.  doi: 10.1007/BF00992698.

[11]

V. MnihK. KavukcuogluD. SilverA. A. Rusu and J. Veness, Human-level control through deep reinforcement learning, Nature, 518 (2015), 529-533.  doi: 10.1038/nature14236.

[12]

T. Hester, M. Vecerik, O. Pietquin, M. Lanctot and T. Schaul, et al., Deep q-learning from demonstrations, in Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018, 3223–3230.

[13]

Q. ZhangT. Du and C. Tian, A Sim2real method based on DDQN for training a self-driving scale car, Math. Found. Comput., 2 (2019), 315-331.  doi: 10.3934/mfc.2019020.

[14]

J. Fan, Z. Wang, Y. Xie and Z. Yang, A theoretical analysis of deep Q-learning, preprint, 2020, arXiv: 1901.00137.

[15]

J. YangT. DuX. LiuB. Niu and L. Guo, Method and implementation of a bioinspired polarization-based attitude and heading reference system by integration of polarization compass and inertial sensors, IEEE Trans. Industrial Electron., 67 (2020), 9802-9812.  doi: 10.1109/TIE.2019.2952799.

[16]

G. ShaniD. Heckerman and R. I. Brafman, An MDP-based recommender system, J. Mach. Learn. Res., 6 (2005), 1265-1295. 

[17]

S. James and E. Johns, 3D simulation for robot arm control with deep Q-learning, preprint, 2016, arXiv: 1609.03759.

[18]

J. L. Crassidis and J. L. Junkins, Optimal estimation of dynamic systems, Chapman & Hall/CRC Applied Mathematics and Nonlinear Science Series, 2, Chapman & Hall/CRC, Boca Raton, FL, 2004. doi: 10.1201/9780203509128.

[19]

S. de Marco, M.-D. Hua, T. Hamel and C. Samson, Position, velocity, attitude and accelerometer-bias estimation from IMU and bearing measurements, 2020 European Control Conference (ECC), St. Petersburg, Russia, 2020. doi: 10.23919/ECC51009.2020.9143918.

[20]

R. L. Farrenkopf, Analytic steady-state accuracy solutions for two common spacecraft attitude estimators, J. Guidance Control, 1 (1978), 282-284.  doi: 10.2514/3.55779.

[21]

S. O. H. Madgwick, A. J. L. Harrison and R. Vaidyanathan, Estimation of IMU and MARG orientation using a gradient descent algorithm, IEEE International Conference on Rehabilitation Robotics, Zurich, Switzerland, 2011. doi: 10.1109/ICORR.2011.5975346.

[22]

T. DuC. TianJ. YangS. WangX. Liu and L. Guo, An autonomous initial alignment and observability analysis for SINS with bio-inspired polarized skylight sensors, IEEE Sensors J., 20 (2020), 7941-7956.  doi: 10.1109/JSEN.2020.2981171.

[23]

R. MahonyT. Hamel and J.-M. Pflimlin, Nonlinear complementary filters on the special orthogonal group, IEEE Trans. Automat. Control, 53 (2008), 1203-1218.  doi: 10.1109/TAC.2008.923738.

[24]

V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves and I. Antonoglou, et al., Playing atari with deep reinforcement learning, preprint, 2013, arXiv: 1312.5602.

Figure 1.  Illustration of DQN
Figure 2.  Variation of geomagnetic field intensity under geomagnetic interference
Figure 3.  Comparison decision action under geomagnetic interference
Figure 4.  Attitude estimation errors under geomagnetic interference
Figure 5.  Variation of polarization angle under polarization interference
Figure 6.  Comparison decision action under polarization interference
Figure 7.  Attitude estimation errors under polarization interference
Figure 8.  (a) Polarization angle under polarization interference. (b) Geomagnetic field intensity under geomagnetic interference
Figure 9.  Comparison of decision-making actions when the magnetometer is disturbed and polarization is blocked
Figure 10.  Attitude estimation errors under the magnetometer is disturbed and polarization is blocked
Table 1.  Standard deviation of attitude angle for decision comparison in experiment 1
Method Pitch(°) Roll(°) Yaw(°)
Complementary filter without DQN 0.4958 0.5057 0.5866
Complementary filter with DQN 0.4790 0.5022 0.3540
Method Pitch(°) Roll(°) Yaw(°)
Complementary filter without DQN 0.4958 0.5057 0.5866
Complementary filter with DQN 0.4790 0.5022 0.3540
Table 2.  Standard deviation of attitude angle for decision comparison in experiment 2
Method Pitch(°) Roll(°) Yaw(°)
Complementary filter without DQN 0.5450 0.5078 0.4700
Complementary filter with DQN 0.4640 0.5031 0.4241
Method Pitch(°) Roll(°) Yaw(°)
Complementary filter without DQN 0.5450 0.5078 0.4700
Complementary filter with DQN 0.4640 0.5031 0.4241
Table 3.  Standard deviation of attitude angle for decision comparison in experiment 3
Method Pitch(°) Roll(°) Yaw(°)
Complementary filter without DQN 0.4966 0.5052 0.6005
Complementary filter with DQN 0.4735 0.5031 0.3822
Method Pitch(°) Roll(°) Yaw(°)
Complementary filter without DQN 0.4966 0.5052 0.6005
Complementary filter with DQN 0.4735 0.5031 0.3822
[1]

Jingang Zhao, Chi Zhang. Finite-horizon optimal control of discrete-time linear systems with completely unknown dynamics using Q-learning. Journal of Industrial and Management Optimization, 2021, 17 (3) : 1471-1483. doi: 10.3934/jimo.2020030

[2]

Andreas Bock, Colin J. Cotter. Learning landmark geodesics using the ensemble Kalman filter. Foundations of Data Science, 2021, 3 (4) : 701-727. doi: 10.3934/fods.2021020

[3]

Christopher Oballe, David Boothe, Piotr J. Franaszczuk, Vasileios Maroulas. ToFU: Topology functional units for deep learning. Foundations of Data Science, 2021  doi: 10.3934/fods.2021021

[4]

Richard Archibald, Feng Bao, Yanzhao Cao, He Zhang. A backward SDE method for uncertainty quantification in deep learning. Discrete and Continuous Dynamical Systems - S, 2022, 15 (10) : 2807-2835. doi: 10.3934/dcdss.2022062

[5]

Ziju Shen, Yufei Wang, Dufan Wu, Xu Yang, Bin Dong. Learning to scan: A deep reinforcement learning approach for personalized scanning in CT imaging. Inverse Problems and Imaging, 2022, 16 (1) : 179-195. doi: 10.3934/ipi.2021045

[6]

Andrea Arnold, Daniela Calvetti, Erkki Somersalo. Vectorized and parallel particle filter SMC parameter estimation for stiff ODEs. Conference Publications, 2015, 2015 (special) : 75-84. doi: 10.3934/proc.2015.0075

[7]

Martin Benning, Elena Celledoni, Matthias J. Ehrhardt, Brynjulf Owren, Carola-Bibiane Schönlieb. Deep learning as optimal control problems: Models and numerical methods. Journal of Computational Dynamics, 2019, 6 (2) : 171-198. doi: 10.3934/jcd.2019009

[8]

Nicholas Geneva, Nicholas Zabaras. Multi-fidelity generative deep learning turbulent flows. Foundations of Data Science, 2020, 2 (4) : 391-428. doi: 10.3934/fods.2020019

[9]

Miria Feng, Wenying Feng. Evaluation of parallel and sequential deep learning models for music subgenre classification. Mathematical Foundations of Computing, 2021, 4 (2) : 131-143. doi: 10.3934/mfc.2021008

[10]

Govinda Anantha Padmanabha, Nicholas Zabaras. A Bayesian multiscale deep learning framework for flows in random media. Foundations of Data Science, 2021, 3 (2) : 251-303. doi: 10.3934/fods.2021016

[11]

Suhua Wang, Zhiqiang Ma, Hongjie Ji, Tong Liu, Anqi Chen, Dawei Zhao. Personalized exercise recommendation method based on causal deep learning: Experiments and implications. STEM Education, 2022, 2 (2) : 157-172. doi: 10.3934/steme.2022011

[12]

Alessandro Scagliotti. Deep Learning approximation of diffeomorphisms via linear-control systems. Mathematical Control and Related Fields, 2022  doi: 10.3934/mcrf.2022036

[13]

Aude Hofleitner, Tarek Rabbani, Mohammad Rafiee, Laurent El Ghaoui, Alex Bayen. Learning and estimation applications of an online homotopy algorithm for a generalization of the LASSO. Discrete and Continuous Dynamical Systems - S, 2014, 7 (3) : 503-523. doi: 10.3934/dcdss.2014.7.503

[14]

Laura Martín-Fernández, Gianni Gilioli, Ettore Lanzarone, Joaquín Míguez, Sara Pasquali, Fabrizio Ruggeri, Diego P. Ruiz. A Rao-Blackwellized particle filter for joint parameter estimation and biomass tracking in a stochastic predator-prey system. Mathematical Biosciences & Engineering, 2014, 11 (3) : 573-597. doi: 10.3934/mbe.2014.11.573

[15]

Bingyan Liu, Xiongbing Ye, Xianzhou Dong, Lei Ni. Branching improved Deep Q Networks for solving pursuit-evasion strategy solution of spacecraft. Journal of Industrial and Management Optimization, 2022, 18 (2) : 1223-1245. doi: 10.3934/jimo.2021016

[16]

João Borges de Sousa, Bernardo Maciel, Fernando Lobo Pereira. Sensor systems on networked vehicles. Networks and Heterogeneous Media, 2009, 4 (2) : 223-247. doi: 10.3934/nhm.2009.4.223

[17]

Tudor Barbu. Deep learning-based multiple moving vehicle detection and tracking using a nonlinear fourth-order reaction-diffusion based multi-scale video object analysis. Discrete and Continuous Dynamical Systems - S, 2022  doi: 10.3934/dcdss.2022083

[18]

Thanh-Tung Pham, Thomas Green, Jonathan Chen, Phuong Truong, Aditya Vaidya, Linda Bushnell. A salinity sensor system for estuary studies. Networks and Heterogeneous Media, 2009, 4 (2) : 381-392. doi: 10.3934/nhm.2009.4.381

[19]

Cem Güneri, Ferruh Özbudak, Funda ÖzdemIr. On complementary dual additive cyclic codes. Advances in Mathematics of Communications, 2017, 11 (2) : 353-357. doi: 10.3934/amc.2017028

[20]

Alexandr A. Zevin, Mark A. Pinsky. Qualitative analysis of periodic oscillations of an earth satellite with magnetic attitude stabilization. Discrete and Continuous Dynamical Systems, 2000, 6 (2) : 293-297. doi: 10.3934/dcds.2000.6.293

 Impact Factor: 

Article outline

Figures and Tables

[Back to Top]