# Stereo visual odometry based on dynamic and static features division

• * Corresponding author: Guangbin Cai

The first author is mainly supported by NSSF of China under Grant (No. 61773387)

• Accurate camera pose estimation in dynamic scenes is an important challenge for visual simultaneous localization and mapping, and it is critical to reduce the effects of moving objects on pose estimation. To tackle this problem, a robust visual odometry approach in dynamic scenes is proposed, which can precisely distinguish between dynamic and static features. The key to the proposed method is combining the scene flow and the static features relative spatial distance invariance principle. Moreover, a new threshold is proposed to distinguish dynamic features.Then the dynamic features are eliminated after matching with the virtual map points. In addition, a new similarity calculation function is proposed to improve the performance of loop-closure detection. Finally, the camera pose is optimized after obtaining a closed loop. Experiments have been conducted on TUM datasets and actual scenes, which shows that the proposed method reduces tracking errors significantly and estimates the camera pose precisely in dynamic scenes.

Mathematics Subject Classification: Primary: 65D19, 65D18; Secondary: 68U10.

• Figure 1.  Stereo camera model

Figure 2.  Generation of a visual vocabulary tree

Figure 3.  Overview of the proposed algorithm in dynamic scenes

Figure 4.  Classification of the scene flow based on angles [26]

Figure 5.  Invariance of the relative spatial distance of the static points

Figure 6.  Construction of the virtual map points

Figure 7.  Three static features selected by the algorithm

Figure 8.  Dynamic features obtained by the algorithm

Figure 9.  Experiment scene sets

Figure 10.  Experimental results of ORB-VO in lab scenes

Figure 11.  Experimental results of the proposed method in lab scenes

Figure 12.  Loop-closure detection result of the inverse proportional function

Figure 13.  Loop-closure detection result of the negative exponential power function

Figure 14.  Loop-closure detection result of the negative exponential power function

Figure 15.  Comparisons between estimated trajectories and the ground truth in walking sequences

Figure 16.  Comparisons between estimated trajectories and the ground truth in sitting sequences

Table 1.  Translation drift and rotational drift of VO method on TUM dataset

 Sequences RMSE of translational drift [m/s] RMSE of rotational drift [$^{\circ}$/s] DVO BaMVO SPW-VO Our Method DVO BaMVO SPW-VO Our Method sitting-static 0.0157 0.0248 0.0231 0.0112 0.6084 0.6977 0.7228 0.3356 sitting-xyz 0.0453 0.0482 0.0219 0.0132 1.4980 1.3885 0.8466 0.5753 sitting-rpy 0.1735 0.1872 0.0843 0.0280 6.0164 5.9834 5.6258 0.6811 sitting-halfsphere 0.1005 0.0589 0.0389 0.0151 4.6490 2.8804 1.8836 0.6103 walking-static 0.3818 0.1339 0.0327 0.0293 6.3502 2.0833 0.8085 0.5500 walking-xyz 0.4360 0.2326 0.0651 0.1034 7.6669 4.3911 1.6442 2.3273 walking-rpy 0.4038 0.3584 0.2252 0.2143 7.0662 6.3898 5.6902 3.9555 walking-halfsphere 0.2628 0.1738 0.0527 0.1061 5.2179 4.2863 2.4048 2.2983

Table 2.  RMSE of the ATE of camera pose estimation (m$^{-1}$)

 Sequences ORB-SLAM2 MR-SLAM SPW-SLAM SF-SLAM Our Method sitting-static 0.0082 – – 0.0081 0.0073 sitting-xyz 0.0094 0.0482 0.0397 0.0101 0.0090 sitting-rpy 0.0197 – – 0.0180 0.0162 sitting-halfsphere 0.0211 0.0470 0.0432 0.0239 0.0164 walking-static 0.1028 0.0656 0.0261 0.0120 0.0108 walking-xyz 0.4278 0.0932 0.0601 0.2251 0.0884 walking-rpy 0.7407 0.1333 0.1791 0.1961 0.3620 walking-halfsphere 0.4939 0.1252 0.0489 0.0423 0.0411
