标签：right mathbf rbrack lbrack SchurVINS left

SchurVINS：基于Schur补的轻量级视觉惯性导航系统

Yunfei Fan, Tianyu Zhao, Guidong Wang

范云飞，赵天宇，王朝栋

ByteDance

字节跳动

{frank.01, zhaotianyu.1998, guidong.wang}@bytedance.com

Abstract

摘要

Accuracy and computational efficiency are the most important metrics to Visual Inertial Navigation System (VINS). The existing VINS algorithms with either high accuracy or low computational complexity, are difficult to provide the high precision localization in resource-constrained devices. To this end, we propose a novel filter-based VINS framework named SchurVINS (SV), which could guarantee both high accuracy by building a complete residual model and low computational complexity with Schur complement. Technically, we first formulate the full residual model where Gradient, Hessian and observation covariance are explicitly modeled. Then Schur complement is employed to decompose the full model into ego-motion residual model and landmark residual model. Finally, Extended Kalman Filter (EKF) update is implemented in these two models with high efficiency. Experiments on EuRoC and TUM-VI datasets show that our method notably outperforms state-of-the-art (SOTA) methods in both accuracy and computational complexity. The experimental code of SchurVINS is available at https://github.com/bytedance/SchurVINS.

准确性和计算效率是视觉惯性导航系统（VINS）最重要的性能指标。现有的VINS算法要么具有较高的准确性，要么具有较低的计算复杂度，但在资源受限的设备上难以提供高精度的定位。为此，我们提出了一种名为SchurVINS（SV）的新型滤波器基础上的VINS框架，该框架能够通过构建完整的残差模型保证高准确性，并通过Schur补实现低计算复杂度。在技术层面，我们首先构建了完整的残差模型，其中梯度、Hessian矩阵和观测协方差都被显式建模。然后，使用Schur补将完整模型分解为自我运动残差模型和地标残差模型。最后，在这些两个模型中高效地实施扩展卡尔曼滤波器（EKF）更新。在EuRoC和TUM-VI数据集上的实验表明，我们的方法在准确性和计算复杂度上都显著优于现有最先进的方法。SchurVINS的实验代码可在 https://github.com/bytedance/SchurVINS 获取。

1. Introduction

1. 引言

High-precision localization technologies have become a cornerstone in various industrial fields, playing an indispensable role particularly in robotics, augmented reality (AR), and virtual reality (VR). In recent decades, visual inertial navigation system (VINS) has attracted significant attentions due to its advantages of low-cost and ubiquitousness. Composed of only cameras and inertial measurement units (IMU), the VINS module can provide six-degree-of-freedom (6-DOF) positioning as accurate as expensive sensors such as Lidar, and is more competent in being installed in portable devices like smartphone and micro aerial vehicles (MAV).

高精度定位技术已成为各个工业领域的基础，尤其在机器人技术、增强现实（AR）和虚拟现实（VR）中发挥着不可或缺的作用。在过去的几十年中，视觉惯性导航系统（VINS）因其低成本和普遍存在的优势而受到了广泛关注。VINS模块仅由摄像头和惯性测量单元（IMU）组成，能够提供与昂贵传感器如激光雷达（Lidar）一样精确的六自由度（6-DOF）定位，并且更适合安装在如智能手机和微型飞行器（MAV）等便携式设备上。

It has been reported that kinds of excellent open-source VINS algorithms could achieve high-precision pose estimation, which mainly includes two methodologies: optimization-based and filter-based methods. Typical optimization-based methods $\left\lbrack {4,{17},{21},{24},{33},{34}}\right\rbrack$ model poses and the corresponding observed landmarks jointly. Benefitting from Schur complement technique [1], this high-dimensional model with special sparsity could be solved efficiently by bundle adjustment (BA [32]). In theory [11], although notable in high-precision of localization, optimization-based methods may suffer from high computational complexity. In contrast, main-stream filter-based methods $\left\lbrack {2,7,{10},{30}}\right\rbrack$ derived from MSCKF [22] utilize the left nullspace method to simplify the residual model. EKF [29] update is then executed on the simplified residual model to estimate corresponding poses. Finally, they achieve high efficiency but compromise accuracy, since landmarks are not optimized with camera poses jointly and all observations are utilized only once. To sum up, optimization-based methods are advantageous in accuracy while filter-based methods are more efficient.

报道称，各种优秀的开源VINS算法能够实现高精度的姿态估计，主要包括两种方法：基于优化和基于滤波的方法。典型的基于优化的方法 $\left\lbrack {4,{17},{21},{24},{33},{34}}\right\rbrack$ 同时建模姿态和相应的观测地标。得益于Schur补技术 [1]，这种具有特殊稀疏性的高维模型可以通过束调整（BA [32]）高效地解决。理论上 [11]，尽管基于优化的方法在定位高精度方面表现显著，但可能遭受高计算复杂度的问题。相比之下，主流的基于滤波的方法 $\left\lbrack {2,7,{10},{30}}\right\rbrack$ 源于MSCKF [22]，使用左零空间方法简化残差模型。然后执行EKF [29] 更新以简化残差模型估计相应的姿态。最终，它们以牺牲精度为代价实现高效率，因为地标不是与相机姿态共同优化的，所有观测仅使用一次。总之，基于优化的方法在精度上有优势，而基于滤波的方法则更有效率。

Figure 1. Comparison of run time, CPU usage and RMSE evaluated on EuRoC dataset. Different shapes and colors indicate different methods and precision, respectively.

图1. 在EuRoC数据集上评估的运行时间、CPU使用率和均方根误差（RMSE）的比较。不同的形状和颜色分别表示不同的方法和精度。

Therefore, it is urgent to develop a framework combines their high precision and efficiency. As discussed above, traditional residual model without simplification can achieve high accuracy. In spite of this, when both landmarks and poses are incorporated into the state vector for joint estimation, the efficiency of EKF-SLAM significantly decreases [22]. Inspired by the Schur comple-ment in optimization-based methods, we make full use of the sparse structure inherent in the high-dimensional residual model constructed with poses and landmarks to achieve high efficiency in EKF. Thus, an EKF-based VINS framework that achieves both high efficiency and accuracy is presented. In the framework, the equivalent residual model, consisting of gradient, Hessian and the corresponding observation covariance, is derived based on the traditional residual model. Taking the special sparse structure of Hessian into account, Schur complement is carried out to break the equivalent residual equation into two smaller equations: equivalent pose residual model and equivalent landmark residual model. The equivalent landmark residual model is able to be further split into a collection of small equivalent residual models due to its own sparse structure. Finally, EKF update is implemented with the derived equivalent residual model to estimate the poses and corresponding landmarks jointly. As shown in Fig. 1, the resulting framework outperforms SOTA methods in latency, computational complexity and accuracy. Our main contributions are summarized as follows:

因此，迫切需要开发一个框架，将它们的高精度和效率结合起来。如上所述，传统的未简化的残差模型能够达到高准确性。尽管如此，当将地标和姿态同时纳入状态向量进行联合估计时，EKF-SLAM的效率显著降低[22]。受到基于优化方法的Schur补的概念启发，我们充分利用了由姿态和地标构建的高维残差模型固有的稀疏结构，在EKF中实现高效率。因此，我们提出了一个基于EKF的VINS框架，该框架既具有高效率，又具有高准确性。在框架中，基于传统残差模型，导出了包含梯度、Hessian矩阵以及相应的观测协方差等价残差模型。考虑到Hessian矩阵的特殊稀疏结构，使用Schur补将等价残差方程分解为两个较小的方程：等价姿态残差模型和等价地标残差模型。由于等价地标残差模型自身的稀疏结构，它可以进一步分解为一系列小的等价残差模型。最终，使用导出的等价残差模型进行EKF更新，以联合估计姿态和相应的地标。如图1所示，该框架在延迟、计算复杂度和准确性方面均优于现有最先进的方法。我们的主要贡献概括如下：

An equivalent residual model is proposed to deal with hyper high-dimension observations, which consists of gradient, Hessian and the corresponding observation covariance. This method is of great generality in EKF systems.
提出了一个等价残差模型来处理超高维观测值，该模型包括梯度、Hessian矩阵以及相应的观测协方差。这种方法在EKF系统中具有很高的通用性。
A lightweight EKF-based landmark solver is proposed to estimate position of landmarks with high efficiency.
提出了一个基于EKF的轻量级地标求解器，用于高效估计地标的位置。
A novel EKF-based VINS framework is developed to achieve ego-motion and landmark estimation simultaneously with high accuracy and efficiency. The experimental code is published to benefit community.
开发了一种新颖的基于EKF的VINS框架，能够同时实现自运动和地标估计，具有高准确性和效率。实验代码已公开发布，以惠及社区。

2. 相关工作

Improving the efficiency and accuracy is an ongoing effort for VINS algorithms. To date, significant research has been carried out to reduce the computational complexity and improve the precision.

提高效率和精度是VINS算法持续努力的方向。迄今为止，已经进行了大量研究以降低计算复杂性和提高精度。

Many VINS algorithms focus on efficiency improvement. Some studies reuse the intermediate results of previous optimization to decrease the amount of repetitive computation [14-16, 21]. While these approaches may yield a slight loss in accuracy, the computational process can be notably accelerated. Some other studies try to achieve high efficiency through engineering technologies. In [23, 36], efficient Hessian construction and Schur complement calculation is employed to improve cache efficiency and avoid redundant matrix representation. In $\left\lbrack {6,{35}}\right\rbrack$ ,variables are declared by single precision instead of traditional double precision to speed up the algorithm.

许多VINS算法专注于效率的提升。一些研究重用前一次优化的中间结果来减少重复计算量 [14-16, 21]。虽然这些方法可能会导致精度略有损失，但计算过程可以显著加快。还有一些研究尝试通过工程技术实现高效率。在 [23, 36] 中，采用了高效的海森矩阵构建和舒尔补运算来提高缓存效率并避免冗余矩阵表示。在 $\left\lbrack {6,{35}}\right\rbrack$ 中，变量使用单精度声明，而不是传统的双精度，以加快算法速度。

Besides efficiency, some studies concentrate on improving the accuracy. In $\left\lbrack {{12},{13},{20}}\right\rbrack$ ,high accuracy is guaranteed through improving the consistency in EKF-based VINS. Some improved MSCKF namely Hybrid MSCKF [10, 18] (combined MSCKF and EKF-SLAM), proposed in recent to balance efficiency and accuracy, model informative landmarks selectively as part of their state variables to estimate jointly [19]. Some researchers construct the local bundle adjustment (LBA) running on other threads to reduce drift $\left\lbrack {4,9}\right\rbrack$ . However,LBA requires massive computational resources which might not be practical for implementation on small devices.

除了效率之外，一些研究集中在提高精度上。在 $\left\lbrack {{12},{13},{20}}\right\rbrack$ 中，通过提高基于EKF的VINS的一致性来保证高精度。一些改进的MSCKF，即混合MSCKF [10, 18]（结合MSCKF和EKF-SLAM），最近提出以平衡效率和精度，选择性地将信息丰富的地标作为状态变量的一部分进行联合估计 [19]。一些研究人员构建了在其它线程上运行的局部束调整（LBA）以减少漂移 $\left\lbrack {4,9}\right\rbrack$。然而，LBA需要大量的计算资源，这可能不适合在小型设备上实施。

3. SchurVINS Framework

3. SchurVINS框架

In this paper, the proposed SchurVINS is developed based on open-source SVO2.0 [8, 9] with stereo configuration, in which sliding window based EKF back-end is employed to replace the original back-end in SVO2.0, and EKF-based landmark solver is utilized to replace the original landmark optimizer. The framework of SchurVINS algorithm and the relationship between SVO and SchurVINS are shown in Fig. 2.

在本文中，提出的SchurVINS是基于开源的SVO2.0 [8, 9] 并具有立体配置开发的，其中滑动窗口基于EKF后端被用来替换SVO2.0中的原始后端，并且使用基于EKF的地标求解器替换原始的地标优化器。SchurVINS算法的框架以及SVO与SchurVINS之间的关系如图2所示。

3.1. State Definition

3.1. 状态定义

Normally, for a traditional EKF-based VINS system [7, 10, 20], the basic IMU state is defined as:

通常，对于一个传统的基于EKF的VINS系统 [7, 10, 20]，基本的IMU状态定义如下：

\[{\mathbf{x}}_{I} = {\left\lbrack \begin{array}{lllll} {}_{I}^{G}{\mathbf{q}}^{\top } & {}^{G}{\mathbf{p}}_{I}{}^{\top } & {}^{G}{\mathbf{v}}_{I}{}^{\top } & {\mathbf{b}}_{a}^{\top } & {\mathbf{b}}_{g}^{\top } \end{array}\right\rbrack }^{\top } \tag{1} \]

where $\{ \mathrm{G}\} ,\{ \mathrm{I}\}$ and $\{ \mathrm{C}\}$ are the global frame,local frame and camera frame,respectively. ${}^{G}{\mathbf{p}}_{I}$ and ${}^{G}{\mathbf{v}}_{I}$ are the position and velocity of IMU expressed in $\{ \mathbf{G}\}$ ,respectively. ${}_{I}^{G}\mathbf{q}$ represents the rotation quaternion from $\{ \mathrm{I}\}$ to $\{ \mathrm{G}\}$ (in this paper, quaternion obeys Hamilton rules [29]). The vectors ${\mathbf{b}}_{a}$ and ${\mathbf{b}}_{g}$ individually represent the biases of the angular velocity and linear acceleration measured by the IMU device. And thus the corresponding EKF error-state of ${\mathbf{x}}_{I}$ is defined as Eq. (2)

其中 $\{ \mathrm{G}\} ,\{ \mathrm{I}\}$ 和 $\{ \mathrm{C}\}$ 分别是全局坐标系、局部坐标系和相机坐标系。${}^{G}{\mathbf{p}}_{I}$ 和 ${}^{G}{\mathbf{v}}_{I}$ 分别表示在 $\{ \mathbf{G}\}$ 中表示的IMU的位置和速度。${}_{I}^{G}\mathbf{q}$ 表示从 $\{ \mathrm{I}\}$ 到 $\{ \mathrm{G}\}$ 的旋转四元数（在本文中，四元数遵循Hamilton规则 [29]）。向量 ${\mathbf{b}}_{a}$ 和 ${\mathbf{b}}_{g}$ 分别代表IMU设备测量的角速度和线性加速度的偏差。因此，${\mathbf{x}}_{I}$ 的相应EKF误差状态定义为式（2）。

\[{\widetilde{\mathbf{x}}}_{I} = {\left\lbrack \begin{array}{lllll} {}_{I}^{G}{\widetilde{\mathbf{\theta }}}^{\top } & {}^{G}{\widetilde{\mathbf{p}}}_{I}{}^{\top } & {}^{G}{\widetilde{\mathbf{v}}}_{I}{}^{\top } & {\widetilde{\mathbf{b}}}_{a}{}^{\top } & {\widetilde{\mathbf{b}}}_{g}{}^{\top } \end{array}\right\rbrack }^{\top } \tag{2} \]

where, ${}_{I}^{G}\widetilde{\mathbf{\theta }}$ represents the error-state of ${}_{I}^{G}\mathbf{q}$ . Except for quaternion, other states can be used with standard additive error (e.g. $\mathbf{x} = \widehat{\mathbf{x}} + \widetilde{\mathbf{x}}$ ). Similar to [29],the extended additive error of quaternion is defined as Eq. (3) (in this paper, quaternion error is defined in frame $\{ G\}$ )

其中，${}_{I}^{G}\widetilde{\mathbf{\theta }}$ 表示 ${}_{I}^{G}\mathbf{q}$ 的误差状态。除四元数外，其他状态可以使用标准的加性误差（例如 $\mathbf{x} = \widehat{\mathbf{x}} + \widetilde{\mathbf{x}}$ ）。与 [29] 类似，四元数的扩展加性误差定义为式（3）（本文中，四元数误差定义在 $\{ G\}$ 坐标系中）。

\[{\mathbf{q}}_{I}^{G} = {\delta }_{I}^{G}\mathbf{q} \otimes {}_{I}^{G}\widehat{\mathbf{q}},\;{\delta }_{I}^{G}\mathbf{q} = {\left\lbrack \begin{array}{ll} 1 & \frac{1}{2}{\delta }_{I}^{G}\widetilde{\mathbf{\theta }} \end{array}\right\rbrack }^{\top } \tag{3} \]

Similarly, the extended additive error of rotation matrix is defined as Eq. (4)

同样，旋转矩阵的扩展加性误差定义为式（4）。

\[\mathbf{R}\left( {{}_{I}^{G}\mathbf{q}}\right) = {}_{I}^{G}\mathbf{R},\;{}_{I}^{G}\mathbf{R} = \left( {\mathbf{I} + {\left\lbrack {}_{I}^{G}\widetilde{\mathbf{\theta }}\right\rbrack }_{ \times }}\right) {}_{I}^{G}\widehat{\mathbf{R}} \tag{4} \]

Figure 2. Framework of SchurVINS,which shows the relationship between SVO and SchurVINS. ${P}_{1}$ to ${P}_{m}$ represent the valid landmarks of the surrounding environment which are employed to construct residual model.

图2. SchurVINS框架，展示了SVO与SchurVINS之间的关系。${P}_{1}$ 到 ${P}_{m}$ 表示周围环境中的有效地标，用于构建残差模型。

3.2. Propagation and Augmentation

3.2. 传播与增强

SchurVINS follows the policy introduced in [29] on state propagation. The time evolution of IMU states are described as

SchurVINS遵循[29]中引入的状态传播策略。IMU状态的时间演化描述为

\[{}_{I}^{G}\dot{\widehat{\mathbf{q}}} = \frac{1}{2}{}_{I}^{G}\widehat{\mathbf{q}} \otimes \Omega \left( \widehat{\mathbf{\omega }}\right) ,\;\Omega \left( \widehat{\mathbf{\omega }}\right) = \left( \begin{matrix} 0 & - {\widehat{\mathbf{\omega }}}^{\top } \\ \widehat{\mathbf{\omega }} & - {\left\lbrack \widehat{\mathbf{\omega }}\right\rbrack }_{ \times } \end{matrix}\right) \tag{5} \]

\[{\dot{\widehat{\mathbf{b}}}}_{g} = {\mathbf{0}}_{3 \times 1},\;{\dot{\widehat{\mathbf{b}}}}_{a} = {\mathbf{0}}_{3 \times 1} \tag{6} \]

\[{}^{G}{\dot{\widehat{\mathbf{p}}}}_{I} = {}^{G}{\mathbf{v}}_{I},\;{}^{G}{\dot{\widehat{\mathbf{v}}}}_{I} = {}_{I}^{G}\widehat{\mathbf{R}}\widehat{\mathbf{a}} + {}^{G}\mathbf{g} \tag{7} \]

where $\widehat{\mathbf{\omega }} = {\mathbf{\omega }}_{m} - {\widehat{\mathbf{b}}}_{g}$ and $\widehat{\mathbf{a}} = {\mathbf{a}}_{m} - {\widehat{\mathbf{b}}}_{a}$ are IMU measurements with biases discarded. where ${\left\lbrack \widehat{\omega }\right\rbrack }_{ \times }$ is skew symmetric matrix of $\widehat{\omega }$ . Based on Eqs.(5) to (7),the linearized continuous dynamics for the error IMU state is defined as

其中 $\widehat{\mathbf{\omega }} = {\mathbf{\omega }}_{m} - {\widehat{\mathbf{b}}}_{g}$ 和 $\widehat{\mathbf{a}} = {\mathbf{a}}_{m} - {\widehat{\mathbf{b}}}_{a}$ 是去除了偏差的IMU测量值。其中 ${\left\lbrack \widehat{\omega }\right\rbrack }_{ \times }$ 是 $\widehat{\omega }$ 的反对称矩阵。基于式（5）至式（7），为误差IMU状态定义了线性化连续动力学。

\[{\dot{\widetilde{\mathbf{x}}}}_{I} = \mathbf{F}{\widetilde{\mathbf{x}}}_{I} + \mathbf{G}{\mathbf{n}}_{I} \tag{8} \]

where ${\mathbf{n}}_{I} = {\left\lbrack \begin{array}{llll} {\mathbf{n}}_{a}{}^{\top } & {\mathbf{n}}_{a\omega }{}^{\top } & {\mathbf{n}}_{g}{}^{\top } & {\mathbf{n}}_{g\omega }{}^{\top } \end{array}\right\rbrack }^{\top }$ . Vectors ${\mathbf{n}}_{a}$ and ${\mathbf{n}}_{g}$ represent the Gaussian noise of the accelerometer and gyroscope measurement,while ${\mathbf{n}}_{a\omega }$ and ${\mathbf{n}}_{g\omega }$ are the random walk rate of the accelerometer and gyroscope measurement biases. $\mathbf{F}$ and $\mathbf{G}$ are defined as

其中 ${\mathbf{n}}_{I} = {\left\lbrack \begin{array}{llll} {\mathbf{n}}_{a}{}^{\top } & {\mathbf{n}}_{a\omega }{}^{\top } & {\mathbf{n}}_{g}{}^{\top } & {\mathbf{n}}_{g\omega }{}^{\top } \end{array}\right\rbrack }^{\top }$ 。向量 ${\mathbf{n}}_{a}$ 和 ${\mathbf{n}}_{g}$ 分别代表加速度计和陀螺仪测量的高斯噪声，而 ${\mathbf{n}}_{a\omega }$ 和 ${\mathbf{n}}_{g\omega }$ 是加速度计和陀螺仪测量偏差的随机游走率。$\mathbf{F}$ 和 $\mathbf{G}$ 定义为

\[\mathbf{F} = \left\lbrack \begin{matrix} {\mathbf{0}}_{3 \times 3} & {\mathbf{0}}_{3 \times 3} & {\mathbf{0}}_{3 \times 3} & {\mathbf{0}}_{3 \times 3} & - {}_{I}^{G}\mathbf{R} \\ {\mathbf{0}}_{3 \times 3} & {\mathbf{0}}_{3 \times 3} & {\mathbf{I}}_{3 \times 3} & {\mathbf{0}}_{3 \times 3} & {\mathbf{0}}_{3 \times 3} \\ - {\left\lbrack {}_{I}^{G}\mathbf{R}\widehat{\mathbf{a}}\right\rbrack }_{ \times } & {\mathbf{0}}_{3 \times 3} & {\mathbf{0}}_{3 \times 3} & - {}_{I}^{G}\mathbf{R} & {\mathbf{0}}_{3 \times 3} \\ {\mathbf{0}}_{6 \times 3} & {\mathbf{0}}_{6 \times 3} & {\mathbf{0}}_{6 \times 3} & {\mathbf{0}}_{6 \times 3} & {\mathbf{0}}_{6 \times 3} \end{matrix}\right\rbrack \tag{9} \]

\[\mathbf{G} = \left\lbrack \begin{matrix} {\mathbf{0}}_{3 \times 3} & {\mathbf{0}}_{3 \times 3} & - {}_{I}^{G}{\mathbf{R}}_{3 \times 3} & {\mathbf{0}}_{3 \times 3} \\ {\mathbf{0}}_{3 \times 3} & {\mathbf{0}}_{3 \times 3} & {\mathbf{0}}_{3 \times 3} & {\mathbf{0}}_{3 \times 3} \\ - {}_{I}^{G}{\mathbf{R}}_{3 \times 3} & {\mathbf{0}}_{3 \times 3} & {\mathbf{0}}_{3 \times 3} & {\mathbf{0}}_{3 \times 3} \\ {\mathbf{0}}_{3 \times 3} & {\mathbf{I}}_{3 \times 3} & {\mathbf{0}}_{3 \times 3} & {\mathbf{0}}_{3 \times 3} \\ {\mathbf{0}}_{3 \times 3} & {\mathbf{0}}_{3 \times 3} & {\mathbf{0}}_{3 \times 3} & {\mathbf{I}}_{3 \times 3} \end{matrix}\right\rbrack \tag{10} \]

${4}^{\text{th }}$ Runge-Kutta numerical integration method is employed in Eqs. (3) to (7) for propagating the estimated IMU state. Based on Eq. (8), the discrete time state transition matrix $\mathbf{\Phi }$ and discrete time noise covariance $\mathbf{Q}$ are formulated as follows:

${4}^{\text{th }}$ 在方程（3）至（7）中采用了龙格-库塔数值积分方法来传播估计的IMU状态。基于方程（8），离散时间状态转移矩阵 $\mathbf{\Phi }$ 和离散时间噪声协方差 $\mathbf{Q}$ 被构建如下：

\[\mathbf{\Phi } = {\mathbf{I}}_{{15} \times {15}} + \mathbf{F}{dt} + \frac{1}{2}{\mathbf{F}}^{2}d{t}^{2} + \frac{1}{6}{\mathbf{F}}^{3}d{t}^{3} \tag{11} \]

\[\mathbf{Q} = \mathbf{\Phi }\mathbf{G}{\mathbf{Q}}_{I}{\mathbf{G}}^{\top }{\mathbf{\Phi }}^{\top }{dt} \]

where ${\mathbf{Q}}_{I} = E\left\lbrack {{\mathbf{n}}_{I}{\mathbf{n}}_{I}{}^{\top }}\right\rbrack$ is the continuous time noise covariance matrix of the system. Hence, the formulations of covariance propagation are built as:

其中 ${\mathbf{Q}}_{I} = E\left\lbrack {{\mathbf{n}}_{I}{\mathbf{n}}_{I}{}^{\top }}\right\rbrack$ 是系统的连续时间噪声协方差矩阵。因此，协方差传播的公式构建为：

\[{\mathbf{P}}_{II} \leftarrow \mathbf{\Phi }{\mathbf{P}}_{II}{\mathbf{\Phi }}^{\top } + \mathbf{Q},\;{\mathbf{P}}_{IA} \leftarrow \mathbf{\Phi }{\mathbf{P}}_{IA} \tag{12} \]

The covariance $\mathbf{P}$ is partitioned as Eq. (13). ${\mathbf{P}}_{II}$ is the covariance of basic state. ${\mathbf{P}}_{IA}$ and ${\mathbf{P}}_{AI}$ is the covariance between basic state and augmented state. ${\mathbf{P}}_{AA}$ is covariance of the augmented state.

协方差 $\mathbf{P}$ 被划分为方程（13）。${\mathbf{P}}_{II}$ 是基本状态的协方差。${\mathbf{P}}_{IA}$ 和 ${\mathbf{P}}_{AI}$ 是基本状态与增广状态之间的协方差。${\mathbf{P}}_{AA}$ 是增广状态的协方差。

\[\mathbf{P} = \left\lbrack \begin{array}{ll} {\mathbf{P}}_{II} & {\mathbf{P}}_{IA} \\ {\mathbf{P}}_{IA}^{\top } & {\mathbf{P}}_{AA} \end{array}\right\rbrack \tag{13} \]

When a new image arrives,the current IMU pose ${\mathbf{x}}_{Ai} =$ ${\left\lbrack \begin{array}{ll} {}_{I}^{G}{\mathbf{q}}^{\top } & {}^{G}{\mathbf{p}}_{I} \end{array}\right\rbrack }^{\top }$ is augmented as well as its covariance. The augmentation formulations are:

当新的图像到达时，当前的IMU姿态 ${\mathbf{x}}_{Ai} =$ ${\left\lbrack \begin{array}{ll} {}_{I}^{G}{\mathbf{q}}^{\top } & {}^{G}{\mathbf{p}}_{I} \end{array}\right\rbrack }^{\top }$ 以及其协方差都会被增广。增广公式为：

\[\mathbf{X} = {\left\lbrack \begin{array}{lllll} {\mathbf{x}}_{I}{}^{\top } & {\mathbf{x}}_{A0}{}^{\top } & {\mathbf{x}}_{A1}{}^{\top } & \cdots & {\mathbf{x}}_{Ai}{}^{\top } \end{array}\right\rbrack }^{\top } \]

\[\mathbf{P} \leftarrow \left\lbrack \begin{matrix} \mathbf{P} & {\mathbf{P}}_{21}{}^{\top } \\ {\mathbf{P}}_{21} & {\mathbf{P}}_{22} \end{matrix}\right\rbrack \tag{14} \]

where ${\mathbf{P}}_{21} = {\mathbf{J}}_{a}\mathbf{P},{\mathbf{P}}_{22} = {\mathbf{J}}_{a}\mathbf{P}{\mathbf{J}}_{a}{}^{\top }$ . And ${\mathbf{J}}_{a}$ is the Jacobian of ${\widetilde{\mathbf{x}}}_{Ai}$ with respect to error states,which is defined as

其中 ${\mathbf{P}}_{21} = {\mathbf{J}}_{a}\mathbf{P},{\mathbf{P}}_{22} = {\mathbf{J}}_{a}\mathbf{P}{\mathbf{J}}_{a}{}^{\top }$ 。并且 ${\mathbf{J}}_{a}$ 是误差状态关于 ${\widetilde{\mathbf{x}}}_{Ai}$ 的雅可比矩阵，定义为

Figure 3. A schematic of our system for ten landmarks and the sliding window of size three shown in (a), and the Hessian or Covariance of different methods shown in (b)-(d). (b) shows our algorithm in which the covariance of every single landmark is independent from the entire covariance of poses in the sliding window. (c) demonstrates the Hessian of both landmarks and poses in the sliding window. (d) demonstrates traditional hybrid MSCKF with the Covariance of both selected landmarks and poses in the sliding window.

图3。我们系统的示意图，展示了十个地标和图（a）中的大小为三的滑动窗口，以及不同方法在（b）-（d）中的海森矩阵或协方差。（b）展示了我们的算法，其中每个单独地标的协方差独立于滑动窗口中姿态的整体协方差。（c）展示了滑动窗口中地标和姿态的海森矩阵。（d）展示了传统的混合MSCKF，其中展示了选择的滑动窗口内地标和姿态的协方差。

follows:

如下：

\[\mathbf{J} = \left\lbrack \begin{array}{lll} {\mathbf{I}}_{3 \times 3} & {\mathbf{0}}_{3 \times 3} & {\mathbf{0}}_{3 \times \left( {9 + {6N}}\right) } \\ {\mathbf{0}}_{3 \times 3} & {\mathbf{I}}_{3 \times 3} & {\mathbf{0}}_{3 \times \left( {9 + {6N}}\right) } \end{array}\right\rbrack \tag{15} \]

3.3. Schur Complement-Based State Update

3.3. 基于Schur补的状态更新

In the SchurVINS scheme, unlike MSCKF methods [10, 30], the EKF update is conducted based on all the successfully triangulated landmarks and their observations in the sliding window, which can eliminate the drift caused by state propagation in every single image timestamp interval as much as possible. For single observation, the reprojec-tion error ${\mathbf{r}}_{i,j}$ of camera measurement is formulated as:

在SchurVINS方案中，与MSCKF方法[10, 30]不同，EKF更新是基于滑动窗口中所有成功三角化的地标及其观测进行的，这可以尽可能多地消除由每个单独图像时间戳间隔中的状态传播引起的漂移。对于单个观测，相机测量的重投影误差${\mathbf{r}}_{i,j}$被公式化为：

\[{\mathbf{r}}_{i,j} = {\mathbf{z}}_{i,j} - {\widehat{\mathbf{z}}}_{i,j} \]

\[{\mathbf{r}}_{i,j} = {\mathbf{J}}_{x,i,j}\widetilde{\mathbf{X}} + {\mathbf{J}}_{f,i,j}{}^{G}{\widetilde{\mathbf{p}}}_{{f}_{j}} + {\mathbf{n}}_{i,j} \tag{16} \]

\[{\widehat{\mathbf{z}}}_{i,j} = \frac{1}{{}^{{C}_{i}}{\widehat{Z}}_{j}}\left\lbrack \begin{matrix} {}^{{C}_{i}}{\widehat{X}}_{j} \\ {}^{{C}_{i}}{\widehat{Y}}_{j} \end{matrix}\right\rbrack \]

where ${\mathbf{r}}_{i,j}$ and ${\mathbf{z}}_{i,j}$ are the reprojection error and the camera measurement of ${j}^{th}$ landmark at ${i}^{th}$ pose in sliding window,respectively,and ${\widehat{\mathbf{z}}}_{i,j}$ is the corresponding theoretical measurement formulated by estimated states. ${}^{{C}_{i}}{\mathbf{p}}_{j} =$ $\left\lbrack \begin{array}{lll} {}^{{C}_{i}}{\widehat{X}}_{j} & {}^{{C}_{i}}{\widehat{Y}}_{j} & {}^{{C}_{i}}{\widehat{Z}}_{j} \end{array}\right\rbrack$ is the landmark coordinate in camera pose of ${i}^{th}$ sliding window. ${\mathbf{n}}_{i,j}$ represents the corresponding measurement noise. $\widetilde{\mathbf{X}}$ and ${}^{G}{\widetilde{\mathbf{p}}}_{{f}_{j}}$ are respectively the state perturbation and landmark position perturbation. ${\mathbf{J}}_{x,i,j}$ and ${\mathbf{J}}_{f,i,j}$ are the Jacobians of residual with respect to system state and landmark position, respectively. The Jaco-bians are defined as follows:

其中${\mathbf{r}}_{i,j}$和${\mathbf{z}}_{i,j}$分别是滑动窗口中${j}^{th}$地标在${i}^{th}$姿态下的重投影误差和相机测量值，分别地，${\widehat{\mathbf{z}}}_{i,j}$是由估计状态得出的相应理论测量值。${}^{{C}_{i}}{\mathbf{p}}_{j} =$$\left\lbrack \begin{array}{lll} {}^{{C}_{i}}{\widehat{X}}_{j} & {}^{{C}_{i}}{\widehat{Y}}_{j} & {}^{{C}_{i}}{\widehat{Z}}_{j} \end{array}\right\rbrack$是${i}^{th}$滑动窗口中相机姿态下的地标坐标。${\mathbf{n}}_{i,j}$表示相应的测量噪声。$\widetilde{\mathbf{X}}$和${}^{G}{\widetilde{\mathbf{p}}}_{{f}_{j}}$分别是状态扰动和地标位置扰动。${\mathbf{J}}_{x,i,j}$和${\mathbf{J}}_{f,i,j}$分别是关于系统状态和地标位置的残差的雅可比矩阵。雅可比矩阵定义如下：

\[{\mathbf{J}}_{x,i,j} = \left\lbrack \begin{array}{lll} {\mathbf{0}}_{2 \times \left( {{15} + {6i}}\right) } & {\mathbf{J}}_{A} & {\mathbf{0}}_{2 \times 6\left( {N - i - 1}\right) } \end{array}\right\rbrack \]

\[{\mathbf{J}}_{A} = {\mathbf{J}}_{i,j}\left\lbrack {{}_{C}^{I}{\widehat{\mathbf{R}}}^{\top }{\left\lbrack {}^{{I}_{i}}{\widehat{\mathbf{p}}}_{{f}_{j}}\right\rbrack }_{ \times }{}_{{I}_{i}}^{G}{\widehat{\mathbf{R}}}^{\top }\; - {}_{{C}_{i}}^{G}{\widehat{\mathbf{R}}}^{\top }}\right\rbrack \tag{17} \]

\[{\mathbf{J}}_{f,i,j} = \left\lbrack {{\mathbf{J}}_{i,j}{}_{{C}_{i}}^{G}{\widehat{\mathbf{R}}}^{\top }}\right\rbrack \]

where, for convenience, we define the camera model using the pinhole model. Therefore, ${\mathbf{J}}_{i,j}$ is defined as:

其中，为了方便起见，我们使用针孔模型定义相机模型。因此，${\mathbf{J}}_{i,j}$被定义为：

\[{\mathbf{J}}_{i,j} = \frac{1}{{C}_{i}{\widehat{Z}}_{j}^{2}}\left\lbrack \begin{matrix} {}^{{C}_{i}}{\widehat{Z}}_{j} & 0 & - {}^{{C}_{i}}{\widehat{X}}_{j} \\ 0 & {}^{{C}_{i}}{\widehat{Z}}_{j} & - {}^{{C}_{i}}{\widehat{Y}}_{j} \end{matrix}\right\rbrack \tag{18} \]

Aiming at all the observations of landmarks in the sliding window, we can acquire the full residual model by stacking all the residual equations:

针对滑动窗口中所有地标观测，我们可以通过堆叠所有残差方程来获得完整的残差模型：

\[\mathbf{r} = \left\lbrack \begin{array}{ll} {\mathbf{J}}_{x} & {\mathbf{J}}_{f} \end{array}\right\rbrack \left\lbrack \begin{matrix} \widetilde{\mathbf{X}} \\ {}^{G}{\widetilde{\mathbf{p}}}_{f} \end{matrix}\right\rbrack + \mathbf{n} \tag{19} \]

where, $\mathbf{r}$ and $\left\lbrack \begin{array}{ll} {\mathbf{J}}_{x} & {\mathbf{J}}_{f} \end{array}\right\rbrack$ are respectively the stacked residual and stacked Jacobian. ${\mathbf{J}}_{x}$ and ${\mathbf{J}}_{f}$ are jacobian with respect to states and landmark positions,respectively. $\mathbf{n}$ is the stacked measurement noise,and the measurement covariance of $\mathbf{n}$ is $\mathbf{R} = \operatorname{diag}\left( {{u}^{2},{u}^{2},\cdots ,{u}^{2}}\right)$ ,where $u$ is the element of standard deviation of $\mathbf{n}$ .

其中，$\mathbf{r}$和$\left\lbrack \begin{array}{ll} {\mathbf{J}}_{x} & {\mathbf{J}}_{f} \end{array}\right\rbrack$分别是堆叠的残差和堆叠的雅可比矩阵。${\mathbf{J}}_{x}$和${\mathbf{J}}_{f}$分别是关于状态和地标位置的雅可比矩阵。$\mathbf{n}$是堆叠的测量噪声，$\mathbf{n}$的测量协方差是$\mathbf{R} = \operatorname{diag}\left( {{u}^{2},{u}^{2},\cdots ,{u}^{2}}\right)$，其中$u$是$\mathbf{n}$的标准偏差元素。

Unlike $\left\lbrack {7,{10},{30}}\right\rbrack$ ,in this paper,the residual model Eq. (19) is projected into the jacobian space ${\left\lbrack \begin{array}{ll} {\mathbf{J}}_{x} & {\mathbf{J}}_{f} \end{array}\right\rbrack }^{\top }$ for formulating equivalent residual equations, which consist of gradient and hessian and observation covariance shown in Eqs. (20) and (21) below. It is worth highlighting that this strategy is an alternative to QR decomposition strategy [22] for speeding-up in any EKF systems with high-dimensional measurements.

与 $\left\lbrack {7,{10},{30}}\right\rbrack$ 不同，在本文中，残差模型式（19）被投影到雅可比空间 ${\left\lbrack \begin{array}{ll} {\mathbf{J}}_{x} & {\mathbf{J}}_{f} \end{array}\right\rbrack }^{\top }$ 中，以构建等价残差方程，该方程包含梯度、海森矩阵和观测协方差，如下面的式（20）和（21）所示。值得强调的是，此策略是加速任何具有高维测量的EKF系统的QR分解策略 [22] 的替代方法。

\[\left\lbrack \begin{matrix} {\mathbf{J}}_{x}{}^{\top } \\ {\mathbf{J}}_{f}{}^{\top } \end{matrix}\right\rbrack \mathbf{r} = \left\lbrack \begin{matrix} {\mathbf{J}}_{x}{}^{\top } \\ {\mathbf{J}}_{f}{}^{\top } \end{matrix}\right\rbrack \left\lbrack \begin{array}{ll} {\mathbf{J}}_{x} & {\mathbf{J}}_{f} \end{array}\right\rbrack \left\lbrack \begin{matrix} \widetilde{\mathbf{X}} \\ {}^{G}{\widetilde{\mathbf{P}}}_{f} \end{matrix}\right\rbrack + {\mathbf{n}}^{\prime } \tag{20} \]

\[{\mathbf{R}}^{\prime } = \left\lbrack \begin{matrix} {\mathbf{J}}_{x}{}^{\top } \\ {\mathbf{J}}_{f}{}^{\top } \end{matrix}\right\rbrack \mathbf{R}\left\lbrack \begin{array}{ll} {\mathbf{J}}_{x} & {\mathbf{J}}_{f} \end{array}\right\rbrack \tag{21} \]

where ${\mathbf{n}}^{\prime }$ and ${\mathbf{R}}^{\prime }$ are the equivalent observation noise and covariance, respectively. Obviously, Eqs. (20) and (21) could be simplified as:

其中 ${\mathbf{n}}^{\prime }$ 和 ${\mathbf{R}}^{\prime }$ 分别是等价观测噪声和协方差。显然，式（20）和（21）可以简化为：

\[\underset{\left\lbrack \begin{matrix} {\mathbf{b}}_{1} \\ {\mathbf{b}}_{2} \end{matrix}\right\rbrack }{\underbrace{\left\lbrack \begin{matrix} {{\mathbf{J}}_{x}}^{\top }\mathbf{r} \\ {{\mathbf{J}}_{f}}^{\top }\mathbf{r} \end{matrix}\right\rbrack }} = \underset{\left\lbrack \begin{matrix} {\mathbf{C}}_{1} & {\mathbf{C}}_{2} \\ {{\mathbf{C}}_{2}}^{\top } & {\mathbf{C}}_{3} \end{matrix}\right\rbrack }{\underbrace{\left\lbrack \begin{matrix} {{\mathbf{J}}_{x}}^{\top }{\mathbf{J}}_{x} & {{\mathbf{J}}_{x}}^{\top }{\mathbf{J}}_{f} \\ {{\mathbf{J}}_{f}}^{\top }{\mathbf{J}}_{x} & {{\mathbf{J}}_{f}}^{\top }{\mathbf{J}}_{f} \end{matrix}\right\rbrack }}\left\lbrack \begin{matrix} \widetilde{\mathbf{X}} \\ {}^{W}{\widetilde{\mathbf{P}}}_{f} \end{matrix}\right\rbrack + \underset{\left\lbrack \begin{matrix} {\mathbf{n}}_{1}^{\prime } \\ {\mathbf{n}}_{2}^{\prime } \end{matrix}\right\rbrack }{\underbrace{{\mathbf{n}}^{\prime }}} \tag{22} \]

\[{\mathbf{R}}^{\prime } = \left\lbrack \begin{array}{ll} {\mathbf{J}}_{x}{}^{\top }{\mathbf{J}}_{x} & {\mathbf{J}}_{x}{}^{\top }{\mathbf{J}}_{f} \\ {\mathbf{J}}_{f}{}^{\top }{\mathbf{J}}_{x} & {\mathbf{J}}_{f}{}^{\top }{\mathbf{J}}_{f} \end{array}\right\rbrack {u}^{2} \tag{23} \]

Since ${}^{G}{\widetilde{\mathbf{P}}}_{f}$ is not included in the states in Eq. (14),it is necessary to employ Schur complement [28] on Eqs. (20) and (21) to marginalize the implicit states. To be straightforward,Eqs. (22) and (23) should be projected into $\mathbf{L}$ space as Eqs. (24) and (25).

由于 ${}^{G}{\widetilde{\mathbf{P}}}_{f}$ 没有包含在式（14）的状态中，因此有必要在式（20）和（21）上应用Schur补 [28] 来边缘化隐含状态。直白地说，式（22）和（23）应该投影到 $\mathbf{L}$ 空间，形成式（24）和（25）。

\[\mathbf{L}\left\lbrack \begin{array}{l} {\mathbf{b}}_{1} \\ {\mathbf{b}}_{2} \end{array}\right\rbrack = \mathbf{L}\left\lbrack \begin{matrix} {\mathbf{C}}_{1} & {\mathbf{C}}_{2} \\ {\mathbf{C}}_{2}{}^{\top } & {\mathbf{C}}_{3} \end{matrix}\right\rbrack \left\lbrack \begin{matrix} \widetilde{\mathbf{X}} \\ {}^{W}{\widetilde{\mathbf{P}}}_{f} \end{matrix}\right\rbrack + \left\lbrack \begin{matrix} {\mathbf{n}}_{1}^{\prime \prime } \\ {\mathbf{n}}_{2}^{\prime \prime } \end{matrix}\right\rbrack \tag{24} \]

\[{\mathbf{R}}^{\prime \prime } = \mathbf{L}\left\lbrack \begin{matrix} {\mathbf{C}}_{1} & {\mathbf{C}}_{2} \\ {\mathbf{C}}_{2}{}^{\top } & {\mathbf{C}}_{3} \end{matrix}\right\rbrack {\mathbf{L}}^{\top }{u}^{2} = \left\lbrack \begin{matrix} {\mathbf{R}}_{1}^{\prime \prime } & \mathbf{0} \\ \mathbf{0} & {\mathbf{R}}_{2}^{\prime \prime } \end{matrix}\right\rbrack \tag{25} \]

Figure 4. The experimental trajectory and point cloud of SchurVINS on TUM-VI and EuRoC datasets.

图4. SchurVINS在TUM-VI和EuRoC数据集上的实验轨迹和点云。

where ${\left\lbrack \begin{array}{ll} {\mathbf{n}}_{1}^{\prime \prime \mathrm{T}} & {\mathbf{n}}_{2}^{\prime \prime \mathrm{T}} \end{array}\right\rbrack }^{\mathrm{T}}$ and ${\mathbf{R}}^{\prime \prime }$ are the derived observation noise and covariance. And $\mathbf{L}$ is defined as:

其中 ${\left\lbrack \begin{array}{ll} {\mathbf{n}}_{1}^{\prime \prime \mathrm{T}} & {\mathbf{n}}_{2}^{\prime \prime \mathrm{T}} \end{array}\right\rbrack }^{\mathrm{T}}$ 和 ${\mathbf{R}}^{\prime \prime }$ 是推导出的观测噪声和协方差。而 $\mathbf{L}$ 定义为：

\[\mathbf{L} = \left\lbrack \begin{matrix} \mathbf{I} & - {\mathbf{C}}_{2}{\mathbf{C}}_{3}^{-1} \\ \mathbf{0} & \mathbf{I} \end{matrix}\right\rbrack \tag{26} \]

Substituting Eq. (26) into Eqs. (24) and (25) yields the simplified formulations:

将式（26）代入式（24）和（25）得到简化公式：

\[\left\lbrack \begin{matrix} {\mathbf{b}}_{1} - {\mathbf{C}}_{2}{\mathbf{C}}_{3}^{-1}{\mathbf{b}}_{2} \\ {\mathbf{b}}_{2} \end{matrix}\right\rbrack = \mathbf{C}\left\lbrack \begin{matrix} \widetilde{\mathbf{X}} \\ {}^{W}{\widetilde{\mathbf{P}}}_{f} \end{matrix}\right\rbrack + \left\lbrack \begin{matrix} {\mathbf{n}}_{1}^{\prime \prime } \\ {\mathbf{n}}_{2}^{\prime \prime } \end{matrix}\right\rbrack \tag{27} \]

\[{\mathbf{R}}^{\prime \prime } = \left\lbrack \begin{matrix} \left( {{\mathbf{C}}_{1} - {\mathbf{C}}_{2}{\mathbf{C}}_{3}^{-1}{\mathbf{C}}_{2}^{\top }}\right) & \mathbf{0} \\ \mathbf{0} & {\mathbf{C}}_{3} \end{matrix}\right\rbrack {u}^{2} \tag{28} \]

where

其中

\[\mathbf{C} = \left\lbrack \begin{matrix} \left( {{\mathbf{C}}_{1} - {\mathbf{C}}_{2}{\mathbf{C}}_{3}^{-1}{\mathbf{C}}_{2}^{\top }}\right) & \mathbf{0} \\ {\mathbf{C}}_{2}^{\top } & {\mathbf{C}}_{3} \end{matrix}\right\rbrack \tag{29} \]

Eqs. (27) and (28) could be decomposed into Eqs. (3

式（27）和（28）可以分解为如下式（3

(30)

to (31) and Eqs. (32) to (33) as follows:

到（31）以及式（32）到（33）：

\[\left\lbrack {{\mathbf{b}}_{1} - {\mathbf{C}}_{2}{\mathbf{C}}_{3}^{-1}{\mathbf{b}}_{2}}\right\rbrack = \left\lbrack {{\mathbf{C}}_{1} - {\mathbf{C}}_{2}{\mathbf{C}}_{3}^{-1}{\mathbf{C}}_{2}^{\top }}\right\rbrack \widetilde{\mathbf{X}} + {\mathbf{n}}_{1}^{\prime \prime } \tag{30} \]

\[{\mathbf{R}}_{1}^{\prime \prime } = \left\lbrack {{\mathbf{C}}_{1} - {\mathbf{C}}_{2}{\mathbf{C}}_{3}^{-1}{\mathbf{C}}_{2}^{\top }}\right\rbrack {u}^{2} \tag{31} \]

\[\left\lbrack {{\mathbf{b}}_{2} - {\mathbf{C}}_{2}^{\top }\widetilde{\mathbf{X}}}\right\rbrack = \left\lbrack {\mathbf{C}}_{3}\right\rbrack {}^{W}{\widetilde{\mathbf{P}}}_{f} + {\mathbf{n}}_{2}^{\prime \prime } \tag{32} \]

\[{\mathbf{R}}_{2}^{\prime \prime } = \left\lbrack {\mathbf{C}}_{3}\right\rbrack {u}^{2} \tag{33} \]

Obviously, Eqs. (30) and (31) are equivalent residual equation and observation noise covariance. They could be substituted into standard EKF model Eqs. (34) and (37) to conduct state update directly.

显然，式（30）和（31）是等价残差方程和观测噪声协方差。它们可以直接代入标准EKF模型式（34）和（37）进行状态更新。

\[\mathbf{K} = {\mathbf{{PJ}}}^{\mathsf{T}}{\left( {\mathbf{{JPJ}}}^{\mathsf{T}} + \mathbf{R}\right) }^{ - 1} \tag{34} \]

\[\Delta \mathbf{x} = \mathbf{{Kr}} \tag{35} \]

\[\mathbf{P} \leftarrow \left( {\mathbf{I} - \mathbf{{KJ}}}\right) \mathbf{P}{\left( \mathbf{I} - \mathbf{{KJ}}\right) }^{\top } + {\mathbf{{KRK}}}^{\top } \tag{36} \]

\[\mathbf{x} \leftarrow \mathbf{x} \oplus \Delta \mathbf{x} \tag{37} \]

3.4. EKF-based Landmark Solver

3.4. 基于EKF的地标求解器

$\widetilde{\mathbf{X}}$ can be obtained by substituting Eqs. (30) and (31) into Eqs. (34) to (37). Then,the resulting $\widetilde{\mathbf{X}}$ could be substituted into Eq. (32) to establish the landmark equivalent residual equation

通过将等式（30）和（31）代入等式（34）至（37），可以得到 $\widetilde{\mathbf{X}}$。然后，将得到的 $\widetilde{\mathbf{X}}$ 代入等式（32）以建立地标等效残差方程。

\[\left\lbrack \begin{matrix} {\mathbf{r}}_{1} \\ {\mathbf{r}}_{2} \\ \vdots \\ {\mathbf{r}}_{m} \end{matrix}\right\rbrack = \left\lbrack \begin{array}{llll} {\mathbf{C}}_{{3}_{1}} & & & \\ & {\mathbf{C}}_{{3}_{2}} & & \\ & & \ddots & \\ & & & {\mathbf{C}}_{{3}_{m}} \end{array}\right\rbrack \left\lbrack \begin{matrix} W{\widetilde{\mathbf{P}}}_{{f}_{1}} \\ W{\widetilde{\mathbf{P}}}_{{f}_{2}} \\ \vdots \\ W{\widetilde{\mathbf{P}}}_{{f}_{m}} \end{matrix}\right\rbrack + {\mathbf{n}}_{2}^{\prime \prime } \tag{38} \]

where ${\mathbf{C}}_{{3}_{1}},\cdots ,{\mathbf{C}}_{{3}_{m}}$ are diagonal elements of ${\mathbf{C}}_{3}$ clarified in Eq. (22). And the corresponding covariance ${\mathbf{R}}_{2}^{\prime \prime }$ is:

其中 ${\mathbf{C}}_{{3}_{1}},\cdots ,{\mathbf{C}}_{{3}_{m}}$ 是 ${\mathbf{C}}_{3}$ 对角线元素，这在等式（22）中已经阐明。相应的协方差 ${\mathbf{R}}_{2}^{\prime \prime }$ 为：

\[{\mathbf{R}}_{2}^{\prime \prime } = \left\lbrack \begin{array}{llll} {\mathbf{C}}_{{3}_{1}}{u}^{2} & & & \\ & {\mathbf{C}}_{{3}_{2}}{u}^{2} & & \\ & & \ddots & \\ & & & {\mathbf{C}}_{{3}_{m}}{u}^{2} \end{array}\right\rbrack \tag{39} \]

Benefited from the sparsity of the resulting landmark equivalent residual equation, Eqs. (38) and (39) is split as a bunch of small independent residual models, shown as Eq. (40), which allows the EKF update of each landmark to conduct one by one. This significantly reduces the computational complexity.

由于地标等效残差方程的稀疏性，等式（38）和（39）被拆分为一组小的独立残差模型，如等式（40）所示，这允许每个地标的EKF更新逐一进行。这显著降低了计算复杂度。

\[\left\lbrack {\mathbf{r}}_{i}\right\rbrack = \left\lbrack {\mathbf{C}}_{{3}_{i}}\right\rbrack \left\lbrack {{}^{W}{\widetilde{\mathbf{P}}}_{{f}_{i}}}\right\rbrack + {\mathbf{n}}_{{2}_{i}}^{\prime \prime },i = 1,\cdots ,m \tag{40} \]

\[\mathbf{R} = \left\lbrack {{\mathbf{C}}_{{3}_{i}}{u}^{2}}\right\rbrack \]

3.5. Frontend

3.5. 前端

Our code implementation makes full use of SVO2.0 as the front-end of SchurVINS. The integrated components of SchurVINS include feature alignment and depth-filter modules from original SVO2.0. Meanwhile, sparse image alignment module is replaced by the proposed EKF propagation scheme to guarantee delivering an accurate pose to feature alignment module. Compared with frame-to-frame feature tracking $\left\lbrack {{10},{24},{30}}\right\rbrack$ ,the strategy of feature alignment,implemented by projecting and matching the co-visible landmarks from local map to frames, achieves excellent long-term landmark tracking performance due to the fact that the lost landmarks in short time is capable to be tracked

我们的代码实现充分利用了SVO2.0作为SchurVINS的前端。SchurVINS集成的组件包括来自原始SVO2.0的特征对齐和深度滤波模块。同时，将稀疏图像对齐模块替换为所提出的EKF传播方案，以确保向特征对齐模块传递准确的姿态。与帧间特征跟踪 $\left\lbrack {{10},{24},{30}}\right\rbrack$ 相比，特征对齐策略（通过将局部地图中的共视地标投影并匹配到帧上实现）由于能够在短时间内追踪到丢失的地标，因此实现了优秀的长期地标跟踪性能。

Sequence	S/M	F/O ${}^{2}$	MH1	MH2	MH3	MH4	MH5	V11	V12	V13	V21	V22	Avg
OKVIS4[17]	M	O	0.160	0.220	0.240	0.340	0.470	0.090	0.200	0.240	0.130	0.160	0.225
VINS-mono[24]	M	O	0.150	0.150	0.220	0.320	0.300	0.079	0.110	0.180	0.080	0.160	0.174
Kimera[26]	S	O	0.110	0.100	0.160	0.240	0.350	0.050	0.080	0.070	0.080	0.100	0.134
ICE-BA[21]	S	O	0.090	0.070	0.110	0.160	0.270	0.050	0.050	0.110	0.120	0.090	0.112
$\mathrm{{SVO}}{2.0}^{5}$ [9]	S	O	0.080	0.080	0.088	0.211	0.231	0.052	0.082	0.073	0.084	0.116	0.109
BASALT[33]	S	O	0.070	0.060	0.070	0.130	0.110	0.040	0.050	0.100	0.040	0.050	0.072
DM-VIO[34]	M	O	0.065	0.044	0.097	0.102	0.096	0.048	0.045	0.069	0.029	0.050	0.064
$\mathrm{{MSCK}{F}^{4}}\left\lbrack {22}\right\rbrack$	S	F	0.420	0.450	0.230	0.370	0.480	0.340	0.200	0.670	0.100	0.160	0.342
${\mathrm{{ROVIO}}}^{4}\left\lbrack 2\right\rbrack$	M	F	0.210	0.250	0.250	0.490	0.520	0.100	0.100	0.140	0.120	0.140	0.232
OpenVINS-4 ${}^{5}$ [10] ${}^{3}$	S	F	0.084	0.084	0.127	0.218	0.360	0.038	0.054	0.050	0.064	0.061	0.114
${\mathrm{{OpenVINS}}}^{5}\left\lbrack {10}\right\rbrack$	S	F	0.072	0.143	0.086	0.173	0.247	0.055	0.060	0.059	0.054	0.047	0.096
SV(ours) ${}^{5}$	S	F	0.049	0.077	0.086	0.125	0.125	0.035	0.053	0.082	0.046	0.075	0.075

${}^{1}\mathrm{\;S}$ and $\mathrm{M}$ mean stereo and monocular methods,respectively.

${}^{1}\mathrm{\;S}$ 和 $\mathrm{M}$ 分别代表立体和单目方法。

${}^{2}\mathrm{\;F}$ and $\mathrm{O}$ mean filter-based and optimization-based methods,respectively.

${}^{2}\mathrm{\;F}$ 和 $\mathrm{O}$ 分别代表基于滤波器和基于优化的方法。

${}^{3}$ OpenVINS-4 means that the maximum size of the sliding window in OpenVINS is configured to be 4.

${}^{3}$ OpenVINS-4意味着OpenVINS中的滑动窗口最大尺寸被配置为4。

4 results taken from [5].

来自文献[5]的4个结果。

${}^{5}$ evaluated by author manually.

${}^{5}$ 由作者手动评估。

All other results are taken from the respective paper.

所有其他结果均取自各自论文。

Table 1. Accuracy evaluation of various mono and stereo VINS algorithms on EuRoC. In the upper part, we summarize the results for the optimization-based methods that run sliding window optimization to estimate pose. In the lower part, we evaluate the results of filter-based methods. Best result in bold, underline is the best result among filter-based methods. SchurVINS achieves the lowest average APE RMSE in filter-based methods and surpasses the majority of optimization-based methods. again. Depth-filter is utilized to execute landmark position initialization. Once the landmark is initialized sufficiently, it would be transferred to the proposed EKF-based landmark solver to proceed estimation with sliding window jointly.

表1.各种单目和双目VINS算法在EuRoC上的精度评估。在上部，我们总结了运行滑动窗口优化以估计位姿的基于优化的方法的结果。在下部，我们评估了基于滤波的方法的结果。最佳结果用粗体表示，下划线表示基于滤波方法中的最佳结果。SchurVINS在基于滤波的方法中实现了最低的平均APE RMSE，并超过了大多数基于优化的方法。再次。深度滤波被用于执行地标位置初始化。一旦地标被充分初始化，它将被转移到所提出的基于EKF的地标求解器，以与滑动窗口联合进行估计。

Sequence	S/M	F/O	c1	c2	c3	c4	c5	r1	r2	r3	r4	r5	r6	Avg
VINS-Mono1	M	O	0.630	0.950	1.560	0.250	0.770	0.070	0.070	0.110	0.040	0.200	0.080	0.430
OKVIS ${}^{1}$	M	O	0.330	0.470	0.570	0.260	0.390	0.060	0.110	0.070	0.030	0.070	0.040	0.218
BASALT1	S	O	0.340	0.420	0.350	0.210	0.370	0.090	0.070	0.130	0.050	0.130	0.020	0.198
DM-VIO	M	O	0.190	0.470	0.240	0.130	0.160	0.030	0.130	0.090	0.040	0.060	0.020	0.141
ROVIO1	M	F	0.470	0.750	0.850	0.130	2.090	0.160	0.330	0.150	0.090	0.120	0.050	0.471
${\mathrm{{OpenVINS}}}^{2}$	S	$\mathrm{F}$	0.413	0.322	1.536	0.186	0.644	0.062	0.093	0.079	0.027	0.074	0.020	0.314
${\mathbf{{SV}}}^{2}$	S	$\mathrm{F}$	0.329	0.285	0.555	0.162	0.274	0.048	0.160	0.066	0.049	0.054	0.021	0.182

${}^{1}$ results taken from [34].

${}^{1}$ 结果来源于[34]。

${}^{2}$ evaluated by author manually.

${}^{2}$ 由作者手动评估。

Table 2. Accuracy evaluation on TUM-VI datasets, c1 to c5 denote corridor1 to corridor5 in TUM-VI datasets, r1 to r6 denote room1 to room6 in TUM-VI datasets. Best result in bold, underline is the best result among filter-based methods.

表2.在TUM-VI数据集上的精度评估，c1到c5表示TUM-VI数据集中的走廊1到走廊5，r1到r6表示TUM-VI数据集中的房间1到房间6。最佳结果用粗体表示，下划线表示基于滤波方法中的最佳结果。

Based on First In First Out (FIFO) strategy, local map only maintains the most recent ten keyframes to support landmark tracking. Since high accuracy is already achieved, the traditional LBA is no longer necessary, which is abandoned in the proposed SchurVINS.

基于先进先出（FIFO）策略，局部地图仅维护最近的十个关键帧以支持地标跟踪。由于已经实现了高精度，传统的LBA不再是必要的，这在提出的SchurVINS中被放弃。

3.6. Keyframe Selection

3.6. 关键帧选择

The strategy of keyframe selection is important in VINS system. There are three strategies to select keyframes in SchurVINS. If the average parallax between the candidate frame and the previous keyframe reaches the threshold or the count of tracked landmarks drops below the certain threshold, the corresponding frame is defined as keyframe. Once the keyframe is selected, the FAST corners [31] are extracted to generate new landmarks via depth-filter module. Additionally, when the gap in both orientation and position between the candidate frame and the co-visible keyframes maintained in the local map is out of the certain range, the keyframe would be determined, by which

关键帧选择策略在VINS系统中至关重要。在SchurVINS中有三种关键帧选择策略。如果候选帧与前一个关键帧的平均视差达到阈值，或者跟踪到的地标数量降至特定阈值以下，相应的帧被定义为关键帧。一旦选择了关键帧，就会提取FAST角点[31]，通过深度滤波模块生成新的地标。此外，当候选帧与局部地图中维护的共视关键帧在方向和位置上的差距超出特定范围时，将确定关键帧，由此

	Avg CPU	Std CPU	Speed
DM-VIO	98/172	-/30	1x/1.76x
BASALT	46/203	-/46	1x/4.37x
VINS-Mono	45	13	1x
OpenVINS	37	10	1x
OpenVINS-4	32	8	1x
SMSCKF[30]	25	4	1x
SVO2.0	89	20	1x
SVO2.0-wo ${}^{2}$	17	6	1x
SV	18	6	1x

${}^{1}$ The $1\mathrm{x}$ evaluation results of DM-VIO and BASALT are the converted results by author manually.

${}^{1}$ DM-VIO和 BASALT的评估结果是作者手动转换的结果。$1\mathrm{x}$

${}^{2}$ SVO2.0-wo means SVO2.0 without the enabled LBA.

${}^{2}$ SVO2.0-wo表示未启用LBA的SVO2.0。

Table 3. Evaluation of CPU overhead for different wellknown VINS algorithms. GBA, PGO and LC are disabled on all the mentioned algorithms, with the exception of SVO2.0, which has the LBA module enabled. Our method provides a notable improvement in efficiency compared to the SOTA VINS algorithms. the tracking module could overcome divergence between the candidate frame and the local map.

表3. 不同知名VINS算法的CPU开销评估。在提到所有算法中，GBA、PGO和LC均被禁用，SVO2.0除外，它启用了LBA模块。我们的方法与最先进的VINS算法相比，在效率上有了显著提升。跟踪模块能够克服候选帧与局部地图之间的发散。

4. Experiments

4. 实验

The accuracy and efficiency of SchurVINS algorithms are evaluated by two experiments. And the additional ablation experiment is carried out to demonstrate the effectiveness of the proposed EKF-based landmark solver.

通过两个实验评估了SchurVINS算法的准确性和效率。并且进行了额外的消融实验，以证明所提出基于EKF的地标求解器的有效性。

System Configuration: We have developed SchurVINS based on the open source code repository of SVO2.0, specifically, svo_pro_open. The majority of system parameters are not required to be modified. For high efficiency, edgelet features, loop closure (LC), pose graph optimization (PGO), LBA and Global BA (GBA) are discarded or deactivated. For our experiments below, we have configured the threshold on the quantity of keyframes in the local map to a maximum of ten. This local map mainly maintains co-visible keyframes and landmarks to achieve feature alignment. In the backend of SchurVINS, there is a sliding window consists of 2 old keyframes and 2 latest temporal frames. The keyframe strategy is similar to original SVO2.0.

系统配置：我们在 SVO2.0 的开源代码库基础上开发了 SchurVINS，具体来说，是基于 svo_pro_open。大多数系统参数无需修改。为了提高效率，我们舍弃或禁用了边缘特征、闭环检测（LC）、位姿图优化（PGO）、局部束调整（LBA）和全局束调整（GBA）。在下面的实验中，我们将局部图中的关键帧数量阈值设置为最多十个。这个局部图主要维护共视关键帧和地标，以实现特征对齐。在 SchurVINS 的后端，有一个由 2 个旧关键帧和 2 个最新时间帧组成的滑动窗口。关键帧策略与原始 SVO2.0 类似。

4.1. Accuracy

4.1. 精确度

The overall accuracy of the mentioned algorithms is evaluated using Root Mean Square Error (RMSE) on two wellknown datasets, EuRoC [3] and TUM-VI [27]. The corresponding experimental trajectory and point cloud of SchurVINS on TUM-VI and EuRoC datasets are shown on Fig. 4. To prevent the fluctuation of the algorithm from causing unreasonable evaluation results, our own evaluation method is to run the algorithm for 7 rounds, remove the maximum and minimum values, and then calculate the average of the remaining results as the evaluation result. In Tab. 1, our method obtains the lowest average RMSE in filter-based methods reported on the dataset so far, as well as outperforms the majority of optimization-based methods. Besides, our approach obtains the similar accuracy with wellknown optimization-based method BASALT and slightly lower accuracy than the recent competitor DM-VIO. Besides, the well-known VINS algorithms, VINS-Fusion [24] and SMSCKF [30], are not included in Tab. 1, since VINS-mono and OpenVINS surpass VINS-Fusion and SMSCKF in terms of accuracy, respectively [10, 25]. The re-evaluation experiment in Tab. 2 is within expectation absolutely. It is worth highlighting that, although degrading in accuracy slightly compared with the two optimization-based competitors, our method achieves obviously lower computational complexity than both of them with details in the next subsection.

所提及算法的整体准确性通过在两个知名数据集 EuRoC [3] 和 TUM-VI [27] 上使用均方根误差（RMSE）进行评估。图 4 展示了 SchurVINS 在 TUM-VI 和 EuRoC 数据集上的实验轨迹和点云。为了防止算法的波动导致不合理的评估结果，我们自己的评估方法是运行算法 7 轮，移除最大值和最小值，然后计算剩余结果的平均值作为评估结果。在表 1 中，我们的方法在基于滤波的方法中获得了迄今为止数据集上报告的最低平均 RMSE，并且优于大多数基于优化的方法。此外，我们的方法与知名的基于优化的方法 BASALT 达到相似的准确性，并且比最近的竞争者 DM-VIO 略逊一筹。另外，表 1 中没有包含知名的 VINS 算法 VINS-Fusion [24] 和 SMSCKF [30]，因为 VINS-mono 和 OpenVINS 分别在准确性上超过了 VINS-Fusion 和 SMSCKF [10, 25]。表 2 中的重新评估实验完全符合预期。值得强调的是，尽管与两个基于优化的竞争者相比准确性略有下降，但我们的方法在计算复杂度上明显低于它们，具体细节将在下一小节中介绍。

	SS	So	OM-OTOAS	Source	[0,9].	SNIAuadO	t-SNIAuadO
SparseImageAlign	$-$	-	1.35	1.43	$-$	。	$-$
FeatureAlign	1.39	1.39	1.79	1.91	-	-	-
KLT	2	-27	-	-	2.63	2.69	2.67
Propagation	0.11	0.11	。	$-$	0.55	0.21	0.18
optimizePose	0.67	0.67	0.48	-	3.16	${0.99}/{4.30}^{2}$	${0.34}/{2.46}^{2}$
optimizeStructure	0.11	0.42	0.07	-	-	0.93	0.44
LBA	-	$-$	-	${26.3}^{3}$	-	-	$-$
Total time ${}^{4}$	3.83	4.11	3.77	9.28	8.53	10.91	7.89

${}^{1}$ denotes SchurVINS with Gauss-Newton optimization-based (GN-based) land-

${}^{1}$ 表示基于高斯-牛顿优化的 SchurVINS（GN-based）。

${}^{2}$ Running time of MSCKF update and SLAM update.

${}^{2}$ MSCKF 更新和 SLAM 更新的运行时间。

${}^{3}$ It contains some running time of SVO2.0 LBA in asynchronous thread.

${}^{3}$ 其中包含 SVO2.0 LBA 在异步线程中的一些运行时间。

${}^{4}$ The total time also contains other modules.

总时间还包含其他模块。

Table 4. Running time evaluation of the main parts of SchurVINS compared with SVO2.0 and OpenVINS on EuRoC MH01 (mean time in ms). Note that the different overhead of optimizeStructure between SVO-NonBA and SchurVINS-GN is primarily attributed to the variation in the count of feature matches, which is a consequence of the localization accuracy.

表4. SchurVINS与SVO2.0和OpenVINS在EuRoC MH01上的运行时间评估（主要部分的平均时间，单位为毫秒）。注意，SVO-NonBA与SchurVINS-GN之间优化结构的时间开销差异主要归因于特征匹配数量的变化，这是定位精度的结果。

4.2. Efficiency

4.2 效率

The efficiency evaluations are carried out on Intel i7-9700 (3.00GHZ) desktop platform. Global BA (GBA), pose graph optimization and loop closure are disabled on all of the following algorithms. Besides, LBA is only enabled on the original SVO2.0. The efficiency experiment is divided into two parts: profiling processor usage and overhead time, which are reported in Tab. 3 and Tab. 4, respectively.

效率评估在Intel i7-9700（3.00GHZ）桌面平台上进行。全局BA（GBA）、位姿图优化和闭环检测在以下所有算法中均被禁用。此外，LBA仅在原始的SVO2.0中启用。效率实验分为两部分：分析处理器使用情况和时间开销，分别报告在表3和表4中。

As demonstrated in Tab. 3, SchurVINS achieves almost the lowest processor usage compared with all the mentioned VINS algorithms. Especially, SVO2.0-wo requires similar cpu usage with SchurVINS, but it suffers from notable inaccuracy since it is almost pure Visual Odometry (VO). To thoroughly investigate the underlying reasons contributing to the efficiency advantages of SchurVINS, the experiment to meticulously analyze the overhead time of SchurVINS including the comparison with SVO2.0, the widely-recognized filter-based OpenVINS and SMSCKF is carried out in Tab. 4.

如表3所示，SchurVINS在所有提到的VINS算法中实现了几乎最低的处理器使用率。特别是，SVO2.0-wo与SchurVINS的CPU使用率相似，但由于其几乎是纯视觉里程计（VO），因此存在明显的准确性问题。为了彻底调查导致SchurVINS效率优势的根本原因，我们在表4中对SchurVINS的时间开销进行了仔细分析，包括与广泛认可的基于滤波器的OpenVINS和SMSCKF的比较。

Sequence	MH1	MH2	MH3	MH4	MH5	V11	V12	V13	V21	V22	Avg
SV	0.049	0.077	0.086	0.125	0.125	0.035	0.053	0.082	0.046	0.075	0.075
SV-GN	0.057	0.055	0.097	0.135	0.116	0.038	0.051	0.068	0.037	0.083	0.073
SV-OFF	0.067	0.103	0.107	0.137	0.143	0.038	0.062	-	0.057	0.255	0.107

${}^{1}$ SV-OFF denotes SchurVINS with disabled EKF-based landmark solver only uses depth-filter to initialize landmark.

${}^{1}$ SV-OFF表示禁用EKF-based地标求解器的SchurVINS，仅使用深度滤波器初始化地标。

Table 5. Ablation Evaluation on EuRoC.

表5. 在EuRoC上的消融评估。

In Tab. 4, the optimizeStructure module in SchurVINS is nearly 3 times faster than that of SchurVINS-GN. Because our method obtains significant computational savings by leveraging the intermediate results of Schur complement. In contrast, SchurVINS-GN reconstructs problems to estimate landmarks. Compared with SVO2.0-wo, SchurVINS is faster due to its replacement from the high-computational SparseImageAlign to propagation module. In contrast, the optimizeStructure of SVO2.0-wo is obviously faster than SchurVINS-GN. The reason is that the latter utilizes almost 4 times measurements than the former to conduct optimization. Compared with SVO2.0, the root cause leads to the obviously increased run time of algorithm is the high computational complexity of LBA. In consideration of OpenVINS, it is noteworthy that neither the default configuration nor the configuration with a maximum size of sliding window of 4 could achieve that Open-VINS outperforms SchurVINS in efficiency. What stands out from this analysis is that the update of SLAM points in OpenVINS requires noticeably more computational resources compared with the EKF-based landmark estimation presented in SchurVINS. Illustrated on Fig. 3, SchurVINS makes full use of the sparsity of problem than both hybrid MSCKF and optimization-based methods.

在表4中，SchurVINS中的optimizeStructure模块几乎比SchurVINS-GN的快3倍。这是因为我们的方法通过利用Schur补的中间结果获得了显著的计算节省。相比之下，SchurVINS-GN重新构建问题以估计地标。与SVO2.0-wo相比，由于将计算量大的SparseImageAlign替换为传播模块，SchurVINS更快。相反，SVO2.0-wo的optimizeStructure明显比SchurVINS-GN快。原因是后者几乎使用了比前者多4倍的测量数据进行优化。与SVO2.0相比，导致算法运行时间显著增加的根本原因是LBA的高计算复杂度。考虑到OpenVINS，值得注意的是，无论是默认配置还是滑动窗口最大尺寸为4的配置，Open-VINS在效率上都没有超过SchurVINS。这一分析中最突出的是，OpenVINS中SLAM点的更新相比SchurVINS中基于EKF的地标估计需要明显更多的计算资源。如图3所示，SchurVINS比混合MSCKF和基于优化的方法更充分利用了问题的稀疏性。

4.3. Ablation Study

4.3. 删除研究

The experiments above strongly support SchurVINS. And thus it is necessary to study the impact of different components of our algorithm. Based on SchurVINS, we replace or discard the EKF-based landmark solver to analyse its effectiveness.

上述实验强烈支持SchurVINS。因此，研究我们算法不同组件的影响是必要的。基于SchurVINS，我们替换或丢弃基于EKF的地标求解器以分析其有效性。

As illustrated in Tab. 5, if without either GN-based or EKF-based landmark solver, SchurVINS cannot sufficiently limit the global drift. Moreover, in some challenge scenarios, lack of estimating landmarks simultaneously in SchurVINS may lead to system divergency. The comparison between SchurVINS and SchurVINS-GN in Tab. 5 indicates that both the proposed EKF-based landmark solver and the GN-based landmark solver belonging to original SVO2.0 are effective and reliable to guarantee high precision. In addition, the comparison between them in Tab. 4 and Tab. 5, illustrates that although the proposed EKF-based landmark solver leads to slight accuracy degradation, it could achieve the obviously low computational complexity. An intuitive explanation for the decreased accuracy is that our method only uses all the observations in sliding window for landmark estimation.

如表5所示，如果没有基于GN或基于EKF的地标求解器，SchurVINS无法充分限制全局漂移。此外，在某些具有挑战性的场景中，SchurVINS同时估计地标的能力不足可能导致系统发散。表5中SchurVINS与SchurVINS-GN的比较表明，所提出的基于EKF的地标求解器和原始SVO2.0中的基于GN的地标求解器均有效可靠，能够保证高精度。此外，表4和表5中的比较说明，尽管所提出的基于EKF的地标求解器导致精度略有下降，但它能够显著降低计算复杂度。对于精度降低的直观解释是，我们的方法只使用了滑动窗口中的所有观测值来进行地标估计。

5. Conclusions and Future Work

5. 结论与未来工作

In this paper, we have developed an EKF-based VINS algorithm, including the novel EKF-based landmark solver, to achieve 6-DoF estimation with both high efficiency and accuracy. In particular, the formulated equivalent residual model consisting of Hessian, Gradient and the corresponding observation covariance is utilized to estimate poses and landmarks jointly to guarantee high-precision positioning. To achieve high efficiency, the equivalent residual model is decomposed as pose residual model and landmark residual model by Schur complement to conduct EKF update respectively. Benefited from the probabilistic independence of surrounding environment elements, the resulting landmark residual model are split as a bunch of small independent residual models for the EKF update of each landmark, which significantly reduces the computational complexity. To best of our knowledge, we are the first to utilize Schur complement factorizing residual model in EKF-based VINS algorithms for acceleration. The experiments based on EuRoC and TUM-VI datasets demonstrate that our approach notably outperforms the overall EKF-based methods $\left\lbrack {{10},{30}}\right\rbrack$ and the majority of optimization-based methods in both accuracy and efficiency. Besides, our approach requires almost less than ${50}\%$ computational resource than the SOTA optimization-based methods [33, 34] with comparable accuracy. In the meanwhile, the ablation studies clearly demonstrate that our proposed EKF-based landmark solver is not only significantly efficient but also capable of ensuring high accuracy.

在本文中，我们开发了一种基于EKF的VINS算法，包括新颖的基于EKF的地标求解器，以实现具有高效和精确性的6自由度估计。特别是，所构建的等价残差模型，包括Hessian、梯度以及相应的观测协方差，被用于联合估计姿态和地标，以确保高精度定位。为了实现高效性，等价残差模型通过Schur补分解为姿态残差模型和地标残差模型，分别进行EKF更新。得益于周围环境元素的概率独立性，得到的地标残差模型被拆分为一组小的独立残差模型，用于每个地标的EKF更新，从而显著降低了计算复杂度。据我们所知，我们是第一个在基于EKF的VINS算法中利用Schur补分解残差模型来加速计算的研究者。基于EuRoC和TUM-VI数据集的实验表明，我们的方法在准确性和效率上都明显优于大多数基于EKF的方法 $\left\lbrack {{10},{30}}\right\rbrack$ 和大多数基于优化的方法。此外，我们的方法所需的计算资源几乎比最先进的基于优化的方法 [33, 34] 少 ${50}\%$，并且具有相当的准确性。同时，消融研究清楚地表明，我们提出的基于EKF的地标求解器不仅效率高，而且能够确保高准确性。

In future work, we will focus on the local map refinement in SchurVINS to explore more accuracy.

在未来的工作中，我们将关注SchurVINS中的局部地图精炼，以探索更高的准确性。

6. Acknowledgment

6. 致谢

We would like to thank Taoran Chen, Chen Chen, and Jia-tong Li in ByteDance as well as Zihuan Cheng in SCUT for their kind help. Moreover, I (Frank) would like to deeply thank my wife, Linan Guo.

我们感谢陈涛然、陈晨和广州大学的李家通在字节跳动的友好帮助，以及华南理工大学的程子桓。此外，我（Frank）想要深深地感谢我的妻子郭琳安。

References

参考文献

[1] Sameer Agarwal, Noah Snavely, Steven M Seitz, and Richard Szeliski. Bundle adjustment in the large. In Computer Vision-ECCV 2010: 11th European Conference on Computer Vision, Heraklion, Crete, Greece, September 5-11, 2010, Proceedings, Part II 11, pages 29-42. Springer, 2010. 1

[2] Michael Bloesch, Sammy Omari, Marco Hutter, and Roland Siegwart. Robust visual inertial odometry using a direct ekf-based approach. In 2015 IEEE/RSJ international conference on intelligent robots and systems (IROS), pages 298-304. IEEE, 2015. 1, 6

[3] Michael Burri, Janosch Nikolic, Pascal Gohl, Thomas Schneider, Joern Rehder, Sammy Omari, Markus W Achte-lik, and Roland Siegwart. The euroc micro aerial vehicle datasets. The International Journal of Robotics Research, 35 (10):1157-1163, 2016. 7

[4] Carlos Campos, Richard Elvira, Juan J Gómez Rodríguez, José MM Montiel, and Juan D Tardós. Orb-slam3: An accurate open-source library for visual, visual-inertial, and mul-timap slam. IEEE Transactions on Robotics, 37(6):1874- 1890, 2021. 1, 2

[5] Jeffrey Delmerico and Davide Scaramuzza. A benchmark comparison of monocular visual-inertial odometry algorithms for flying robots. In 2018 IEEE international conference on robotics and automation (ICRA), pages 2502-2509. IEEE, 2018. 6

[6] Nikolaus Demmel, Christiane Sommer, Daniel Cremers, and Vladyslav Usenko. Square root bundle adjustment for large-scale reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11723-11732, 2021. 2

[7] Yunfei Fan, Ruofu Wang, and Yinian Mao. Stereo visual inertial odometry with online baseline calibration. In 2020 IEEE International Conference on Robotics and Automation (ICRA), pages 1084-1090. IEEE, 2020. 1, 2, 4

[8] Christian Forster, Matia Pizzoli, and Davide Scaramuzza. SVO: Fast semi-direct monocular visual odometry. In IEEE Int. Conf. Robot. Autom. (ICRA), pages 15-22, 2014. 2

[9] Christian Forster, Zichao Zhang, Michael Gassner, Manuel Werlberger, and Davide Scaramuzza. SVO: Semidirect visual odometry for monocular and multicamera systems. IEEE Trans. Robot., 33(2):249-265, 2017. 2, 6

[10] Patrick Geneva, Kevin Eckenhoff, Woosik Lee, Yulin Yang, and Guoquan Huang. Openvins: A research platform for visual-inertial estimation. In 2020 IEEE International Conference on Robotics and Automation (ICRA), pages 4666- 4672. IEEE, 2020. 1, 2, 4, 5, 6, 7, 8

[11] Guoquan Huang. Visual-inertial navigation: A concise review. In 2019 international conference on robotics and automation (ICRA), pages 9572-9582. IEEE, 2019. 1

[12] Guoquan P Huang, Anastasios I Mourikis, and Stergios I Roumeliotis. Analysis and improvement of the consistency of extended kalman filter based slam. In 2008 IEEE International Conference on Robotics and Automation, pages 473- 479. IEEE, 2008. 2

[13] Guoquan P Huang, Anastasios I Mourikis, and Stergios I Roumeliotis. A first-estimates jacobian ekf for improving slam consistency. In Experimental Robotics: The Eleventh International Symposium, pages 373-382. Springer, 2009. 2

[14] Viorela Ila, Lukas Polok, Marek Solony, and Pavel Svoboda. Slam++-a highly efficient and temporally scalable incremental slam framework. The International Journal of Robotics Research, 36(2):210-230, 2017. 2

[15] Michael Kaess, Ananth Ranganathan, and Frank Dellaert. isam: Incremental smoothing and mapping. IEEE Transactions on Robotics, 24(6):1365-1378, 2008.

[16] Michael Kaess, Hordur Johannsson, Richard Roberts, Viorela Ila, John J Leonard, and Frank Dellaert. isam2: Incremental smoothing and mapping using the bayes tree. The International Journal of Robotics Research, 31(2):216-235, 2012. 2

[17] Stefan Leutenegger, Simon Lynen, Michael Bosse, Roland Siegwart, and Paul Furgale. Keyframe-based visual-inertial odometry using nonlinear optimization. The International Journal of Robotics Research, 34(3):314-334, 2015. 1, 6

[18] Mingyang Li and Anastasios I Mourikis. Vision-aided inertial navigation for resource-constrained systems. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 1057-1063. IEEE, 2012. 2

[19] Mingyang Li and Anastasios I Mourikis. Optimization-based estimator design for vision-aided inertial navigation. In Robotics: Science and Systems, pages 241-248. Berlin Germany, 2013. 2

[20] Mingyang Li and Anastasios I Mourikis. High-precision, consistent ekf-based visual-inertial odometry. The International Journal of Robotics Research, 32(6):690-711, 2013. 2

[21] Haomin Liu, Mingyu Chen, Guofeng Zhang, Hujun Bao, and Yingze Bao. Ice-ba: Incremental, consistent and efficient bundle adjustment for visual-inertial slam. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1974-1982, 2018. 1, 2, 6

[22] Anastasios I Mourikis and Stergios I Roumeliotis. A multistate constraint kalman filter for vision-aided inertial navigation. In Proceedings 2007 IEEE international conference on robotics and automation, pages 3565-3572. IEEE, 2007. 1, $2,4,6$

[23] Lukas Polok, Marek Solony, Viorela Ila, Pavel Smrz, and Pavel Zemcik. Efficient implementation for block matrix operations for nonlinear least squares problems in robotic applications. In 2013 IEEE International Conference on Robotics and Automation, pages 2263-2269. IEEE, 2013. 2

[24] Tong Qin, Peiliang Li, and Shaojie Shen. Vins-mono: A robust and versatile monocular visual-inertial state estimator. IEEE Transactions on Robotics, 34(4):1004-1020, 2018. 1, $5,6,7$

[25] Tong Qin, Jie Pan, Shaozu Cao, and Shaojie Shen. A general optimization-based framework for local odometry estimation with multiple sensors. CoRR, abs/1901.03638, 2019. 7

[26] Antoni Rosinol, Marcus Abate, Yun Chang, and Luca Car-lone. Kimera: an open-source library for real-time metric-semantic localization and mapping. In 2020 IEEE Inter-

national Conference on Robotics and Automation (ICRA), pages 1689-1696. IEEE, 2020. 6

[27] David Schubert, Thore Goll, Nikolaus Demmel, Vladyslav Usenko, Jörg Stückler, and Daniel Cremers. The tum vi benchmark for evaluating visual-inertial odometry. In 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 1680-1687. IEEE, 2018. 7

[28] Gabe Sibley, Larry Matthies, and Gaurav Sukhatme. Sliding window filter with application to planetary landing. Journal of Field Robotics, 27(5):587-608, 2010. 4

[29] Joan Sola. Quaternion kinematics for the error-state kalman filter. arXiv preprint arXiv:1711.02508, 2017. 1, 2, 3

[30] Ke Sun, Kartik Mohta, Bernd Pfrommer, Michael Watterson, Sikang Liu, Yash Mulgaonkar, Camillo J Taylor, and Vijay Kumar. Robust stereo visual inertial odometry for fast autonomous flight. IEEE Robotics and Automation Letters, 3 (2):965-972, 2018. 1, 4, 5, 7, 8

[31] Miroslav Trajković and Mark Hedley. Fast corner detection. Image and vision computing, 16(2):75-87, 1998. 6

[32] Bill Triggs, Philip F McLauchlan, Richard I Hartley, and Andrew W Fitzgibbon. Bundle adjustment-a modern synthesis. In Vision Algorithms: Theory and Practice: International Workshop on Vision Algorithms Corfu, Greece, September 21-22, 1999 Proceedings, pages 298- 372. Springer, 2000. 1

[33] Vladyslav Usenko, Nikolaus Demmel, David Schubert, Jörg Stückler, and Daniel Cremers. Visual-inertial mapping with non-linear factor recovery. IEEE Robotics and Automation Letters, 5(2):422-429, 2019. 1, 6, 8

[34] Lukas Von Stumberg and Daniel Cremers. Dm-vio: Delayed marginalization visual-inertial odometry. IEEE Robotics and Automation Letters, 7(2):1408-1415, 2022. 1, 6, 8

[35] Kejian Wu, Ahmed M Ahmed, Georgios A Georgiou, and Stergios I Roumeliotis. A square root inverse filter for efficient vision-aided inertial navigation on mobile devices. In Robotics: Science and Systems, page 2. Rome, Italy, 2015. 2

[36] Zhichao Ye, Guanglin Li, Haomin Liu, Zhaopeng Cui, Hujun Bao, and Guofeng Zhang. Coli-ba: Compact linearization based solver for bundle adjustment. IEEE Transactions on Visualization and Computer Graphics, 28(11):3727-3736, 2022. 2

标签：right,mathbf,rbrack,lbrack,SchurVINS,left
From： https://www.cnblogs.com/odesey/p/18345592

SchurVINS

SchurVINS: Schur Complement-Based Lightweight Visual Inertial Navigation System

SchurVINS：基于Schur补的轻量级视觉惯性导航系统

Abstract

摘要

1. Introduction

1. 引言

2. 相关工作

3. SchurVINS Framework

3. SchurVINS框架

3.1. State Definition

3.1. 状态定义

3.2. Propagation and Augmentation

3.2. 传播与增强

3.3. Schur Complement-Based State Update

3.3. 基于Schur补的状态更新

3.4. EKF-based Landmark Solver

3.4. 基于EKF的地标求解器

3.5. Frontend

3.5. 前端

3.6. Keyframe Selection

3.6. 关键帧选择

4. Experiments

4. 实验

4.1. Accuracy

4.1. 精确度

4.2. Efficiency

4.2 效率

4.3. Ablation Study

4.3. 删除研究

5. Conclusions and Future Work

5. 结论与未来工作

6. Acknowledgment

6. 致谢

References

参考文献

相关文章

赞助商

阅读排行

SchurVINS

SchurVINS: Schur Complement-Based Lightweight Visual Inertial Navigation System

SchurVINS：基于Schur补的轻量级视觉惯性导航系统

Abstract

摘要

1. Introduction

1. 引言

2. Related Work

2. 相关工作

3. SchurVINS Framework

3. SchurVINS框架

3.1. State Definition

3.1. 状态定义

3.2. Propagation and Augmentation

3.2. 传播与增强

3.3. Schur Complement-Based State Update

3.3. 基于Schur补的状态更新

3.4. EKF-based Landmark Solver

3.4. 基于EKF的地标求解器

3.5. Frontend

3.5. 前端

3.6. Keyframe Selection

3.6. 关键帧选择

4. Experiments

4. 实验

4.1. Accuracy

4.1. 精确度

4.2. Efficiency

4.2 效率

4.3. Ablation Study

4.3. 删除研究

5. Conclusions and Future Work

5. 结论与未来工作

6. Acknowledgment

6. 致谢

References

参考文献

相关文章

赞助商

阅读排行