标签：right mathbf system GNSS GVIS VINS left

GVINS: Tightly Coupled GNSS-Visual-Inertial Fusion for Smooth and Consistent State Estimation

GVINS：紧密耦合的GNSS-视觉-惯性融合系统，用于平滑和一致的状态估计

Shaozu Cao, Xiuyuan Lu, and Shaojie Shen

曹少祖，吕修媛，沈邵杰

Abstract-Visual-Inertial odometry (VIO) is known to suffer from drifting especially over long-term runs. In this paper, we present GVINS, a non-linear optimization based system that tightly fuses GNSS raw measurements with visual and inertial information for real-time and drift-free state estimation. Our system aims to provide accurate global 6-DoF estimation under complex indoor-outdoor environment where GNSS signals may be intermittent or even totally unavailable. To connect global measurements with local states, a coarse-to-fine initialization procedure is proposed to efficiently calibrate the transformation online and initialize GNSS states from only a short window of measurements. The GNSS code pseudorange and Doppler shift measurements, along with visual and inertial information, are then modelled and used to constrain the system states in a factor graph framework. For complex and GNSS-unfriendly areas, the degenerate cases are discussed and carefully handled to ensure robustness. Thanks to the tightly-coupled multi-sensor approach and system design, our system fully exploits the merits of three types of sensors and is capable to seamlessly cope with the transition between indoor and outdoor environments, where satellites are lost and reacquired. We extensively evaluate the proposed system by both simulation and real-world experiments, and the result demonstrates that our system substantially eliminates the drift of VIO and preserves the local accuracy in spite of noisy GNSS measurements. The challenging indoor-outdoor and urban driving experiments verify the availability and robustness of GVINS in complex environments. In addition, experiments also show that our system can gain from even a single satellite while conventional GNSS algorithms need four at least.

摘要-视觉-惯性里程计（VIO）在长期运行中尤其容易产生漂移。在本文中，我们提出了GVINS，这是一种基于非线性优化的系统，它紧密融合了GNSS原始测量值与视觉和惯性信息，用于实时无漂移的状态估计。我们的系统旨在在复杂的室内外环境中提供准确的全球6自由度估计，在这些环境中，GNSS信号可能是间歇性的，甚至完全不可用。为了将全球测量值与局部状态连接起来，我们提出了一种粗到细的初始化过程，以在线有效地校准变换并仅从短时间窗口的测量值中初始化GNSS状态。然后，将GNSS码伪距和多普勒频移测量值与视觉和惯性信息一起建模，并在因子图框架中用于约束系统状态。对于复杂且对GNSS不友好的区域，讨论了退化情况，并谨慎处理以确保鲁棒性。得益于紧密耦合的多传感器方法和系统设计，我们的系统完全利用了三种类型传感器的优点，能够无缝应对室内外环境之间的过渡，其中卫星丢失并重新获取。我们通过仿真和实际世界实验对所提出的系统进行了广泛评估，结果表明，我们的系统显著消除了VIO的漂移，并尽管GNSS测量值噪声较大，仍保持了局部精度。具有挑战性的室内外和城市驾驶实验验证了GVINS在复杂环境中的可用性和鲁棒性。此外，实验还表明，我们的系统即使只有一个卫星也能受益，而传统的 GNSS 算法至少需要四个。

Index Terms-state estimation, sensor fusion, SLAM, localization

索引术语-状态估计，传感器融合，SLAM，定位

I. INTRODUCTION

I. 引言

L OCALIZATION is an essential functionality for many spatial-aware applications, such as autonomous driving, Unmanned Aerial Vehicle (UAV) navigation and Augmented Reality (AR). Estimating system states with various sensors has been widely studied for decades. Among these, the sensor fusion approach has been more and more popular in recent years. Due to the complementary properties provided by heterogeneous sensors, sensor fusion algorithms can significantly improve the accuracy and robustness of the state estimation system.

定位是许多空间感知应用的基本功能，例如自动驾驶、无人机（UAV）导航和增强现实（AR）。数十年来，利用各种传感器估计系统状态已被广泛研究。在这些方法中，传感器融合方法近年来越来越受欢迎。由于异构传感器提供的互补特性，传感器融合算法可以显著提高状态估计系统的准确性和鲁棒性。

The camera provides rich visual information with only a low cost and small footprint, thus attracting much attention from both computer vision and robotics area. Combined with a MEMS IMU, which offers high frequency and outlier-free inertial measurement, Visual-Inertial Navigation (VIN) algorithms can often achieve high accuracy and be more robust in complex environments. Nevertheless, both camera and IMU operate in the local frame and it has been proven that the VIN system has four unobservable directions [1],namely $x,y,z$ and yaw. Thus the odometry drift is inevitable for any VIN system. On the other hand, Global Navigation Satellite System (GNSS) provides a drift-free and global-aware solution for localization tasks, and has been extensively used in various scenarios. GNSS signal is freely available and conveys the range information between the receiver and satellites. With at least 4 satellites being tracked simultaneously, the receiver is able to obtain its unique coordinate in the global Earth frame. Considering the complementary characteristics between the VIN and GNSS system, it seems natural that improvements can be made by fusing information from both systems together.

摄像头仅以低成本和小体积提供丰富的视觉信息，因此在计算机视觉和机器人领域受到了广泛关注。结合MEMS惯性测量单元（IMU），后者提供高频率和无异常的惯性测量，视觉-惯性导航（VIN）算法通常能够在复杂环境中实现高准确性和更强的鲁棒性。然而，摄像头和IMU都在局部坐标系中运行，已经证明VIN系统有四个不可观测方向[1]，即$x,y,z$和偏航(yaw)。因此，对于任何VIN系统来说，里程计漂移是不可避免的。另一方面，全球导航卫星系统（GNSS）为定位任务提供了无漂移和全局感知的解决方案，并在各种场景中得到了广泛应用。GNSS信号免费提供，并传达接收器与卫星之间的距离信息。当同时跟踪至少4颗卫星时，接收器能够获得其在全球地球坐标系中的唯一坐标。考虑到VIN和GNSS系统之间的互补特性，通过融合两个系统的信息似乎很自然能够实现改进。

However, many challenges exist during the fusion of two systems. Firstly, a stable initialization from noisy GNSS measurement is indispensable. Among quantities need to be initialized, the 4-DoF transformation between the local VIN frame and the global GNSS frame is essential. The transformation is necessary to associate measurements from local and global systems together. Unlike the extrinsic transformation between camera and IMU, this transformation cannot be offline calibrated because each time the VIN system starts such transformation will vary. In addition, one-shot alignment using a portion of sequence does not work well as the drift of the fusion system makes such alignment invalid during GNSS outage situations. Thus, an online initialization and calibration between local frame and global frame is necessary to fuse heterogeneous measurements and cope with complex indoor-outdoor environments. Secondly, the precision of the GNSS measurement does not match with that of the VIN system, and various error sources exist during the GNSS signal propagation. In practice, the code pseudorange measurement, which is used for global localization, can only achieve meter-level precision while the VIN system is capable to provide centimeter-level estimation over a short range. As a result, the fusion system will be susceptible to the noisy GNSS measurement if not formulated carefully. Thirdly, degeneration happens when the fusion system experiences certain movements such as pure rotation or the number of locked satellites is insufficient. Normally the GNSS-visual-inertial fusion system can offer a drift-free 6-DoF global estimation, but the conclusion no longer holds in degenerate cases. In addition, the transition between indoor and outdoor environments, during which all satellites are lost and gradually reacquired again, also poses challenges to the system design.

然而，在两个系统的融合过程中存在许多挑战。首先，从噪声GNSS测量中获取稳定的初始化是必不可少的。在需要初始化的量中，本地VIN坐标系与全球GNSS坐标系之间的4自由度变换至关重要。这种变换对于将本地和全球系统的测量数据关联起来是必要的。与相机和IMU之间的外部变换不同，这种变换不能离线校准，因为每次VIN系统启动时这种变换都会发生变化。此外，使用序列的一部分进行的一次性对准在融合系统的漂移使得GNSS中断情况下这种对准无效。因此，本地坐标系和全球坐标系之间的在线初始化和校准对于融合异构测量数据以及应对复杂的室内外环境是必要的。其次，GNSS测量的精度与VIN系统的精度不匹配，GNSS信号传播过程中存在各种误差源。实际上，用于全球定位的码伪距测量只能达到米级精度，而VIN系统能够在短距离内提供厘米级的估计。因此，如果融合系统没有仔细构建，将会对噪声GNSS测量敏感。第三，当融合系统经历某些运动（如纯旋转）或锁定卫星数量不足时，会发生退化。通常，GNSS-视觉-惯性融合系统能够提供无漂移的6自由度全局估计，但在退化情况下这一结论不再成立。此外，在室内外环境过渡期间，所有卫星都会丢失然后逐渐重新获取，这也对系统设计提出了挑战。

To address the above-mentioned issues, we propose a non-linear optimization-based system to tightly fuse GNSS raw measurements (code pseudorange and Doppler frequency shift) with visual and inertial data for accurate and drift-free state estimation. The 4-DoF transformation between local and global frames is recovered via a coarse-to-fine approach during initialization phase and is further optimized subsequently. To incorporate noisy GNSS raw measurements, all GNSS constraints are formulated under a probabilistic factor graph in which all states are jointly optimized. In addition, degenerate cases are discussed and carefully handled to ensure robustness. Thanks to the tightly-coupled approach and system design, our system fully exploits the complementary properties among GNSS, visual and inertial measurements and is able to provide locally smooth and globally consistent estimation even in complex environments, as shown in Fig. 1. We highlight the contributions of this paper as follows:

为了解决上述问题，我们提出了一种基于非线性优化的系统，将GNSS原始测量数据（码伪距和多普勒频移）与视觉和惯性数据进行紧密融合，以实现准确且无漂移的状态估计。在初始化阶段，通过粗到细的方法恢复了本地框架与全局框架之间的4自由度变换，并在之后进一步优化。为了纳入有噪声的GNSS原始测量数据，所有GNSS约束都在一个概率因子图中进行公式化，其中所有状态都是联合优化的。此外，还讨论了退化情况，并进行了仔细处理以确保鲁棒性。得益于紧密耦合的方法和系统设计，我们的系统能够充分利用GNSS、视觉和惯性测量之间的互补特性，即使在复杂环境中也能提供局部平滑且全局一致的估计，如图1所示。本文的贡献如下：

All authors are with the Department of Electronic and Computer Engineering, Hong Kong University of Science and Technology, Hong Kong, China. {scaoad, xluaj}@connect.ust.hk, eeshao jie@ust.hk. (Corresponding author: Shaozu Cao.)

所有作者均隶属于香港科技大学电子与计算机工程系，香港，中国。{scaoad, xluaj}@connect.ust.hk, eeshao jie@ust.hk。（通讯作者：曹少祖。）

Fig. 1. A snapshot of our system in a complex indoor-outdoor environment. The global estimation result is plotted on Google Maps directly and aligns well with the ground truth RTK trajectory as shown in part (a). Part (b) depicts the distribution of satellites with tangential direction representing the azimuth and radial direction being the elevation angle. The blue arrow is a compass-like application which indicts the global yaw orientation of the camera. Subplot (c) and (d) illustrate the altitude information and the local-ENU yaw offset respectively. The measurement noise level of each tracked satellite is shown in part (e). Note that there is an obvious failure on the RTK trajectory when we walk the indoor stairs, while our system can still perform global estimation even in indoor environment.

图 1. 我们系统在复杂室内外环境中的一个快照。全局估计结果直接绘制在谷歌地图上，并与部分（a）所示的真实RTK轨迹对齐良好。部分（b）描述了卫星的分布，切向方向代表方位角，径向方向代表高度角。蓝色箭头是一个类似指南针的应用，指示相机的全局偏航方向。子图（c）和（d）分别说明了高度信息和本地-ENU偏航偏移。每个跟踪卫星的测量噪声水平在部分（e）中显示。注意，当我们在室内楼梯上走动时，RTK轨迹有一个明显的失败，而我们的系统即使在室内环境中也能进行全局估计。

an online coarse-to-fine approach to initialize GNSS-visual-inertial states.
一种在线粗到细的方法来初始化GNSS-视觉-惯性状态。
an optimization-based, tightly-coupled approach to fuse visual-inertial data with multi-constellation GNSS raw measurements under the probabilistic framework.
一种基于优化的、紧密耦合的方法，在概率框架下融合视觉-惯性数据与多星系统GNSS原始测量值。
a real-time estimator which is capable to provide drift-free 6-DoF global estimation in complex environment where GNSS signals may be largely intercepted or even totally unavailable.
一种实时估计器，能够在GNSS信号可能被大量截获或甚至完全不可用的复杂环境中提供无漂移的6自由度全局估计。
an evaluation of the proposed system in both simulation and real-world environments.
对所提出系统在模拟和真实世界环境中的评估。

For the benefit of the research community, the proposed ${\text{system}}^{1}$ ,along with the well-synchronized datasets ${}^{2}$ ,have been open-sourced.

为了研究社区的利益，所提出的 ${\text{system}}^{1}$ 以及良好同步的数据集 ${}^{2}$ 已经开源。

The rest of this work is structured as follows: in Section II we discuss the existing relevant literature. Section III describes the notation and coordinate system involved in the system. In Section IV we briefly introduce relevant background knowledge of GNSS. Section V shows the structure and workflow of the proposed system. The problem formulation and methodology are illustrated in Section VI. In Section VII we address the GNSS initialization issues and discuss several degenerate cases that degrade the performance of our system. The experiment setup and evaluation are given in Section VIII. Finally Section IX concludes this paper.

本工作的其余部分结构如下：在第二节中，我们讨论现有的相关文献。第三节描述了系统中涉及的符号和坐标系。第四节简要介绍GNSS的相关背景知识。第五节展示了所提出系统的结构和工作流程。第六节阐述了问题公式化和方法。第七节我们讨论GNSS初始化问题，并讨论了几种使系统性能退化的退化解。第八节给出了实验设置和评估。最后，第九节总结本文。

II. 相关工作

State estimation via multiple sensors fusion approach has been proven to be effective and robust, and there is extensive literature on this area. Among those, we are particularly interested in the combination of small size and low cost sensors such as camera, IMU and GNSS receiver, to produce a real-time accurate estimation in the unknown environment.

通过多传感器融合方法进行状态估计已被证明是有效且健壮的，并且这一领域有大量的文献。在这些文献中，我们特别关注小型低成本传感器（如相机、IMU和GNSS接收器）的组合，以在未知环境中实现实时准确估计。

The fusion of visual and inertial measurement in a tightly-coupled manner can be classified into either filter-based method or optimization-based method. MSCKF [2] is an excellent filter-based state estimator which utilizes the geometric constraints between multiple camera poses to efficiently optimize the system states. Based on MSCKF, [3] makes improvements on its accuracy and consistency, and [4] aims to overcome its numerical stability issue especially on mobile devices. Compared with the filter based approach, nonlinear batch optimization method can achieve better performance by re-linearization at the expense of computational cost. OKVIS [5] utilizes keyframe-based sliding window optimization approach for state estimation. VINS-Mono [6] also optimizes system states within the sliding window but is more complete with online relocalization and pose graph optimization. Since camera and IMU only impose local relative constraint among states, accumulated drift is a critical issue in the VIN system, especially over long-term operation.

视觉和惯性测量的紧耦合融合可以归类为基于滤波器的方法或基于优化的方法。MSCKF [2] 是一种优秀的基于滤波器的状态估计器，它利用多个相机姿态之间的几何约束来有效地优化系统状态。基于MSCKF，[3] 提高了其准确性和一致性，[4] 则旨在克服其在移动设备上的数值稳定性问题。与基于滤波器的方法相比，非线性批量优化方法通过在计算成本上重新线性化可以实现更好的性能。OKVIS [5] 利用基于关键帧的滑动窗口优化方法进行状态估计。VINS-Mono [6] 也在滑动窗口内优化系统状态，但更完整，具有在线重定位和姿态图优化。由于相机和IMU仅对状态之间施加局部相对约束，累积漂移是VIN系统中一个关键问题，特别是在长期运行过程中。

${}^{1}$ https://github.com/HKUST-Aerial-Robotics/GVINS

${}^{2}$ https://github.com/HKUST-Aerial-Robotics/GVINS-Dataset

As GNSS provides absolute measurement in the global Earth frame, incorporating GNSS information is a natural way to reduce accumulated drift. In terms of loosely-coupled manner, [7] [8] describe state estimation systems which fuse GNSS solution with visual and inertial data under the EKF framework. [9] proposes a UKF algorithm that fuses visual, inertial, LiDAR and GNSS solution to produce a smooth and consistent trajectory in different environments. [10], [11] and our previous work VINS-Fusion [12] fuse the result from local VIO with GNSS solution under the optimization framework. In [13] the authors combine the result from Precise Point Positioning (PPP) with a stereo VIO to achieve low drift estimation. Both the GNSS code and phase measurements are used in their formulation and precise satellite products are utilized to improve the accuracy. All aforementioned works rely on the GNSS solution to perform estimation so system failure will occur once the GNSS solution is highly corrupted or unavailable in the situation where the number of tracked satellites is below than 4.

由于GNSS在全球地球坐标系中提供绝对测量，因此融合GNSS信息是减少累积漂移的自然方法。在松耦合方式方面，文献[7] [8] 描述了在EKF框架下融合GNSS解算结果与视觉及惯性数据的态估计系统。文献[9] 提出了一个UKF算法，该算法融合了视觉、惯性、LiDAR和GNSS解算结果，以在不同环境中生成平滑且一致的轨迹。文献[10]、[11]以及我们之前的工作VINS-Fusion [12] 在优化框架下融合了本地VIO的结果与GNSS解算结果。在文献[13]中，作者将精确点定位（PPP）的结果与立体VIO相结合，以实现低漂移估计。他们的公式中同时使用了GNSS码和相位测量，并利用精确的卫星产品来提高精度。所有上述工作都依赖于GNSS解算结果来进行估计，因此一旦GNSS解算结果严重受损或不可用，且跟踪到的卫星数量少于4颗的情况下，系统将会失效。

In the line of tightly-coupled GNSS-visual approaches, [14] tightly fuses GNSS code pseudorange data and visual measurements from a sky-pointing camera in the EKF manner. The image from the upward-facing camera is segmented as the sky and non-sky areas. The non-sky areas are used for feature detection and matching. In addition, only GNSS signals coming from sky directions are used to avoid potential multipath effect. However, the upward-facing camera means that their system cannot work in an open-sky scenario and is only suitable for urban environments. In addition, the transformation between the local vehicle frame and the global frame is assumed known in their work. In [15] the authors proposed a system that tightly combine the stereo visual odometry with the GNSS code pseudorange and Doppler shift measurements using the EKF framework. Three driving tests with moderate distance were conducted to evaluate their system. However, only horizontal errors are reported in their first data sequence and the majority of their experiments are just qualitatively analysed.

在紧密耦合的GNSS-视觉方法中，文献[14]通过EKF方式紧密融合了GNSS码伪距数据和天空指向相机的视觉测量。天空指向相机的图像被分割为天空和非天空区域。非天空区域用于特征检测和匹配。此外，仅使用来自天空方向的GNSS信号，以避免潜在的multipath效应。然而，天空指向相机的使用意味着他们的系统无法在开阔天空场景中工作，仅适用于城市环境。此外，在他们的工作中，假设本地车辆框架与全局框架之间的转换是已知的。在文献[15]中，作者提出了一种系统，该系统通过EKF框架紧密地将立体视觉里程计与GNSS码伪距和多普勒频移测量相结合。进行了三次中等距离的驾驶测试以评估他们的系统。然而，在他们的第一个数据序列中仅报告了水平误差，而且他们的大多数实验只是定性地分析。

There are also some works on tightly fusing GNSS raw measurement with visual and inertial information. [16], [17] and [18] combine camera, IMU and GNSS RTK measurements under the EKF framework for localization. The RTK solution, which usually owns centimeter-level accuracy, requires a static GNSS reference station with known position as infrastructure. [19] and [20] investigate the performance of the fusion system in cluttered urban environment where less than 4 satellites are tracked. However, the transformation between local and global frames is not handled and the scale of their real-world experiments is limited. In addition, the result of the underlying VIN system in [20], as is tested in standalone mode, shows large drift over a short period of time. Recently we found a similar work [21] that tightly fuse GNSS raw measurements with visual-inertial SLAM. An RMSE error of ${14.33}\mathrm{\;m}$ is reported on the longest sequence $\left( {{5.9}\mathrm{\;{km}}}\right)$ in their evaluation,whereas the value is only ${4.51}\mathrm{\;m}$ for our system, even on a more challenging urban driving sequence with a total distance of ${22.9}\mathrm{\;{km}}$ . In GNSS-unfriendly areas where the number of GNSS measurements becomes insufficient, [21] drops all GNSS measurements which may still benefit the estimator as shown in our experiments. In addition, the indoor environments within the sequence such as tunnels cannot be handled by their system, which again limits the potential of the tightly multi-sensor fusion approach.

也有一些研究致力于将GNSS原始测量值与视觉和惯性信息紧密融合。[16]、[17]和[18]在EKF框架下结合了相机、IMU和GNSS RTK测量进行定位。RTK解决方案通常具有厘米级的精度，但需要一个静止的、位置已知的GNSS参考站作为基础设施。[19]和[20]研究了在少于4颗卫星跟踪的杂乱城市环境中融合系统的性能。然而，他们没有处理局部坐标系和全局坐标系之间的转换，并且他们在现实世界实验的规模有限。此外，文献[20]中底层VIN系统在独立模式下的测试结果显示，在短时间内存在较大的漂移。最近我们发现了一项类似的工作[21]，该工作将GNSS原始测量值与视觉-惯性SLAM紧密融合。在他们的评估中，最长序列${14.33}\mathrm{\;m}$的RMSE误差报道为$\left( {{5.9}\mathrm{\;{km}}}\right)$，而我们的系统即使在更具挑战性的城市驾驶序列中，总距离为${22.9}\mathrm{\;{km}}$时，误差值仅为${4.51}\mathrm{\;m}$。在GNSS不友好的区域，GNSS测量值数量不足，[21]会丢弃所有GNSS测量值，这可能会像我们的实验所示，对估计器仍有好处。此外，序列中的室内环境，如隧道，他们的系统无法处理，这再次限制了紧密多传感器融合方法的潜力。

Fig. 2. An illustration of the local world, ECEF and ENU frames.

图2. 局部世界、地球中心地球固定坐标系（ECEF）和东北天（ENU）坐标系的示意图。

To this end, we aim to build a robust and accurate state estimator with GNSS raw measurements, visual and inertial data tightly fused. By leveraging the global measurement from GNSS, the accumulated error from visual-inertial system will be eliminated. The transformation between the local and global frame will be estimated without any offline calibration. The system is capable to work in complex indoor and outdoor environments and achieves local smoothness and global consistency.

为此，我们旨在利用GNSS原始测量值、视觉和惯性数据紧密融合，构建一个健壮且准确的状态估计器。通过利用GNSS提供的全局测量值，视觉-惯性系统的累积误差将被消除。本地框架与全局框架之间的转换将在不需要任何离线校准的情况下进行估计。该系统能够在复杂的室内外环境中工作，并实现局部平滑性和全局一致性。

III. Notation and Definitions

III. 符号与定义

A. Frames

A. 坐标系

The spatial frames involved in our system consist of:

我们系统中涉及的空间坐标系包括：

Sensor Frame: The sensor frame is attached to the sensor and is a local frame in which sensor reports its reading. In our system,sensor frames include the camera frame ${\left( \cdot \right) }^{c}$ and the IMU frame ${\left( \cdot \right) }^{i}$ ,and we choose IMU frame as our estimation target frame and denote it as body frame ${\left( \cdot \right) }^{b}$ .
传感器坐标系：传感器坐标系附着在传感器上，是传感器报告读数的局部坐标系。在我们的系统中，传感器坐标系包括相机坐标系 ${\left( \cdot \right) }^{c}$ 和IMU坐标系 ${\left( \cdot \right) }^{i}$ ，我们选择IMU坐标系作为我们的估计目标坐标系，并将其表示为机体坐标系 ${\left( \cdot \right) }^{b}$ 。
Local World Frame: We represent the conventional frame in which visual-inertial system operates as the local world frame ${\left( \cdot \right) }^{w}$ . In VIN system,the origin of the local world frame is arbitrarily set and the $\mathrm{z}$ axis is often chosen to be gravity-aligned as illustrated in Fig. 2.
本地世界坐标系：我们将视觉-惯性系统操作的常规坐标系表示为本地世界坐标系 ${\left( \cdot \right) }^{w}$ 。在VIN系统中，本地世界坐标系的原点可以任意设置，且如图2所示，$\mathrm{z}$ 轴通常选择与重力对齐。
ECEF Frame: The Earth-Centered, Earth-Fixed (ECEF) frame ${\left( \cdot \right) }^{e}$ is a Cartesian coordinate system that is fixed with respect to Earth. As shown in Fig. 2, the origin of ECEF frame is attached to the center of mass of Earth. The x-y plane coincides with the Earth's equatorial plane with x-axis pointing to the prime meridian. The z-axis is chosen to be perpendicular to the Earth's equatorial plane in the direction of the geographical North Pole. Finally the y-axis is taken to make ECEF frame a right-handed coordinate system. In this paper we use the WGS84 realization of ECEF frame.
地心地球固定坐标系（ECEF）：地心地球固定（ECEF）坐标系 ${\left( \cdot \right) }^{e}$ 是一个相对于地球固定的笛卡尔坐标系。如图2所示，ECEF坐标系的原点附着在地球质心上。x-y平面与地球赤道平面重合，x轴指向本初子午线。z轴选择垂直于地球赤道平面，指向地理北极的方向。最后，y轴取值使得ECEF坐标系成为一个右手坐标系。在本文中，我们使用WGS84实现的ECEF坐标系。
ENU Frame: In order to connect the local world and global ECEF frames, a semi-global frame, ENU, is introduced. The $\mathrm{x},\mathrm{y},\mathrm{z}$ axis of the ENU frame ${\left( \cdot \right) }^{n}$ point to the east, north, and up direction respectively (Fig. 2). Given a point in ECEF frame, a unique ENU frame can be determined with its origin sitting on that point. Note that the $\mathrm{z}$ axis of ENU frame is also gravity aligned.
ENU坐标系：为了连接本地世界和全球地心地心坐标系（ECEF），引入了一个半全局坐标系，即ENU。ENU坐标系的$\mathrm{x},\mathrm{y},\mathrm{z}$轴${\left( \cdot \right) }^{n}$分别指向东、北和上方向（图2）。给定一个在ECEF坐标系中的点，可以确定一个唯一的ENU坐标系，其原点位于该点上。注意ENU坐标系的$\mathrm{z}$轴也是与重力对齐的。
ECI Frame: The Earth-Centered Inertial (ECI) frame is an inertial coordinate system with the center of mass of the Earth as its origin. The three axes of the ECI frame ${\left( \cdot \right) }^{E}$ are taken to point in fixed directions with respect to the stars, i.e., do not rotate with the Earth. The GNSS signal travels in straight line in the ECI frame, which can greatly simplify the formulation. In this paper the ECI frame is formed by freezing the ECEF frame at the time of reception of the GNSS signal.
ECI坐标系：地球中心惯性（ECI）坐标系是一个惯性坐标系，以地球质心为原点。ECI坐标系的三个轴${\left( \cdot \right) }^{E}$指向相对于恒星固定的方向，即不随地球旋转。GNSS信号在ECI坐标系中沿直线传播，这可以大大简化公式。在本文中，ECI坐标系是通过在接收到GNSS信号的时刻冻结ECEF坐标系形成的。

In terms of temporal frames, GNSS data is tagged in GNSS time system (for example, GPS time), while visual and inertial measurements are marked in the local time system. We assume that these two time systems are aligned beforehand and do not distinguish them accordingly.

在时间坐标系方面，GNSS数据以GNSS时间系统（例如，GPS时间）标记，而视觉和惯性测量以本地时间系统标记。我们假设这两个时间系统事先已经对齐，因此不加以区分。

B. Notation

B. 符号

In this paper we use ${\mathbf{R}}_{a}^{z}$ and ${\mathbf{p}}_{a}^{z}$ to denote the rotational and translational part of the transformation from frame $a$ to frame $z$ . For rotational part,the corresponding Hamilton quaternion ${\mathbf{q}}_{a}^{z}$ is also used,with $\otimes$ representing its multiplication operation. We use subscript to refer a moving frame at a specific time instance. For example, ${\mathbf{R}}_{{a}_{t}}^{z}$ stands for the rotation from the moving frame $a$ at time $t$ to the fixed frame $z$ .

在本文中，我们使用${\mathbf{R}}_{a}^{z}$和${\mathbf{p}}_{a}^{z}$表示从坐标系$a$到坐标系$z$的变换的旋转部分和平移部分。对于旋转部分，还使用相应的Hamilton四元数${\mathbf{q}}_{a}^{z}$，其中$\otimes$表示其乘法运算。我们使用下标来指代特定时间实例的移动坐标系。例如，${\mathbf{R}}_{{a}_{t}}^{z}$表示从时间$t$的移动坐标系$a$到固定坐标系$z$的旋转。

For constant quantities,we use ${\mathbf{g}}^{w}$ to represent the gravity vector in the local world frame. $c$ is the speed of light in vacuum and ${\omega }_{E}$ stands for the angular velocity of the Earth.

对于常量，我们使用 ${\mathbf{g}}^{w}$ 来表示局部世界坐标系中的重力向量。 $c$ 是真空中的光速，${\omega }_{E}$ 代表地球的角速度。

C. States

C. 状态

The system states to be estimated include:

需要估计的系统状态包括：

the position ${\mathbf{p}}_{b}^{w}$ and orientation ${\mathbf{q}}_{b}^{w}$ of the body frame with respect to the local world frame,
躯体框架相对于局部世界坐标系的位置 ${\mathbf{p}}_{b}^{w}$ 和方向 ${\mathbf{q}}_{b}^{w}$，
the velocity ${\mathbf{v}}_{b}^{w}$ ,accelerometer bias ${\mathbf{b}}_{a}$ and gyroscope bias ${\mathbf{b}}_{w}$ ,
速度 ${\mathbf{v}}_{b}^{w}$、加速度计偏差 ${\mathbf{b}}_{a}$ 和陀螺仪偏差 ${\mathbf{b}}_{w}$，
the inverse depth $\rho$ for each feature,
每个特征的逆深度 $\rho$，
the yaw offset $\psi$ between the local world frame and ENU frame,receiver clock bias $\delta \mathbf{t}$ and receiver clock drifting rate $\delta \dot{t}$ . Because our system support all four constellations, the clock biases for GPS, GLONASS,
局部世界坐标系与ENU坐标系之间的偏航偏移 $\psi$、接收器时钟偏差 $\delta \mathbf{t}$ 和接收器时钟漂移率 $\delta \dot{t}$。因为我们的系统支持所有四个星座，GPS、GLONASS、

Galileo and BeiDou are estimated separately. Note that the receiver clock drifting rate for each constellation is the same.

Galileo 和 BeiDou 的时钟偏差是分别估计的。注意，每个星座的接收器时钟漂移率是相同的。

Our system adopt a sliding window optimization manner and states $\mathcal{X}$ inside the window can be summarized as

我们的系统采用滑动窗口优化方式，窗口内的状态 $\mathcal{X}$ 可以概括为

\[\mathcal{X} = \left\lbrack {{\mathbf{x}}_{0},{\mathbf{x}}_{1},\cdots {\mathbf{x}}_{n},{\rho }_{0},{\rho }_{1},\cdots {\rho }_{m},\psi }\right\rbrack \tag{1a} \]

\[{\mathbf{x}}_{k} = \left\lbrack {{\mathbf{p}}_{{b}_{{t}_{k}}}^{w},{\mathbf{v}}_{{b}_{{t}_{k}}}^{w},{\mathbf{q}}_{{b}_{{t}_{k}}}^{w},{\mathbf{b}}_{a},{\mathbf{b}}_{w},\delta \mathbf{t},\dot{\delta t}}\right\rbrack ,k \in \left\lbrack {0,n}\right\rbrack \tag{1b} \]

\[\delta \mathbf{t} = \left\lbrack {\delta {t}_{G},\delta {t}_{R},\delta {t}_{E},\delta {t}_{C}}\right\rbrack \tag{1c} \]

where $n$ is the window size and $m$ is the number of feature points in the window. The four components in $\delta \mathbf{t}$ correspond to receiver's clock biases with respect to the time of GPS, GLONASS, Galileo and BeiDou respectively.

其中 $n$ 是窗口大小，$m$ 是窗口中的特征点数量。$\delta \mathbf{t}$ 中的四个分量分别对应于接收器相对于GPS、GLONASS、Galileo 和 BeiDou 的时间偏差。

IV. GNSS FUNDAMENTALS

IV. GNSS 基础

Since our system requires GNSS raw measurement processing, background knowledge about GNSS is necessary. In this section, we first give an overview about GNSS. Then two types of raw measurements, namely code pseudorange and Doppler shift, are introduced and modelled. Finally the principle of Single Point Positioning (SPP) algorithm for global localization is described in the end of this section.

由于我们的系统需要GNSS原始测量数据处理，因此有必要了解GNSS的背景知识。在本节中，我们首先概述了GNSS。然后介绍了两种类型的原始测量，即码伪距和多普勒频移，并进行了建模。最后在本节的结尾描述了用于全局定位的单点定位（SPP）算法的原理。

A. GNSS Overview

A. GNSS 概述

Global Navigation Satellite System (GNSS), as its name suggests, is a satellite-based system which is capable to provide global localization service. Currently there are four independent and fully operational systems, namely GPS, GLONASS, Galileo and BeiDou. The navigation satellite continuously transmits radio signal from which the receiver can uniquely identify the satellite and retrieve the navigation message. Taking the GPS L1C signal as an example, the final transmitted signal is composed of three layers, as illustrated in Fig. 3. The navigation message contains parameters of the orbit, corrections of the clock error, coefficients of ionospheric delay and other information related to satellite's status. The orbit parameters, also know as ephemeris, contains 14 variables and is used to calculate the satellite's ECEF coordinate at a particular time. The satellite's clock error is modelled as a second-order polynomial, i.e., with 3 parameters. Each satellite is assigned a unique Pseudo Random Noise (PRN) code that repeats every 1 millisecond. The 50 bit/s navigation message is first exclusive-ored with the PRN code and then used to modulate the high frequency carrier signal. After receiving the signal, the receiver obtains the Doppler shift (Section. IV-C) by measuring the frequency difference between the received one and designed one. The code pseudorange measurement (Section. IV-B) is inferred from the PRN code shift which indicts the propagation time. Finally the navigation message is uncovered by a reverse demodulation process.

全球导航卫星系统（GNSS），顾名思义，是一种基于卫星的系统，能够提供全球定位服务。目前有四个独立且完全运行的系统，分别是GPS、GLONASS、Galileo和北斗。导航卫星持续地从 which the receiver can uniquely identify the satellite and retrieve the navigation message. 以GPS L1C信号为例，最终传输的信号由三层组成，如图3所示。导航信息包含轨道参数、时钟误差修正、电离层延迟系数以及与卫星状态相关的其他信息。轨道参数，也称为星历，包含14个变量，用于计算特定时间卫星的地球中心坐标系（ECEF）坐标。卫星的时钟误差被建模为二阶多项式，即有3个参数。每颗卫星都被分配了一个唯一的伪随机噪声（PRN）码，每1毫秒重复一次。首先将50比特/秒的导航信息与PRN码进行异或运算，然后用于调制高频载波信号。接收信号后，接收机通过测量接收信号与设计信号的频率差来获取多普勒频移（第IV-C节）。代码伪距测量（第IV-B节）是从PRN码的偏移中推断出来的，这表明了传播时间。最后，导航信息通过反向解调过程揭示出来。

Fig. 3. The hierarchical structure of the GPS L1C signal. The navigation message first mixes with the satellite-specific PRN code, and then the resulting sequence is used to modulate the high frequency carrier signal. The final signal is transmitted by the satellite and captured by the receiver, which applies a reverse process to obtain the measurement and retrieve the message.

图3. GPS L1C信号的层次结构。导航信息首先与卫星特定的PRN码混合，然后使用得到的序列调制高频载波信号。最终信号由卫星传输并被接收机捕获，接收机应用反向过程来获取测量值并检索信息。

B. Code Pseudorange Measurement

B. 代码伪距测量

Upon the reception of the signal, the Time of Flight (ToF) of the signal is measured from the PRN code shift. By multiplying with the speed of light, the receiver obtains code pseudorange measurement. The code pseudorange is prefixed with "pseudo" because it not only contains the geometric distance between the satellite and the receiver, but also includes various errors during the signal generation, propagation and processing.

接收到信号后，从PRN码移位测量信号的时间飞行（ToF）。通过乘以光速，接收器获得码伪距测量值。码伪距前面加上“伪”，因为它不仅包含卫星与接收器之间的几何距离，还包括信号生成、传播和处理过程中的各种误差。

The error source on the satellite side mainly consists of satellite orbit and clock error. The orbit error comes from the influence of other celestial objects which are not precisely modelled by the ephemeris, and the clock error is the result of imperfect satellite onboard atomic clock with respect to the standard system time. The orbit and clock errors are monitored and constantly corrected by the system control segment. During the signal propagation from satellite to receiver, it goes through the ionosphere and troposphere, where the speed of the electromagnetic signal is no longer as same as that in vacuum and the signal gets delayed according to the atmosphere components and propagation path. The phenomenon that signal reaches the receiver with different ways, known as multipath effect, may occur and add extra delay especially for low elevation satellites. When the signal arrivals, the ToF is calculated by comparing the signal transmission time, which is marked by the satellite's atomic clock, with the receiver's less accurate local clock time. Thus the range information is also offset by the receiver clock error with respect to the GNSS system time. In conclusion, the code pseudorange measurement can be modelled as

卫星端的误差源主要是由卫星轨道和时钟误差组成。轨道误差来源于其他天体的影响，这些影响没有被星历精确建模，而时钟误差是相对于标准系统时间，卫星上原子钟不完美造成的。轨道和时钟误差由系统控制段监控并不断校正。在信号从卫星传播到接收器的过程中，它穿过电离层和对流层，在这两个层级中，电磁信号的速度不再与真空中的速度相同，信号会根据大气成分和传播路径延迟。信号以不同方式到达接收器，这种现象称为多径效应，可能会增加额外的延迟，尤其对于低仰角的卫星。当信号到达时，通过比较卫星原子钟标记的信号传输时间和接收器不太准确的本地时钟时间，计算ToF。因此，距离信息也因接收器时钟相对于GNSS系统时间的误差而偏移。总之，码伪距测量可以建模为

\[{\widetilde{P}}_{r}^{s} = \begin{Vmatrix}{{\mathbf{p}}_{s}^{E} - {\mathbf{p}}_{r}^{E}}\end{Vmatrix} + c\left( {{\mathbf{\zeta }}_{s}^{T}\delta \mathbf{t} - \Delta {t}^{s}}\right) \tag{2} \]

\[+ {T}_{r}^{s} + {I}_{r}^{s} + {M}_{r}^{s} + {\epsilon }_{r}^{s} \]

where ${\mathbf{p}}_{s}^{E}$ and ${\mathbf{p}}_{r}^{E}$ is the ECI coordinate of the satellite $s$ and receiver $r$ ,respectively. ${\mathbf{\zeta }}_{s}$ is designed to be a $4 \times 1$ indictor vector with the corresponding satellite constellation entity being 1 and other three entities being $0.\Delta {t}^{s}$ is the satellite clock error, which can be calculated from the broadcast navigation message. ${T}_{r}^{s}$ and ${I}_{r}^{s}$ stand for the tropospheric and the ionospheric delay respectively. We use ${M}_{r}^{s}$ to denote the delay caused by multipath effect and ${\epsilon }_{r}^{s}$ for the measurement noise. Here the delay terms ${T}_{r}^{s},{I}_{r}^{s}$ and ${M}_{r}^{s}$ are expressed in unit of length,i.e.,multiplied by $c$ .

其中 ${\mathbf{p}}_{s}^{E}$ 和 ${\mathbf{p}}_{r}^{E}$ 分别是卫星 $s$ 和接收器 $r$ 的 ECI 坐标。${\mathbf{\zeta }}_{s}$ 被设计为一个 $4 \times 1$ 指示向量，相应的卫星星座实体为 1，其他三个实体为 $0.\Delta {t}^{s}$ 是卫星钟误差，可以从广播导航消息中计算得出。${T}_{r}^{s}$ 和 ${I}_{r}^{s}$ 分别代表对流层和电离层的延迟。我们使用 ${M}_{r}^{s}$ 来表示多径效应引起的延迟，${\epsilon }_{r}^{s}$ 表示测量噪声。这里的延迟项 ${T}_{r}^{s},{I}_{r}^{s}$ 和 ${M}_{r}^{s}$ 以长度单位表示，即乘以 $c$。

C. Doppler Measurement

C. 多普勒测量

The Doppler frequency shift is measured from the difference between the received carrier signal and the designed one, and it reflects the receiver-satellite relative motion along the signal propagation path. Due to the characteristic of the GNSS signal structure, the accuracy of the Doppler measurement is usually an order of magnitude higher than that of code pseudorange. The Doppler shift is modelled as:

多普勒频率偏移是通过接收到的载波信号与设计信号的差异来测量的，它反映了接收器-卫星在信号传播路径上的相对运动。由于 GNSS 信号结构的特性，多普勒测量的精度通常比码伪距的精度高一个数量级。多普勒偏移建模为：

\[\Delta {\widetilde{f}}_{r}^{s} = - \frac{1}{\lambda }\left\lbrack {{\mathbf{\kappa }}_{r}^{sT}\left( {{\mathbf{v}}_{s}^{E} - {\mathbf{v}}_{r}^{E}}\right) + c\left( {\dot{\delta t} - \dot{\Delta {t}^{s}}}\right) }\right\rbrack + {\eta }_{r}^{s}, \tag{3} \]

where ${\mathbf{v}}_{r}^{E}$ and ${\mathbf{v}}_{s}^{E}$ represent the receiver’s and satellite’s velocity in ECI frame respectively. We use $\lambda$ to denote the wavelength of the carrier signal,and ${\mathbf{\kappa }}_{r}^{s}$ for the unit vector from receiver to satellite in ECI frame. $\Delta {t}^{s}$ is the drift rate of the satellite clock error which is reported in the navigation message,and finally ${\eta }_{r}^{s}$ represents the Doppler measurement noise.

其中 ${\mathbf{v}}_{r}^{E}$ 和 ${\mathbf{v}}_{s}^{E}$ 分别代表接收器和卫星在 ECI 帧中的速度。$\lambda$ 表示载波信号的波长，${\mathbf{\kappa }}_{r}^{s}$ 表示从接收器到卫星在 ECI 帧中的单位向量。$\Delta {t}^{s}$ 是卫星钟误差的漂移率，该值在导航消息中报告，最后 ${\eta }_{r}^{s}$ 表示多普勒测量噪声。

D. SPP Algorithm

D. 单点定位算法

The Single Point Positioning (SPP) algorithm utilizes code pseudorange measurements to determine the 3-DOF global position of the GNSS receiver via trilateration. Thus in theory the coordinate of the receiver can be obtained by the aid of 3 different satellites. However, as mentioned in Section IV-B, code pseudorange measurement is offset by the receiver clock bias. Because the receiver clock bias can cause an error of hundreds of kilometers, it must be estimated along with the location in order to get a reasonable result. To this end, at least 4 code pseudorange measurements are required to fully constrain the 3-DOF global position and receiver clock bias. Because different navigation systems use different time references, there exists clock offset between different systems. Additional measurements are necessary in order to estimate the inter-system clock offset if the satellites are from multiple constellations. To summarize,at least $\left( {N + 3}\right)$ satellites are required to be simultaneously tracked in order to obtain the uniquely localize the receiver,where $N$ is the number of constellations among the tracked satellites.

单点定位（SPP）算法利用码伪距测量来确定GNSS接收机的3自由度全球位置，通过三角测量法实现。因此在理论上，通过3颗不同的卫星的辅助就可以获得接收机的坐标。然而，如第IV-B节所述，码伪距测量受到接收机时钟偏差的影响。由于接收机时钟偏差可能会引起数百公里的误差，因此必须与位置一起估计，才能得到合理的结果。为此，至少需要4个码伪距测量值来完全约束3自由度全球位置和接收机时钟偏差。由于不同的导航系统使用不同的时间参考，不同系统之间存在时钟偏移。如果卫星来自多个星座，则需要额外的测量值来估计系统间时钟偏移。总之，至少需要同时跟踪 $\left( {N + 3}\right)$ 颗卫星才能唯一确定接收机的位置，其中 $N$ 是被跟踪卫星中的星座数量。

After collecting enough measurements, constraints from Eq. 2 are stacked together to form a series of equations with ${\mathbf{p}}_{r}^{E}$ and $\delta \mathbf{t}$ unknown. Corrections are applied to code pseudorange measurement making it only a function of ${\mathbf{p}}_{r}^{E}$ and $\delta \mathbf{t}$ . In our system the tropospheric delay ${T}_{r}^{s}$ is estimated by Saastamoinen model [22],and ionospheric delay ${I}_{r}^{s}$ is computed using Klobuchar model [23] and parameters in the ephemeris. By excluding the low elevation satellites, we ignore the delay ${M}_{r}^{s}$ caused by multipath effect. In practice,more than $\left( {N + 3}\right)$ measurements will be used and the solution is obtained by minimizing the sum of the squared residuals. As is shown in [24], the noise of the SPP solution not only depends on the measurement noise but also has a relationship with the geometric distribution of satellites. A simplified 2D case in Fig. 4 shows the effect of satellites distribution on the noise characteristic of the final solution. Thus the performance of SPP algorithm will be better with evenly distributed satellites, even with the measurement noise unchanged.

在收集到足够的测量数据后，来自等式2的约束被堆叠在一起，形成了一系列包含 ${\mathbf{p}}_{r}^{E}$ 和 $\delta \mathbf{t}$ 未知数的方程。对码伪距测量应用了校正，使其仅成为 ${\mathbf{p}}_{r}^{E}$ 和 $\delta \mathbf{t}$ 的函数。在我们的系统中，对流层延迟 ${T}_{r}^{s}$ 是通过Saastamoinen模型 [22] 估算的，电离层延迟 ${I}_{r}^{s}$ 是通过Klobuchar模型 [23] 以及星历中的参数计算得出的。通过排除低仰角卫星，我们忽略了由多径效应引起的延迟 ${M}_{r}^{s}$。实际上，会使用超过 $\left( {N + 3}\right)$ 的测量数据，并通过最小化残差平方和来获得解。如 [24] 中所示，SPP解的噪声不仅取决于测量噪声，还与卫星的几何分布有关。图4中的简化2D案例展示了卫星分布对最终解噪声特性的影响。因此，即使测量噪声不变，SPP算法的性能也会随着卫星均匀分布而提高。

Fig. 4. A simplified 2D illustration of how satellites distribution affects the uncertainty of SPP solution. Here we assume the time between the receiver and satellite are synchronized thus two satellites are enough for localization. The dash line represents ground truth range while the area in between the two solid lines denotes the possible noisy measurement. The uncertainties of SPP solutions are represented by the shadows.

图4。卫星分布如何影响SPP解不确定性的简化2D示意图。这里我们假设接收器和卫星之间时间是同步的，因此两个卫星就足够进行定位。虚线表示真实的地距，而两条实线之间的区域表示可能的噪声测量。SPP解的不确定性由阴影表示。

V. System Overview

V. 系统概述

The structure of our proposed system is illustrated in Fig. 5. The estimator takes raw GNSS, IMU and camera measurements as input, and applies necessary preprocessing on each type of measurement afterwards. As in [6], the IMU measurements are pre-integrated and sparse feature points are detected and tracked from the image sequence. For GNSS raw data, we first filter out low-elevation and unhealthy satellites which are prone to errors. In order to reject unstable satellite signal, only satellites which are continuously locked for a certain amount of epochs are allowed to enter the system. Because the ephemeris data is acquired via the slow satellite-receiver wireless link (50 bit/s on GPS L1C), a GNSS measurement is unusable until its corresponding ephemeris is fully transmitted. After the preprocessing phase, all measurements are ready for the estimator. Before performing optimization, an initialization phase is necessary to properly initialize the system states of the non-linear estimator.

我们提出系统的结构如图5所示。估计器接收原始的GNSS、IMU和相机测量数据作为输入，并在之后对每种类型的测量数据进行必要的预处理。与文献[6]中一样，IMU的测量数据被预积分，并且从图像序列中检测和跟踪稀疏特征点。对于GNSS原始数据，我们首先过滤掉低仰角和不健康的卫星，因为它们容易产生误差。为了拒绝不稳定的卫星信号，只有连续锁定一定数量的历元的卫星才被允许进入系统。由于星历数据是通过慢速的卫星-接收器无线链路（GPS L1C上的50 bit/s）获取的，直到相应的星历数据完全传输后，GNSS测量数据才可用。预处理阶段完成后，所有测量数据都准备好供估计器使用。在执行优化之前，一个初始化阶段是必要的，以正确初始化非线性估计器的系统状态。

The initialization starts with a vision-only Structure from Motion (SfM), from which an up-to-similarity motion and structure are jointly estimated, then the trajectory from IMU is aligned to the SfM result in order to recover the scale, velocity, gravity and IMU bias. After VI initialization is finished, a coarse-to-fine GNSS initialization process is conducted. At first a coarse anchor localization result is obtained by the SPP algorithm, then the local and global frames are associated in the yaw alignment stage using the local velocity from VI initialization and GNSS Doppler measurement. Finally the initialization phase ends with the anchor refinement, which utilizes accurate local trajectory and imposes clock constraints to further refine the anchor's global position.

初始化从仅视觉的Structure from Motion（SfM）开始，从中联合估计出相似性运动和结构，然后将IMU的轨迹与SfM结果对齐，以恢复尺度、速度、重力和IMU偏差。VI初始化完成后，进行从粗到细的GNSS初始化过程。首先通过SPP算法获得一个粗略的锚点定位结果，然后在偏航对齐阶段，使用VI初始化得到的局部速度和GNSS多普勒测量将局部框架和全局框架关联起来。最后，初始化阶段以锚点细化结束，该过程利用精确的局部轨迹并施加时钟约束，以进一步细化锚点的全局位置。

After the initialization phase, the GNSS degeneration cases are checked and carefully handled to ensure robust performance. Then constraints from all measurements are formulated to jointly estimate system states within the sliding window under the non-linear optimization framework. Note that our system is naturally degraded to a VIO if GNSS is not available or cannot be properly initialized. To ensure the real-time performance and handle visual-inertial degenerate motions, the two-way marginalization strategy [25], which selects the frame to remove based on a parallax test, is also applied after each optimization.

初始化阶段完成后，检查并仔细处理 GNSS 退化情况，以确保鲁棒性能。然后，在非线性优化框架下，将所有测量值的约束制定为在滑动窗口内联合估计系统状态。请注意，如果 GNSS 不可用或无法正确初始化，我们的系统将自然退化为视觉-惯性里程计（VIO）。为确保实时性能并处理视觉-惯性退化运动，在每次优化后还应用了双向边缘化策略 [25]，该策略基于视差测试选择要移除的帧。

VI. Probabilistic Formulation

VI. 概率形式化

In this section, we first formulate and derive our state estimation problem under the probabilistic framework. As shown later, the whole problem is organized as a factor graph and measurements from sensors form a series of factors which in turn constrain the system states. Each type of factor in the probabilistic graph will be discussed in detail through this section. Note that the formulation of visual and inertial factors are inherited from [6] [26] [27] thus not the contribution of this work. The relevant content is listed only for the completeness of this literature.

在本节中，我们首先在概率框架下制定并推导出我们的状态估计问题。如后续所示，整个问题被组织为一个因子图，传感器测量值形成一系列因子，进而约束系统状态。在本节中，将详细讨论概率图中的每种因子。请注意，视觉和惯性因子的制定继承自 [6] [26] [27]，因此不是本工作的贡献。相关内容仅为了文献完整性而列出。

A. MAP Estimation

A. 最大后验概率估计（MAP）

We define the optimum system state as the one that maximizes a posterior (MAP) given all the measurements. Assuming that all measurements are independent to each other and the noise with each measurement is zero-mean Gaussian distributed, the MAP problem can be further transformed to the one that minimize the sum of a series of costs, with each cost corresponding to one specific measurement.

我们定义最优系统状态为在所有测量值给定下最大化后验概率（MAP）的状态。假设所有测量值相互独立，且每个测量的噪声为零均值高斯分布，那么 MAP 问题可以进一步转化为最小化一系列成本之和的问题，每个成本对应于一个特定的测量值。

\[{\mathcal{X}}^{ \star } = \underset{\mathcal{X}}{\arg \max }p\left( {\mathcal{X} \mid \mathbf{z}}\right) \]

\[= \underset{\mathcal{X}}{\arg \max }p\left( \mathcal{X}\right) p\left( {\mathbf{z} \mid \mathcal{X}}\right) \]

\[= \underset{\mathcal{X}}{\arg \max }p\left( \mathcal{X}\right) \mathop{\prod }\limits_{{i = 1}}^{n}p\left( {{\mathbf{z}}_{i} \mid \mathcal{X}}\right) \tag{4} \]

\[= \underset{\mathcal{X}}{\arg \min }\left\{ {{\begin{Vmatrix}{\mathbf{r}}_{p} - {\mathbf{H}}_{p}\mathcal{X}\end{Vmatrix}}^{2} + \mathop{\sum }\limits_{{i = 1}}^{n}{\begin{Vmatrix}\mathbf{r}\left( {\mathbf{z}}_{i},\mathcal{X}\right) \end{Vmatrix}}_{{\mathbf{P}}_{i}}^{2}}\right\} , \]

where $\mathbf{z}$ stands for the aggregation of $\mathrm{n}$ independent sensor measurements and $\left\{ {{\mathbf{r}}_{p},{\mathbf{H}}_{p}}\right\}$ encapsulates the prior information of the system state. $\mathbf{r}\left( \cdot \right)$ denotes the residual function of each measurement and $\parallel \cdot {\parallel }_{\mathbf{P}}$ is the Mahalanobis norm.

其中 $\mathbf{z}$ 表示 $\mathrm{n}$ 独立传感器测量的聚合，$\left\{ {{\mathbf{r}}_{p},{\mathbf{H}}_{p}}\right\}$ 包含系统状态的先验信息。$\mathbf{r}\left( \cdot \right)$ 表示每个测量的残差函数，$\parallel \cdot {\parallel }_{\mathbf{P}}$ 是马哈拉诺比斯范数。

Note that such formulation naturally fits with the factor graph representation [28], thus we decompose our optimization problem as individual factors that relate states and measurements. Fig. 6 shows the factor graph of our system. Besides factors derived from measurements, a prior factor is used to constrain the four unobservable directions of the initial pose of the local world frame, and later it will become a densely connected prior as we marginalize old frames. In the following we will discuss each factor in details.

请注意，此类公式自然符合因子图表示法 [28]，因此我们将优化问题分解为与状态和测量相关的独立因子。图6展示了我们系统的因子图。除了来自测量的因子外，还使用先验因子来约束局部世界初始姿态的四个不可观测方向，随后在边际化旧帧时，它将变成一个密集连接的先验。接下来，我们将详细讨论每个因子。

B. Inertial Factor

B. 惯性因子

The measurements involved in the inertial factor consist of the biased, noisy linear acceleration and angular velocity of the platform. As the accelerometer operates near the Earth's surface, the linear acceleration measurement also contains the gravity component. The Coriolis and centrifugal forces due to Earth's rotation are ignored in the IMU's formulation considering the noisy measurement of the low-cost IMU. Thus the inertial measurement can be modelled as

惯性因子涉及的测量包括平台的带偏差、噪声的线性加速度和角速度。由于加速度计在地球表面附近工作，线性加速度测量还包含重力分量。考虑到低成本IMU的噪声测量，IMU的公式中忽略了由于地球自转产生的科里奥利力和离心力。因此，惯性测量可以建模为

\[{\widetilde{\mathbf{a}}}_{t} = {\mathbf{a}}_{t} + {\mathbf{b}}_{{a}_{t}} + {\mathbf{R}}_{w}^{{b}_{t}}{\mathbf{g}}^{w} + {\mathbf{n}}_{a} \tag{5a} \]

\[{\widetilde{\mathbf{\omega }}}_{t} = {\mathbf{\omega }}_{t} + {\mathbf{b}}_{{w}_{t}} + {\mathbf{n}}_{w}, \tag{5b} \]

Fig. 5. The diagram above shows the workflow of our proposed system. At first measurements from all sensors are preprocessed before going into follow-up procedures. In the initialization stage, visual-inertial initialization is accomplished by aligning the inertial information with the result of vision-only SfM. If visual-inertial successfully gets aligned, a coarse-to-fine process is performed in order to initialize the GNSS states. The system monitors and handles GNSS degeneration cases once GNSS states get involved. Finally constraints from all measurements within the sliding window are optimized by the non-linear optimization. Note that if GNSS states cannot get initialized, our system can still work in visual-inertial mode. The marginalization strategy is also adopted to ensure real-time estimation.

图5. 上面的图表展示了我们提出系统的流程。首先，所有传感器的测量数据在进入后续步骤之前都会进行预处理。在初始化阶段，通过将惯性信息与仅视觉SfM的结果对齐，完成视觉-惯性初始化。如果视觉-惯性成功对齐，则执行从粗到细的过程，以初始化GNSS状态。一旦GNSS状态参与，系统就会监控和处理GNSS退化情况。最后，通过非线性优化对滑动窗口内所有测量的约束进行优化。请注意，如果GNSS状态无法初始化，我们的系统仍然可以在视觉-惯性模式下工作。还采用了边际化策略以确保实时估计。

where $\left\{ {{\widetilde{\mathbf{a}}}_{t},{\widetilde{\mathbf{\omega }}}_{t}}\right\}$ is the output of the IMU at time $t$ ,and $\left\{ {{\mathbf{a}}_{t},{\mathbf{\omega }}_{t}}\right\}$ stands for the linear acceleration and angular velocity of the platform in IMU sensor frame. The additive noise ${\mathbf{n}}_{a}$ and ${\mathbf{n}}_{w}$ are assumed to be zero-mean Gaussian distributed, e.g., ${\mathbf{n}}_{a} \sim \mathcal{N}\left( {\mathbf{0},{\mathbf{\sum }}_{a}}\right) ,{\mathbf{n}}_{w} \sim \mathcal{N}\left( {\mathbf{0},{\mathbf{\sum }}_{w}}\right)$ . The slowly varying biases associated with the accelerometer and gyroscope are modelled as a random walk as follows:

其中 $\left\{ {{\widetilde{\mathbf{a}}}_{t},{\widetilde{\mathbf{\omega }}}_{t}}\right\}$ 是 IMU 在时间 $t$ 的输出，$\left\{ {{\mathbf{a}}_{t},{\mathbf{\omega }}_{t}}\right\}$ 表示平台在 IMU 传感器坐标系中的线性加速度和角速度。假设加性噪声 ${\mathbf{n}}_{a}$ 和 ${\mathbf{n}}_{w}$ 为零均值高斯分布，例如 ${\mathbf{n}}_{a} \sim \mathcal{N}\left( {\mathbf{0},{\mathbf{\sum }}_{a}}\right) ,{\mathbf{n}}_{w} \sim \mathcal{N}\left( {\mathbf{0},{\mathbf{\sum }}_{w}}\right)$ 。与加速度计和陀螺仪相关的缓慢变化的偏差被建模为随机游走，如下所示：

\[{\dot{\mathbf{b}}}_{{a}_{t}} = {\mathbf{n}}_{{b}_{a}},\;{\dot{\mathbf{b}}}_{{w}_{t}} = {\mathbf{n}}_{{b}_{w}}, \tag{6} \]

with ${\mathbf{n}}_{{b}_{a}} \sim \mathcal{N}\left( {\mathbf{0},{\mathbf{\sum }}_{{b}_{a}}}\right) ,{\mathbf{n}}_{{b}_{w}} \sim \mathcal{N}\left( {\mathbf{0},{\mathbf{\sum }}_{{b}_{w}}}\right)$ .

其中 ${\mathbf{n}}_{{b}_{a}} \sim \mathcal{N}\left( {\mathbf{0},{\mathbf{\sum }}_{{b}_{a}}}\right) ,{\mathbf{n}}_{{b}_{w}} \sim \mathcal{N}\left( {\mathbf{0},{\mathbf{\sum }}_{{b}_{w}}}\right)$ 。

In practice, the frequency of IMU is often an order of magnitude higher than that of camera, thus it is computationally intractable to estimate each state of the IMU measurements. To this end, IMU pre-integration approach [26] is adopted to aggregate multiple measurements into a single one. For inertial measurements within the time interval $\left\lbrack {{t}_{k},{t}_{k + 1}}\right\rbrack$ ,the derived measurements are computed as

实际上，IMU 的频率通常比相机的频率高一个数量级，因此计算上不可行对 IMU 测量的每个状态进行估计。为此，采用了 IMU 预积分方法 [26] 来将多个测量值聚合成一个。对于时间间隔 $\left\lbrack {{t}_{k},{t}_{k + 1}}\right\rbrack$ 内的惯性测量，派生的测量值计算为

\[{\mathbf{\alpha }}_{{b}_{{t}_{k + 1}}}^{{b}_{{t}_{k}}} = {\iint }_{t \in \left\lbrack {{t}_{k},{t}_{k + 1}}\right\rbrack }{\mathbf{R}}_{{b}_{t}}^{{b}_{{t}_{k}}}\left( {{\widetilde{\mathbf{a}}}_{t} - {\mathbf{b}}_{{a}_{t}}}\right) d{t}^{2} \tag{7a} \]

\[{\mathbf{\beta }}_{{b}_{{t}_{k + 1}}}^{{b}_{{t}_{k}}} = {\int }_{t \in \left\lbrack {{t}_{k},{t}_{k + 1}}\right\rbrack }{\mathbf{R}}_{{b}_{t}}^{{b}_{{t}_{k}}}\left( {{\widetilde{\mathbf{a}}}_{t} - {\mathbf{b}}_{{a}_{t}}}\right) {dt} \tag{7b} \]

\[{\mathbf{\gamma }}_{{b}_{{t}_{k + 1}}}^{{b}_{{t}_{k}}} = {\int }_{t \in \left\lbrack {{t}_{k},{t}_{k + 1}}\right\rbrack }\frac{1}{2}\mathbf{\Omega }\left( {{\widetilde{\mathbf{\omega }}}_{t} - {\mathbf{b}}_{{w}_{t}}}\right) {\mathbf{\gamma }}_{{b}_{t}}^{{b}_{{t}_{k}}}{dt} \tag{7c} \]

with

其中

\[\mathbf{\Omega }\left( \mathbf{\omega }\right) = \left\lbrack \begin{matrix} - \lfloor \mathbf{\omega }{\rfloor }_{ \times } & \mathbf{\omega } \\ - {\mathbf{\omega }}^{T} & 0 \end{matrix}\right\rbrack ,\lfloor \mathbf{\omega }{\rfloor }_{ \times } = \left\lbrack \begin{matrix} 0 & - {\omega }_{z} & {\omega }_{y} \\ {\omega }_{z} & 0 & - {\omega }_{x} \\ - {\omega }_{y} & {\omega }_{x} & 0 \end{matrix}\right\rbrack . \tag{8} \]

Here ${b}_{k}$ stands for the body frame in time ${t}_{k}.\{ \mathbf{\alpha },\mathbf{\beta },\mathbf{\gamma }\}$ encapsulates the relative position, velocity and rotation information between frame ${b}_{k}$ and ${b}_{k + 1}$ ,and can be constructed without the initial position, velocity and rotation profiles given IMU biases. Finally the residual relates the system states and pre-integrated IMU measurements can be formulated as:

这里 ${b}_{k}$ 表示时间 ${t}_{k}.\{ \mathbf{\alpha },\mathbf{\beta },\mathbf{\gamma }\}$ 下的机体坐标系，${t}_{k}.\{ \mathbf{\alpha },\mathbf{\beta },\mathbf{\gamma }\}$ 包含了坐标系 ${b}_{k}$ 和 ${b}_{k + 1}$ 之间的相对位置、速度和旋转信息，并且可以在没有初始位置、速度和旋转剖面给定 IMU 偏差的情况下构建。最后，残差将系统状态和预积分的 IMU 测量关联起来，可以表示为：

\[{\mathbf{r}}_{\mathcal{B}}\left( {{\widetilde{\mathbf{z}}}_{{b}_{{t}_{k + 1}}}^{{b}_{{t}_{k}}},\mathcal{X}}\right) = \left\lbrack \begin{matrix} \delta {\mathbf{\alpha }}_{{b}_{{t}_{k + 1}}}^{{b}_{{t}_{k}}} \\ \delta {\mathbf{\beta }}_{{b}_{{t}_{k + 1}}}^{{b}_{{t}_{k}}} \\ \delta {\mathbf{\theta }}_{{b}_{{t}_{k + 1}}}^{{b}_{{t}_{k}}} \\ \delta {\mathbf{b}}_{a} \\ \delta {\mathbf{b}}_{g} \end{matrix}\right\rbrack \]

\[= \left\lbrack \begin{matrix} {\mathbf{R}}_{w}^{{b}_{{t}_{k}}}\left( {{\mathbf{p}}_{{b}_{{t}_{k + 1}}}^{w} - {\mathbf{p}}_{{b}_{{t}_{k}}}^{w} + \frac{1}{2}{\mathbf{g}}^{w}\Delta {t}_{k}^{2} - {\mathbf{v}}_{{b}_{{t}_{k}}}^{w}\Delta {t}_{k}}\right) - {\widehat{\mathbf{\alpha }}}_{{b}_{{t}_{k + 1}}}^{{b}_{{t}_{k}}} \\ {\mathbf{R}}_{w}^{{b}_{{t}_{k}}}\left( {{\mathbf{v}}_{{b}_{{t}_{k + 1}}}^{w} + {\mathbf{g}}^{w}\Delta {t}_{k} - {\mathbf{v}}_{{b}_{{t}_{k}}}^{w}}\right) - {\widehat{\mathbf{\beta }}}_{{b}_{{t}_{k + 1}}}^{{b}_{{t}_{k}}} \\ 2{\left\lbrack {\mathbf{q}}_{{b}_{{t}_{k}}}^{{w}^{-1}} \otimes {\mathbf{q}}_{{b}_{{t}_{k + 1}}}^{w} \otimes {\left( {\widehat{\mathbf{\gamma }}}_{{b}_{{t}_{k + 1}}}^{{b}_{{t}_{k}}}\right) }^{-1}\right\rbrack }_{xyz} \\ {\mathbf{b}}_{a{b}_{{t}_{k + 1}}} - {\mathbf{b}}_{a{b}_{{t}_{k}}} \\ {\mathbf{b}}_{w{b}_{{t}_{k + 1}}} - {\mathbf{b}}_{w{b}_{{t}_{k}}} \end{matrix}\right\rbrack , \tag{9} \]

where $\delta {\mathbf{\theta }}_{{b}_{{t}_{k + 1}}}^{{b}_{{t}_{k}}}$ represents the relative rotation error in 3D Euclidean space,and the operator ${\left\lbrack \cdot \right\rbrack }_{xyz}$ returns the imaginary part of a quaternion.

其中 $\delta {\mathbf{\theta }}_{{b}_{{t}_{k + 1}}}^{{b}_{{t}_{k}}}$ 表示三维欧几里得空间中的相对旋转误差，运算符 ${\left\lbrack \cdot \right\rbrack }_{xyz}$ 返回四元数的虚部。

C. Visual Factor

C. 视觉因子

The visual measurement used in our system is a bunch of sparse feature points extracted from image frames. The strong corners [29] within the image are detected as feature points and are further tracked by the iterative Lucas-Kanade method [30]. After distortion correction [31] being applied to feature points, the projection process can be modelled as:

我们系统中使用的视觉测量是从图像帧中提取的一组稀疏特征点。图像中的强角点 [29] 被检测为特征点，并通过迭代 Lucas-Kanade 方法 [30] 进一步跟踪。在特征点应用了畸变校正 [31] 之后，投影过程可以建模为：

\[\widetilde{\mathcal{P}} = {\pi }_{c}\left( {{\mathbf{R}}_{b}^{c}\left( {{\mathbf{R}}_{w}^{b}{\mathbf{x}}^{w} + {\mathbf{p}}_{w}^{b}}\right) + {\mathbf{p}}_{b}^{c}}\right) + {\mathbf{n}}_{c}, \tag{10} \]

where $\widetilde{\mathcal{P}} = {\left\lbrack u,v\right\rbrack }^{T}$ is the feature coordinate in image plane, and ${\mathbf{x}}^{w}$ is its corresponding 3D landmark position in local world frame. ${\pi }_{c}\left( \cdot \right)$ represents the camera projection function and ${\mathbf{n}}_{c}$ is the measurement noise. Thus for a feature $l$ with inverse depth ${\rho }_{l}$ in frame $i$ ,if it is observed again in frame $j$ , the residual that relates two frames can be expressed as

其中 $\widetilde{\mathcal{P}} = {\left\lbrack u,v\right\rbrack }^{T}$ 是图像平面中的特征坐标，${\mathbf{x}}^{w}$ 是其在局部世界坐标系中的相应三维标记位置。${\pi }_{c}\left( \cdot \right)$ 表示相机投影函数，${\mathbf{n}}_{c}$ 是测量噪声。因此对于一个在帧 $l$ 中具有逆深度 ${\rho }_{l}$ 的特征，如果它在帧 $j$ 中再次被观测到，那么关联两帧的残差可以表示为

\[{\mathbf{r}}_{\mathcal{C}}\left( {{\widetilde{\mathbf{z}}}_{l},\mathcal{X}}\right) = {\widetilde{\mathcal{P}}}_{l}^{{c}_{{t}_{j}}} - {\pi }_{c}\left( {\widehat{\mathbf{x}}}_{l}^{{c}_{{t}_{j}}}\right) \tag{11a} \]

\[{\widehat{\mathbf{x}}}_{l}^{{c}_{{t}_{j}}} = {\mathbf{R}}_{b}^{c}\left( {{\mathbf{R}}_{w}^{{b}_{{t}_{j}}}\left( {{\mathbf{R}}_{{b}_{{t}_{i}}}^{w}\left( {{\mathbf{R}}_{c}^{b}\frac{1}{{\rho }_{l}}{\pi }_{c}^{-1}\left( {\widetilde{\mathcal{P}}}_{l}^{{c}_{{t}_{i}}}\right) + {\mathbf{p}}_{c}^{b}}\right) + }\right. }\right. \tag{11b} \]

\[\left. {\left. {\mathbf{p}}_{{b}_{{t}_{i}}}^{w}\right) + {\mathbf{p}}_{w}^{{b}_{{t}_{j}}}}\right) + {\mathbf{p}}_{b}^{c} \]

Fig. 6. Factor graph representation of the optimization problem in our system, where system states are denoted by large colored circles and factors are represented by small black circles. The factors from various measurements consist of inertial factor $i$ ,visual factor $f$ ,code pseudorange and Doppler factor $g$ and clock factor $c$ . A prior factor $p$ is used to constrain the first pose of the local world frame.

图6. 我们系统中优化问题的因子图表示，其中系统状态由大色圈表示，因子由小黑圈表示。来自各种测量的因子包括惯性因子 $i$ ，视觉因子 $f$ ，码伪距和多普勒因子 $g$ 以及时钟因子 $c$ 。先验因子 $p$ 用于约束局部世界框架的第一个姿态。

where $\left\{ {{\mathbf{R}}_{c}^{b},{\mathbf{t}}_{c}^{b}}\right\}$ is the transformation between IMU and camera.

其中 $\left\{ {{\mathbf{R}}_{c}^{b},{\mathbf{t}}_{c}^{b}}\right\}$ 是IMU与相机之间的变换。

D. Code Pseudorange Factor

D. 码伪距因子

Consider a GNSS receiver $r$ which locks a navigation satellite $s$ ,it measures the code shift to obtain the code pseudorange information as illustrated in Eq. (2). The satellite clock error and atmospheric delay are compensated using the models described in Section IV-D. In our system, the code pseudorange noise ${\epsilon }_{r}^{s}$ is assumed to be zero-mean Gaussian distributed such as ${\epsilon }_{r}^{s} \sim N\left( {0,{\sigma }_{r,{pr}}^{s}}\right)$ ,where the variance ${\sigma }_{r,{pr}}^{s}$ is modelled as

考虑一个锁定导航卫星 $s$ 的GNSS接收器 $r$ ，它测量码移以获得如式（2）所示的码伪距信息。卫星时钟误差和大气延迟使用第IV-D节中描述的模型进行补偿。在我们的系统中，假设码伪距噪声 ${\epsilon }_{r}^{s}$ 是零均值高斯分布，如 ${\epsilon }_{r}^{s} \sim N\left( {0,{\sigma }_{r,{pr}}^{s}}\right)$ ，其中方差 ${\sigma }_{r,{pr}}^{s}$ 被建模为

\[{\sigma }_{r,{pr}}^{s} = \frac{{n}_{s} \times {n}_{pr}}{{\sin }^{2}{\theta }_{el}}. \tag{12} \]

Here ${n}_{s}$ is the broadcast satellite space accuracy index,and ${n}_{pr}$ is the code pseudorange measurement noise index reported by the receiver. ${\theta }_{el}$ represents the satellite elevation angle at the view of the receiver, and there are two reasons for this denominator term. Firstly it can suppress the noise caused by GNSS multiple path effect that usually occurs on low elevation satellites. Furthermore, the ionospheric delay obtained by Klobuchar model, which is widely adopted by navigation system,still contains an error up to ${50}\%$ [23]. As the low elevation satellites will suffer from a significant ionospheric delay, the denominator term can also reduce the error coming with the ionospheric compensation.

这里 ${n}_{s}$ 是广播卫星空间精度指数，而 ${n}_{pr}$ 是接收机报告的码伪距测量噪声指数 ${\theta }_{el}$ 代表接收机视角下的卫星高度角，这个分母项有两个原因。首先，它可以抑制低高度卫星上通常发生的 GNSS 多路径效应引起的噪声。此外，广泛被导航系统采用的 Klobuchar 模型得到的电离层延迟仍然存在高达 ${50}\%$ [23] 的误差。由于低高度卫星会受到显著的电离层延迟影响，分母项也可以减少电离层补偿带来的误差。

A coordinate in the ECEF frame can be transformed to the local world frame via an anchor point, at which the ENU frame is built. Given the ECEF coordinate of the anchor point, the rotation from ENU frame to ECEF frame is

在 ECEF 坐标系中的坐标可以通过一个锚点转换到局部世界坐标系，在该锚点处建立 ENU 坐标系。给定锚点的 ECEF 坐标，从 ENU 坐标系到 ECEF 坐标系的旋转是

\[{\mathbf{R}}_{n}^{e} = \left\lbrack \begin{matrix} - \sin \lambda & - \sin \phi \cos \lambda & \cos \phi \cos \lambda \\ \cos \lambda & - \sin \phi \sin \lambda & \cos \phi \sin \lambda \\ 0 & \cos \phi & \sin \phi \end{matrix}\right\rbrack \tag{13} \]

where $\phi$ and $\lambda$ is the latitude and longitude of the reference point in geographic coordinate system. The 1-DOF rotation between ENU and local world frame ${\mathbf{R}}_{w}^{n}$ is given by the yaw offset $\psi$ . Then the relationship between ECEF and local world coordinates of the receiver's antenna can be expressed as

其中 $\phi$ 和 $\lambda$ 是地理坐标系中参考点的纬度和经度。ENU 与局部世界坐标系之间的 1-自由度旋转 ${\mathbf{R}}_{w}^{n}$ 由偏航偏移 $\psi$ 给出。然后接收机天线在 ECEF 和局部世界坐标之间的关系可以表示为

\[{\mathbf{p}}_{r}^{e} = {\mathbf{R}}_{n}^{e}{\mathbf{R}}_{w}^{n}\left( {{\mathbf{p}}_{r}^{w} - {\mathbf{p}}_{\text{anc }}^{w}}\right) + {\mathbf{p}}_{\text{anc }}^{e}. \tag{14} \]

In our implementation we set the anchor point to the origin of the local world frame, that is, the origin of the local world frame coincides with the origin of the ENU frame, as illustrated in Fig. 2. Thus ${\mathbf{p}}_{anc}^{w}$ ,the anchor’s coordinate in the local world frame, becomes a zero vector. The position of the receiver's antenna in the local world frame can be associated with the system states by

在我们的实现中，我们将锚点设置为局部世界坐标系的原点，即局部世界坐标系的原点与 ENU 坐标系的原点重合，如图 2 所示。因此 ${\mathbf{p}}_{anc}^{w}$ ，即锚点在局部世界坐标系中的坐标，变成了零向量。接收机天线在局部世界坐标系中的位置可以通过以下方式与系统状态相关联

\[{\mathbf{p}}_{r}^{w} = {\mathbf{p}}_{b}^{w} + {\mathbf{R}}_{b}^{w}{\mathbf{p}}_{r}^{b} \tag{15} \]

where ${\mathbf{p}}_{r}^{b}$ is the offset of the antenna expressed in body frame.

其中 ${\mathbf{p}}_{r}^{b}$ 是天线在机体坐标系中的偏移。

So far we are able to compute the ECEF coordinate of the receiver's antenna at any time given the corresponding system states. Because the GNSS measurements are time tagged by the receiver, we define the ECI frame to be coincident with the ECEF frame at the signal reception time. In this way, we have ${\mathbf{p}}_{r}^{E} = {\mathbf{p}}_{r}^{e}$ when the signal arrives at the receiver. On the other hand, the satellite's position in ECEF frame at the signal transmission time,which we denote as ${\mathbf{p}}_{s}^{{e}^{\prime }}$ ,can be obtained by the broadcast ephemeris and code pseudorange measurement. As a result of Earth’s rotation,the ECEF frame ${\left( \cdot \right) }^{{e}^{\prime }}$ when the signal leaves the satellite is different from the one ${\left( \cdot \right) }^{e}$ when the signal arrives. To this end, the satellite's position need to be transformed to the ECI frame (also the ECEF frame at reception time) by

到目前为止，我们能够计算出在任意给定系统状态下接收机天线的ECEF坐标。因为GNSS测量由接收机打上时间标签，我们定义ECI框架在信号接收时间与ECEF框架重合。这样，当信号到达接收机时，我们有 ${\mathbf{p}}_{r}^{E} = {\mathbf{p}}_{r}^{e}$。另一方面，卫星在信号传输时间的ECEF框架中的位置，我们称之为 ${\mathbf{p}}_{s}^{{e}^{\prime }}$，可以通过广播星历和码伪距测量获得。由于地球自转，信号离开卫星时的ECEF框架 ${\left( \cdot \right) }^{{e}^{\prime }}$ 与信号到达时的框架 ${\left( \cdot \right) }^{e}$ 不同。为此，卫星的位置需要转换到ECI框架（也是接收时间的ECEF框架）通过

\[{\mathbf{p}}_{s}^{E} = {\mathbf{R}}_{z}\left( {-{\omega }_{E}{t}_{f}}\right) {\mathbf{p}}_{s}^{{e}^{\prime }}, \tag{16} \]

where ${\mathbf{R}}_{z}\left( \theta \right)$ represents a rotation about the $z$ axis of the ECI frame with magnitude $\theta$ ,and ${t}_{f}$ is the ToF of the GNSS signal.

其中 ${\mathbf{R}}_{z}\left( \theta \right)$ 表示绕ECI框架的 $z$ 轴旋转，旋转量为 $\theta$，而 ${t}_{f}$ 是GNSS信号的飞行时间。

In the end, the residual of a single code pseudorange measured in ${t}_{k}$ ,which connects system states $\left\{ {{\mathbf{p}}_{{b}_{{t}_{k}}}^{w},{\mathbf{q}}_{{b}_{{t}_{k}}}^{w},\delta {\mathbf{t}}_{k},\psi }\right\}$ and satellite ${s}_{j}$ ,can be formulated as

最后，单个码伪距测量残差在 ${t}_{k}$ 中，它连接系统状态 $\left\{ {{\mathbf{p}}_{{b}_{{t}_{k}}}^{w},{\mathbf{q}}_{{b}_{{t}_{k}}}^{w},\delta {\mathbf{t}}_{k},\psi }\right\}$ 和卫星 ${s}_{j}$，可以表示为

\[{r}_{\mathcal{P}}\left( {{\widetilde{\mathbf{z}}}_{{r}_{k}}^{{s}_{j}},\mathcal{X}}\right) = \begin{Vmatrix}{{\mathbf{R}}_{z}\left( {{\omega }_{E}{t}_{f}}\right) {\mathbf{p}}_{s}^{{e}^{\prime }} - {\mathbf{p}}_{{r}_{k}}^{E}}\end{Vmatrix} + c\left( {{\mathbf{\zeta }}_{{s}_{j}}^{T}\delta {\mathbf{t}}_{k} - \Delta {t}^{{s}_{j}}}\right) + \]

\[{T}_{{r}_{k}}^{{s}_{j}} + {I}_{{r}_{k}}^{{s}_{j}} - {\widetilde{P}}_{{r}_{k}}^{{s}_{j}} \tag{17} \]

where ${r}_{k}$ stands for the GNSS receiver at time ${t}_{k}$ .

其中 ${r}_{k}$ 代表在时间 ${t}_{k}$ 的GNSS接收机。

E. Doppler Factor

E. 多普勒因子

The Doppler frequency shift, as shown in Eq. (3), is a result of the relative velocity along the line of the signal propagation path between the receiver and satellite. Similar to code pseudorange noise,the Doppler measurement noise ${\eta }_{r,{dp}}^{s}$ is assumed to be Gaussian distributed and the corresponding variance is modelled as

如公式（3）所示，多普勒频率偏移是接收机和卫星之间信号传播路径上的相对速度的结果。与码伪距噪声类似，多普勒测量噪声 ${\eta }_{r,{dp}}^{s}$ 被假定为高斯分布，相应的方差被建模为

\[{\sigma }_{r,{dp}}^{s} = \frac{{n}_{s} \times {n}_{dp}}{{\sin }^{2}{\theta }_{el}}, \tag{18} \]

where ${n}_{dp}$ is the measurement noise index reported by the receiver. The receiver's velocity in ECEF frame can be obtained from the local world velocity via

其中 ${n}_{dp}$ 是接收机报告的测量噪声指数。接收机在ECEF框架中的速度可以通过本地世界速度获得通过

\[{\mathbf{v}}_{r}^{e} = {\mathbf{R}}_{n}^{e}{\mathbf{R}}_{w}^{n}{\mathbf{v}}_{b}^{w} \tag{19} \]

By defining the ECI frame as the ECEF frame at reception time,we have ${\mathbf{v}}_{r}^{E} = {\mathbf{v}}_{r}^{e}$ . Then the satellite’s velocity in the signal-transmission ECEF frame, ${\mathbf{v}}_{s}^{{e}^{\prime }}$ ,can be transformed to the ECI frame by

通过将ECI框架定义为接收时间点的ECEF框架，我们得到 ${\mathbf{v}}_{r}^{E} = {\mathbf{v}}_{r}^{e}$ 。然后信号传输ECEF框架中的卫星速度 ${\mathbf{v}}_{s}^{{e}^{\prime }}$ 可以通过

\[{\mathbf{v}}_{s}^{E} = {\mathbf{R}}_{z}\left( {-{\omega }_{E}{t}_{f}}\right) {\mathbf{v}}_{s}^{{e}^{\prime }} \tag{20} \]

Finally the residual with related to Doppler measurement in ${t}_{k}$ ,which connects system states $\left\{ {{\mathbf{p}}_{{b}_{{t}_{k}}}^{w},{\mathbf{v}}_{{b}_{{t}_{k}}}^{w},\dot{\delta }{t}_{k},\psi }\right\}$ and satellite ${s}_{j}$ ,can be formulated as

最后，与多普勒测量相关的残差 ${t}_{k}$ ，它连接系统状态 $\left\{ {{\mathbf{p}}_{{b}_{{t}_{k}}}^{w},{\mathbf{v}}_{{b}_{{t}_{k}}}^{w},\dot{\delta }{t}_{k},\psi }\right\}$ 和卫星 ${s}_{j}$ ，可以表示为

\[{r}_{\mathcal{D}}\left( {{\widetilde{\mathbf{z}}}_{{r}_{k}}^{{s}_{j}},\mathcal{X}}\right) = \frac{1}{\lambda }{{\mathbf{\kappa }}_{{r}_{k}}^{{s}_{j}}}^{T}\left( {{\mathbf{v}}_{{s}_{j}}^{E} - {\mathbf{v}}_{{r}_{k}}^{E}}\right) + \tag{21} \]

\[\frac{c}{\lambda }\left( {\dot{\delta }{t}_{k} - \Delta {\dot{t}}^{{s}_{j}}}\right) + \Delta {\widetilde{f}}_{{r}_{k}}^{{s}_{j}}. \]

F. Receiver clock factors

F. 接收机时钟因素

The receiver clock biases in ${t}_{k}$ and ${t}_{k + 1}$ relate the clock drift rate by

${t}_{k}$ 和 ${t}_{k + 1}$ 中的接收机时钟偏差与时钟漂移率相关联

\[\delta {\mathbf{t}}_{k} = \delta {\mathbf{t}}_{k - 1} + {\mathbf{1}}_{4 \times 1}{\int }_{{t}_{k - 1}}^{{t}_{k}}\dot{\delta t}{dt} \tag{22} \]

where ${\mathbf{1}}_{n \times m}$ stands for $n$ by $m$ all-ones matrix,and the residual in discrete case is

其中 ${\mathbf{1}}_{n \times m}$ 代表由 $n$ 形成的全1矩阵，离散情况下的残差是

\[{\mathbf{r}}_{\mathcal{T}}\left( {{\widetilde{\mathbf{z}}}_{k - 1}^{k},\mathcal{X}}\right) = \delta {\mathbf{t}}_{k} - \delta {\mathbf{t}}_{k - 1} - {\mathbf{1}}_{4 \times 1}\dot{\delta }{t}_{k - 1}{\tau }_{k - 1}^{k}, \tag{23} \]

where ${\tau }_{k - 1}^{k}$ is the time difference between measurement $k - 1$ and $k$ . The covariance matrix associate with this residual is defined as a 4 by 4 diagonal matrix ${\mathbf{D}}_{t,k}$ with its elements describe the discretization error.

其中 ${\tau }_{k - 1}^{k}$ 是测量 $k - 1$ 和 $k$ 之间的时间差。与此残差相关的协方差矩阵定义为4x4的对角矩阵 ${\mathbf{D}}_{t,k}$ ，其元素描述了离散化误差。

The GNSS receiver clock drift rate, on the other hand, is determined by the frequency stability of the receiver clock. Temperature Controlled X'tal(crystal) Oscillator (TCXO) is often chosen as the clock source on low-cost GNSS receivers. Due to the noise characteristic of TCXO, the receiver clock drift rate is modelled as a random walk process, thus the residual becomes

另一方面，GNSS接收机时钟的漂移率由接收机时钟的频率稳定性决定。温度控制晶体振荡器（TCXO）通常被选为低成本GNSS接收机的时钟源。由于TCXO的噪声特性，接收机时钟的漂移率被建模为随机游走过程，因此残差变为

\[{r}_{\mathcal{W}}\left( {{\widetilde{\mathbf{z}}}_{k - 1}^{k},\mathcal{X}}\right) = \dot{\delta }{t}_{k} - \dot{\delta }{t}_{k - 1}. \tag{24} \]

The corresponding variance ${\sigma }_{{dt},k}$ is determined by the stability of the clock frequency drift.

对应的方差 ${\sigma }_{{dt},k}$ 由时钟频率漂移的稳定性决定。

Fig. 7. An illustration of the proposed coarse-to-fine initialization process. The module takes the local position and velocity result from VIO and outputs the corresponding trajectory in global ECEF frame.

图7. 提出的粗到细初始化过程的示意图。该模块接收来自VIO的本地位置和速度结果，并输出相应的全球ECEF框架轨迹。

VII. GNSS INITIALIZATION AND DEGENERATION

VII. GNSS初始化与退化

The state estimation process described in the last section is non-linear with respect to the system states thus its performance heavily relies on the initial values. With online initialization, the initial states can be well recovered from an unknown situation without any assumption or manual intervention. During the system operation, the estimator may also encounter imperfect situations where some of sensors experience failure or degeneration. As there is already extensive literature on the topics of initialization and degeneration with respect to the visual-inertial system, in this section we limit the scope to the GNSS part. In the following we first introduce the proposed coarse-to-fine GNSS initialization approach, then we discuss several scenarios that degrade the performance of our system.

上一个部分描述的状态估计过程相对于系统状态是非线性的，因此其性能严重依赖于初始值。通过在线初始化，可以在没有任何假设或人工干预的情况下，从未知情况中很好地恢复初始状态。在系统运行期间，估计器可能会遇到不完美的情况，其中一些传感器可能会出现故障或退化。由于已经有大量文献涉及视觉-惯性系统的初始化和退化问题，本节我们将范围限制在GNSS部分。接下来，我们首先介绍所提出的粗到细GNSS初始化方法，然后讨论几种会降低我们系统性能的场景。

A. Initialization

A. 初始化

As mentioned before, an anchor point with known global and local coordinate is necessary to fuse the global GNSS measurement with the local visual and inertial information. As the anchor point is already set to the origin of the local world frame, the ECEF coordinate of the local world origin need to be calibrated beforehand. In addition,the yaw offset $\psi$ between ENU and local world frame, which brings nonlinearity into the system, also needs a reasonable initial value in order to converge at the non-linear optimization stage. In this paper, we propose a multi-stage GNSS-VI initialization procedure to online calibrate the anchor point and the yaw offset. Before the GNSS-VI initialization, we assume that the VIO has been successfully initialized, i.e. the gravity vector, initial velocity, initial IMU bias and scale have obtained initial values [32]. After that, a smooth trajectory in the local world frame is formed and is ready to be used in the GNSS-VI initialization phase. The GNSS-VI initialization procedure requires at least 4 satellites being tracked (if all satellites belongs to a single system, $\left( {N + 3}\right)$ if $N$ satellite systems are involved). In addition, a minimum distance of 4 meters is also required to obtain reliable initial quantities. As illustrated in Fig. 7, the online GNSS-VI initialization is conducted in a coarse-to-fine manner and consists of three steps:

如前所述，一个已知全局和局部坐标的锚点对于将全局GNSS测量与局部视觉和惯性信息融合是必要的。由于锚点已经设置为局部世界坐标系的原点，因此需要预先校准局部世界原点的ECEF坐标。此外，ENU坐标系与局部世界坐标系之间的偏航偏移 $\psi$ ，它给系统带来了非线性，也需要一个合理的初始值，以便在非线性优化阶段收敛。在本文中，我们提出了一种多阶段的GNSS-VI初始化程序，以在线校准锚点和偏航偏移。在GNSS-VI初始化之前，我们假设VIO已经成功初始化，即重力矢量、初始速度、初始IMU偏差和尺度已经获得初始值 [32]。之后，局部世界框架中形成了一个平滑的轨迹，并准备好在GNSS-VI初始化阶段使用。GNSS-VI初始化程序至少需要跟踪4颗卫星（如果所有卫星都属于单一系统，$\left( {N + 3}\right)$ 如果涉及 $N$ 卫星系统）。此外，还需要至少4米的最低距离以获得可靠的初始量。如图7所示，在线GNSS-VI初始化以粗到细的方式进行，并包括三个步骤：

Coarse Anchor Point Localization: At first a coarse ECEF coordinate is generated by the GNSS SPP algorithm without any prior information. The SPP algorithm takes all code pseudorange measurements from the most recent epoch as input.
粗略锚点定位：首先，在没有先验信息的情况下，通过GNSS单点定位(SPP)算法生成粗略的ECEF坐标。SPP算法将最近历元的全部伪距测量值作为输入。
Yaw Offset Calibration: In the second step, we calibrate the yaw offset between the ENU frame and the local world frame using the less noisy Doppler measurement. The initial yaw offset and receiver clock drift rate are obtained through the following optimization problem
偏航偏移校准：在第二步中，我们使用较不噪声的多普勒测量值来校准ENU坐标系与局部世界坐标系之间的偏航偏移。通过以下优化问题获得初始偏航偏移和接收机时钟漂移率。

\[\mathop{\operatorname{minimize}}\limits_{{\dot{\delta }t,\psi }}\mathop{\sum }\limits_{{k = 1}}^{n}\mathop{\sum }\limits_{{j = 1}}^{{p}_{k}}{\begin{Vmatrix}{r}_{\mathcal{D}}\left( {\widetilde{\mathbf{z}}}_{{r}_{k}}^{{s}_{j}},\mathcal{X}\right) \end{Vmatrix}}_{{\sigma }_{{r}_{k},{dp}}^{{s}_{j}}}^{2}, \tag{25} \]

where $\mathrm{n}$ is the sliding window size and ${p}_{k}$ is the number of satellites observed in $k$ -th epoch inside the window. Here we fix the velocity ${\mathbf{v}}_{b}^{w}$ to the result of VIO and assume that $\delta {t}_{k}$ is constant within the window. The coarse anchor coordinate obtained from the first step is used to calculated the direction vector ${\mathbf{\kappa }}_{r}^{s}$ and rotation ${\mathbf{R}}_{n}^{e}.{\mathbf{\kappa }}_{r}^{s}$ and ${\mathbf{R}}_{n}^{e}$ are not sensitive to the receiver's location thus a coarse anchor point coordinate is sufficient. The parameters to be estimated only include the yaw offset $\psi$ and the average clock bias drift rate ${\delta t}$ over the entire window measurements. After that, the transformation between the ENU frame and local world frame is fully calibrated.

其中 $\mathrm{n}$ 是滑动窗口大小，${p}_{k}$ 是在 $k$ -th历元内观测到的卫星数量。在这里，我们将速度 ${\mathbf{v}}_{b}^{w}$ 固定为VIO的结果，并假设 $\delta {t}_{k}$ 在窗口内是恒定的。第一步获得的粗略锚点坐标用于计算方向向量 ${\mathbf{\kappa }}_{r}^{s}$，而 ${\mathbf{R}}_{n}^{e}.{\mathbf{\kappa }}_{r}^{s}$ 和 ${\mathbf{R}}_{n}^{e}$ 对接收器的位置不敏感，因此一个粗略的锚点坐标就足够了。需要估计的参数只包括偏航偏移 $\psi$ 和在整个窗口测量期间的平均时钟偏差漂移率 ${\delta t}$。

Anchor Point Refinement: Finally we are ready to refine the previous coarse anchor point and align the local world trajectory with that in ECEF frame. Different from the first step, the position result from VIO is used as prior information. The following problem is optimized over the sliding window measurements.
锚点细化：最后，我们准备细化之前的粗略锚点，并将本地世界轨迹与ECEF框架中的轨迹对齐。与第一步不同，这里将VIO的位置结果用作先验信息。以下问题是在滑动窗口测量上优化的。

\[\mathop{\operatorname{minimize}}\limits_{{\delta \mathbf{t},{\mathbf{p}}_{anc}^{e}}}\left( {\mathop{\sum }\limits_{{k = 1}}^{n}\mathop{\sum }\limits_{{j = 1}}^{{p}_{k}}{\begin{Vmatrix}{r}_{\mathcal{P}}\left( {\widetilde{\mathbf{z}}}_{{r}_{k}}^{{s}_{j}},\mathcal{X}\right) \end{Vmatrix}}_{{\sigma }_{{r}_{k},{pr}}^{{s}_{j}}}^{2} + }\right. \tag{26} \]

\[\left. {\mathop{\sum }\limits_{{k = 1}}^{n}{\begin{Vmatrix}{\mathbf{r}}_{\mathcal{T}}\left( {\widetilde{\mathbf{z}}}_{k - 1}^{k},\mathcal{X}\right) \end{Vmatrix}}_{{\mathbf{D}}_{t,k}}^{2}}\right) \]

The anchor point coordinate and the receiver clock biases associate with each GNSS epoch are refined through the optimization of the above problem. After this step, the anchor point, origin of the ENU frame, is set to the origin of the local world frame. Finally the initialization phase of the entire estimator is finished and all necessary quantities have been assigned initial values.

通过优化上述问题，锚点坐标和与每个GNSS历元关联的接收器时钟偏差得到细化。在此步骤之后，将锚点，即ENU框架的原点，设置为本地世界框架的原点。最终，整个估计器的初始化阶段完成，所有必要的量都已分配了初始值。

B. Degenerate Cases

B. 退化解

There is no doubt that our fusion system will perform best in an open-area where GNSS signal is stable and satellites are well-distributed. In the following we will discuss several situations which may degrade the performance of our system.

毫无疑问，我们的融合系统在开阔区域，GNSS信号稳定且卫星分布良好的情况下表现最佳。接下来，我们将讨论几种可能会降低我们系统性能的情况。

Low speed movement: Since the noise level of Doppler shift measurement is an order of magnitude lower than that of code pseudorange, the yaw offset between the local world frame and ENU frame can be well constrained by a short window of Doppler shift measurements. Once the velocity of the GNSS receiver is below the noise level of the Doppler shift, the estimated yaw offset may be corrupted by the measurement noise. In addition, low speed movement also implies that the translational distance within the window is short, thus the yaw estimation may be affected by code pseudorange as well. In an extreme case where the platform experiences a rotation-only movement, GNSS cannot provide any information on the rotational directions and in turn the yaw component will drift as that in VIO. Thus we fix the yaw offset variable if the average velocity inside the window is below the threshold ${v}_{ths}$ . In our system, ${v}_{ths}$ is set to ${0.3}\mathrm{\;m}/\mathrm{s}$ which can be easily satisfied even by a pedestrian.
低速运动：由于多普勒频移测量的噪声水平比码伪距的噪声水平低一个数量级，因此通过短时间窗口内的多普勒频移测量可以很好地约束本地世界坐标系与东北天（ENU）坐标系之间的偏航偏移。一旦GNSS接收机的速度低于多普勒频移的噪声水平，估计的偏航偏移可能会受到测量噪声的破坏。此外，低速运动也意味着窗口内的平移距离较短，因此偏航估计可能还会受到码伪距的影响。在平台仅经历旋转运动的极端情况下，GNSS无法提供关于旋转方向的任何信息，从而导致偏航分量像VIO中那样漂移。因此，如果窗口内的平均速度低于阈值 ${v}_{ths}$，我们就固定偏航偏移变量。在我们的系统中，${v}_{ths}$ 设置为 ${0.3}\mathrm{\;m}/\mathrm{s}$，即使是对行人来说也可以轻松满足。

Fig. 8. Relative pose error of GVINS, VINS-Fusion and VINS-Mono with respect to the evaluation distance on the simulation environment. The top two figures correspond to the four unobservable directions (x, y, z and yaw) of VIO and the bottom figure is the overall relative rotation error.

图8. GVINS、VINS-Fusion和VINS-Mono在仿真环境下的评估距离对应的相对位姿误差。上面两个图对应于VIO的四个不可观测方向（x、y、z和偏航），下面图是整体的相对旋转误差。

Less than 4 satellites being tracked: If the number of satellites being tracked is less than 4 , the SPP or loosely-coupled approaches will fail to resolve the receiver's location. However, with the help of the tightly-coupled structure, our system is still able to make use of available satellites and subsequently update the states vector. Later in Section VIII-B we will investigate the performance degradations under various satellite configurations.
跟踪的卫星少于4颗：如果跟踪的卫星数量少于4颗，SPP或松耦合方法将无法确定接收机的位置。然而，在紧耦合结构的帮助下，我们的系统仍然能够利用可用的卫星，并随后更新状态向量。在第VIII-B节中，我们将研究在不同卫星配置下的性能下降情况。
No GNSS signal: In indoor or cluttered environments where GNSS signal is totally unavailable, the states related to global information,namely the yaw offset $\psi$ ,receiver clock bias $\delta \mathbf{t}$ and drift rate ${\delta t}$ are no longer observable. However, constraints from Eq. (23) and (24) are still kept during the optimization. The clock drift rate of low-cost receivers is quite stable as we found in the receiver stand-still analysis, thus the (near-)optimum clock drift rate is maintained by the constraint from Eq. (24). Similarly, the receiver bias is propagated by the constraint from Eq. (23), which in turn provides a good initial value when the GNSS signal is reacquired. This mechanism improves the stability of our fusion system when the GNSS signal is intermittent and eliminates the need for re-initialization when signal is lost-and-reacquired.
无GNSS信号：在室内或杂乱环境中，当GNSS信号完全不可用时，与全局信息相关的状态，即偏航偏移 $\psi$、接收器时钟偏差 $\delta \mathbf{t}$ 和漂移率 ${\delta t}$ 将不再可观测。然而，在优化过程中，仍然保持来自方程（23）和（24）的约束。我们在接收器静止分析中发现，低成本接收器的时钟漂移率相当稳定，因此通过方程（24）的约束维持了（接近）最优时钟漂移率。同样，接收器偏差通过方程（23）的约束传播，这反过来在GNSS信号重新捕获时提供了一个良好的初始值。当GNSS信号间歇性时，该机制提高了我们的融合系统的稳定性，并消除了信号丢失和重新捕获时重新初始化的需要。

VIII. EXPERIMENTAL RESULTS

VIII. 实验结果

We conduct both simulation and real-world experiments to verify the performance of our proposed system. In this section, we compare our system with the open-source VINS-Mono [6], VINS-Fusion [12] (Monocular+IMU+GNSS) and RTKLIB [33]. Since we are only interested in the real-time estimation result, the loop function of VINS-Mono and VINS-Fusion, which optimizes pose graph based on revisited scene, is disabled. We use RTKLIB ${}^{3}$ to compute the GNSS SPP solution and feed the obtained GNSS location to VINS-Fusion for a loose-coupled result. The window size of our system, as well as that of VINS-Mono and VINS-Fusion, are set to 10. Table. I lists the maximum velocity and the overall RTK fixed rate in each experiment. All experiments in this section are performed on a desktop PC with an Intel i7-8700K at 3.7 $\mathrm{{GHz}}$ and ${32}\mathrm{{GB}}$ memory.

我们进行了模拟和现实世界的实验来验证我们提出系统的性能。在本节中，我们将我们的系统与开源的VINS-Mono [6]、VINS-Fusion [12]（单目+IMU+GNSS）和RTKLIB [33]进行了比较。因为我们只对实时估计结果感兴趣，所以禁用了VINS-Mono和VINS-Fusion的循环函数，该函数基于重新访问的场景优化姿态图。我们使用RTKLIB ${}^{3}$ 计算GNSS SPP解决方案，并将获得的GNSS位置输入到VINS-Fusion中，以获得松耦合结果。我们的系统以及VINS-Mono和VINS-Fusion的窗口大小均设置为10。表I列出了每次实验中的最大速度和整体RTK固定率。本节的所有实验都是在配备有Intel i7-8700K，3.7 $\mathrm{{GHz}}$ 和 ${32}\mathrm{{GB}}$ 内存的台式PC上进行的。

TABLE I

VELOCITY AND RTK FIXED RATE PROFILES IN EACH EXPERIMENT

每次实验中的速度和RTK固定率分布

	maximum velocity $\left\lbrack {\mathrm{m}/\mathrm{s}}\right\rbrack$	RTK fixed rate [%]
Simulation	10.000	N/A
Sports field	1.676	100 %
Indoor-outdoor	2.108	81.3 %
Urban driving	21.424	84.7 %

A. Simulation

A. 模拟

Setup: The simulation environment is a ${30}\mathrm{\;m} \times {30}\mathrm{\;m} \times$ ${30}\mathrm{\;m}$ cube with random generated 3D landmarks. The landmarks are projected to a ${10} - \mathrm{{Hz}}$ virtual camera with 75 degree horizontal FOV and 55 degree vertical FOV, which in turn generates around 100 visible features per frame. An additional white noise term with a standard deviation of 0.5 pixel is added to all feature points. A virtual ${200} - \mathrm{{Hz}}$ IMU is rigidly connected to the camera and moves along a pre-designed 3D path. The standard deviation associate with the white noise of the accelerometer and gyroscope is set to ${0.05}\mathrm{\;m}/{\mathrm{s}}^{2}$ and ${0.005}\mathrm{{rad}}/\mathrm{s}$ respectively,and the standard deviation of the accelerometer and gyroscope bias random walk is set to ${3.5} \times {10}^{-4}\mathrm{\;m}/{\mathrm{s}}^{2}$ and ${3.5} \times {10}^{-5}\mathrm{{rad}}/\mathrm{s}$ respectively. In the meantime,a ${10} - \mathrm{{Hz}}$ virtual GNSS receiver generates code pseudorange and Doppler shift measurements using the past or real-time broadcast ephemeris data. The standard deviation of code pseudorange and Doppler white noise shift is set to ${1m}$ and ${0.5}\mathrm{\;{Hz}}\left( { \sim {0.1}\mathrm{\;m}/\mathrm{s}\text{equivalent}}\right)$ respectively. The simulation experiment lasts for 30 minutes, with a trajectory over 10 kilometers.
设置：仿真环境是一个 ${30}\mathrm{\;m} \times {30}\mathrm{\;m} \times$ ${30}\mathrm{\;m}$ 立方体，其中随机生成了三维地标。地标被投影到一个 ${10} - \mathrm{{Hz}}$ 虚拟摄像头中，该摄像头具有75度水平视场角和55度垂直视场角，进而每帧生成大约100个可见特征点。所有特征点均添加了标准差为0.5像素的白噪声项。一个虚拟的 ${200} - \mathrm{{Hz}}$ IMU 与摄像头刚性连接，并沿着预设的三维路径移动。加速度计和陀螺仪的白噪声标准差分别设置为 ${0.05}\mathrm{\;m}/{\mathrm{s}}^{2}$ 和 ${0.005}\mathrm{{rad}}/\mathrm{s}$，加速度计和陀螺仪偏差随机游走的标准差分别设置为 ${3.5} \times {10}^{-4}\mathrm{\;m}/{\mathrm{s}}^{2}$ 和 ${3.5} \times {10}^{-5}\mathrm{{rad}}/\mathrm{s}$。与此同时，一个 ${10} - \mathrm{{Hz}}$ 虚拟GNSS接收机使用过去或实时广播的星历数据生成码伪距和多普勒频移测量值。码伪距和多普勒白噪声频移的标准差分别设置为 ${1m}$ 和 ${0.5}\mathrm{\;{Hz}}\left( { \sim {0.1}\mathrm{\;m}/\mathrm{s}\text{equivalent}}\right)$。仿真实验持续30分钟，轨迹超过10公里。

Fig. 9. Absolute trajectory error of GVINS, VINS-Fusion, VINS-Mono and RTKLIB with respect to the traveled distance on the simulation environment.

图9. GVINS、VINS-Fusion、VINS-Mono和RTKLIB在仿真环境下的绝对轨迹误差与行驶距离的关系。

TABLE II

INITIALIZATION QUALITY METRICS IN SIMULATION AND REAL-WORLD EXPERIMENTS

仿真和实际世界实验中的初始化质量指标

	Yaw offset error [degree]	Anchor point error $\left\lbrack \mathrm{m}\right\rbrack$
Simulation	0.183	0.635
Sports field	0.35	1.491
Indoor-outdoor	0.478	4.370
Urban driving	2.490	4.816

result: In the simulation environment, the GNSS-VI gets initialized immediately after visual-inertial alignment since the system do not need to wait for ephemerides. The initialization quality, which we measure by the error of local-ENU yaw offset and anchor point, is listed in Table. II. Fig. 8 shows the relative pose error (RPE) [34] with respect to the evaluation distance. As can be seen from the figure, The relative error of VINS-Mono increases with the evaluation distance in both translational and rotational directions. Among those the rotational error mainly comes from yaw component. This indicts that VINS-Mono suffers from accumulated drift in the four unobservable directions,namely $\mathrm{x},\mathrm{y},\mathrm{z}$ and yaw. The error of VINS-Fusion exhibits similar tendency when the evaluation distance is short, and remains at a constant level when the distance increases further. This implies that VINS-Fusion is able to bound the accumulated drift by loosely incorporating the GNSS solution. However, the magnitude of its relative error is much larger compared with the result of VINS-Mono and GVINS, thus the smoothness of the estimator is highly affected by the noisy GNSS measurement. Thanks to the tightly-coupled approach we adopted, our proposed system combines advantages of both VINS-Mono and VINS-Fusion. On the one hand, the relative error is comparable to that of VINS-Mono for short range thus the smoothness is preserved. On the other hand, the error no longer accumulates in all directions and the global consistency is also guaranteed.
结果：在仿真环境中，由于系统无需等待星历数据，GNSS-VI在视觉-惯性对准后立即初始化。初始化质量，我们通过本地-ENU偏航偏移量和锚点的误差来衡量，列在表II中。图8显示了相对于评估距离的相对位姿误差（RPE）[34]。从图中可以看出，VINS-Mono的相对误差在平移和旋转方向上均随评估距离增加。其中旋转误差主要来自偏航分量。这表明VINS-Mono在四个不可观测方向上，即 $\mathrm{x},\mathrm{y},\mathrm{z}$ 和偏航方向上存在累积漂移。当评估距离较短时，VINS-Fusion的误差表现出类似的趋势，当距离进一步增加时，误差保持恒定水平。这意味着VINS-Fusion通过松散地结合GNSS解能够限制累积漂移。然而，与VINS-Mono和GVINS的结果相比，其相对误差的幅度要大得多，因此估计器的平滑性受到GNSS测量噪声的影响。得益于我们采用的紧耦合方法，我们提出的系统结合了VINS-Mono和VINS-Fusion的优点。一方面，对于短距离，相对误差与VINS-Mono相当，因此保持了平滑性。另一方面，误差不再在所有方向上累积，同时也保证了全局一致性。

Fig. 9 depicts the absolute trajectory error (ATE) along with the traveled distance. The error plot of VINS-Mono keeps increasing as a result of accumulated drift, while it remains constant for all other three approaches. The ATE of RTKLIB SPP algorithm shows the noise level of the GNSS code pseudorange measurement, and VINS-Fusion is able to reduce the magnitude of ATE by combine the result of VIO in a loosely-coupled manner. By tightly fusing GNSS raw measurements and visual inertial data in a unified framework, our algorithm effectively suppresses the noise of GNSS signal and keeps the ATE at a low level. The final Root Mean Square Error (RMSE) of each approach is shown in Table. III.

图9描述了沿行驶距离的绝对轨迹误差（ATE）。VINS-Mono的误差曲线由于累积漂移而持续增加，而其他三种方法的误差保持不变。RTKLIB SPP算法的ATE显示了GNSS码伪距测量的噪声水平，而VINS-Fusion能够通过松耦合方式结合VIO的结果来减小ATE的大小。通过在统一框架内紧密融合GNSS原始测量值和视觉惯性数据，我们的算法有效地抑制了GNSS信号的噪声，并保持ATE在低水平。每种方法的最终均方根误差（RMSE）如表III所示。

${}^{3}$ https://github.com/tomojitakasu/RTKLIB/tree/rtklib_2.4.3

Fig. 10. The equipment used in our real-world experiments is a helmet with a VI-Sensor and a u-blox ZED-F9P attached. The camera and IMU measurements are well synchronized by VI-Sensor itself. The PPS signal from the GNSS receiver is used to trigger the VI-Sensor to align the global time with the local time.

图10。我们在现实世界实验中使用的设备是一个带有VI-Sensor和u-blox ZED-F9P的头盔。摄像头和IMU的测量通过VI-Sensor本身很好地同步。来自GNSS接收器的PPS信号用于触发VI-Sensor，以将全球时间与本地时间对齐。

TABLE III

RMSE[M] STATISTICS COMPARED WITH DIFFERENT APPROACHES IN SIMULATION ENVIRONMENT

RMSE[M] 在仿真环境中与不同方法的统计比较

	GVINS	VINS-Fusion	VINS-Mono	RTKLIB
Simulation	0.202	1.162	7.471	2.076
Sports field	0.806	2.149	8.537	2.835
Indoor-outdoor	3.700	6.905	36.651	6.036
Urban driving	4.508	N/A	N/A	11.106

B. Real-world Experiments

B. 现实世界实验

As illustrated in Fig. 10, the device used in our real-world experiments is a helmet with a VI-Sensor [35] and an u-blox ZED-F9P GNSS receiver ${}^{4}$ attached. The detailed specifications of each sensor are shown in Table. IV. The VI-Sensor provides two cameras as a stereo pair and we only use the left one for all experiments. The u-blox ZED-F9P is a low-cost multi-band receiver with multiple constellations support. In addition, ZED-F9P owns an internal RTK engine which is capable to provide receiver's location at an accuracy of $1\mathrm{\;{cm}}$ in open area. The real-time RTCM stream from a nearby base station is fed to to the ZED-F9P receiver for the ground truth RTK solution. In terms of time synchronization, the camera and IMU are synchronized by VI-Sensor, and the local time is aligned with the global GNSS time via Pluse Per Second (PPS) signal of ZED-F9P and hardware trigger of VI-Sensor.

如图10所示，我们在现实世界实验中使用的设备是一个头盔，配备有VI-Sensor [35] 和一个u-blox ZED-F9P GNSS接收器 ${}^{4}$。每个传感器的详细规格如表IV所示。VI-Sensor提供了一对立体相机，我们只使用左边的相机进行所有实验。u-blox ZED-F9P是一款低成本的多频接收器，支持多个星座。此外，ZED-F9P拥有一个内部RTK引擎，能够在开阔区域提供精度为 $1\mathrm{\;{cm}}$ 的接收器位置。来自附近基站的真实时RTCM数据流被输入到ZED-F9P接收器，以获取真实RTK解。在时间同步方面，相机和IMU通过VI-Sensor同步，本地时间通过ZED-F9P的每秒脉冲（PPS）信号和VI-Sensor的硬件触发与全球GNSS时间对齐。

Sports Field Experiment: This experiment is conducted on a sports field at our campus where we follow an athletic track for 5 laps. The sports field is a typical outdoor environment with an opened area on one side and some buildings the other side. During the experiment most of the satellites are well locked and the status of RTK remains fixed throughout the whole path. In this experiment the global consistency of our estimator is examined against the repeated trajectory and the unstable signal near buildings also poses challenges to the local smoothness of the result.
体育场实验：这项实验在我们的校园内的一个体育场上进行，我们在跑道上跑了5圈。体育场是一个典型的户外环境，一侧是开阔区域，另一侧有一些建筑物。在实验过程中，大多数卫星信号都锁定良好，RTK的状态在整个路径上保持固定。在这个实验中，我们通过重复轨迹检验了我们的估计器的全局一致性，而建筑物附近的信号不稳定也给结果的局部平滑性带来了挑战。

TABLE IV

SENSOR SPECIFICATIONS FOR THE DEVICE USED IN REAL-WORLD ENVIRONMENT

现实世界环境中使用的设备传感器规格

Sensor Type/Item	Value	Unit
Camera
Sensor	Aptina MT9V034
Shutter	Global shutter
Resolution	${752} \times {480}$	pixel
Horizontal field of view	98	degree
Vertical field of view	73	degree
Frequency	20	Hz
IMU
Sensor	ADIS 16448
Frequency	200	Hz
Gyroscope noise density	${7.0} \times {10}^{-3}$	${}^{ \circ }/s{\mathrm{\;{Hz}}}^{-{0.5}}$
Accelerometer noise density	${6.6} \times {10}^{-4}$	$m{s}^{-2}{\mathrm{\;{Hz}}}^{-{0.5}}$
GNSS
Receiver	u-blox ZED-F9P
Antenna	Tallysman TW3882
Raw measurement frequency	10	Hz
RTK solution frequency	10	Hz

Fig. 11. Positioning error of GVINS, VINS-Fusion, VINS-Mono and RTKLIB at the sports field experiment. The three sub-figures correspond to the three directions of ENU frame. The result from GVINS, VINS-Fusion and RTKLIB are compared directly against the RTK ground truth without any alignment, while the result from VINS-Mono is aligned to the ground truth trajectory beforehand.

图 11. 在运动场实验中 GVINS、VINS-Fusion、VINS-Mono 和 RTKLIB 的定位误差。三个子图分别对应于 ENU 坐标系的三个方向。GVINS、VINS-Fusion 和 RTKLIB 的结果直接与没有进行任何对齐的 RTK 实际值进行比较，而 VINS-Mono 的结果是在实验前与实际轨迹对齐的。

In this experiment,the GNSS-VI is initialized in ${4.1}\mathrm{\;s}$ after the visual-inertial finish its alignment. The positioning error of this experiment is plotted against ENU axes as depicted in Fig. 11. A reference point, which is used to transform the ECEF result to a ENU frame, is arbitrarily selected on the sports field. Since VINS-Fusion, RTKLIB and our system can directly output estimation results in ECEF frame, we do not apply any alignment for their trajectories. For VINS-Mono which only gives results in local frame, we perform a 4-DOF alignment between its trajectory and the ENU path of RTK using the first 2000 poses. Note that the global positioning results from VINS-Fusion, RTKLIB and our system suffer from a certain bias due to satellites' orbit error, inaccurate atmospheric delay modeling and multipath effect, while that of VINS-Mono does not have this issue because of the pre-alignment we made.

在本次实验中，GNSS-VI 在视觉-惯性完成对齐后在 ${4.1}\mathrm{\;s}$ 处初始化。本次实验的定位误差如图 11 所示，绘制在 ENU 轴上。在运动场上任意选择一个参考点，用于将 ECEF 结果转换到 ENU 坐标系。由于 VINS-Fusion、RTKLIB 和我们的系统可以直接输出 ECEF 坐标系下的估计结果，因此我们没有对它们的轨迹进行任何对齐。对于只提供局部框架结果的 VINS-Mono，我们使用前 2000 个姿态在其轨迹和 RTK 的 ENU 路径之间执行 4-自由度对齐。请注意，由于卫星轨道误差、不精确的大气延迟建模和多径效应，VINS-Fusion、RTKLIB 和我们的系统的全局定位结果存在一定的偏差，而 VINS-Mono 的结果由于我们进行的预对齐，没有这个问题。

${}^{4}$ https://www.u-blox.com/en/product/zed-f9p-module

Fig. 12. The trajectory of RTK, GVINS, VINS-Fusion, VINS-Mono and RTKLIB in the sports field experiment. The resulting trajectory of our proposed system is smooth and aligns well with that of the RTK.

图 12. 在运动场实验中 RTK、GVINS、VINS-Fusion、VINS-Mono 和 RTKLIB 的轨迹。我们提出系统的轨迹平滑，并且与 RTK 的轨迹对齐良好。

Fig. 13. Positioning error of our proposed system in situations where the number of locked satellites is insufficient. In the "All" setting, the system utilizes all available (around 20) satellites to perform estimation. The " 3 ", "2", "1" corresponds to cases where only that number of satellites are used in the system. When the number becomes 0 , our system does not use any satellite and degrades to a VIO.

图 13. 在锁定卫星数量不足的情况下，我们提出系统的定位误差。在 “全部” 设置中，系统利用所有可用（大约 20 个）卫星进行估计。 “3”、“2”、“1” 分别对应系统中只使用那个数量的卫星的情况。当数量变为 0 时，我们的系统不使用任何卫星并降级为 VIO。

From Fig. 11 we see that VINS-Mono suffers from drifting among all three directions. In addition, the periodic fluctuations on horizontal directions (east and north) implies an obvious drift on the yaw estimation. On the other hand, the SPP solution from RTKLIB does not drift at all, but is highly affected by the noisy GNSS measurement. The error of VINS-Fusion is bounded as a result of combining the global information from SPP result. However, the local accuracy oscillates a lot and the local smoothness is ruined in the meantime. As a comparison, the positioning error of our proposed system does not grow with the traveled distance and is always maintained at a low level. Meanwhile, the error varies slowly and continuously, which also indicates our system effectively suppresses the noise from unstable GNSS signals. Table. III lists the RMSE of each method and Fig. 12 shows final trajectories on Google Maps. The resulting 5 laps of our system overlap with each other and align well with those of RTK. Through this experiment, we show that our system is able to achieve global consistency to eliminate drifts of VIO and also preserve the local smoothness under noisy GNSS conditions.

从图 11 我们可以看到 VINS-Mono 在所有三个方向上都存在漂移。此外，水平方向（东和北）的周期性波动意味着偏航估计上有一个明显的漂移。另一方面，来自 RTKLIB 的 SPP 解算方案根本不会漂移，但受到噪声 GNSS 测量的严重影响。由于结合了 SPP 结果的全局信息，VINS-Fusion 的误差被限制在一定范围内。然而，局部精度波动很大，同时局部平滑性也被破坏。相比之下，我们提出系统的定位误差不会随着行驶距离的增长而增加，并且始终保持在较低水平。同时，误差变化缓慢且连续，这也表明我们的系统能够有效抑制来自不稳定 GNSS 信号的噪声。表 III 列出了每种方法的 RMSE，图 12 显示了在谷歌地图上的最终轨迹。我们系统的 5 圈轨迹彼此重叠，并与 RTK 的轨迹对齐良好。通过这个实验，我们展示了我们的系统能够实现全局一致性以消除 VIO 的漂移，并且在噪声 GNSS 条件下保持局部平滑性。

Fig. 14. The trajectories of our proposed system with different satellite configurations. GVINS performs best by utilizing all available satellites ("All"), and degrades to VIO with zero satellite configuration ("0"). A small bias occurs when only 3 satellites are used ("3"), and translational drift emerges when the satellite number is further reduced to 2 ("2"). If there is only 1 satellite available ("1"), yaw estimation starts to drift as well, but with a smaller magnitude compared to VIO ( 0 satellite).

图 14. 我们提出系统的轨迹，在不同卫星配置下。GVINS 通过利用所有可用卫星（“全部”）表现最佳，而在没有卫星配置时（“0”）降级为 VIO。当仅使用 3 颗卫星时（“3”），会出现小偏差，当卫星数量进一步减少到 2 颗时（“2”），会出现平移漂移。如果只有 1 颗卫星可用（“1”），偏航估计也会开始漂移，但与 VIO（0 颗卫星）相比，漂移幅度较小。

Insufficient Satellites Experiment: Based on the data sequence of sports field experiment, we further investigated the degenerate case where the number of tracked satellites is less than 4. Normally there are about 20 satellites being locked in this sequence, and we intentionally remove most of the satellites in the non-linear optimization phase in order to test the system behavior. Starting from the zero-satellite setting, we sequentially add satellite G2, G13 and G5, which are well tracked during the experiment, to the system to simulate the one, two, and three-satellite situations respectively. In this experiment we only use satellites from a single constellation (GPS) because the general case where $M$ satellites coming from $N$ constellations is equivalent to the $\left( {M - N + 1}\right)$ single-constellation case due to unknown clock offsets between different systems. It is worth to mention that our system naturally degrades to a VIO when there is no satellite available.
不充分卫星实验：基于运动场实验的数据序列，我们进一步研究了跟踪卫星数量少于 4 颗的退化情况。通常这个序列中会有大约 20 颗卫星被锁定，我们故意在非线性优化阶段移除大多数卫星，以测试系统行为。从零卫星设置开始，我们依次向系统中添加在实验中跟踪良好的卫星 G2、G13 和 G5，以分别模拟一、二、三颗卫星的情况。在这个实验中，我们只使用来自单一星座（GPS）的卫星，因为当 $M$ 颗卫星来自 $N$ 个星座的一般情况，由于不同系统间未知时钟偏移，等价于 $\left( {M - N + 1}\right)$ 单一星座的情况。值得一提的是，当没有卫星可用时，我们的系统自然会降级为 VIO。

The positioning error with 5 different settings is illustrated in Fig. 13. Obviously our system performs best in the normal setting where all available satellites are used for estimation. In the up direction, the errors of all other 4 configurations accumulate in a similar manner. This indicts that the drift in the up direction can no longer be eliminated with 3 satellites or less. In terms of horizontal directions, no accumulated error but only a small bias occurs for the three-satellite setting, which means our system is still able to suppress drifts in east, north and yaw directions. If the number of satellites is further reduced to 2 , the horizontal positioning error starts growing with the traveled distance, and we observed small periodic fluctuations in north direction which coincides with that of VIO. This implies that the drift in horizontal plane occurs and yaw error also emerges although the magnitude is very small. Finally with the one-satellite configuration, accumulated errors occur on all four unobservable directions of VIO. However, the error of the yaw component is still smaller compared to that of VIO, which can be inferred from the amplitude of the sine-wave-like error curve. The final trajectories with different satellite settings is shown in Fig. 14. Through this experiment, we claim that our system gradually degrades to different extents when the number of locked satellites varies from 3 to 0 . However, the proposed system outperforms a pure VIO in all different settings which indicts that our tightly fusion approach can still gain information from limited satellites.

图13展示了5种不同设置下的定位误差。显然，在正常设置下，我们的系统表现最佳，此时使用了所有可用卫星进行估计。在向上方向，所有其他4种配置的误差以相似的方式累积。这表明，当卫星数量降至3颗或更少时，向上方向的漂移已无法消除。在水平方向上，三卫星设置的累积误差很小，仅表现为微小的偏差，这意味着我们的系统仍然能够抑制东、北和偏航方向的漂移。如果卫星数量进一步减少到2颗，水平定位误差将随行驶距离增长，并且我们观察到北方向的周期性波动，这与VIO的波动一致。这表明水平面内发生了漂移，尽管偏航误差的幅度非常小，但也出现了。最后，在单卫星配置下，VIO的四个不可观测方向上都出现了累积误差。然而，偏航分量的误差仍然小于VIO的误差，这一点可以从正弦波形误差曲线的振幅推断出来。不同卫星设置下的最终轨迹如图14所示。通过这个实验，我们声称当锁定卫星的数量从3颗变为0颗时，我们的系统会逐渐退化到不同的程度。然而，与纯VIO相比，所提出的系统在所有不同设置下均表现更优，这表明我们的紧密融合方法仍然能够从有限的卫星中获取信息。

Indoor-outdoor Experiment: The GNSS-VI initialization takes ${9.0}\mathrm{\;s}$ in this experiment,with the majority of time waiting for GNSS navigation messages. This experiment, through which we aim to test the robustness of our system, is performed in a complex indoor-outdoor environment. The path of this experiment goes through many challenging scenarios which may bring a single-sensor-based system to failure. For example, no features are detected and tracked in dim or bright area, and the GNSS signal is highly corrupted or totally unavailable in cluttered or indoor environment. In addition, the path is similar to the one in a typical exploration task where no large loops exist, thus drifting is inevitable for any visual-inertial SLAM system. The overall distance of the resulting trajectory is over 3 kilometers and the altitude change is around 130 meters.
室内外实验：在本次实验中，GNSS-VI初始化耗时 ${9.0}\mathrm{\;s}$，大部分时间在等待GNSS导航消息。我们通过这个实验来测试系统的鲁棒性，实验在一个复杂的室内外环境中进行。实验路径经过许多可能使基于单一传感器的系统失败的挑战性场景。例如，在昏暗或明亮区域无法检测和跟踪特征，而在杂乱或室内环境中，GNSS信号严重受损或完全不可用。此外，路径与典型探索任务中的路径相似，不存在大环，因此对于任何视觉-惯性SLAM系统来说，漂移是不可避免的。生成的轨迹总距离超过3公里，高度变化约为130米。

Fig. 15 shows the ENU positioning error on the indoor-outdoor sequence. During this experiment the RTK ground truth is no longer always available because of the GNSS-unfriendly environment. Thus we only compare with segments where RTK is in fix status. The gaps around 300s occurs when we were under a bridge and passing through the woods which blocked most of the sky, and the blanks around 1200s and 1800s correspond to the situation where we were going up the indoor stairs. The result of VINS-Fusion is not shown in the figure because of huge errors and oscillations. It can be observed from the figure that VINS-Mono still experiences large accumulated errors on horizontal and yaw directions, while the error in the up direction is smaller than the previous experiment because of the attitude excitation on this sequence. The result of RTKLIB, although does not drift, varies a lot around the ground truth value. Those oscillations indict the condition of GNSS signal and severely affect the performance of VINS-Fusion. Our proposed system outperforms other three approaches in terms of positioning error and overcomes the harsh condition brought by the noisy GNSS measurement. The result of our system still has a bias on the up direction because of imperfect GNSS modelling and various error sources, while the up error of VINS-Mono starts from zero because of pre-alignment. The final trajectories of RTK, aligned VINS-Mono and our system is shown in Fig. 17. The figure shows that both VINS-Mono and our proposed system work well across the whole sequence, although obvious drift occurs on the result of VINS-Mono. The discontinuities on the trajectory of RTK is the result of cluttered and indoor environment. The trajectory of our system follows the RTK result well, and the positioning result, even in GNSS-unfriendly area, can be effectively recovered. Although the duration where RTK fails is short in the whole sequence, the impact can be significant. As shown in Fig. 16, the result of RTK is smooth and aligns well with that of GVINS when GNSS is reliable. However, the solution reported by RTK results in an error of up to 80 meters during GNSS outage, and such behavior is catastrophic for any location-based tasks. The final RMSE of all four approaches is shown in table. III.

图 15 显示了室内外序列上的 ENU 定位误差。在此次实验中，由于不友好的 GNSS 环境，RTK 真实值不再总是可用。因此，我们只比较 RTK 处于固定状态下的段。大约 300 秒周围的间隙发生在我们经过桥梁和穿过树林时，这些地方遮挡了大部分天空，而大约 1200 秒和 1800 秒周围的空白对应于我们上室内楼梯的情况。由于巨大的误差和振荡，VINS-Fusion 的结果没有在图中显示。从图中可以看出，VINS-Mono 仍在水平和偏航方向上经历较大的累积误差，而由于该序列上的姿态激励，向上的误差小于前一次实验。RTKLIB 的结果虽然不漂移，但在真实值附近波动较大。这些振荡表明了 GNSS 信号的条件，并严重影响 VINS-Fusion 的性能。我们提出的系统在定位误差方面优于其他三种方法，并克服了由噪声 GNSS 测量带来的严酷条件。我们系统的结果在向上方向上仍有偏差，这是由于不完美的 GNSS 建模和各种误差源，而 VINS-Mono 的向上误差从零开始，因为预对齐。RTK、对齐的 VINS-Mono 和我们系统的最终轨迹如图 17 所示。该图显示 VINS-Mono 和我们提出的系统在整个序列中都能很好地工作，尽管 VINS-Mono 的结果上有明显的漂移。RTK 轨迹上的不连续性是杂乱和室内环境的结果。我们系统的轨迹很好地跟随 RTK 结果，即使在 GNSS 不友好区域，定位结果也可以有效恢复。虽然在整个序列中 RTK 失败的持续时间很短，但影响可能很大。如图 16 所示，当 GNSS 可靠时，RTK 的结果平滑且与 GVINS 的结果对齐良好。然而，在 GNSS 中断期间，RTK 报告的解决方案误差高达 80 米，这种行为对于任何基于位置的任务都是灾难性的。所有四种方法的最终均方根误差 (RMSE) 显示在表 III 中。

Fig. 15. Positioning error of GVINS, VINS-Mono and RTKLIB in the complex indoor-outdoor experiment. We only compare with the RTK fix solutions, so the gaps in the figure corresponding to the situations where ground truth is not available. The result of VINS-Fusion is not shown because of huge errors and oscillations.

图 15. GVINS、VINS-Mono 和 RTKLIB 在复杂室内外实验中的定位误差。我们只与 RTK 固定解进行比较，因此图中对应于真实值不可用的情况有间隙。由于巨大的误差和振荡，未显示 VINS-Fusion 的结果。

GNSS Factor Experiment: Based on the previous indoor-outdoor sequence, we further investigate the role of each GNSS measurement(i.e. code pseudorange, Doppler shift) on the performance of our proposed system. By removing the corresponding graph factor after initialization phase, we obtain the positioning error on code pseudorange-only and Doppler-only configurations as depicted in Fig. 18. In the situation where we only employ Doppler shift measurement, an obvious drift occurs as the system no longer has global position constraints. In addition, the initialization error, which is inevitable because we initialize from only a short window of measurements, cannot be eliminated and acts like a bias subsequently. If we instead conduct the code pseudorange-only optimization, the system behaves like a normal GVINS, e.g. the system does not drift any more and the initialization error can be eliminated after a short period. However, as the code pseudorange measurement tends to be noisy and receiver clock biases are no longer constrained by Doppler shift, the smoothness of the estimation result is affected by the unstable signal, as shown in the magnified portion of Fig. 18. Through this experiment, we show that the code pseudorange measurement is the key to eliminating the accumulated drift of VIO. However, with the aid of the Doppler shift measurement, the estimation result tends to be smoother under unstable GNSS conditions.
GNSS 因子实验：基于之前的室内外序列，我们进一步研究每个 GNSS 测量（即代码伪距、多普勒频移）对我们提出系统性能的影响。在初始化阶段之后移除相应的图因子，我们得到仅使用代码伪距和仅使用多普勒配置的定位误差，如图 18 所示。在我们仅使用多普勒频移测量的情况下，由于系统不再具有全局位置约束，因此出现了明显的漂移。此外，初始化误差由于我们仅从短时间窗口的测量中初始化而是不可避免的，无法消除，并且后续表现为偏差。如果我们改为仅进行代码伪距优化，系统的表现就像正常的 GVINS，例如系统不再漂移，且在短时间后可以消除初始化误差。然而，由于代码伪距测量倾向于噪声较大，且接收器时钟偏差不再受多普勒频移约束，估计结果的平滑性受到不稳定信号的影响，如图 18 放大部分所示。通过这个实验，我们表明代码伪距测量是消除 VIO 累积漂移的关键。然而，在 GNSS 条件不稳定的情况下，借助多普勒频移测量，估计结果趋于更平滑。

Fig. 16. Positioning result of RTK and GVINS in the complex indoor-outdoor experiment.

图 16. RTK 和 GVINS 在复杂室内外实验中的定位结果。

Fig. 17. Final trajectories in the complex indoor-outdoor experiment. The result of RTKLIB and VINS-Fusion are not plotted because of large noise and jitters. The discontinuities on the RTK path is the result of bad GNSS signal and fix-lost events.

图 17. 复杂室内外实验的最终轨迹。由于 RTKLIB 和 VINS-Fusion 的结果存在大量噪声和抖动，因此没有绘制。RTK 路径上的不连续性是由于 GNSS 信号不良和定位丢失事件造成的。

Urban Driving Experiment: In this experiment we test our system with a challenging urban driving scenario in one of the most populous districts of Hong Kong. The experiment begins at dusk and lasts over 40 minutes until complete dark,with a total distance of ${22.9}\mathrm{\;{km}}$ . The data sequence covers heterogeneous situations, such as day and night, urban canyon and open-sky outdoors, etc. The challenging cases, including high-rise buildings, low illumination, fast movement and highly dynamic environments, make it impracticable for a single-sensor based algorithm to deal with. Two image samples from the data sequence are shown in Fig. 19.
城市驾驶实验：在这个实验中，我们在香港人口最稠密的地区之一的一个具有挑战性的城市驾驶场景中测试我们的系统。实验从黄昏开始，持续了40多分钟直到完全黑暗，总行驶距离为 ${22.9}\mathrm{\;{km}}$ 。数据序列涵盖了异质环境，例如白天和夜晚、城市峡谷和开阔天空等。具有挑战性的情况，包括高层建筑、低光照、快速移动和高度动态环境，使得基于单一传感器的算法无法处理。数据序列中的两个图像样本显示在图 19 中。

Fig. 18. Positioning error of normal GVINS, GVINS w/o Doppler factor and GVINS w/o code pseudorange factor.

图 18. 正常 GVINS、不带多普勒因子 GVINS 和不带码伪距因子的 GVINS 定位误差。

Fig. 19. Two image samples illustrate the challenging situations in the urban driving experiment. In the left image the GNSS receiver is surrounded by high-rise buildings, where multipath effect is obvious. The right image shows a highly dynamic scenario with low illumination and a high traffic flow on an expressway.

图 19. 两个图像样本说明了城市驾驶实验中的挑战性情况。左图中的 GNSS 接收器被高层建筑环绕，多径效应明显。右图显示了一个动态场景，其中光照低，高速公路上车流量大。

During the experiment, the GNSS outage occur constantly even in outdoor environments, because of the traffic signs and bridges above the road. In addition, severe multipath effect is observed on the GNSS measurements when the receiver is surrounded by high-rise buildings in urban canyon. To this end, a robust norm is applied on the code pseudorange and Doppler shift factors to re-weight GNSS outliers.

在实验过程中，即使在户外环境中，由于道路上的交通标志和桥梁，GNSS 断电也经常发生。此外，当接收器在城市峡谷中的高层建筑周围时，GNSS 测量值上观察到了严重多径效应。为此，对码伪距和多普勒频移因子应用了一种鲁棒范数，以重新加权 GNSS 异常值。

On this sequence, the VINS-Mono, which only uses visual and inertial sensors to perform estimation, fails at 1200s when the sky becomes dark and many vehicles pass by. The failure of VINS-Mono occurs at ${54}\%$ of the total distance,with a RMSE of ${760.22}\mathrm{\;m}$ indicting a large drift. The loosely-coupled GNSS-visual-inertial algorithm, VINS-Fusion, does not explicitly report any failure. However, hugh oscillations are observed on its result, with the corresponding RMSE at the order of ${10}^{5}\mathrm{\;m}$ . To this end,we also mark the result of VINS-Fusion as a failure case.

在这个序列中，仅使用视觉和惯性传感器进行估计的VINS-Mono在1200秒时失败，当时天空变暗，许多车辆经过。VINS-Mono的失败发生在总距离的 ${54}\%$ 处，其均方根误差（RMSE）为 ${760.22}\mathrm{\;m}$ ，表明有较大的漂移。松耦合的GNSS-视觉-惯性算法VINS-Fusion没有明确报告任何失败。然而，其结果上观察到了巨大的振荡，相应的RMSE为 ${10}^{5}\mathrm{\;m}$ 量级。为此，我们还把VINS-Fusion的结果标记为失败案例。

Fig. 20. Positioning error of GVINS, VINS-Mono and RTKLIB in the urban driving experiment. The gaps in the figure corresponding to the RTK non-fixed status. The results of VINS-Fusion and VINS-Mono are not shown because of failure.

图20. 在城市驾驶实验中GVINS、VINS-Mono和RTKLIB的位置误差。图中的间隙对应于RTK非固定状态。由于失败，未显示VINS-Fusion和VINS-Mono的结果。

In this experiment, the GNSS-VI is successfully initialized in ${2.0}\mathrm{\;s}$ after the visual-inertial initialization finishes. Fig. 20 shows the positioning error of GVINS and RTKLIB on three axes of ENU frame respectively. The extreme errors from the result of RTKLIB,which we define as above ${100}\mathrm{\;m}$ ,are not shown to limit the scale of the plot. The large-magnitude oscillations of RTKLIB on this data sequence clearly illustrate the terrible quality of GNSS signal in the harsh environment, especially around ${400}\mathrm{\;s}$ and ${1350}\mathrm{\;s}$ ,where the receiver is surrounded by high-rise buildings and multipath effect is severe. Our proposed system, GVINS, survives through the whole sequence, which again proves the availability and robustness of our system. The slowly varying and well bounded positioning error of GVINS in Fig. 20 shows the local smoothness and global consistency properties of the proposed method.

在这个实验中，GNSS-VI在视觉-惯性初始化完成后在 ${2.0}\mathrm{\;s}$ 成功初始化。图20显示了GVINS和RTKLIB在ENU坐标系三个轴上的定位误差。为了限制图表的规模，未显示RTKLIB结果中的极端误差，我们将其定义为超过 ${100}\mathrm{\;m}$ 的误差。RTKLIB在该数据序列上的大幅度振荡清楚地说明了恶劣环境中GNSS信号的质量很差，特别是在 ${400}\mathrm{\;s}$ 和 ${1350}\mathrm{\;s}$ 附近，接收器被高层建筑环绕，多径效应严重。我们提出的系统GVINS在整个序列中都能存活下来，这再次证明了我们的系统的可用性和鲁棒性。图20中GVINS的缓慢变化且界限明确的定位误差显示了所提出方法的局部平滑性和全局一致性。

Fig. 21 illustrates the positioning results of RTK and GVINS. From Fig. 21 we see that the trajectory of our system aligns well with that of RTK on horizontal directions. Since we do not perform any alignment on the result of GVINS, a obvious bias, in addition to the varying error, can be observed on the vertical direction. The RMSE of GVINS and RTKLIB is also included in Table. III, and the fields of VINS-Mono and VINS-Fusion are marked as N/A because of failure. The final trajectories on Google Maps are depicted in Fig. 22, where the result of RTK is plotted on the top of GVINS. Note that the GNSS outage occurs constantly even on the open-sky expressway because of the traffic signs and viaducts. By stack the trajectory of RTK on the top of that of GVINS, the discontinuities on the path of RTK, corresponding to the RTK non-fixed status over long distance, can be clearly illustrated in the figure. Due to the large scale of the map, the frequent short-term RTK outages cannot be observed.

图21展示了RTK和GVINS的定位结果。从图21中我们可以看到，我们的系统轨迹在水平方向上与RTK的轨迹对齐良好。由于我们未对GVINS的结果执行任何对齐操作，除了误差变化外，在垂直方向上还可以观察到明显的偏差。GVINS和RTKLIB的均方根误差（RMSE）也包含在表III中，而VINS-Mono和VINS-Fusion的字段标记为不可用（N/A），因为它们失败了。最终在谷歌地图上的轨迹如图22所示，其中RTK的结果绘制在GVINS的顶部。请注意，即使在开阔的高速公路上，由于交通标志和立交桥，GNSS信号中断也是持续发生的。通过将RTK的轨迹堆叠在GVINS的轨迹之上，图中断续的路径，对应于RTK在长距离上的非固定状态，可以清楚地展示出来。由于地图的比例较大，频繁的短期RTK中断无法观察到。

Fig. 21. Positioning result of RTK and GVINS in the challenging urban driving experiment.

图21。在具有挑战性的城市驾驶实验中RTK和GVINS的定位结果。

In terms of the computational time, the feature detection and tracking, which are same for VINS-Mono, VINS-Fusion and GVINS,costs ${7.28}\mathrm{\;{ms}}$ per frame. The window optimization of VINS-Mono takes ${21.76}\mathrm{\;{ms}}$ on average. The time spent in the pose graph module of VINS-Fusion, which is used to fuse GNSS solution with visual-inertial odometry, grows as the travelled distance increases. The lower limit is ${1.12}\mathrm{\;{ms}}$ at the beginning and the upper bound is ${1018.46}\mathrm{\;{ms}}$ in the end, with an average value of ${404.83}\mathrm{\;{ms}}$ . In contrast,our proposed GVINS only needs ${21.91}\mathrm{\;{ms}}$ on the window optimization thanks to the tightly-coupled and sliding-window approaches we adopted. Considering the ${20} - \mathrm{{Hz}}$ camera we used in our experiments, our system can safely run at real-time while obvious lags may be observed in the case of VINS-Fusion as the travelled distance grows.

在计算时间方面，特征检测和跟踪对于VINS-Mono、VINS-Fusion和GVINS来说是相同的，每帧的成本为 ${7.28}\mathrm{\;{ms}}$。VINS-Mono的窗口优化平均耗时 ${21.76}\mathrm{\;{ms}}$。VINS-Fusion的位姿图模块用于将GNSS解决方案与视觉-惯性里程计融合，其耗时随行驶距离的增加而增长。初始阶段的下限是 ${1.12}\mathrm{\;{ms}}$，而最终的上限是 ${1018.46}\mathrm{\;{ms}}$，平均值为 ${404.83}\mathrm{\;{ms}}$。相比之下，我们提出的GVINS由于采用了紧耦合和滑动窗口方法，窗口优化仅需 ${21.91}\mathrm{\;{ms}}$。考虑到我们在实验中使用的 ${20} - \mathrm{{Hz}}$ 相机，我们的系统可以实时安全运行，而在行驶距离增长的情况下，VINS-Fusion可能会出现明显的延迟。

IX. CONCLUSION

IX. 结论

In this paper, we propose a tightly-coupled system to fuse measurements from camera, IMU and GNSS receiver under a non-linear optimization-based framework. Our system starts with an initialization phase, during which a coarse-to-fine procedure is employed to online calibrate the transformation between the local and global frames. In the optimization phase, GNSS raw measurements are modelled and formulated using the probabilistic factor graph. The degenerate cases are considered and carefully handled to keep the system robust in the complex environment. We conduct experiments in both simulation and real-world environments to evaluate the performance of our system, and the results show that our system effectively eliminates the accumulated drift and preserves the local accuracy of a typical VIO system. To this end, we state that our system can achieve both local smoothness and global consistency.

在本文中，我们提出了一种基于非线性优化框架的紧耦合系统，用于融合相机、IMU和GNSS接收机的测量数据。我们的系统以初始化阶段开始，在此阶段采用粗到细的过程在线校准本地框架与全局框架之间的变换。在优化阶段，GNSS原始测量数据被建模并用概率因子图进行公式化。考虑了退化情况，并仔细处理以保持系统在复杂环境中的鲁棒性。我们在模拟环境和真实世界环境中进行实验以评估我们系统的性能，结果显示我们的系统能够有效消除累积漂移并保持典型VIO系统的局部精度。为此，我们指出我们的系统可以实现局部平滑性和全局一致性。

Fig. 22. The trajectories of RTK and GVINS in the urban driving experiment. The two paths totally align with each other. The trajectory of RTK is plotted on the top of that of GVINS so that RTK non-fixed status can be clearly shown by the discontinuities.

图 22. 在城市驾驶实验中 RTK 和 GVINS 的轨迹。两条路径完全重合。RTK 的轨迹绘制在 GVINS 轨迹之上，以便通过不连续性清楚地显示 RTK 的非固定状态。

In future work, the theoretical observability analysis will be conducted under various degenerate situations and we aim to build an online observability-aware state estimator to deal with complex environments and possible sensor failures. In addition, we are also interested in reducing the absolute positioning error by GNSS measurements combination [36] or PPP [37] techniques to handle distributed localization tasks in swarm systems.

在未来的工作中，我们将在各种退化情况下进行理论可观测性分析，并旨在构建一个在线可观测性感知的状态估计器，以处理复杂环境和可能的传感器故障。此外，我们还对通过 GNSS 测量组合 [36] 或 PPP [37] 技术减少绝对定位误差感兴趣，以处理群体系统中的分布式定位任务。

REFERENCES

参考文献

[1] G. Huang, M. Kaess, and J. J. Leonard, "Towards consistent visual-inertial navigation," in Proc. of the IEEE Int. Conf. on Robot. and Autom., 2014, pp. 4926-4933.

[2] A. I. Mourikis and S. I. Roumeliotis, "A multi-state constraint Kalman filter for vision-aided inertial navigation," in Proc. of the IEEE Int. Conf. on Robot. and Autom., Roma, Italy, Apr. 2007, pp. 3565-3572.

[3] M. Li and A. Mourikis, "High-precision, consistent EKF-based visual-inertial odometry," Int. J. Robot. Research, vol. 32, no. 6, pp. 690-711, May 2013.

[4] K. Wu, A. Ahmed, G. A. Georgiou, and S. I. Roumeliotis, "A square root inverse filter for efficient vision-aided inertial navigation on mobile devices." in Robot.: Science and Systems, 2015, p. 2.

[5] S. Leutenegger, S. Lynen, M. Bosse, R. Siegwart, and P. Furgale, "Keyframe-based visual-inertial odometry using nonlinear optimization," Int. J. Robot. Research, vol. 34, no. 3, pp. 314-334, Mar. 2014.

[6] T. Qin, P. Li, and S. Shen, "Vins-mono: A robust and versatile monocular visual-inertial state estimator," IEEE Trans. Robot., vol. 34, no. 4, pp. 1004-1020, 2018.

[7] C. V. Angelino, V. R. Baraniello, and L. Cicala, "Uav position and attitude estimation using imu, gnss and camera," in Proc. of the Int. Conf. on Information Fusion, 2012, pp. 735-742.

[8] S. Lynen, M. W. Achtelik, S. Weiss, M. Chli, and R. Siegwart, "A robust and modular multi-sensor fusion approach applied to mav navigation," in Proc. of the IEEE/RSJ Int. Conf. on Intell. Robots and Syst. IEEE, 2013, pp. 3923-3929.

[9] S. Shen, Y. Mulgaonkar, N. Michael, and V. Kumar, "Multi-sensor fusion for robust autonomous flight in indoor and outdoor environments with a rotorcraft MAV," in Proc. of the IEEE Int. Conf. on Robot. and Autom., Hong Kong, China, May 2014, pp. 4974-4981.

[10] R. Mascaro, L. Teixeira, T. Hinzmann, R. Siegwart, and M. Chli, "Gomsf: Graph-optimization based multi-sensor fusion for robust uav pose estimation," in Proc. of the IEEE Int. Conf. on Robot. and Autom., 2018, pp. 1421-1428.

[11] Y. Yu, W. Gao, C. Liu, S. Shen, and M. Liu, "A gps-aided omnidirectional visual-inertial state estimator in ubiquitous environments," in Proc. of the IEEE/RSJ Int. Conf. on Intell. Robots and Syst., 2019, pp. 7750-7755.

[12] T. Qin, S. Cao, J. Pan, and S. Shen, "A general optimization-based framework for global pose estimation with multiple sensors," CoRR, vol. abs/1901.03642, 2019. [Online]. Available: http://arxiv.org/abs/ 1901.03642

[13] X. Li, X. Wang, J. Liao, X. Li, S. Li, and H. Lyu, "Semi-tightly coupled integration of multi-gnss ppp and s-vins for precise positioning in gnss-challenged environments," Satellite Navigation, vol. 2, no. 1, pp. 1-14, 2021.

[14] P. V. Gakne and K. O'Keefe, "Tightly-coupled gnss/vision using a sky-pointing camera for vehicle navigation in urban areas," Sensors, vol. 18, no. 4, p. 1244, 2018.

[15] M. Schreiber, H. Königshof, A.-M. Hellmund, and C. Stiller, "Vehicle localization with tightly coupled gnss and visual odometry," in Proc. of the IEEE Sym. on Intell. Vehicles, 2016, pp. 858-863.

[16] D. P. Shepard and T. E. Humphreys, "High-precision globally-referenced position and attitude via a fusion of visual slam, carrier-phase-based gps, and inertial measurements," in Proc. of the IEEE/ION Sym. on Pos. Loc. and Nav., 2014, pp. 1309-1328.

[17] T. Li, H. Zhang, Z. Gao, X. Niu, and N. El-Sheimy, "Tight fusion of a monocular camera, mems-imu, and single-frequency multi-gnss rtk for precise navigation in gnss-challenged environments," Remote Sensing, vol. 11, no. 6, p. 610, 2019.

[18] J. E. Yoder, P. A. Iannucci, L. Narula, and T. E. Humphreys, "Multi-antenna vision-and-inertial-aided edgnss for micro aerial vehicle pose estimation," in Proc. of the Int. Tech. Meet. of the Sate. Div. of The Inst. of Nav., 2020, pp. 2281-2298.

[19] A. Soloviev and D. Venable, "Integration of gps and vision measurements for navigation in gps-challenged environments," in Proc. of the IEEE/ION Sym. on Pos. Loc. and Nav., 2010, pp. 826-833.

[20] D. H. Won, E. Lee, M. Heo, S. Sung, J. Lee, and Y. J. Lee, "Gnss integration with vision-based navigation for low gnss visibility conditions," GPS solutions, vol. 18, no. 2, pp. 177-187, 2014.

[21] J. Liu, W. Gao, and Z. Hu, "Optimization-based visual-inertial SLAM tightly coupled with raw GNSS measurements," CoRR, vol. abs/2010.11675, 2020. [Online]. Available: https://arxiv.org/abs/2010. 11675

[22] J. Saastamoinen, "Contributions to the theory of atmospheric refraction," Bulletin Géodésique (1946-1975), vol. 105, no. 1, pp. 279-298, 1972.

[23] J. A. Klobuchar, "Ionospheric time-delay algorithm for single-frequency gps users," IEEE Trans. on aero. and elec. syst., no. 3, pp. 325-331, 1987.

[24] E. Kaplan and C. Hegarty, Understanding GPS: principles and applications. Artech house, 2005.

[25] S. Shen, N. Michael, and V. Kumar, "Tightly-coupled monocular visual-inertial fusion for autonomous flight of rotorcraft MAVs," in Proc. of the IEEE Int. Conf. on Robot. and Autom., Seattle, WA, May 2015.

[26] C. Forster, L. Carlone, F. Dellaert, and D. Scaramuzza, "On-manifold preintegration for real-time visual-inertial odometry," IEEE Trans. Robot., vol. 33, no. 1, pp. 1-21, 2017.

[27] S. Shen, N. Michael, and V. Kumar, "Tightly-coupled monocular visual-inertial fusion for autonomous flight of rotorcraft mavs," in Proc. of the IEEE Int. Conf. on Robot. and Autom., 2015, pp. 5303-5310.

[28] F. R. Kschischang, B. J. Frey, and H.-A. Loeliger, "Factor graphs and the sum-product algorithm," IEEE Trans. on info. theo., vol. 47, no. 2, pp. 498-519, 2001.

[29] J. Shi et al., "Good features to track," in Proc. of the IEEE Int. Conf. on Pattern Recognition, 1994, pp. 593-600.

[30] B. D. Lucas and T. Kanade, "An iterative image registration technique with an application to stereo vision," in Proc. of the Intl. Joint Conf. on Artificial Intelligence, Vancouver, Canada, Aug. 1981, pp. 24-28.

[31] L. Heng, B. Li, and M. Pollefeys, "Camodocal: Automatic intrinsic and extrinsic calibration of a rig with multiple generic cameras and odometry," in Proc. of the IEEE/RSJ Int. Conf. on Intell. Robots and Syst., 2013, pp. 1793-1800.

[32] T. Qin and S. Shen, "Robust initialization of monocular visual-inertial estimation on aerial robots." in Proc. of the IEEE/RSJ Int. Conf. on Intell. Robots and Syst., Vancouver, Canada, 2017.

[33] T. Takasu and A. Yasuda, "Development of the low-cost rtk-gps receiver with an open source program package rtklib," in Proc. of the Int. Sym. on GPS/GNSS, vol. 1, 2009.

[34] A. Geiger, P. Lenz, and R. Urtasun, "Are we ready for autonomous driving? the kitti vision benchmark suite," in Proc. of the IEEE Int. Conf. on Pattern Recognition, 2012, pp. 3354-3361.

[35] J. Nikolic, J. Rehder, M. Burri, P. Gohl, S. Leutenegger, P. T. Furgale, and R. Siegwart, "A synchronized visual-inertial sensor system with fpga pre-processing for accurate real-time slam," in Proc. of the IEEE Int. Conf. on Robot. and Autom., 2014, pp. 431-437.

[36] G. Blewitt, "An automatic editing algorithm for gps data," Geophysical research letters, vol. 17, no. 3, pp. 199-202, 1990.

[37] J. Zumberge, M. Heflin, D. Jefferson, M. Watkins, and F. Webb, "Precise point positioning for the efficient and robust analysis of gps data from large networks," J. of geophysical research: solid earth, vol. 102, no. B3, pp. 5005-5017, 1997.

标签：right,mathbf,system,GNSS,GVIS,VINS,left
From： https://www.cnblogs.com/odesey/p/18325172

GVIS