论文解析二： SuperGlue 同时进行特征匹配以及滤除外点的网络

标签：dim torch log nn self SuperGlue scores 除外解析

1.SuperGlue摘要

本文提出了一种能够同时进行特征匹配以及滤除外点的网络。其中特征匹配是通过求解可微分最优化转移问题（ optimal transport problem）来解决；本文基于注意力机制提出了一种将2D特征点以及聚合机制，这使得SuperGlue能够同时感知潜在的3D场景以及进行特征匹配。该网络能够在GPU上达到实时，预期能够集成到slam算法中位置如下图：

在这里插入图片描述

在经典的SLAM框架中，前端进行特征提取，后端进行非线性优化，而中间非常重要的一步就是特征匹配，传统的特征匹配通常是结合最近邻、RASANC等一些算法进行处理，SuperGlue的推出是SLAM算法端到端深度学习的一个重要里程碑。

2.SuperGlue网络结构

在这里插入图片描述

整个框架由两个主要模块组成，注意力GNN以及最优匹配层：

先进入注意力GNN网络，通过Keypoint Encoder（关键点编码器）将关键点位置p以及视觉描述符d映射到单个向量中（该向量可以理解为特征匹配向量），随后利用Self（自我注意力）以及Cross（交叉注意力）（重复L）来创建更强大的表示向量f。

随后进入最优匹配层，通过计算特征匹配向量的内积得到score matrix（M*N的得分矩阵），用dustbin score进行扩充，然后通过Sinkhorn算法（迭代T次）找到最佳的部分分配。

2.1 Attentional Graph Neural Network（注意图神经网络）

2.1.1 KeyPoint Encoder ：解决同时进行特征匹配以及滤除外点的网络

假定有A，B两张图像，分别检测出M和N个特征点，分别记为A = {1,2…M} 和 B = {1,2…N} ，每个特征点由（p，d）表示，其中pi = (x,y,c)为第i个特征点（归一化）后的位置（x，y）和置信度（c）。di为第i个特征点的特征向量。我们首先对输入网络的特征点和特征向量进行编码：
在这里插入图片描述

其作用就是特征点的位置和特征向量编码进同一个特征 ，使得网络在进行匹配时能够同时考虑到特征描述和位置的相似性。

self.kenc = KeypointEncoder(self.descriptor_dim, self.keypoint_encoder)#关键点编码器（KeypointEncoder）实例化

desc0 = desc0 + self.kenc(kpts0, data['scores0'])
desc1 = desc1 + self.kenc(kpts1, data['scores1'])#对于两组关键点数据kpts0和kpts1，分别调用self.kenc对其进行编码处理，并将编码结果累加到描述子desc0和desc1中，以更新描述子的信息。这样可以有效地将关键点信息融入到描述子中，提高描述子的表征能力

#多层感知机MLP 1*1卷积+BN+ReLU
def MLP(channels: list, do_bn=True):
    """ Multi-layer perceptron """
    n = len(channels)
    layers = []
    #遍历每一层
    for i in range(1, n):
        layers.append(
            #添加一维卷积层
            nn.Conv1d(channels[i - 1], channels[i], kernel_size=1, bias=True))
        #如果不是最后一层，则添加批归一化和ReLU激活函数
        if i < (n-1):
            if do_bn:
                layers.append(nn.BatchNorm1d(channels[i]))
            layers.append(nn.ReLU())
    return nn.Sequential(*layers)
class KeypointEncoder(nn.Module):
    """ 关键点编码器（KeypointEncoder），联合使用MLP对特征点的位置和特征进行编码"""
    def __init__(self, feature_dim, layers):
        """
        初始化关键点编码器
        参数:
            feature_dim (int): 特征维度
            layers (list): 包含隐藏层维度的列表
        """
        super().__init__()
        self.encoder = MLP([3] + layers + [feature_dim])
        nn.init.constant_(self.encoder[-1].bias, 0.0)
    def forward(self, kpts, scores):
        """
        前向传播函数
        参数:
            kpts (Tensor): 关键点张量，形状为(N, K, 2)，N为样本数，K为关键点个数，每个关键点有2维坐标信息
            scores (Tensor): 分数张量，形状为(N, K)，N为样本数，K为关键点个数
        返回:
            Tensor: 编码后的特征张量
        """
        inputs = [kpts.transpose(1, 2), scores.unsqueeze(1)]
        return self.encoder(torch.cat(inputs, dim=1))

2.1.2 Attentional Aggregation

在这里插入图片描述

message = self.attn(x, source, source)

self.attn = MultiHeadedAttention(num_heads, feature_dim)

"""
    实现注意力机制
    参数:
        query (Tensor): 查询张量，形状为(B, Dq, H, N)，B为批量大小，Dq为查询维度，H为头数，N为序列长度
        key (Tensor): 键张量，形状为(B, Dk, H, M)，Dk为键维度，M为序列长度
        value (Tensor): 值张量，形状为(B, Dv, H, M)，Dv为值维度
    返回:
        Tuple[Tensor, Tensor]: 经过注意力加权后的值张量和注意力分布张量
"""
def attention(query, key, value):
    dim = query.shape[1]
    scores = torch.einsum('bdhn,bdhm->bhnm', query, key) / dim**.5
    prob = torch.nn.functional.softmax(scores, dim=-1)
    return torch.einsum('bhnm,bdhm->bdhn', prob, value), prob
class MultiHeadedAttention(torch.jit.ScriptModule):
    """ 定义了一个多头注意力的类，用于增强模型的表达能力 """
    prob: List[torch.Tensor]

    def __init__(self, num_heads: int, d_model: int):
        super().__init__()
        assert d_model % num_heads == 0
        self.dim = d_model // num_heads
        self.num_heads = num_heads
        self.merge = nn.Conv1d(d_model, d_model, kernel_size=1)
        self.proj = nn.ModuleList([deepcopy(self.merge) for _ in range(3)]) # 对应不同的W
        self.prob = []

    @torch.jit.script_method
    def forward(self, query, key, value):
        batch_dim = query.size(0)
        query, key, value = [l(x).view(batch_dim, self.dim, self.num_heads, -1)
                             for l, x in zip(self.proj, (query, key, value))] # 通过卷积提取query、key、value
        x, prob = attention(query, key, value) # 进行attention计算
        self.prob.append(prob)
        return self.merge(x.contiguous().view(batch_dim, self.dim*self.num_heads, -1)) # 合并多头结果

在这里插入图片描述

self.gnn = AttentionalGNN(self.descriptor_dim, self.GNN_layers)

desc0, desc1 = self.gnn(desc0, desc1)

class AttentionalPropagation(torch.jit.ScriptModule):
    def __init__(self, feature_dim: int, num_heads: int):
        super().__init__()
        # 初始化时创建一个MultiHeadedAttention实例，用于进行多头注意力计算
        self.attn = MultiHeadedAttention(num_heads, feature_dim)
        # 创建一个MLP（多层感知机）实例，用于后续信息融合
        self.mlp = MLP([feature_dim*2, feature_dim*2, feature_dim])
        # 对MLP最后一层的偏置项进行初始化，设置为常数0.0
        nn.init.constant_(self.mlp[-1].bias, 0.0)

    @torch.jit.script_method
    def forward(self, x, source):
        # 使用MultiHeadedAttention计算输入x与source之间的注意力信息
        message = self.attn(x, source, source)
        # 将输入x和注意力信息message拼接起来，然后通过MLP进行信息融合得到输出
        return self.mlp(torch.cat([x, message], dim=1)) # 这里就是上文公式对应的代码计算
#实现基于注意力机制的图神经网络（GNN）
class AttentionalGNN(torch.jit.ScriptModule):
    def __init__(self, feature_dim: int, layer_names: list):
        super().__init__()
        self.layers = nn.ModuleList([
            AttentionalPropagation(feature_dim, 4)
            for _ in range(len(layer_names))])
        self.names = layer_names

    @torch.jit.script_method
    def forward(self, desc0, desc1):
        for i, layer in enumerate(self.layers):
            layer.attn.prob = []
            if self.names[i] == 'cross':
                src0, src1 = desc1, desc0
            else:  # if name == 'self':
                src0, src1 = desc0, desc1
            delta0, delta1 = layer(desc0, src0), layer(desc1, src1)
            desc0, desc1 = (desc0 + delta0), (desc1 + delta1) # 这里相当于residual相加
        return desc0, desc1

这里反复迭代Self-/Cross-Attention的目的原论文指出是为了模拟人类进行匹配时来回浏览的过程，其实Self-Attention为了使得特征更加具备匹配的特异性，而Cross-Attention是为了这些具备特异性的点在图像间进行相似度比较。这个过程在原论文中作者有很好的可视化出来，如下图所示：
在这里插入图片描述
其中，绿色、蓝色和红色分别代表匹配简单、中等和困难的特征点，从左侧Self-Attention的可视化结果我们可以看出，在浅层时，特征点关联到了图像上所有的特征的，而随着网络层数的增加，Self-Attention逐渐收敛到和自己最相似的特征点（包括位置和特征描述），而Cross-Attention表现也是相同的，随着网络层数的增加逐渐收敛到正确匹配点。而且我们可以观察到，绿色的特征点更容易收敛，且关注的区域会随着网络的层数增加而减小。

2.2 Optimal Matching Layer (最优匹配层)

在完成Attention Graph Neural Network计算后，我们对所有特征点的特征进行一层MLP构建最终我们匹配用的Score矩阵Sij
在这里插入图片描述

对应代码如下：

self.final_proj = nn.Conv1d(self.descriptor_dim, self.descriptor_dim, kernel_size=1, bias=True)

# Final MLP projection.
mdesc0, mdesc1 = self.final_proj(desc0), self.final_proj(desc1)

# Compute matching descriptor distance.
scores = torch.einsum('bdn,bdm->bnm', mdesc0, mdesc1)
scores = scores / self.descriptor_dim**.5

接下来就是对scores矩阵应用Sinkhorn算法：

#下面两个是Sinkhorn算法实现 
#执行Sinkhorn归一化的迭代过程，在对数空间中进行计算以提高数值稳定性
def log_sinkhorn_iterations(Z, log_mu, log_nu, iters: int):
    """ Perform Sinkhorn Normalization in Log-space for stability"""
    u, v = torch.zeros_like(log_mu), torch.zeros_like(log_nu)
    for _ in range(iters):
        u = log_mu - torch.logsumexp(Z + v.unsqueeze(1), dim=2)
        v = log_nu - torch.logsumexp(Z + u.unsqueeze(2), dim=1)
    return Z + u.unsqueeze(2) + v.unsqueeze(1)

#用于在对数空间中执行可微的最优输运（Optimal Transport）操作，以确保稳定性
def log_optimal_transport(scores, alpha, iters: int):
    """ Perform Differentiable Optimal Transport in Log-space for stability"""
    b, m, n = scores.shape
    one = scores.new_tensor(1)
    ms, ns = (m*one).to(scores), (n*one).to(scores)

    bins0 = alpha.expand(b, m, 1)
    bins1 = alpha.expand(b, 1, n)
    alpha = alpha.expand(b, 1, 1)

    couplings = torch.cat([torch.cat([scores, bins0], -1),
                           torch.cat([bins1, alpha], -1)], 1)

    norm = - (ms + ns).log()
    log_mu = torch.cat([norm.expand(m), ns.log()[None] + norm])
    log_nu = torch.cat([norm.expand(n), ms.log()[None] + norm])
    log_mu, log_nu = log_mu[None].expand(b, -1), log_nu[None].expand(b, -1)

    Z = log_sinkhorn_iterations(couplings, log_mu, log_nu, iters)
    Z = Z - norm  # multiply probabilities by M+N
    return Z

self.gnn = AttentionalGNN(self.config['descriptor_dim'], self.config['GNN_layers'])

2.3 损失函数

SuperGlue的损失函数如下所示：

在这里插入图片描述

3.整体代码详解

import torch
from torch import nn
from copy import deepcopy
from pathlib import Path

#多层感知机  1*1卷积+BN+ReLU
def MLP(channels: list, do_bn=True):
    """ Multi-layer perceptron """
    n = len(channels)
    layers = []
    for i in range(1, n):
        layers.append(
            nn.Conv1d(channels[i - 1], channels[i], kernel_size=1, bias=True))
        if i < (n-1):
            if do_bn:
                layers.append(nn.BatchNorm1d(channels[i]))
            layers.append(nn.ReLU())
    return nn.Sequential(*layers)

# 参数:
#     kpts (Tensor): 关键点张量，形状为(N, K, 2)，N为样本数，K为关键点个数，每个关键点有2维坐标信息
#     image_shape (tuple): 图像尺寸，形状为(C, H, W)，C为通道数，H为图像高度，W为图像宽度
#返回:
#     Tensor: 规范化后的关键点位置张量，形状同输入kpts
def normalize_keypoints(kpts, image_shape):
    """ 根据图像尺寸规范化关键点位置"""
    _, _, height, width = image_shape
    one = kpts.new_tensor(1)
    size = torch.stack([one*width, one*height])[None]# 构建图像大小张
    center = size / 2 # 计算图像中心点坐标
    scaling = size.max(1, keepdim=True).values * 0.7 # 计算缩放比例
    return (kpts - center[:, None, :]) / scaling[:, None, :]

# KeypointEncoder 关键点编码器
class KeypointEncoder(nn.Module):
    """ 关键点编码器（KeypointEncoder），联合使用MLP对特征点的位置和特征进行编码"""
    def __init__(self, feature_dim, layers):
        """
        初始化关键点编码器
        参数:
            feature_dim (int): 特征维度
            layers (list): 包含隐藏层维度的列表
        """
        super().__init__()
        self.encoder = MLP([3] + layers + [feature_dim])
        nn.init.constant_(self.encoder[-1].bias, 0.0)
    def forward(self, kpts, scores):
        """
        前向传播函数
        参数:
            kpts (Tensor): 关键点张量，形状为(N, K, 2)，N为样本数，K为关键点个数，每个关键点有2维坐标信息
            scores (Tensor): 分数张量，形状为(N, K)，N为样本数，K为关键点个数

        返回:
            Tensor: 编码后的特征张量
        """
        inputs = [kpts.transpose(1, 2), scores.unsqueeze(1)]
        return self.encoder(torch.cat(inputs, dim=1))

"""
    实现注意力机制
    参数:
        query (Tensor): 查询张量，形状为(B, Dq, H, N)，B为批量大小，Dq为查询维度，H为头数，N为序列长度
        key (Tensor): 键张量，形状为(B, Dk, H, M)，Dk为键维度，M为序列长度
        value (Tensor): 值张量，形状为(B, Dv, H, M)，Dv为值维度
    返回:
        Tuple[Tensor, Tensor]: 经过注意力加权后的值张量和注意力分布张量
"""
def attention(query, key, value):
    dim = query.shape[1]
    scores = torch.einsum('bdhn,bdhm->bhnm', query, key) / dim**.5
    prob = torch.nn.functional.softmax(scores, dim=-1)
    return torch.einsum('bhnm,bdhm->bdhn', prob, value), prob


class MultiHeadedAttention(nn.Module):
    """ Multi-head attention to increase model expressivitiy  定义了一个多头注意力的类，用于增强模型的表达能力"""
    def __init__(self, num_heads: int, d_model: int):
        super().__init__()
        assert d_model % num_heads == 0
        self.dim = d_model // num_heads
        self.num_heads = num_heads
        self.merge = nn.Conv1d(d_model, d_model, kernel_size=1)
        self.proj = nn.ModuleList([deepcopy(self.merge) for _ in range(3)])

    def forward(self, query, key, value):
        batch_dim = query.size(0)
        query, key, value = [l(x).view(batch_dim, self.dim, self.num_heads, -1)
                             for l, x in zip(self.proj, (query, key, value))]
        x, _ = attention(query, key, value)
        return self.merge(x.contiguous().view(batch_dim, self.dim*self.num_heads, -1))


class AttentionalPropagation(nn.Module):
    def __init__(self, feature_dim: int, num_heads: int):
        super().__init__()
        # 初始化时创建一个MultiHeadedAttention实例，用于进行多头注意力计算
        self.attn = MultiHeadedAttention(num_heads, feature_dim)
        # 创建一个MLP（多层感知机）实例，用于后续信息融合
        self.mlp = MLP([feature_dim*2, feature_dim*2, feature_dim])
        # 对MLP最后一层的偏置项进行初始化，设置为常数0.0
        nn.init.constant_(self.mlp[-1].bias, 0.0)

    def forward(self, x, source):
        # 使用MultiHeadedAttention计算输入x与source之间的注意力信息
        message = self.attn(x, source, source)
        # 将输入x和注意力信息message拼接起来，然后通过MLP进行信息融合得到输出
        return self.mlp(torch.cat([x, message], dim=1))

#实现基于注意力机制的图神经网络（GNN）
class AttentionalGNN(nn.Module):
    def __init__(self, feature_dim: int, layer_names: list):
        super().__init__()
        # 创建了多个AttentionalPropagation层，并将它们组成一个ModuleList
        self.layers = nn.ModuleList([
            AttentionalPropagation(feature_dim, 4)
            for _ in range(len(layer_names))])
        # 存储每一层的名称
        self.names = layer_names

    def forward(self, desc0, desc1):
        for layer, name in zip(self.layers, self.names):
            if name == 'cross':
                src0, src1 = desc1, desc0
            else:  # if name == 'self':
                src0, src1 = desc0, desc1
            # 调用AttentionalPropagation层进行信息传播
            delta0, delta1 = layer(desc0, src0), layer(desc1, src1)
            # 将传播后的信息与原始信息相加得到更新后的节点表示
            desc0, desc1 = (desc0 + delta0), (desc1 + delta1)
        return desc0, desc1

#下面两个是Sinkhorn算法实现 
#执行Sinkhorn归一化的迭代过程，在对数空间中进行计算以提高数值稳定性
def log_sinkhorn_iterations(Z, log_mu, log_nu, iters: int):
    """ Perform Sinkhorn Normalization in Log-space for stability"""
    u, v = torch.zeros_like(log_mu), torch.zeros_like(log_nu)
    for _ in range(iters):
        u = log_mu - torch.logsumexp(Z + v.unsqueeze(1), dim=2)
        v = log_nu - torch.logsumexp(Z + u.unsqueeze(2), dim=1)
    return Z + u.unsqueeze(2) + v.unsqueeze(1)

#用于在对数空间中执行可微的最优输运（Optimal Transport）操作，以确保稳定性
def log_optimal_transport(scores, alpha, iters: int):
    """ Perform Differentiable Optimal Transport in Log-space for stability"""
    b, m, n = scores.shape
    one = scores.new_tensor(1)
    ms, ns = (m*one).to(scores), (n*one).to(scores)

    bins0 = alpha.expand(b, m, 1)
    bins1 = alpha.expand(b, 1, n)
    alpha = alpha.expand(b, 1, 1)

    couplings = torch.cat([torch.cat([scores, bins0], -1),
                           torch.cat([bins1, alpha], -1)], 1)

    norm = - (ms + ns).log()
    log_mu = torch.cat([norm.expand(m), ns.log()[None] + norm])
    log_nu = torch.cat([norm.expand(n), ms.log()[None] + norm])
    log_mu, log_nu = log_mu[None].expand(b, -1), log_nu[None].expand(b, -1)

    Z = log_sinkhorn_iterations(couplings, log_mu, log_nu, iters)
    Z = Z - norm  # multiply probabilities by M+N
    return Z


def arange_like(x, dim: int):
    return x.new_ones(x.shape[dim]).cumsum(0) - 1  # traceable in 1.1


class matchNet(nn.Module):
    #默认配置
    default_config = {
        'descriptor_dim': 256,
        'weights': '',
        'keypoint_encoder': [32, 64, 128, 256],
        'GNN_layers': ['self', 'cross'] * 9,
        'sinkhorn_iterations': 100,
        'match_threshold': 0.2,
    }

    def __init__(self, config):
        super().__init__()
        self.config = {**self.default_config, **config}
        # 初始化模型组件
    #Attentional Graph Neural Network 执行基于注意力机制的图神经网络
        # 创建KeypointEncoder实例，用于对关键点进行编码
        self.kenc = KeypointEncoder(
            self.config['descriptor_dim'], self.config['keypoint_encoder'])
        # 创建AttentionalGNN实例，用于注意力GNN网络(Attentional Aggregation) 
        self.gnn = AttentionalGNN(
            self.config['descriptor_dim'], self.config['GNN_layers'])
    #Optimal Transport Layer  最优匹配层
        # 创建一个1*1卷积层，获得得到描述符f
        self.final_proj = nn.Conv1d(
            self.config['descriptor_dim'], self.config['descriptor_dim'],
            kernel_size=1, bias=True)
        #获得得分矩阵 s
        bin_score = torch.nn.Parameter(torch.tensor(1.))
        self.register_parameter('bin_score', bin_score)

        # 根据权重加载模型
        if self.config['weights'] in ['indoor','outdoor','mytrain']:
            path=Path(__file__).parent
            path=path / 'weights/MatchNet_{}.pth'.format(self.config['weights'])
            if self.config['weights'] in ['indoor','outdoor']:
                self.load_state_dict(torch.load(path))
            else:
                model=torch.load(path)
                self.load_state_dict(model['net'])
            print('Loaded matchNet model (\"{}"\weights)'.format(self.config['weights']))

    def forward(self, data):
        """Run matchNet on a pair of keypoints and descriptors"""
        # 获取关键点和描述符数据
        desc0, desc1 = data['descriptors0'], data['descriptors1']
        kpts0, kpts1 = data['keypoints0'], data['keypoints1']

        # 转置操作
        desc0 = desc0.transpose(0,1)#(1,128,1024)
        desc1 = desc1.transpose(0,1)#(1,128,1024)
        kpts0 = torch.reshape(kpts0, (1, -1, 2))#(1,1024,2)
        kpts1 = torch.reshape(kpts1, (1, -1, 2))#(1,1024,2)

        # 如果没有关键点则直接返回
        if kpts0.shape[1] == 0 or kpts1.shape[1] == 0:  # no keypoints
            shape0, shape1 = kpts0.shape[:-1], kpts1.shape[:-1]
            return {
                'matches0': kpts0.new_full(shape0, -1, dtype=torch.int)[0],
                'matches1': kpts1.new_full(shape1, -1, dtype=torch.int)[0],
                'matching_scores0': kpts0.new_zeros(shape0)[0],
                'matching_scores1': kpts1.new_zeros(shape1)[0],
                'skip_train': True
            }

        # file_name = data['file_name']
        # 获取所有匹配数据
        all_matches = data['all_matches'].permute(1,2,0) # shape=torch.Size([batch, 1525, 2])
        
        # Keypoint normalization.
         # 关键点归一化
        kpts0 = normalize_keypoints(kpts0, data['image0'].shape)
        kpts1 = normalize_keypoints(kpts1, data['image1'].shape)

        # Keypoint MLP encoder. # Keypoint MLP编码器 KeypointEncoder
        scores0,scores1= data['scores0'],data['scores1']
        desc0 = desc0 + self.kenc(kpts0, torch.transpose(scores0, 0, 1))#(batch,128,1024)
        desc1 = desc1 + self.kenc(kpts1, torch.transpose(scores1, 0, 1))#(batch,128,717)

        # Multi-layer Transformer network.  创建AttentionalGNN实例，用于注意力GNN网络(Attentional Aggregation)
        desc0, desc1 = self.gnn(desc0, desc1)#(batch,128,1024),(batch,128,717)

        # Final MLP projection. 描述符F
        mdesc0, mdesc1 = self.final_proj(desc0), self.final_proj(desc1)#(batch,128,1024),(batch,128,717)

        # Compute matching descriptor distance. 得分矩阵Sij
        scores = torch.einsum('bdn,bdm->bnm', mdesc0, mdesc1)#(batch,1024,718)
        scores = scores / self.config['descriptor_dim']**.5

        # Run the optimal transport.  Sinkhorn算法实现 
        scores = log_optimal_transport(
            scores, self.bin_score,
            iters=self.config['sinkhorn_iterations'])#(batch,1025,719)

        # Get the matches with score above "match_threshold". 获取得分超过"match_threshold"的匹配结果
        max0, max1 = scores[:, :-1, :-1].max(2), scores[:, :-1, :-1].max(1)#(values,indices),(bqtch,1024),(batch,717)
        indices0, indices1 = max0.indices, max1.indices
        mutual0 = arange_like(indices0, 1)[None] == indices1.gather(1, indices0)#为True代表的是正确的匹配对,keypoint0的对应点
        mutual1 = arange_like(indices1, 1)[None] == indices0.gather(1, indices1)#为True代表的是正确的匹配对,keypoint1的对应点
        zero = scores.new_tensor(0)
        mscores0 = torch.where(mutual0, max0.values.exp(), zero)#正确的匹配对，保留分数(exp)，其余赋0，keypoint0的匹配分数
        mscores1 = torch.where(mutual1, mscores0.gather(1, indices1), zero)#正确的匹配对，保留分数(exp)，其余赋0，keypoint1的匹配分数
        valid0 = mutual0 & (mscores0 > self.config['match_threshold'])#取大于阈值的分数,(True,False)
        valid1 = mutual1 & valid0.gather(1, indices1)
        indices0 = torch.where(valid0, indices0, indices0.new_tensor(-1))#取大于阈值的索引
        indices1 = torch.where(valid1, indices1, indices1.new_tensor(-1))

        # check if indexed correctly 计算损失
        loss = []
        for i in range(len(all_matches[0])):
            x = all_matches[0][i][0]
            y = all_matches[0][i][1]
            loss.append(-torch.log( scores[0][x][y].exp() )) # check batch size == 1 ?
        # for p0 in unmatched0:
        #     loss += -torch.log(scores[0][p0][-1])
        # for p1 in unmatched1:
        #     loss += -torch.log(scores[0][-1][p1])
        loss_mean = torch.mean(torch.stack(loss))
        loss_mean = torch.reshape(loss_mean, (1, -1))
        return {
            'matches0': indices0[0], # use -1 for invalid match
            'matches1': indices1[0], # use -1 for invalid match
            'matching_scores0': mscores0[0],
            'matching_scores1': mscores1[0],
            'loss': loss_mean[0],
            'skip_train': False
        }

标签：dim,torch,log,nn,self,SuperGlue,scores,除外,解析
From： https://blog.csdn.net/kaszxc/article/details/142095516

论文解析二： SuperGlue 同时进行特征匹配以及滤除外点的网络

目录

1.SuperGlue摘要

2.SuperGlue网络结构

2.1 Attentional Graph Neural Network（注意图神经网络）

2.1.1 KeyPoint Encoder ：解决同时进行特征匹配以及滤除外点的网络

2.1.2 Attentional Aggregation

2.2 Optimal Matching Layer (最优匹配层)

2.3 损失函数

3.整体代码详解

相关文章

赞助商

阅读排行

论文解析二： SuperGlue 同时进行特征匹配以及滤除外点的网络

目录

1.SuperGlue摘要

2.SuperGlue网络结构

2.1 Attentional Graph Neural Network（ 注意图神经网络）

2.1.1 KeyPoint Encoder ：解决 同时进行特征匹配以及滤除外点的网络

2.1.2 Attentional Aggregation

2.2 Optimal Matching Layer (最优匹配层)

2.3 损失函数

3.整体代码 详解

相关文章

赞助商

阅读排行

2.1 Attentional Graph Neural Network（注意图神经网络）

2.1.1 KeyPoint Encoder ：解决同时进行特征匹配以及滤除外点的网络

3.整体代码详解