在深度学习用于人脸识别方面,为了提高识别的准确率,研究者提出了ArcFace 技术。ArcFace 通过在 Softmax 损失函数上添加一种角度余弦距离的 margin 来提高人脸识别的准确率,ArcFace 始终优于 SOTA,且容易实现,计算开销可忽略不计。
论文:ArcFace: Additive Angular Margin Loss for Deep Face Recognition,地址:https://arxiv.org/pdf/1801.07698
特征归一化:对提取到的特征向量进行 L2 归一化,将其转换为单位向量,使特征向量更加稳定。
引入角度余弦(cosine)相似性作为度量标准,以增加样本之间的区分性。具体来说,为每个类别增加一个 learnable 的权重(称为 margin),将输入特征与各个类别的权重向量做余弦相似性计算,并确保同类别特征之间的相似度尽可能大,不同类别间的相似度尽可能小。
深度卷积神经网络(DCNN) 特征和最后一个 全连接层(FC) 权重之间的点积/内积 等于 特征和权重归一化之后的余弦距离。先利用 反余弦 (arc-cosine) 函数来计算当前特征与目标权重之间的角度。然后,把一个加性角度边距 (additive angular margin) 加到目标角度,然后通过余弦 (cosine) 函数再次获得目标logit。接着,通过固定的特征范数重缩放所有logit,且后续的步骤与Softmax Loss 中的步骤完全相同。
ArcFace算法通过上面两步骤后,进行分类器训练:在 Softmax 损失函数的基础上,引入角度余弦(cosine)相似性作为度量标准,以增加样本之间的区分性。
在上面已为每个类别增加一个 learnable 的权重(称为 margin),这里将输入特征与各个类别的权重向量做余弦相似性计算,并确保同类别特征之间的相似度尽可能大,不同类别间的相似度尽可能小。提出基于角度和余弦间隔的加性角度边距损失 (Additive Angular Margin Loss, ArcFace),cos( θ + m ) ,(θ为当前特征与目标权重之间的夹角),对归一化后的权重和特征在角度空间内进行优化以最大化决策边界,其几何含义更加直观,大量实验表明识别效果也更好。
class Bottleneck(Module):
expansion = 4
def __init__(self, inplanes, planes, stride = 1, downsample = None):
super(Bottleneck, self).__init__()
self.conv1 = conv1x1(inplanes, planes)
self.bn1 = BatchNorm2d(planes)
self.conv2 = conv3x3(planes, planes, stride)
self.bn2 = BatchNorm2d(planes)
self.conv3 = conv1x1(planes, planes * self.expansion)
self.bn3 = BatchNorm2d(planes * self.expansion)
self.relu = ReLU(inplace = True)
self.downsample = downsample
self.stride = stride
def forward(self, x):
identity = x
out = self.conv1(x)
out = self.bn1(out)
out = self.relu(out)
out = self.conv2(out)
out = self.bn2(out)
out = self.relu(out)
out = self.conv3(out)
out = self.bn3(out)
if self.downsample is not None:
identity = self.downsample(x)
out += identity
for m in self.modules():
if isinstance(m, Bottleneck):
nn.init.constant_(m.bn3.weight, 0)
elif isinstance(m, BasicBlock):
nn.init.constant_(m.bn2.weight, 0)
def _make_layer(self, block, planes, blocks, stride = 1):
downsample = None
if stride != 1 or self.inplanes != planes * block.expansion:
downsample = Sequential(
conv1x1(self.inplanes, planes * block.expansion, stride),
BatchNorm2d(planes * block.expansion),
layers = []
layers.append(block(self.inplanes, planes, stride, downsample))
self.inplanes = planes * block.expansion
for _ in range(1, blocks):
layers.append(block(self.inplanes, planes))
return Sequential(*layers)
def forward(self, x):
x = self.conv1(x)
x = self.bn1(x)
x = self.relu(x)
x = self.maxpool(x)
x = self.layer1(x)
x = self.layer2(x)
x = self.layer3(x)
x = self.layer4(x)
x = self.bn_o1(x)
x = self.dropout(x)
x = x.view(x.size(0), -1)
x = self.fc(x)
x = self.bn_o2(x)
return x
def ResNet_50(input_size, **kwargs):
"""Constructs a ResNet-50 model.
model = ResNet(input_size, Bottleneck, [3, 4, 6, 3], **kwargs)
return model
class ArcFace(nn.Module):
r"""Implement of ArcFace (https://arxiv.org/pdf/1801.07698v1.pdf):
in_features: size of each input sample
out_features: size of each output sample
device_id: the ID of GPU where the model will be trained by model parallel.
if device_id=None, it will be trained on CPU without model parallel.
s: norm of input feature
m: margin
def __init__(self, in_features, out_features, device_id, s = 64.0, m = 0.50, easy_margin = False):
super(ArcFace, self).__init__()
self.in_features = in_features
self.out_features = out_features
self.device_id = device_id
self.s = s
self.m = m
self.weight = Parameter(torch.FloatTensor(out_features, in_features))
self.easy_margin = easy_margin
self.cos_m = math.cos(m)
self.sin_m = math.sin(m)
self.th = math.cos(math.pi - m)
self.mm = math.sin(math.pi - m) * m # coso-sin(pi-m)*m
def forward(self, input, label):
# --------------------------- cos(theta) & phi(theta) ---------------------------
if self.device_id == None:
cosine = F.linear(F.normalize(input), F.normalize(self.weight))
x = input
sub_weights = torch.chunk(self.weight, len(self.device_id), dim=0)
temp_x = x.cuda(self.device_id[0])
weight = sub_weights[0].cuda(self.device_id[0])
cosine = F.linear(F.normalize(temp_x), F.normalize(weight))
for i in range(1, len(self.device_id)):
temp_x = x.cuda(self.device_id[i])
weight = sub_weights[i].cuda(self.device_id[i])
cosine = torch.cat((cosine, F.linear(F.normalize(temp_x), F.normalize(weight)).cuda(self.device_id[0])), dim=1)
sine = torch.sqrt(1.0 - torch.pow(cosine, 2))
phi = cosine * self.cos_m - sine * self.sin_m # cos��m+o)
if self.easy_margin:
phi = torch.where(cosine > 0, phi, cosine)
phi = torch.where(cosine > self.th, phi, cosine - self.mm) # coso-m*sim(m)
# --------------------------- convert label to one-hot ---------------------------
one_hot = torch.zeros(cosine.size())
if self.device_id != None:
one_hot = one_hot.cuda(self.device_id[0])
one_hot.scatter_(1, label.view(-1, 1).long(), 1)
# -------------torch.where(out_i = {x_i if condition_i else y_i) -------------
output = (one_hot * phi) + ((1.0 - one_hot) * cosine) # you can use torch.where if your torch.__version__ is 0.4
output *= self.s
return output
class FocalLoss(nn.Module):
def __init__(self, gamma = 2, eps = 1e-7):
super(FocalLoss, self).__init__()
self.gamma = gamma
self.eps = eps
self.ce = nn.CrossEntropyLoss()
def forward(self, input, target):
logp = self.ce(input, target)
p = torch.exp(-logp)
loss = (1 - p) ** self.gamma * logp
return loss.mean()
Focal Loss 通过引入一个可调参数gamma,使得模型在训练过程中更加关注难以分类的样本,从而在类别不平衡的情况下提高模型的性能。这个损失函数在目标检测和分类任务中特别有效,因为它能够平衡不同类别样本的贡献。
论文作者提到,在现实中,要获取大规模的标注人脸训练数据集,可能需要花费大量的人力与时间,成本很昂贵。可以从网络上模型有噪声的数据,通过在ArcFace 中引入子类来放松类内约束,迫使所有样本向对应的正中心靠近。我们为每个类设计K副中心,训练样本只需要靠近K正子中心中的任何一个,而不是
