机器学习——鞋类别识别
一、选题背景
随着计算机硬件的不断发展,人工智能再次走进研究人员的实现,通过构造一定智能的人工系统,让机器去完成以往需要人类智力才能胜任的工作,如进行图像识别、自然语言处理、无人机驾驶等。利用计算机的硬件设备作为支撑,让计算机模拟人类某些智能行为的基本理论、方法和技术。
本次机器学习设计为鞋种类识别,随着线上商城和快递行业的不断发展,人们现在对于衣服、鞋子等大都通过网上商城购买,不仅减少了一些人员费用支出,认为提供了一种新兴的岗位,但随着购买量的不断增大,一些问题也涌现出来,首当其辞便是仓库对于商品的分类管理,通过摄像头捕捉不同种类鞋子的照片,利用深度学习的卷积神经网络实现鞋子种类识别。
二、设计方案
本次机器学习设计的具体方案,通过网上收集数据集,对数据集进行处理,数据集符合我们所使用的格式。在数据集中的文件打上标签,在对数据进行预处理,之后采用pytorch框架搭建、使用卷积神经网络构建以及训练模型、通过训练和验证准确性以及训练和验证损失图进行分析、最后导入测试图片进行测试并保存测试图片名称以及输出识别结果。
本次涉及的技术难点,如何将数据集中大量的数据处理成我们所需要的格式,如何提高图片的识别准确度也是一大难点。对于处理数据,可以采用编写程序,通过数据集自带的csv数据文件进行数据处理。而图片识别准确度可以通过图片的大小,添加卷积层数、对数据进行二次筛选、增加训练次数来提升图片识别的精度。
数据集来源:kaggle:https://www.kaggle.com/datasets/utkarshsaxenadn/shoes-classification-dataset-13k-images?resource=download
参考案例:kaggle讨论区,车牌识别案例,花卉识别等。
三、机器学习的实现步骤
1. 数据集收集
从Kaggle上下载数据集,解压打开,分别是芭蕾舞鞋、船鞋、布洛克鞋、木屐、运动鞋。
2.数据集分析及预处理
本次数据集包含了五种鞋类的图片,每个文件夹对应着不同的图片信息,首先通过统计每类写的图片数量,绘制出图片分类柱状图。
通过导入python中的数据处理包,读取和处理文件夹下的数据图片。
import torch import torch.nn as nn from torchvision import transforms, datasets import json import matplotlib.pyplot as plt import os import torch.optim as optim from model import resnet34, resnet101 # 下载预训练模型,迁移学习 import os
首先通过路径拼接读取,分类出训练集和测试集。
data_root = os.path.abspath(os.path.join(os.getcwd(), "../..")) # 获取数据根路径 image_path = data_root + "Python Shoe\shoe\Shoes Dataset\\" # 鞋子图片的地址 train_dataset = datasets.ImageFolder(root=image_path + "Train", transform=data_transform["train"]) train_num = len(train_dataset) # {'daisy':0, 'dandelion':1, 'roses':2, 'sunflower':3, 'tulips':4} flower_list = train_dataset.class_to_idx cla_dict = dict((val, key) for key, val in flower_list.items()) # 把 dict 写入json文件中 json_str = json.dumps(cla_dict, indent=4) with open('class_indices.json', 'w') as json_file: json_file.write(json_str)
先随机采集,然后对裁剪得到的图像缩放为同一大小、水平翻转以及归一化处理。
data_transform = { "train": transforms.Compose([transforms.RandomResizedCrop(224), transforms.RandomHorizontalFlip(), transforms.ToTensor(), transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])]), "val": transforms.Compose([transforms.Resize(256), # 将最小边进行缩放 transforms.CenterCrop(224), transforms.ToTensor(), transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])])}
接下来打乱数据集的顺序,防止出现过拟合现象。
# shuffle=ture随机打乱 train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size, shuffle=True, num_workers=0) validate_dataset = datasets.ImageFolder(root=image_path + "Test", transform=data_transform["val"]) val_num = len(validate_dataset)
2.搭建resnet模型(残差网络)
ResNet(Residual Neural Network)由微软研究院的Kaiming He等四名华人提出,通过使用ResNet Unit成功训练出了152层的神经网络,并在ILSVRC2015比赛中取得冠军,在top5上的错误率为3.57%,同时参数量比VGGNet低,效果非常突出。ResNet的结构可以极快的加速神经网络的训练,模型的准确率也有比较大的提升。同时ResNet的推广性非常好,甚至可以直接用到InceptionNet网络中。
ResNet的主要思想是在网络中增加了直连通道,即Highway Network的思想。此前的网络结构是性能输入做一个非线性变换,而Highway Network则允许保留之前网络层的一定比例的输出。ResNet的思想和Highway Network的思想也非常类似,允许原始输入信息直接传到后面的层中,如下图所示。
这样的话这一层的神经网络可以不用学习整个的输出,而是学习上一个网络输出的残差,因此ResNet又叫做残差网络。
class ResNet(nn.Module): # 网络结构 # init中 block包括BasicBlock和Bottleneck,blocks_num表示所使用残差列表的数目,每个大layer中的block个数 # num_classes训练集分类个数,include_top为了在以后ResNet基础上搭建更复杂的网络() # include_top->lets you select if you want the final dense layers or not. def __init__(self, block, blocks_num, num_classes=1000, include_top=True): super(ResNet, self).__init__() self.include_top = include_top self.in_channel = 64 self.conv1 = nn.Conv2d(3, self.in_channel, kernel_size=7, stride=2, padding=3, bias=False) # 输入深度为3(正好是彩色图片的3个通道),输出深度为64,滤波器为7*7,步长为2,填充3层,特征图缩小1/2 self.bn1 = nn.BatchNorm2d(self.in_channel) self.relu = nn.ReLU(inplace=True) self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1) # 最大池化,滤波器为3*3,步长为2,填充1层,特征图又缩小1/2 # 此时,特征图的尺寸已成为输入的1/4 self.layer1 = self._make_layer(block, 64, blocks_num[0]) # conv2_x self.layer2 = self._make_layer(block, 128, blocks_num[1], stride=2) # conv3_x self.layer3 = self._make_layer(block, 256, blocks_num[2], stride=2) # conv4_x self.layer4 = self._make_layer(block, 512, blocks_num[3], stride=2) # conv5_x if self.include_top: # AdaptiveAvgPool2d自适应平均池化 self.avgpool = nn.AdaptiveAvgPool2d((1, 1)) # output size = (1, 1) self.fc = nn.Linear(512 * block.expansion, num_classes) # 这里进行的是网络的参数初始化,可以看出卷积层和批标准化层的初始化方法是不一样的 for m in self.modules(): # self.modules()采取深度优先遍历的方式,存储了网络的所有模块,包括本身和儿子 if isinstance(m, nn.Conv2d): # isinstance()判断一个对象是否是一个已知的类型 nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu') # 9. kaiming_normal 初始化 (这里是nn.init初始化函数的源码,有好几种初始化方法) # torch.nn.init.kaiming_normal_(tensor, a=0, mode='fan_in', nonlinearity='leaky_relu') # nn.init.kaiming_normal_(w, mode='fan_out', nonlinearity='relu') # tensor([[ 0.2530, -0.4382, 1.5995], # [ 0.0544, 1.6392, -2.0752]]) elif isinstance(m, nn.BatchNorm2d): nn.init.constant_(m.weight, 1) nn.init.constant_(m.bias, 0) # 3. 常数 - 固定值 val # torch.nn.init.constant_(tensor, val) # nn.init.constant_(w, 0.3) # tensor([[ 0.3000, 0.3000, 0.3000], # [ 0.3000, 0.3000, 0.3000]]) # _make_layer(block(2个),channel残差结构卷积层使用个数,大layer中该层包括多少个残差结构,s=1) def _make_layer(self, block, channel, block_num, stride=1): downsample = None if stride != 1 or self.in_channel != channel * block.expansion: # 判断步长是否为1,判断当前块的输入深度和当前块卷积层深度乘于残差块的扩张 downsample = nn.Sequential( nn.Conv2d(self.in_channel, channel * block.expansion, kernel_size=1, stride=stride, bias=False), nn.BatchNorm2d(channel * block.expansion)) # 一旦判断条件成立,那么给downsample赋予一层1*1卷积层和一层批标准化层。并且这一步将伴随这特征图缩小1/2 # 而为何要在shortcut中再进行卷积操作?是因为在残差块之间,比如当要从64深度的3*3卷积层阶段过渡到128深度的3*3卷积层阶段,主分支为64深度的输入已经通过128深度的3*3卷积层变成了128深度的输出,而shortcut分支中x的深度仍为64,而主分支和shortcut分支相加的时候,深度不一致会报错。这就需要进行升维操作,使得shortcut分支中的x从64深度升到128深度。 # 而且需要这样操作的其实只是在基础块BasicBlock中,在瓶颈块Bottleneck中主分支中自己就存在升维操作,那么Bottleneck还在shortcut中引入卷积层的目的是什么?能带来什么帮助? layers = [] layers.append(block(self.in_channel, channel, downsample=downsample, stride=stride)) self.in_channel = channel * block.expansion # 一定要注意,out_channels一直都是3*3卷积层的深度 for _ in range(1, block_num): layers.append(block(self.in_channel, channel)) return nn.Sequential(*layers) # 这里表示将layers中的所有block按顺序接在一起 def forward(self, x): x = self.conv1(x) x = self.bn1(x) x = self.relu(x) x = self.maxpool(x) x = self.layer1(x) x = self.layer2(x) x = self.layer3(x) with torch.no_grad(): x = self.layer4(x) if self.include_top: x = self.avgpool(x) x = torch.flatten(x, 1) # out = out.view(out.size(0),-1) # 将原有的多维输出拉回一维 x = self.fc(x) return x # 得到整个网络框架
ResNet沿用了VGG全3×33×3卷积层的设计。残差块里首先有2个有相同输出通道数的3×33×3卷积层。每个卷积层后接BN层和ReLU激活函数,然后将输入直接加在最后的ReLU激活函数前,这种结构用于层数较少的神经网络中,比如ResNet34。若输入通道数比较多,就需要引入1×11×1卷积层来调整输入的通道数,这种结构也叫作瓶颈模块,通常用于网络层数较多的结构中。
# 基础块 两个3x3 class BasicBlock(nn.Module): # resnet 18\34 层 expansion = 1# downsample 对应有没有虚线的结构 def __init__(self, in_channel, out_channel, stride=1, downsample=None): super(BasicBlock, self).__init__() self.conv1 = nn.Conv2d(in_channels=in_channel, out_channels=out_channel, kernel_size=3, stride=stride, padding=1, bias=False) self.bn1 = nn.BatchNorm2d(out_channel) # 输出特征矩阵的深度 self.relu = nn.ReLU(inplace=True) self.conv2 = nn.Conv2d(in_channels=out_channel, out_channels=out_channel, kernel_size=3, stride=1, padding=1, bias=False) self.bn2 = nn.BatchNorm2d(out_channel) self.downsample = downsample # 传递下采样方法# 这个是shortcut的操作
def forward(self, x): # 定义正向传播过程 identity = x # 捷径连接 if self.downsample is not None: identity = self.downsample(x) # 我们将上一层的输出x输入进这个downsample所拥有一些操作(卷积等),将结果赋给identity # 简单说,这个目的就是为了应对上下层输出输入深度不一致问题 out = self.conv1(x) out = self.bn1(out) out = self.relu(out)
out = self.conv2(out) with torch.no_grad(): out = self.bn2(out) # 这一步没有relu激活函数,是因为将输出加上捷径分支再relu
out += identity out = self.relu(out)
return out
# 瓶颈块,有三个卷积层分别是1x1,3x3,1x1,分别用来降低维度,卷积处理,升高维度 # 引入Bottleneck的目的是,减少参数的数目,Bottleneck相比较BasicBlock在参数的数目上少了许多, # 但是精度上却差不多。减少参数同时还会减少计算量,使模型更快的收敛。
class Bottleneck(nn.Module): # resnet 50\101\152 expansion = 4 # 每个layer的第3层卷积核个数为第1,2层的4倍(eg.64 64 256)
def __init__(self, in_channel, out_channel, stride=1, downsample=None): super(Bottleneck, self).__init__() self.conv1 = nn.Conv2d(in_channels=in_channel, out_channels=out_channel, kernel_size=1, stride=1, bias=False) # squeeze channels self.bn1 = nn.BatchNorm2d(out_channel) # ----------------------------------------- self.conv2 = nn.Conv2d(in_channels=out_channel, out_channels=out_channel, kernel_size=3, stride=stride, bias=False, padding=1) self.bn2 = nn.BatchNorm2d(out_channel) # ----------------------------------------- self.conv3 = nn.Conv2d(in_channels=out_channel, out_channels=out_channel * self.expansion, kernel_size=1, stride=1, bias=False) # unsqueeze channels self.bn3 = nn.BatchNorm2d(out_channel * self.expansion) self.relu = nn.ReLU(inplace=True) # 对从上层网络Conv2d中传递下来的tensor直接进行修改,这样能够节省运算内存,不用多存储其他变量 self.downsample = downsample # 传递下采样方法
def forward(self, x): # 定义正向传播过程 identity = x # 捷径连接 if self.downsample is not None: identity = self.downsample(x)
out = self.conv1(x) out = self.bn1(out) out = self.relu(out)
out = self.conv2(out) out = self.bn2(out) out = self.relu(out)
out = self.conv3(out) out = self.bn3(out) # 这一步没有relu激活函数,是因为将输出加上捷径分支再relu
out += identity out = self.relu(out)
return out
3.编写训练函数
首先调用调用resnet预模型,从官网下载
Resnet是ImageNet竞赛中分类问题比较好的网络,它有多种结构形式,有Resnet-34,Resnet-50, Resnet-101, Resnet-152,本次项目选取了34和101进行对比实验。
本次实验设计迭代次数为30次,对比两个网络结构查看实验效果。
for epoch in range(30): # train net.train() running_loss = 0.0 for step, data in enumerate(train_loader, start=0): images, labels = data optimizer.zero_grad() if hasattr(torch.cuda, 'empty_cache'): torch.cuda.empty_cache() logits = net(images.to(device)) loss = loss_function(logits, labels.to(device)) loss.backward() optimizer.step() # print statistics running_loss += loss.item() # print train process rate = (step + 1) / len(train_loader) a = "*" * int(rate * 50) b = "." * int((1 - rate) * 50) print("\rtrain loss: {:^3.0f}%[{}->{}]{:.4f}".format(int(rate * 100), a, b, loss), end="") print() with torch.no_grad(): # validate net.eval() acc = 0.0 # accumulate accurate number / epoch with torch.no_grad(): for val_data in validate_loader: val_images, val_labels = val_data outputs = net(val_images.to(device)) # eval model only have last output layer # loss = loss_function(outputs, test_labels) predict_y = torch.max(outputs, dim=1)[1] acc += (predict_y == val_labels.to(device)).sum().item() val_accurate = acc / val_num if val_accurate > best_acc: best_acc = val_accurate torch.save(net.state_dict(), save_path) print('[epoch %d] train_loss: %.3f test_accuracy: %.3f' % (epoch + 1, running_loss / step, val_accurate))
4.编写测试函数
通过读取测试图片和调用训练后的模型,识别出实现结果,与原始的标签进行分析,查看网络预测效果。
# load image img = Image.open("tulip.jpg") plt.imshow(img) # [N, C, H, W] img = data_transform(img) # expand batch dimension img = torch.unsqueeze(img, dim=0) # read class_indict try: json_file = open('./class_indices.json', 'r') class_indict = json.load(json_file) except Exception as e: print(e) exit(-1) try: # create model model = resnet34(num_classes=5) except RuntimeError as exception: if "out of memory" in str(exception): print("WARNING: out of memory") if hasattr(torch.cuda, 'empty_cache'): torch.cuda.empty_cache() else: raise exception # load model weights model_weight_path = "./resNet34.pth" model.load_state_dict(torch.load(model_weight_path, map_location=device)) model.eval() with torch.no_grad(): # predict class output = torch.squeeze(model(img)) predict = torch.softmax(output, dim=0) predict_cla = torch.argmax(predict).numpy() print(class_indict[str(predict_cla)], predict[predict_cla].numpy()) plt.show()
四、实验效果
1. 实现效果
通过实现训练30次的效果我们可以知道,模型的准确率在75%左右
2.测试集效果
最终使用resnet34网络在测试集中的训练效果如下所示。
五、实验总结
从最终的结果来看,本次设计的鞋类图像识别技术达到了预期的目标,但准确率还可进一步提升,首先存在问题是数据集不足,后期可对数据集进行剪切、翻转等处理,扩充实验的数据集。第二本次实验仅使用了resnet34和resnet101两种网络结果,对网络优化处理力度有待提高,其次其他神经网络需要试验一下运行效果。第三模型的训练速度较慢,本次模型训练基于cpu,下一步模型改进调用gpu进行程序运行。
六、全部代码
from tkinter import Variable import torch import torch.nn as nn from torchvision import transforms, datasets import json import matplotlib.pyplot as plt import os import torch.optim as optim from model import resnet34, resnet101 # 下载预训练模型,迁移学习 import os os.environ["CUDA_VISIBLE_DEVICES"] = "0,2,3" device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") print(device) # 注意这里的Normalize的参数跟前几个模型不一样 data_transform = { "train": transforms.Compose([transforms.RandomResizedCrop(224), transforms.RandomHorizontalFlip(), transforms.ToTensor(), transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])]), "val": transforms.Compose([transforms.Resize(256), # 将最小边进行缩放 transforms.CenterCrop(224), transforms.ToTensor(), transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])])} data_root = os.path.abspath(os.path.join(os.getcwd(), "../..")) # get data root path image_path = data_root + "Python Shoe\shoe\Shoes Dataset\\" # flower data set path train_dataset = datasets.ImageFolder(root=image_path + "Train", transform=data_transform["train"]) train_num = len(train_dataset) # {'daisy':0, 'dandelion':1, 'roses':2, 'sunflower':3, 'tulips':4} flower_list = train_dataset.class_to_idx cla_dict = dict((val, key) for key, val in flower_list.items()) # write dict into json file json_str = json.dumps(cla_dict, indent=4) with open('class_indices.json', 'w') as json_file: json_file.write(json_str) batch_size = 16 # shuffle=ture随机打乱 train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size, shuffle=True, num_workers=0) validate_dataset = datasets.ImageFolder(root=image_path + "Test", transform=data_transform["val"]) val_num = len(validate_dataset) validate_loader = torch.utils.data.DataLoader(validate_dataset, batch_size=batch_size, shuffle=False, num_workers=0) try: net = resnet34() # 实例化,没有传参? # net.load_state_dict(torch.load(model_weight_path), strict=False) 载入模型的另外一种方法 except RuntimeError as exception: if "out of memory" in str(exception): print("WARNING: out of memory") if hasattr(torch.cuda, 'empty_cache'): torch.cuda.empty_cache() else: raise exception # load pretrain weights model_weight_path = "resnet34-pre.pth" # 保存权重路径 missing_keys, unexpected_keys = net.load_state_dict(torch.load(model_weight_path), strict=False) # 载入模型权重 # for param in net.parameters(): # param.requires_grad = False # change fc layer structure inchannel = net.fc.in_features # 输入特征矩阵的深度 net.fc = nn.Linear(inchannel, 5) net.to(device) loss_function = nn.CrossEntropyLoss() optimizer = optim.Adam(net.parameters(), lr=0.0001) best_acc = 0.0 save_path = './resNet34.pth' for epoch in range(30): # train net.train() running_loss = 0.0 for step, data in enumerate(train_loader, start=0): images, labels = data optimizer.zero_grad() if hasattr(torch.cuda, 'empty_cache'): torch.cuda.empty_cache() logits = net(images.to(device)) loss = loss_function(logits, labels.to(device)) loss.backward() optimizer.step() # print statistics running_loss += loss.item() # print train process rate = (step + 1) / len(train_loader) a = "*" * int(rate * 50) b = "." * int((1 - rate) * 50) print("\rtrain loss: {:^3.0f}%[{}->{}]{:.4f}".format(int(rate * 100), a, b, loss), end="") print() with torch.no_grad(): # validate net.eval() acc = 0.0 # accumulate accurate number / epoch with torch.no_grad(): for val_data in validate_loader: val_images, val_labels = val_data outputs = net(val_images.to(device)) # eval model only have last output layer # loss = loss_function(outputs, test_labels) predict_y = torch.max(outputs, dim=1)[1] acc += (predict_y == val_labels.to(device)).sum().item() val_accurate = acc / val_num if val_accurate > best_acc: best_acc = val_accurate torch.save(net.state_dict(), save_path) print('[epoch %d] train_loss: %.3f test_accuracy: %.3f' % (epoch + 1, running_loss / step, val_accurate)) print('Finished Training') import torch.nn as nn import torch # 基础块 两个3x3 class BasicBlock(nn.Module): # resnet 18\34 层 expansion = 1 # downsample 对应有没有虚线的结构 def __init__(self, in_channel, out_channel, stride=1, downsample=None): super(BasicBlock, self).__init__() self.conv1 = nn.Conv2d(in_channels=in_channel, out_channels=out_channel, kernel_size=3, stride=stride, padding=1, bias=False) self.bn1 = nn.BatchNorm2d(out_channel) # 输出特征矩阵的深度 self.relu = nn.ReLU(inplace=True) self.conv2 = nn.Conv2d(in_channels=out_channel, out_channels=out_channel, kernel_size=3, stride=1, padding=1, bias=False) self.bn2 = nn.BatchNorm2d(out_channel) self.downsample = downsample # 传递下采样方法# 这个是shortcut的操作 def forward(self, x): # 定义正向传播过程 identity = x # 捷径连接 if self.downsample is not None: identity = self.downsample(x) # 我们将上一层的输出x输入进这个downsample所拥有一些操作(卷积等),将结果赋给identity # 简单说,这个目的就是为了应对上下层输出输入深度不一致问题 out = self.conv1(x) out = self.bn1(out) out = self.relu(out) out = self.conv2(out) with torch.no_grad(): out = self.bn2(out) # 这一步没有relu激活函数,是因为将输出加上捷径分支再relu out += identity out = self.relu(out) return out # 瓶颈块,有三个卷积层分别是1x1,3x3,1x1,分别用来降低维度,卷积处理,升高维度 # 引入Bottleneck的目的是,减少参数的数目,Bottleneck相比较BasicBlock在参数的数目上少了许多, # 但是精度上却差不多。减少参数同时还会减少计算量,使模型更快的收敛。 class Bottleneck(nn.Module): # resnet 50\101\152 expansion = 4 # 每个layer的第3层卷积核个数为第1,2层的4倍(eg.64 64 256) def __init__(self, in_channel, out_channel, stride=1, downsample=None): super(Bottleneck, self).__init__() self.conv1 = nn.Conv2d(in_channels=in_channel, out_channels=out_channel, kernel_size=1, stride=1, bias=False) # squeeze channels self.bn1 = nn.BatchNorm2d(out_channel) # ----------------------------------------- self.conv2 = nn.Conv2d(in_channels=out_channel, out_channels=out_channel, kernel_size=3, stride=stride, bias=False, padding=1) self.bn2 = nn.BatchNorm2d(out_channel) # ----------------------------------------- self.conv3 = nn.Conv2d(in_channels=out_channel, out_channels=out_channel * self.expansion, kernel_size=1, stride=1, bias=False) # unsqueeze channels self.bn3 = nn.BatchNorm2d(out_channel * self.expansion) self.relu = nn.ReLU(inplace=True) # 对从上层网络Conv2d中传递下来的tensor直接进行修改,这样能够节省运算内存,不用多存储其他变量 self.downsample = downsample # 传递下采样方法 def forward(self, x): # 定义正向传播过程 identity = x # 捷径连接 if self.downsample is not None: identity = self.downsample(x) out = self.conv1(x) out = self.bn1(out) out = self.relu(out) out = self.conv2(out) out = self.bn2(out) out = self.relu(out) out = self.conv3(out) out = self.bn3(out) # 这一步没有relu激活函数,是因为将输出加上捷径分支再relu out += identity out = self.relu(out) return out class ResNet(nn.Module): # 网络结构 # init中 block包括BasicBlock和Bottleneck,blocks_num表示所使用残差列表的数目,每个大layer中的block个数 # num_classes训练集分类个数,include_top为了在以后ResNet基础上搭建更复杂的网络() # include_top->lets you select if you want the final dense layers or not. def __init__(self, block, blocks_num, num_classes=1000, include_top=True): super(ResNet, self).__init__() self.include_top = include_top self.in_channel = 64 self.conv1 = nn.Conv2d(3, self.in_channel, kernel_size=7, stride=2, padding=3, bias=False) # 输入深度为3(正好是彩色图片的3个通道),输出深度为64,滤波器为7*7,步长为2,填充3层,特征图缩小1/2 self.bn1 = nn.BatchNorm2d(self.in_channel) self.relu = nn.ReLU(inplace=True) self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1) # 最大池化,滤波器为3*3,步长为2,填充1层,特征图又缩小1/2 # 此时,特征图的尺寸已成为输入的1/4 self.layer1 = self._make_layer(block, 64, blocks_num[0]) # conv2_x self.layer2 = self._make_layer(block, 128, blocks_num[1], stride=2) # conv3_x self.layer3 = self._make_layer(block, 256, blocks_num[2], stride=2) # conv4_x self.layer4 = self._make_layer(block, 512, blocks_num[3], stride=2) # conv5_x if self.include_top: # AdaptiveAvgPool2d自适应平均池化 self.avgpool = nn.AdaptiveAvgPool2d((1, 1)) # output size = (1, 1) self.fc = nn.Linear(512 * block.expansion, num_classes) # 这里进行的是网络的参数初始化,可以看出卷积层和批标准化层的初始化方法是不一样的 for m in self.modules(): # self.modules()采取深度优先遍历的方式,存储了网络的所有模块,包括本身和儿子 if isinstance(m, nn.Conv2d): # isinstance()判断一个对象是否是一个已知的类型 nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu') # 9. kaiming_normal 初始化 (这里是nn.init初始化函数的源码,有好几种初始化方法) # torch.nn.init.kaiming_normal_(tensor, a=0, mode='fan_in', nonlinearity='leaky_relu') # nn.init.kaiming_normal_(w, mode='fan_out', nonlinearity='relu') # tensor([[ 0.2530, -0.4382, 1.5995], # [ 0.0544, 1.6392, -2.0752]]) elif isinstance(m, nn.BatchNorm2d): nn.init.constant_(m.weight, 1) nn.init.constant_(m.bias, 0) # 3. 常数 - 固定值 val # torch.nn.init.constant_(tensor, val) # nn.init.constant_(w, 0.3) # tensor([[ 0.3000, 0.3000, 0.3000], # [ 0.3000, 0.3000, 0.3000]]) # _make_layer(block(2个),channel残差结构卷积层使用个数,大layer中该层包括多少个残差结构,s=1) def _make_layer(self, block, channel, block_num, stride=1): downsample = None if stride != 1 or self.in_channel != channel * block.expansion: # 判断步长是否为1,判断当前块的输入深度和当前块卷积层深度乘于残差块的扩张 downsample = nn.Sequential( nn.Conv2d(self.in_channel, channel * block.expansion, kernel_size=1, stride=stride, bias=False), nn.BatchNorm2d(channel * block.expansion)) # 一旦判断条件成立,那么给downsample赋予一层1*1卷积层和一层批标准化层。并且这一步将伴随这特征图缩小1/2 # 而为何要在shortcut中再进行卷积操作?是因为在残差块之间,比如当要从64深度的3*3卷积层阶段过渡到128深度的3*3卷积层阶段,主分支为64深度的输入已经通过128深度的3*3卷积层变成了128深度的输出,而shortcut分支中x的深度仍为64,而主分支和shortcut分支相加的时候,深度不一致会报错。这就需要进行升维操作,使得shortcut分支中的x从64深度升到128深度。 # 而且需要这样操作的其实只是在基础块BasicBlock中,在瓶颈块Bottleneck中主分支中自己就存在升维操作,那么Bottleneck还在shortcut中引入卷积层的目的是什么?能带来什么帮助? layers = [] layers.append(block(self.in_channel, channel, downsample=downsample, stride=stride)) self.in_channel = channel * block.expansion # 一定要注意,out_channels一直都是3*3卷积层的深度 for _ in range(1, block_num): layers.append(block(self.in_channel, channel)) return nn.Sequential(*layers) # 这里表示将layers中的所有block按顺序接在一起 def forward(self, x): x = self.conv1(x) x = self.bn1(x) x = self.relu(x) x = self.maxpool(x) x = self.layer1(x) x = self.layer2(x) x = self.layer3(x) with torch.no_grad(): x = self.layer4(x) if self.include_top: x = self.avgpool(x) x = torch.flatten(x, 1) # out = out.view(out.size(0),-1) # 将原有的多维输出拉回一维 x = self.fc(x) return x # 得到整个网络框架 def resnet34(num_classes=1000, include_top=True): return ResNet(BasicBlock, [3, 4, 6, 3], num_classes=num_classes, include_top=include_top) def resnet101(num_classes=1000, include_top=True): return ResNet(Bottleneck, [3, 4, 23, 3], num_classes=num_classes, include_top=include_top) import torch from model import resnet34 from PIL import Image from torchvision import transforms import matplotlib.pyplot as plt import json import os os.environ["CUDA_VISIBLE_DEVICES"] = "0" device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") data_transform = transforms.Compose( [transforms.Resize(256), transforms.CenterCrop(224), transforms.ToTensor(), transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])]) # load image img = Image.open("tulip.jpg") plt.imshow(img) # [N, C, H, W] img = data_transform(img) # expand batch dimension img = torch.unsqueeze(img, dim=0) # read class_indict try: json_file = open('./class_indices.json', 'r') class_indict = json.load(json_file) except Exception as e: print(e) exit(-1) try: # create model model = resnet34(num_classes=5) except RuntimeError as exception: if "out of memory" in str(exception): print("WARNING: out of memory") if hasattr(torch.cuda, 'empty_cache'): torch.cuda.empty_cache() else: raise exception # load model weights model_weight_path = "./resNet34.pth" model.load_state_dict(torch.load(model_weight_path, map_location=device)) model.eval() with torch.no_grad(): # predict class output = torch.squeeze(model(img)) predict = torch.softmax(output, dim=0) predict_cla = torch.argmax(predict).numpy() print(class_indict[str(predict_cla)], predict[predict_cla].numpy()) plt.show()
标签:机器,nn,self,torch,stride,类别,识别,channel,out From: https://www.cnblogs.com/LrvingLM/p/17001829.html