首页 > 其他分享 >38、记录使用华为的ModelArts去调用npu训练yolov5模型和推理

38、记录使用华为的ModelArts去调用npu训练yolov5模型和推理

时间:2022-09-28 11:35:46浏览次数:96  
标签:yolov5 38 ModelArts label ckpt lr file path size


基本思想:有机会使用华为ModelArts云服务,做一下尝试,逐记录一下

第一步:登录帐号,查看一下服务配置,镜像自己选择和缴费就行

[ma-user ~]$npu-smi info
+------------------------------------------------------------------------------------+
| npu-smi 21.0.3.1 Version: 21.0.2 |
+----------------------+---------------+---------------------------------------------+
| NPU Name | Health | Power(W) Temp(C) |
| Chip | Bus-Id | AICore(%) Memory-Usage(MB) HBM-Usage(MB) |
+======================+===============+=============================================+
| 0 910PremiumA | OK | 87.5 38 |
| 0 | 0000:C1:00.0 | 0 758 / 15171 0 / 32768 |
+======================+===============+=============================================+

下载官方源码yolov5进行测试,注意要下载到work目录下,否则下次登录,文件不复存在,下面源码来自官方,附录1

链接: https://pan.baidu.com/s/11hsEUT-1abqIZNQy8rc4Dw 提取码: cp7k

二、上传文件夹和训练数据,注意用法和notebook规则一样,只能上传文件,参考官方 这个训练源码,搞成了coco数据集进行训练, 图片太多,需要使用压缩包经过OBS中转上传

1)、设置本次训练的default_config.yaml

# Builtin Configurations(DO NOT CHANGE THESE CONFIGURATIONS unless you know exactly what you are doing)
enable_modelarts: False
# Url for modelarts
data_url: ""
train_url: ""
checkpoint_url: ""
# Path for local
output_dir: "/cache"
data_path: "/cache/data"
output_path: "/cache/train"
load_path: "/cache/checkpoint_path"
device_target: "Ascend"
need_modelarts_dataset_unzip: True
modelarts_dataset_unzip_name: "coco"

# ==============================================================================
# Train options
data_dir: "dataset/"
per_batch_size: 32
yolov5_version: "yolov5s"
pretrained_backbone: ""
resume_yolov5: ""
pretrained_checkpoint: ""

lr_scheduler: "cosine_annealing"
lr: 0.013
lr_epochs: "220,250"
lr_gamma: 0.1
eta_min: 0.0
T_max: 300
#训练轮次
max_epoch: 3000
warmup_epochs: 20
weight_decay: 0.0005
momentum: 0.9
loss_scale: 1024
label_smooth: 0
label_smooth_factor: 0.1
log_interval: 100
#ckpt输出位置
ckpt_path: "outputs/"
ckpt_interval: 1
is_save_on_master: 1
is_distributed: 0
rank: 0
group_size: 1
need_profiler: 0
training_shape: ""
resize_rate: 10
is_modelArts: 0

# Eval options
pretrained: ""
#log输出位置
log_path: "outputs/"
ann_val_file: "annotations/val.json"
eval(num_classes + 5)

input_shape: [[3, 32, 64, 128, 256, 512, 1],
[3, 48, 96, 192, 384, 768, 2],
[3, 64, 128, 256, 512, 1024, 3],
[3, 80, 160, 320, 640, 1280, 4]]

# test_param
test_img_shape: [640, 640]

labels: [ 'seatbelt', 'helmet', 'no_coverall', 'no_helmet', 'coverall', 'no_seatbelt']

coco_ids: [ 1, 2, 3, 4, 5, 6 ]

result_files: './result_Files'

---

# Help description for each configuration
# Train options
data_dir: "Train dataset directory."
per_batch_size: "Batch size for Training."
pretrained_backbone: "The ckpt file of CspDarkNet53."
resume_yolov5: "The ckpt file of YOLOv5, which used to fine tune."
pretrained_checkpoint: "The ckpt file of YOLOv5CspDarkNet53."
lr_scheduler: "Learning rate scheduler, options: exponential, cosine_annealing."
lr: "Learning rate."
lr_epochs: "Epoch of changing of lr changing, split with ','."
lr_gamma: "Decrease lr by a factor of exponential lr_scheduler."
eta_min: "Eta_min in cosine_annealing scheduler."
T_max: "T-max in cosine_annealing scheduler."
max_epoch: "Max epoch num to train the model."
warmup_epochs: "Warmup epochs."
weight_decay: "Weight decay factor."
momentum: "Momentum."
loss_scale: "Static loss scale."
label_smooth: "Whether to use label smooth in CE."
label_smooth_factor: "Smooth strength of original one-hot."
log_interval: "Logging interval steps."
ckpt_path: "Checkpoint save location."
ckpt_interval: "Save checkpoint interval."
is_save_on_master: "Save ckpt on master or all rank, 1 for master, 0 for all ranks."
is_distributed: "Distribute train or not, 1 for yes, 0 for no."
rank: "Local rank of distributed."
group_size: "World size of device."
need_profiler: "Whether use profiler. 0 for no, 1 for yes."
training_shape: "Fix training shape."
resize_rate: "Resize rate for multi-scale training."
ann_file: "path to annotation"
each_multiscale: "Apply multi-scale for each scale"
labels: "the label of train data"
multi_label: "use multi label to nms"
multi_label_thresh: "multi label thresh"

# Eval options
pretrained: "model_path, local pretrained model to load"
log_path: "checkpoint save location"
ann_val_file: "path to annotation"

# Export options
device_id: "Device id for export"
batch_size: "batch size for export"
testing_shape: "shape for test"
ckpt_file: "Checkpoint file path for export"
file_name: "output file name for export"
file_format: "file format for export"
result_files: 'path to 310 infer result floder'

其他文件还需要修改postprocess.py和src/yolo.py的类别、数据集路径

2)、开始训练

[ma-user outputs]$python3 train.py

38、记录使用华为的ModelArts去调用npu训练yolov5模型和推理_数据集

38、记录使用华为的ModelArts去调用npu训练yolov5模型和推理_上传文件_02

训练过程中可以监控一下npu的使用率

38、记录使用华为的ModelArts去调用npu训练yolov5模型和推理_上传文件_03

第三步: 评测训练结果

[ma-user yolov5_self]$python eval.py --data_dir=dataset/ --eval_shape=640 --pretrained="outputs/2022-09-07_time_14_04_50/ckpt_0/0-2_168.ckpt

38、记录使用华为的ModelArts去调用npu训练yolov5模型和推理_python_04

 识别结果

38、记录使用华为的ModelArts去调用npu训练yolov5模型和推理_数据集_05

 参考

​LegnaDoc​

​models: Models of MindSpore - Gitee.com​

标签:yolov5,38,ModelArts,label,ckpt,lr,file,path,size
From: https://blog.51cto.com/u_12504263/5719075

相关文章

  • P3238 [HNOI2014]道路堵塞
    P3238HNOI2014道路堵塞点击查看代码#include<iostream>#include<stdio.h>#include<string.h>#include<algorithm>#include<utility>#include<array>#incl......
  • P3638 [APIO2013] 机器人
    P3638APIO2013机器人区间DP+最短路处理环形DP设f[l][r][i]表示合并出编号为[l..r]的机器人(在i号格子)的最少步数转移:1.合并机器人2.用最短路转移:使用两个队列模......
  • 代码随想录训练营|Day 7|454, 383, 15, 18, 总结
    454.4SumIIGivenfourintegerarrays nums1, nums2, nums3,and nums4 alloflength n,returnthenumberoftuples (i,j,k,l) suchthat:0<=i,j,......
  • luogu P1385 密令
    密令题目描述给定一小写字母串\(s\),每次操作你可以选择一个\(p\)(\(1\leqp\lt|s|\))执行下述修改中的任意一个:将\(s_p\)改为其字典序\(+1\)的字母,将\(s_{p+1}......
  • 代码随想录第七天| 454.四数相加II、383. 赎金信 、15. 三数之和 、18. 四数之和
    第一题454.四数相加II给你四个整数数组nums1、nums2、nums3和nums4,数组长度都是n,请你计算有多少个元组(i,j,k,l)能满足:0<=i,j,k,l<nnums1[i]+nums......
  • [Typescript] 38. Medium - Diff
    Getan Object thatisthedifferencebetween O & O1/*_____________YourCodeHere_____________*/typeDiff<T,S>={[KinExclude<(keyofT|keyof......
  • AGC038C LCMs 详解(莫比乌斯反演好题)
    ProblemAGC038C给定一个长为\(n\)的序列\(A_1,A_2,\cdots,A_n\),求\(\sum_{i=1}^{n}{\sum_{j=i+1}^{n}{lcm(A_i,A_j)}}\bmod998244353\)\(n\leq2\times10^5,A_i......
  • 2038年危机!“Unix千年虫”
    2000年到来前,“千年虫”bug曾经引发了很大的恐慌,甚至不少影视剧中都有夸大的描写。不过在紧急磋商和“打补丁”之后,软硬件“无法正确处理2000年问题”的千年虫危机算是......
  • 38th 2022/8/13 模拟赛总结27
    这次哈!前一天我就不该喝咖啡的!但后悔也来不及了!!!睡很晚,这次再次让我体会到了睡眠的重要性痛苦,比赛时有力没法使,就很好,甚至还去拉虚脱了因为既着凉,又没精神,直接崩溃然后......
  • 《程序员的 38 堂成长课》29-38 读书笔记
    如果一件事情必须做很多次,那么就写一个脚本来帮你做。尽早并经常向客户展示产品,这样你会很快发现是否在创建错误的产品。不要长时间加班,弄得自己筋疲力尽,这会让人们一......