基本思想:有机会使用华为ModelArts云服务,做一下尝试,逐记录一下
第一步:登录帐号,查看一下服务配置,镜像自己选择和缴费就行
[ma-user ~]$npu-smi info
+------------------------------------------------------------------------------------+
| npu-smi 21.0.3.1 Version: 21.0.2 |
+----------------------+---------------+---------------------------------------------+
| NPU Name | Health | Power(W) Temp(C) |
| Chip | Bus-Id | AICore(%) Memory-Usage(MB) HBM-Usage(MB) |
+======================+===============+=============================================+
| 0 910PremiumA | OK | 87.5 38 |
| 0 | 0000:C1:00.0 | 0 758 / 15171 0 / 32768 |
+======================+===============+=============================================+
下载官方源码yolov5进行测试,注意要下载到work目录下,否则下次登录,文件不复存在,下面源码来自官方,附录1
链接: https://pan.baidu.com/s/11hsEUT-1abqIZNQy8rc4Dw 提取码: cp7k
二、上传文件夹和训练数据,注意用法和notebook规则一样,只能上传文件,参考官方 这个训练源码,搞成了coco数据集进行训练, 图片太多,需要使用压缩包经过OBS中转上传
1)、设置本次训练的default_config.yaml
# Builtin Configurations(DO NOT CHANGE THESE CONFIGURATIONS unless you know exactly what you are doing)
enable_modelarts: False
# Url for modelarts
data_url: ""
train_url: ""
checkpoint_url: ""
# Path for local
output_dir: "/cache"
data_path: "/cache/data"
output_path: "/cache/train"
load_path: "/cache/checkpoint_path"
device_target: "Ascend"
need_modelarts_dataset_unzip: True
modelarts_dataset_unzip_name: "coco"
# ==============================================================================
# Train options
data_dir: "dataset/"
per_batch_size: 32
yolov5_version: "yolov5s"
pretrained_backbone: ""
resume_yolov5: ""
pretrained_checkpoint: ""
lr_scheduler: "cosine_annealing"
lr: 0.013
lr_epochs: "220,250"
lr_gamma: 0.1
eta_min: 0.0
T_max: 300
#训练轮次
max_epoch: 3000
warmup_epochs: 20
weight_decay: 0.0005
momentum: 0.9
loss_scale: 1024
label_smooth: 0
label_smooth_factor: 0.1
log_interval: 100
#ckpt输出位置
ckpt_path: "outputs/"
ckpt_interval: 1
is_save_on_master: 1
is_distributed: 0
rank: 0
group_size: 1
need_profiler: 0
training_shape: ""
resize_rate: 10
is_modelArts: 0
# Eval options
pretrained: ""
#log输出位置
log_path: "outputs/"
ann_val_file: "annotations/val.json"
eval(num_classes + 5)
input_shape: [[3, 32, 64, 128, 256, 512, 1],
[3, 48, 96, 192, 384, 768, 2],
[3, 64, 128, 256, 512, 1024, 3],
[3, 80, 160, 320, 640, 1280, 4]]
# test_param
test_img_shape: [640, 640]
labels: [ 'seatbelt', 'helmet', 'no_coverall', 'no_helmet', 'coverall', 'no_seatbelt']
coco_ids: [ 1, 2, 3, 4, 5, 6 ]
result_files: './result_Files'
---
# Help description for each configuration
# Train options
data_dir: "Train dataset directory."
per_batch_size: "Batch size for Training."
pretrained_backbone: "The ckpt file of CspDarkNet53."
resume_yolov5: "The ckpt file of YOLOv5, which used to fine tune."
pretrained_checkpoint: "The ckpt file of YOLOv5CspDarkNet53."
lr_scheduler: "Learning rate scheduler, options: exponential, cosine_annealing."
lr: "Learning rate."
lr_epochs: "Epoch of changing of lr changing, split with ','."
lr_gamma: "Decrease lr by a factor of exponential lr_scheduler."
eta_min: "Eta_min in cosine_annealing scheduler."
T_max: "T-max in cosine_annealing scheduler."
max_epoch: "Max epoch num to train the model."
warmup_epochs: "Warmup epochs."
weight_decay: "Weight decay factor."
momentum: "Momentum."
loss_scale: "Static loss scale."
label_smooth: "Whether to use label smooth in CE."
label_smooth_factor: "Smooth strength of original one-hot."
log_interval: "Logging interval steps."
ckpt_path: "Checkpoint save location."
ckpt_interval: "Save checkpoint interval."
is_save_on_master: "Save ckpt on master or all rank, 1 for master, 0 for all ranks."
is_distributed: "Distribute train or not, 1 for yes, 0 for no."
rank: "Local rank of distributed."
group_size: "World size of device."
need_profiler: "Whether use profiler. 0 for no, 1 for yes."
training_shape: "Fix training shape."
resize_rate: "Resize rate for multi-scale training."
ann_file: "path to annotation"
each_multiscale: "Apply multi-scale for each scale"
labels: "the label of train data"
multi_label: "use multi label to nms"
multi_label_thresh: "multi label thresh"
# Eval options
pretrained: "model_path, local pretrained model to load"
log_path: "checkpoint save location"
ann_val_file: "path to annotation"
# Export options
device_id: "Device id for export"
batch_size: "batch size for export"
testing_shape: "shape for test"
ckpt_file: "Checkpoint file path for export"
file_name: "output file name for export"
file_format: "file format for export"
result_files: 'path to 310 infer result floder'
其他文件还需要修改postprocess.py和src/yolo.py的类别、数据集路径
2)、开始训练
[ma-user outputs]$python3 train.py
训练过程中可以监控一下npu的使用率
第三步: 评测训练结果
[ma-user yolov5_self]$python eval.py --data_dir=dataset/ --eval_shape=640 --pretrained="outputs/2022-09-07_time_14_04_50/ckpt_0/0-2_168.ckpt
识别结果
参考
models: Models of MindSpore - Gitee.com
标签:yolov5,38,ModelArts,label,ckpt,lr,file,path,size From: https://blog.51cto.com/u_12504263/5719075