首页 > 其他分享 >pytorch 训练 RuntimeError Unable to find a valid cuDNN algorithm to run convolution

pytorch 训练 RuntimeError Unable to find a valid cuDNN algorithm to run convolution

时间:2023-06-02 12:06:57浏览次数:38  
标签:run algorithm py yons Unable File forward home line


pytorch 训练 RuntimeError: Unable to find a valid cuDNN algorithm to run convolution

pytorch 训练 RuntimeError: Unable to find a valid cuDNN algorithm to run convolution


# 问题描述:


python:3.95

pytorch:1.10.2

python train.py --img 640 --batch 64 --epochs 600 --data voc.yaml --weights yolov5x.pt  --device 0,1,2,3
Traceback (most recent call last):
  File "/home/yons/mtl/pytorch/yolov5/train.py", line 643, in <module>
    main(opt)
  File "/home/yons/mtl/pytorch/yolov5/train.py", line 539, in main
    train(opt.hyp, opt, device, callbacks)
  File "/home/yons/mtl/pytorch/yolov5/train.py", line 330, in train
    pred = model(imgs)  # forward
  File "/home/yons/miniconda3/envs/ptest/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/yons/miniconda3/envs/ptest/lib/python3.9/site-packages/torch/nn/parallel/data_parallel.py", line 168, in forward
    outputs = self.parallel_apply(replicas, inputs, kwargs)
  File "/home/yons/miniconda3/envs/ptest/lib/python3.9/site-packages/torch/nn/parallel/data_parallel.py", line 178, in parallel_apply
    return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
  File "/home/yons/miniconda3/envs/ptest/lib/python3.9/site-packages/torch/nn/parallel/parallel_apply.py", line 86, in parallel_apply
    output.reraise()
  File "/home/yons/miniconda3/envs/ptest/lib/python3.9/site-packages/torch/_utils.py", line 434, in reraise
    raise exception
RuntimeError: Caught RuntimeError in replica 0 on device 0.
Original Traceback (most recent call last):
  File "/home/yons/miniconda3/envs/ptest/lib/python3.9/site-packages/torch/nn/parallel/parallel_apply.py", line 61, in _worker
    output = module(*input, **kwargs)
  File "/home/yons/miniconda3/envs/ptest/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/yons/mtl/pytorch/yolov5/models/yolo.py", line 126, in forward
    return self._forward_once(x, profile, visualize)  # single-scale inference, train
  File "/home/yons/mtl/pytorch/yolov5/models/yolo.py", line 149, in _forward_once
    x = m(x)  # run
  File "/home/yons/miniconda3/envs/ptest/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/yons/mtl/pytorch/yolov5/models/common.py", line 139, in forward
    return self.cv3(torch.cat((self.m(self.cv1(x)), self.cv2(x)), dim=1))
  File "/home/yons/miniconda3/envs/ptest/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/yons/miniconda3/envs/ptest/lib/python3.9/site-packages/torch/nn/modules/container.py", line 141, in forward
    input = module(input)
  File "/home/yons/miniconda3/envs/ptest/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/yons/mtl/pytorch/yolov5/models/common.py", line 105, in forward
    return x + self.cv2(self.cv1(x)) if self.add else self.cv2(self.cv1(x))
  File "/home/yons/miniconda3/envs/ptest/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/yons/mtl/pytorch/yolov5/models/common.py", line 47, in forward
    return self.act(self.bn(self.conv(x)))
  File "/home/yons/miniconda3/envs/ptest/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/yons/miniconda3/envs/ptest/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 446, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/home/yons/miniconda3/envs/ptest/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 442, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Unable to find a valid cuDNN algorithm to run convolution

运行nvidia-smi显示

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.103.01   Driver Version: 470.103.01   CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:04:00.0 Off |                  N/A |
|  0%   35C    P8    34W / 350W |     18MiB / 12051MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce ...  Off  | 00000000:08:00.0 Off |                  N/A |
|  0%   36C    P8    22W / 350W |      5MiB / 12053MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   2  NVIDIA GeForce ...  Off  | 00000000:85:00.0 Off |                  N/A |
|  0%   36C    P8    37W / 350W |      5MiB / 12053MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   3  NVIDIA GeForce ...  Off  | 00000000:89:00.0 Off |                  N/A |
|  0%   34C    P8    27W / 350W |      5MiB / 12053MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1430      G   /usr/lib/xorg/Xorg                  9MiB |
|    0   N/A  N/A      1726      G   /usr/bin/gnome-shell                6MiB |
|    1   N/A  N/A      1430      G   /usr/lib/xorg/Xorg                  4MiB |
|    2   N/A  N/A      1430      G   /usr/lib/xorg/Xorg                  4MiB |
|    3   N/A  N/A      1430      G   /usr/lib/xorg/Xorg                  4MiB |
+-----------------------------------------------------------------------------+

# 原因分析:

可能达到了内存限制


# 解决方案:

降低下batch

python train.py --img 640 --batch 32 --epochs 600 --data voc.yaml --weights yolov5x.pt  --device 0,1,2,3


标签:run,algorithm,py,yons,Unable,File,forward,home,line
From: https://blog.51cto.com/u_16015778/6401415

相关文章

  • ssh远程redhat6报错:Unable to negotiate with *.*.*.* port 22: no matching host key
    报错:Unabletonegotiatewith*.*.*.*port22:nomatchinghostkeytypefound.Theiroffer:ssh-rsa,ssh-dss分析:openssh觉得ssh-rsa加密方式不安全,直接从8.8开始默认不允许这种密钥用于登陆了 解决:cat/etc/ssh/ssh_config.d/redhat6.confHost*PubkeyAcceptedKe......
  • STL algorithm算法
    Functionsin<algorithm>Non-modifyingsequenceoperations:all_ofTestconditiononallelementsinrange(functiontemplate)any_ofTestifanyelementinrangefulfillscondition(functiontemplate)none_ofTestifnoelementsfulfillconditi......
  • XXX packages are looking for funding run `npm fund` for details
    原文链接:https://blog.csdn.net/weixin_45895806/article/details/110062752当你刚刚npminstall了一个新的插件之后一般都会报这个提示,并不是报错当你执行npmfund时会出现以下每一个https开头的链接打开都是一个网页,一般均为插件所在的github的地址,又有可能是开发者......
  • 如何正确在多线程环境下更新UI_使用Platform的runLater方法
    如何正确在多线程环境下更新UI_使用Platform的runLater方法许多UI控件都提供了各种修改方法,比如我们可以修改Label上面的文本,进度条ProgressBar的进度。但我们必须保证修改UI的线程是JavaFX的UI线程,如果不是则会出现异常。那么我们如何在另一个线程中修改JavaFX的UI呢?......
  • 解决fatal: unable to access ‘https://github.com……‘: Failed to connect to
    问题:gitclone时会报如下错误 解决办法:1.在cmd下执行 ipconfig/flushdns,清理DNS缓存 2.重新执行gitclonehttps://github.com/.../.git即可成功......
  • unable to find valid certification path to requested target
    发生这种情况时,很有可能与证书无关,而是某个第三方类库获取不到了我的解决方案,找到原版本的AAR文件放到工程的LIBs目录中,引用到工程中 unabletofindvalidcertificationpathtorequestedtarget最新解决方案(更新于2023-04-08)置顶da_caoyuan已于 2023-04-0811......
  • tflearn Training Step每次 We will run it for 10 epochs (t
    TrainingTFLearnprovidesamodelwrapper'DNN'thatcanautomaticallyperformsaneuralnetworkclassifiertasks,suchastraining,prediction,save/restore,etc...Wewillrunitfor10epochs(thenetworkwillseealldata10times)withabat......
  • Runtime
        ......
  • GitlabCI学习笔记之四:GitLabRunner pipeline语法之only except rules workflow
    1.only&except参考文档:https://docs.gitlab.com/ee/ci/yaml/#only--exceptonly和except是两个参数用分支策略来限制jobs构建,后面会逐步被rules替代only定义哪些分支和标签的git项目将会被job执行。except定义哪些分支和标签的git项目将不会被job执行示例job:#use......
  • ERESOLVE unable to resolve dependency tree
    错误描述:报错原因(据查):依赖项中存在无法解决的冲突解决方法:使用如下命令npmi--legacy-peer-deps运行结果:......