首页 > 系统相关 >50、ubuntu18.04&20.04+CUDA11.1+cudnn11.3+TensorRT7.2+Deepsteam5.1+vulkan环境搭建和YOLO5部署

50、ubuntu18.04&20.04+CUDA11.1+cudnn11.3+TensorRT7.2+Deepsteam5.1+vulkan环境搭建和YOLO5部署

时间:2022-09-28 11:32:12浏览次数:77  
标签:std lib get Deepsteam5.1 TensorRT7.2 usr ubuntu vulkan local


基本思想:想学习一下TensorRT的使用,随笔记录一下;

链接:https://pan.baidu.com/s/1uFOktdF-bHcDDsufIqmNSA 
提取码:k55w 
复制这段内容后打开百度网盘手机App,操作更方便哦

记录一下pip安装命令:

pip install **** -i http://mirrors.aliyun.com/pypi/simple --trusted-host mirrors.aliyun.com

一、安装显卡驱动,显卡版本为RTX2080显卡 

注意:首先将secure boot 设置为disabled

避免sudo apt-get install nvidia-*安装方式造成登录界面循环。

1. ubuntu 18.04默认安装nvidia显卡驱动首先需要禁用nouveau。

ubuntu@ubuntu:~$ sudo vim /etc/modprobe.d/blacklist.conf

在文件最后部分插入以下两行内容

blacklist nouveau
options nouveau modeset=0

更新系统

ubuntu@ubuntu:~$ sudo update-initramfs -u

重启系统,并且验证nouveau是否已禁用

ubuntu@ubuntu:~$ lsmod | grep nouveau

没有信息显示,说明nouveau已被禁用,重启一下。

2. 在英伟达的官网上查找你自己电脑的显卡型号然后下载相应的驱动。网址:​​官方 GeForce 驱动程序 | NVIDIA​​ 

我下载的版本:NVIDIA-Linux-x86_64-460.67.run

下载后的run文件拷贝至home目录下。

3. 在ubuntu下按ctrl+alt+f1进入命令行界面,

然后在命令行界面下输入:

ubuntu@ubuntu:~$ sudo apt-get install lightdm

ubuntu@ubuntu:~$ sudo service lightdm stop

4.然后卸载掉原有驱动:

ubuntu@ubuntu:~$ sudo apt-get remove nvidia-*   

给驱动run文件赋予执行权限:

ubuntu@ubuntu:~$ sudo chmod  a+x NVIDIA-Linux-x86_64-460.67.run

执行安装:

ubuntu@ubuntu:~$ sudo ./NVIDIA-Linux-x86_64-460.67.run -no-x-check -no-nouveau-check -no-opengl-files

检查驱动是否安装成功:

ubuntu@ubuntu:~$ nvidia-smi

如果出现如下提示,则说明安装成功:

Mon Mar 22 13:00:57 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.67 Driver Version: 460.67 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce RTX 208... Off | 00000000:01:00.0 Off | N/A |
| 22% 43C P0 53W / 250W | 0MiB / 11016MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce RTX 208... Off | 00000000:03:00.0 Off | N/A |
| 19% 42C P0 62W / 250W | 0MiB / 11019MiB | 1% Default |
+-------------------------------+----------------------+----------------------+
| 2 GeForce RTX 208... Off | 00000000:82:00.0 Off | N/A |
| 0% 40C P0 1W / 250W | 0MiB / 11019MiB | 0% Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+

还需要设置一下,禁止更新内核,以防以后出问题

命令行禁用linux自动更新内核

ubuntu@ubuntu:~$ cat /etc/apt/apt.conf.d/10periodic

APT::Periodic::Update-Package-Lists "1";

APT::Periodic::Download-Upgradeable-Packages "0";

APT::Periodic::AutocleanInterval "0";

将配置中的"Update-Package-Lists"参数配置为"0";

二、然后下载cuda11.1的驱动文件

ubuntu@ubuntu:~$ sudo ./cuda_11.1.0_455.23.05_linux.run

分别的选项为

50、ubuntu18.04&20.04+CUDA11.1+cudnn11.3+TensorRT7.2+Deepsteam5.1+vulkan环境搭建和YOLO5部署_linux

下一步,选中EULA选项

50、ubuntu18.04&20.04+CUDA11.1+cudnn11.3+TensorRT7.2+Deepsteam5.1+vulkan环境搭建和YOLO5部署_pytorch_02

下一步,安装不带驱动文件的cuda11.1 

50、ubuntu18.04&20.04+CUDA11.1+cudnn11.3+TensorRT7.2+Deepsteam5.1+vulkan环境搭建和YOLO5部署_linux_03

 显示安装完成


===========
= Summary =
===========

Driver: Not Selected
Toolkit: Installed in /usr/local/cuda-11.1/
Samples: Installed in /home/ubuntu/

Please make sure that
- PATH includes /usr/local/cuda-11.1/bin
- LD_LIBRARY_PATH includes /usr/local/cuda-11.1/lib64, or, add /usr/local/cuda-11.3/lib64 to /etc/ld.so.conf and run ldconfig as root

To uninstall the CUDA Toolkit, run cuda-uninstaller in /usr/local/cuda-11.3/bin
***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least .00 is required for CUDA 11.3 functionality to work.
To install the driver using this installer, run the following command, replacing <CudaInstaller> with the name of this run file:
sudo <CudaInstaller>.run --silent --driver

Logfile is /var/log/cuda-installer.log

然后在配置文件添加配置项

ubuntu@ubuntu:~$ sudo gedit ~/.bashrc

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-11.1/lib64
export PATH=$PATH:/usr/local/cuda-11.1/bin
export CUDA_HOME=/usr/local/cuda-11.1

ubuntu@ubuntu:~$ source ~/.bashrc

测试版本

nvcc -V

Pytorch

pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html

三、下载cudnn ​​https://developer.nvidia.com/rdp/cudnn-download​

解压文件,进行复制 11.3

ubuntu@ubuntu:~$ tar -zxvf cudnn-11.3-linux-x64-v8.2.0.53.tgz 
ubuntu@ubuntu:~$ sudo cp cuda/include/cudnn*.h /usr/local/cuda/include
ubuntu@ubuntu:~$ sudo cp cuda/include/cudnn.h /usr/local/cuda/include
ubuntu@ubuntu:~$ sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
ubuntu@ubuntu:~$ sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*

查看一下

ubuntu@ubuntu:~$ cat /usr/local/cuda/include/cudnn.h | grep cudnn
/* cudnn : Neural Networks Library
#include "cudnn_version.h"
#include "cudnn_ops_infer.h"
#include "cudnn_ops_train.h"
#include "cudnn_adv_infer.h"
#include "cudnn_adv_train.h"
#include "cudnn_cnn_infer.h"
#include "cudnn_cnn_train.h"
#include "cudnn_backend.h"

四、软连接

sudo ln -sf /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_adv_train.so.8.1.0 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_adv_train.so.8
sudo ln -sf /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8.1.0 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8
sudo ln -sf /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8.1.0 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8
sudo ln -sf /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8.1.0 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8
sudo ln -sf /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_ops_train.so.8.1.0 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_ops_train.so.8
sudo ln -sf /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8.1.0 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8
sudo ln -sf /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn.so.8.1.0 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn.so.8

or

sudo ln -sf /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_adv_train.so.8.2.0 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_adv_train.so.8
sudo ln -sf /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8.2.0 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8
sudo ln -sf /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8.2.0 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8
sudo ln -sf /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8.2.0 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8
sudo ln -sf /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_ops_train.so.8.2.0 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_ops_train.so.8
sudo ln -sf /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8.2.0 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8
sudo ln -sf /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn.so.8.2.0 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn.so.8

五、安装一下tensorRT ​​https://developer.nvidia.com/nvidia-tensorrt-7x-download​

ubuntu@ubuntu:~$ tar xzvf TensorRT-7.2.2.3.Ubuntu-18.04.x86_64-gnu.cuda-11.1.cudnn8.0.tar.gz /home/ubuntu/NVIDIA_CUDA-11.1_Samples
ubuntu@ubuntu:~$ sudo vim ~/.bashrc
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/ubuntu/NVIDIA_CUDA-11.1_Samples/TensorRT-7.2.2.3
ubuntu@ubuntu:~$ source ~/.bashrc
ubuntu@ubuntu:~$ cd NVIDIA_CUDA-11.3_Samples/TensorRT-7.2.2.3
ubuntu@ubuntu:~$ cd python
ubuntu@ubuntu:~$ sudo pip3 install tensorrt-7.2.2.3-cp37-none-linux_x86_64.whl
ubuntu@ubuntu:~$ cd ../uff
ubuntu@ubuntu:~$ sudo pip3 install uff-0.6.5-py2.py3-none-any.whl
ubuntu@ubuntu:~$ cd ../graphsurgeon
ubuntu@ubuntu:~$ sudo pip3 install graphsurgeon-0.4.1-py2.py3-none-any.whl

拷贝库

# TensorRT路径下
sudo cp -r ./lib/* /usr/lib
sudo cp -r ./include/* /usr/include

在以后遇到代码执行过程中问题(Python)然后需要拷贝so到/usr/lib文件夹中 以后缺什么so 就从该目录中拷贝so到usr/local中

ubuntu@ubuntu:~/tensorrt_inference/arcface/build$ sudo cp ~/NVIDIA_CUDA-11.1_Samples/TensorRT-7.2.2.3/targets/x86_64-linux-gnu/lib/libnvinfer.so.7 /usr/lib/
ubuntu@ubuntu:~/tensorrt_inference/arcface/build$ sudo cp ~/NVIDIA_CUDA-11.1_Samples/TensorRT-7.2.2.3/targets/x86_64-linux-gnu/lib/libnvonnxparser.so.7 /usr/lib/
ubuntu@ubuntu:~/tensorrt_inference/arcface/build$ sudo cp ~/NVIDIA_CUDA-11.1_Samples/TensorRT-7.2.2.3/targets/x86_64-linux-gnu/lib/libnvinfer_plugin.so.7 /usr/lib/
ubuntu@ubuntu:~/tensorrt_inference/arcface/build$ sudo cp ~/NVIDIA_CUDA-11.1_Samples/TensorRT-7.2.2.3/targets/x86_64-linux-gnu/lib/libnvparsers.so.7 /usr/lib/
ubuntu@ubuntu:~/tensorrt_inference/arcface/build$ sudo cp ~/NVIDIA_CUDA-11.1_Samples/TensorRT-7.2.2.3/targets/x86_64-linux-gnu/lib/libmyelin.so.1 /usr/lib/

最后的文件配置为{.bashrc}

export CUDA_HOME=/usr/local/cuda
export PATH=$PATH:$CUDA_HOME/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64
export LD_LIBRARY_PATH=/home/ubuntu/NVIDIA_CUDA-11.1_Samples/TensorRT-7.2.2.3/lib:$LD_LIBRARY_PATH

测试一下版本信息

cat /usr/local/cuda/include/cudnn_version.h | grep CUDNN_MAJOR -A 2
ubuntu@ubuntu:~$ sudo ldconfig /usr/local/cuda/lib64
ubuntu@ubuntu:~$ python3
Python 3.8.6 (default, Sep 25 2020, 09:36:53)
[GCC 10.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> print(torch.cuda.is_available())
True
>>> from torch.backends import cudnn
>>> print(cudnn.is_available())
True
>>>import tensorrt
>>>

然后下载代码;yolo5&

同时也顺便把vulkan安装一下,以便后续使用ncnn的vulkan加速功能

wget -qO - http://packages.lunarg.com/lunarg-signing-key-pub.asc | sudo apt-key add -
sudo wget -qO /etc/apt/sources.list.d/lunarg-vulkan-1.1.97-bionic.list http://packages.lunarg.com/vulkan/1.1.97/lunarg-vulkan-1.1.97-bionic.list
sudo apt update
sudo apt install lunarg-vulkan-sdk
sudo apt-get install cmake git gcc g++ mesa-* libwayland-dev libxrandr-dev
sudo apt-get install libvulkan1 mesa-vulkan-drivers vulkan-utils
vulkaninfo

测试结果

ubuntu@ubuntu:~/ncnn/build/benchmark$ vulkaninfo
ERROR: [Loader Message] Code 0 : libGLX_nvidia.so.0: cannot open shared object file: No such file or directory
ERROR: [Loader Message] Code 0 : /usr/lib/i386-linux-gnu/libvulkan_radeon.so: wrong ELF class: ELFCLASS32
ERROR: [Loader Message] Code 0 : /usr/lib/i386-linux-gnu/libvulkan_intel.so: wrong ELF class: ELFCLASS32
INTEL-MESA: warning: Performance support disabled, consider sysctl dev.i915.perf_stream_paranoid=0

==========
VULKANINFO
==========

Vulkan Instance Version: 1.2.131


Instance Extensions: count = 18
====================
VK_EXT_acquire_xlib_display : extension revision 1
VK_EXT_debug_report : extension revision 9
VK_EXT_debug_utils : extension revision 1
VK_EXT_direct_mode_display : extension revision 1
VK_EXT_display_surface_counter : extension revision 1
VK_KHR_device_group_creation : extension revision 1
VK_KHR_display : extension revision 23
VK_KHR_external_fence_capabilities : extension revision 1
VK_KHR_external_memory_capabilities : extension revision 1
VK_KHR_external_semaphore_capabilities : extension revision 1
VK_KHR_get_display_properties2 : extension revision 1
VK_KHR_get_physical_device_properties2 : extension revision 1
VK_KHR_get_surface_capabilities2 : extension revision 1
VK_KHR_surface : extension revision 25
VK_KHR_surface_protected_capabilities : extension revision 1
VK_KHR_wayland_surface : extension revision 6
VK_KHR_xcb_surface : extension revision 6
VK_KHR_xlib_surface : extension revision 6

Layers: count = 12
=======
....
---------------------------------
samplerMirrorClampToEdge = true
drawIndirectCount = true
storageBuffer8BitAccess = true
uniformAndStorageBuffer8BitAccess = true
storagePushConstant8 = true
shaderBufferInt64Atomics = true
shaderSharedInt64Atomics = false
shaderFloat16 = true
shaderInt8 = true
descriptorIndexing = true
shaderInputAttachmentArrayDynamicIndexing = false
shaderUniformTexelBufferArrayDynamicIndexing = true
shaderStorageTexelBufferArrayDynamicIndexing = true
shaderUniformBufferArrayNonUniformIndexing = false
shaderSampledImageArrayNonUniformIndexing = true
shaderStorageBufferArrayNonUniformIndexing = true
shaderStorageImageArrayNonUniformIndexing = true
shaderInputAttachmentArrayNonUniformIndexing = false
shaderUniformTexelBufferArrayNonUniformIndexing = true
shaderStorageTexelBufferArrayNonUniformIndexing = true
descriptorBindingUniformBufferUpdateAfterBind = false
descriptorBindingSampledImageUpdateAfterBind = true
descriptorBindingStorageImageUpdateAfterBind = true
descriptorBindingStorageBufferUpdateAfterBind = true
descriptorBindingUniformTexelBufferUpdateAfterBind = true
descriptorBindingStorageTexelBufferUpdateAfterBind = true
descriptorBindingUpdateUnusedWhilePending = true
descriptorBindingPartiallyBound = true
descriptorBindingVariableDescriptorCount = false
runtimeDescriptorArray = true
samplerFilterMinmax = true
scalarBlockLayout = true
imagelessFramebuffer = true
uniformBufferStandardLayout = true
shaderSubgroupExtendedTypes = true
separateDepthStencilLayouts = true
hostQueryReset = true
timelineSemaphore = true
bufferDeviceAddress = true
bufferDeviceAddressCaptureReplay = true
bufferDeviceAddressMultiDevice = false
vulkanMemoryModel = true
vulkanMemoryModelDeviceScope = true
vulkanMemoryModelAvailabilityVisibilityChains = true
shaderOutputViewportIndex = true
shaderOutputLayer = true
subgroupBroadcastDynamicId = true

VkPhysicalDeviceVulkanMemoryModelFeatures:
------------------------------------------
vulkanMemoryModel = true
vulkanMemoryModelDeviceScope = true
vulkanMemoryModelAvailabilityVisibilityChains = true

VkPhysicalDeviceYcbcrImageArraysFeaturesEXT:
--------------------------------------------
ycbcrImageArrays = true


ubuntu@ubuntu:~/ncnn/build/benchmark$ vkcube
INTEL-MESA: warning: Performance support disabled, consider sysctl dev.i915.perf_stream_paranoid=0

测试结果图

50、ubuntu18.04&20.04+CUDA11.1+cudnn11.3+TensorRT7.2+Deepsteam5.1+vulkan环境搭建和YOLO5部署_ubuntu_04

貌似NCNN执行需要以vulkan的sdk导入方式使用,官网https://vulkan.lunarg.com/sdk/home 下载的vulkansdk-linux-x86_64-1.2.182.0 然后解压,放入了/usr/local目录下了

ubuntu@ubuntu:~$ sudo cp -r  vulkansdk-linux-x86_64-1.2.182.0/ /usr/local/
[sudo] password for ubuntu:
ubuntu@ubuntu:~$ cd /usr/local/

添加环境变量 记得source一下

export Vulkan_LIBRARY=/usr/local/vulkansdk-linux-x86_64-1.2.182.0/1.2.182.0/x86_64/lib
export Vulkan_INCLUDE_DIR=/usr/local/vulkansdk-linux-x86_64-1.2.182.0/1.2.182.0/x86_64/include
export Vulkan_BIN=/usr/local/vulkansdk-linux-x86_64-1.2.182.0/1.2.182.0/x86_64/bin
export PATH=$PATH:$Vulkan_LIBRARY
export PATH=$PATH:$Vulkan_INCLUDE_DIR
export PATH=$PATH:$Vulkan_BIN

(七)下载大佬提供的tensorRT源代码 

ubuntu@ubuntu:~$  https://github.com/wang-xinyu/tensorrtx.git
ubuntu@ubuntu:~$ https://github.com/ultralytics/yolov5.git
ubuntu@ubuntu:~$ cp tensorrtx/yolov5/gen_wts.py yolov5

然后修改一下py脚本的内容

import torch
import struct
from utils.torch_utils import select_device

# Initialize
device = select_device('cpu')
# Load model
model = torch.load('/home/ubuntu/yolov5/runs/train/exp/weights/best.pt', map_location=device)['model'].float() # load to FP32
model.to(device).eval()

f = open('/home/ubuntu/yolov5/runs/train/exp/weights/bestyolov5x.wts', 'w')
f.write('{}\n'.format(len(model.state_dict().keys())))
for k, v in model.state_dict().items():
vr = v.reshape(-1).cpu().numpy()
f.write('{} {} '.format(k, len(vr)))
for vv in vr:
f.write(' ')
f.write(struct.pack('>f',float(vv)).hex())
f.write('\n')

修改问题点

ubuntu@ubuntu:~/yolov5$ sudo apt-get install liblzma-dev
Reading package lists... Done
Building dependency tree
Reading state information... Done
liblzma-dev is already the newest version (5.2.2-1.3).
0 upgraded, 0 newly installed, 0 to remove and 190 not upgraded.
ubuntu@ubuntu:~/yolov5$ sudo pip3 install -i https://pypi.tuna.tsinghua.edu.cn/simple backports.lzma
ubuntu@ubuntu:~/yolov5$ sudo vim /usr/local/lib/python3.7/lzma.py
源码代码
from _lzma import *
from _lzma import _encode_filter_properties, _decode_filter_properties
修改代码为
try:#
from _lzma import *
from _lzma import _encode_filter_properties, _decode_filter_properties
except ImportError:#
from backports.lzma import * ##
from backports.lzma import _encode_filter_properties, _decode_filter_properties#

然后继续执行成功

ubuntu@ubuntu:~/yolov5$ python3 gen_wts.py 
ubuntu@ubuntu:~/yolov5$ ls runs/train/exp/weights/
best.pt bestyolov5x.wts last.pt
ubuntu@ubuntu:~/yolov5$ cp runs/train/exp/weights/bestyolov5x.wts ../tensorrtx/yolov5

进行模型转换

ubuntu@ubuntu:~$ mkdir tensorrtx/yolov5/build 
ubuntu@ubuntu:~$ cd tensorrtx/yolov5/build
ubuntu@ubuntu:~/tensorrtx/yolov5$ cp /home/ubuntu/NVIDIA_CUDA-11.1_Samples/TensorRT-7.2.2.3/include/* .
ubuntu@ubuntu:~/tensorrtx/yolov5/build$ make
[ 20%] Linking CXX shared library libmyplugins.so
/usr/bin/ld: cannot find -lnvinfer
collect2: error: ld returned 1 exit status
CMakeFiles/myplugins.dir/build.make:341: recipe for target 'libmyplugins.so' failed
make[2]: *** [libmyplugins.so] Error 1
CMakeFiles/Makefile2:67: recipe for target 'CMakeFiles/myplugins.dir/all' failed
make[1]: *** [CMakeFiles/myplugins.dir/all] Error 2
Makefile:83: recipe for target 'all' failed
make: *** [all] Error 2
ubuntu@ubuntu:~/tensorrtx/yolov5/build$ sudo cp ~/NVIDIA_CUDA-11.1_Samples/TensorRT-7.2.2.3/lib/libnvinfer.so /usr/local/lib/
ubuntu@ubuntu:~/tensorrtx/yolov5/build$ make
[ 20%] Linking CXX shared library libmyplugins.so
[ 40%] Built target myplugins
Scanning dependencies of target yolov5
[ 60%] Building CXX object CMakeFiles/yolov5.dir/calibrator.cpp.o
[ 80%] Building CXX object CMakeFiles/yolov5.dir/yolov5.cpp.o
[100%] Linking CXX executable yolov5
[100%] Built target yolov5
ubuntu@ubuntu:~/tensorrtx/yolov5/build$ sudo ./yolov5 -s
./yolov5: error while loading shared libraries: libnvinfer.so.7: cannot open shared object file: No such file or directory
ubuntu@ubuntu:~/tensorrtx/yolov5/build$ sudo cp ~/NVIDIA_CUDA-11.1_Samples/TensorRT-7.2.2.3/lib/libnvinfer.so.7 /usr/local/lib/
ubuntu@ubuntu:~/tensorrtx/yolov5/build$ sudo ./yolov5 -s
./yolov5: error while loading shared libraries: libmyelin.so.1: cannot open shared object file: No such file or directory
ubuntu@ubuntu:~/tensorrtx/yolov5/build$ sudo cp ~/NVIDIA_CUDA-11.1_Samples/TensorRT-7.2.2.3/lib/libmyelin.so.1 /usr/local/lib/

转换模型报错

ubuntu@ubuntu:~/tensorrtx/yolov5/build$ sudo ./yolov5 -s ../bestyolov5x.wts ../bestyolov5.eigine x
Loading weights: ../bestyolov5x.wts
[03/23/2021-09:24:48] [E] [TRT] (Unnamed Layer* 475) [Convolution]: kernel weights has count 6720 but 81600 was expected
[03/23/2021-09:24:48] [E] [TRT] (Unnamed Layer* 475) [Convolution]: count of 6720 weights in kernel, but kernel dimensions (1,1) with 320 input channels, 255 output channels and 1 groups were specified. Expected Weights count is 320 * 1*1 * 255 / 1 = 81600
[03/23/2021-09:24:48] [E] [TRT] (Unnamed Layer* 475) [Convolution]: kernel weights has count 6720 but 81600 was expected
[03/23/2021-09:24:48] [E] [TRT] (Unnamed Layer* 475) [Convolution]: count of 6720 weights in kernel, but kernel dimensions (1,1) with 320 input channels, 255 output channels and 1 groups were specified. Expected Weights count is 320 * 1*1 * 255 / 1 = 81600
[03/23/2021-09:24:48] [E] [TRT] (Unnamed Layer* 475) [Convolution]: kernel weights has count 6720 but 81600 was expected
[03/23/2021-09:24:48] [E] [TRT] (Unnamed Layer* 475) [Convolution]: count of 6720 weights in kernel, but kernel dimensions (1,1) with 320 input channels, 255 output channels and 1 groups were specified. Expected Weights count is 320 * 1*1 * 255 / 1 = 81600
[03/23/2021-09:24:48] [E] [TRT] (Unnamed Layer* 475) [Convolution]: kernel weights has count 6720 but 81600 was expected
[03/23/2021-09:24:48] [E] [TRT] (Unnamed Layer* 475) [Convolution]: count of 6720 weights in kernel, but kernel dimensions (1,1) with 320 input channels, 255 output channels and 1 groups were specified. Expected Weights count is 320 * 1*1 * 255 / 1 = 81600
Building engine, please wait for a while...
[03/23/2021-09:24:48] [E] [TRT] (Unnamed Layer* 475) [Convolution]: kernel weights has count 6720 but 81600 was expected
[03/23/2021-09:24:48] [E] [TRT] (Unnamed Layer* 475) [Convolution]: count of 6720 weights in kernel, but kernel dimensions (1,1) with 320 input channels, 255 output channels and 1 groups were specified. Expected Weights count is 320 * 1*1 * 255 / 1 = 81600
[03/23/2021-09:24:48] [E] [TRT] Could not compute dimensions for (Unnamed Layer* 475) [Convolution]_output, because the network is not valid.
[03/23/2021-09:24:48] [E] [TRT] Network validation failed.
Build engine successfully!
yolov5: /home/ubuntu/tensorrtx/yolov5/yolov5.cpp:143: void APIToModel(unsigned int, nvinfer1::IHostMemory**, float&, float&, std::__cxx11::string&): Assertion `engine != nullptr' failed.
Aborted

修改代码类别

​yololayer.h​​​中 修改​​static constexpr int CLASS_NUM = 2;​​默认是80

ubuntu@ubuntu:~/tensorrtx/yolov5/build$ sudo ./yolov5 -s ../bestyolov5x.wts ../bestyolov5.eigine x
Loading weights: ../bestyolov5x.wts
Building engine, please wait for a while...
[03/23/2021-09:32:55] [W] [TRT] Half2 support requested on hardware without native FP16 support, performance will be negatively affected.
[03/23/2021-09:33:03] [W] [TRT] TensorRT was linked against cuBLAS/cuBLAS LT 11.3.0 but loaded cuBLAS/cuBLAS LT 11.2.1
[03/23/2021-09:34:10] [W] [TRT] TensorRT was linked against cuBLAS/cuBLAS LT 11.3.0 but loaded cuBLAS/cuBLAS LT 11.2.1
Build engine successfully!

测试一下Pytorch和tensorRT速度

ubuntu@ubuntu:~/yolov5$ python3 detect.py --source /home/ubuntu/Downloads/video/defect/20210201090237.avi --weights runs/train/exp/weights/best.pt --device "0"

视频时长 1 minute 11 seconds ;

PyTorch处理时间  (166.625s)

ubuntu@ubuntu:~/yolov5$ python3 detect.py --source /home/ubuntu/Downloads/video/defect/20210201090749.avi --weights runs/train/exp/weights/best.pt --device "0"

视频时长 5 seconds ;

PyTorch处理时间  (12.689s)

ubuntu@ubuntu:~/tensorrtx/yolov5/build$ ./yolov5 -v ../bestyolov5.eigine /home/ubuntu/Downloads/video/defect/20210201090237.avi

视频时长 1 minute 11 seconds ;

PyTorch处理时间  (137200ms)

ubuntu@ubuntu:~/tensorrtx/yolov5/build$ ./yolov5 -v ../bestyolov5.eigine /home/ubuntu/Downloads/video/defect/20210201090749.avi

视频时长 5 seconds ;

TensorRT处理时间  (12832ms)

付录一个修改 读视频的代码 很简单/home/ubuntu/tensorrtx/yolov5/yolov5.cpp 修改版

#include <iostream>
#include <chrono>
#include "cuda_utils.h"
#include "logging.h"
#include "common.hpp"
#include "utils.h"
#include "calibrator.h"

#define USE_FP16 // set USE_INT8 or USE_FP16 or USE_FP32
#define DEVICE 0 // GPU id
#define NMS_THRESH 0.4
#define CONF_THRESH 0.5
#define BATCH_SIZE 1

// stuff we know about the network and the input/output blobs
static const int INPUT_H = Yolo::INPUT_H;
static const int INPUT_W = Yolo::INPUT_W;
static const int CLASS_NUM = Yolo::CLASS_NUM;
static const int OUTPUT_SIZE = Yolo::MAX_OUTPUT_BBOX_COUNT * sizeof(Yolo::Detection) / sizeof(float) +
1; // we assume the yololayer outputs no more than MAX_OUTPUT_BBOX_COUNT boxes that conf >= 0.1
const char *INPUT_BLOB_NAME = "data";
const char *OUTPUT_BLOB_NAME = "prob";
static Logger gLogger;

static int get_width(int x, float gw, int divisor = 8) {
//return math.ceil(x / divisor) * divisor
if (int(x * gw) % divisor == 0) {
return int(x * gw);
}
return (int(x * gw / divisor) + 1) * divisor;
}

static int get_depth(int x, float gd) {
if (x == 1) {
return 1;
} else {
return round(x * gd) > 1 ? round(x * gd) : 1;
}
}

ICudaEngine *
build_engine(unsigned int maxBatchSize, IBuilder *builder, IBuilderConfig *config, DataType dt, float &gd, float &gw,
std::string &wts_name) {
INetworkDefinition *network = builder->createNetworkV2(0U);

// Create input tensor of shape {3, INPUT_H, INPUT_W} with name INPUT_BLOB_NAME
ITensor *data = network->addInput(INPUT_BLOB_NAME, dt, Dims3{3, INPUT_H, INPUT_W});
assert(data);

std::map<std::string, Weights> weightMap = loadWeights(wts_name);
Weights emptywts{DataType::kFLOAT, nullptr, 0};

/* ------ yolov5 backbone------ */
auto focus0 = focus(network, weightMap, *data, 3, get_width(64, gw), 3, "model.0");
auto conv1 = convBlock(network, weightMap, *focus0->getOutput(0), get_width(128, gw), 3, 2, 1, "model.1");
auto bottleneck_CSP2 = C3(network, weightMap, *conv1->getOutput(0), get_width(128, gw), get_width(128, gw),
get_depth(3, gd), true, 1, 0.5, "model.2");
auto conv3 = convBlock(network, weightMap, *bottleneck_CSP2->getOutput(0), get_width(256, gw), 3, 2, 1, "model.3");
auto bottleneck_csp4 = C3(network, weightMap, *conv3->getOutput(0), get_width(256, gw), get_width(256, gw),
get_depth(9, gd), true, 1, 0.5, "model.4");
auto conv5 = convBlock(network, weightMap, *bottleneck_csp4->getOutput(0), get_width(512, gw), 3, 2, 1, "model.5");
auto bottleneck_csp6 = C3(network, weightMap, *conv5->getOutput(0), get_width(512, gw), get_width(512, gw),
get_depth(9, gd), true, 1, 0.5, "model.6");
auto conv7 = convBlock(network, weightMap, *bottleneck_csp6->getOutput(0), get_width(1024, gw), 3, 2, 1, "model.7");
auto spp8 = SPP(network, weightMap, *conv7->getOutput(0), get_width(1024, gw), get_width(1024, gw), 5, 9, 13,
"model.8");

/* ------ yolov5 head ------ */
auto bottleneck_csp9 = C3(network, weightMap, *spp8->getOutput(0), get_width(1024, gw), get_width(1024, gw),
get_depth(3, gd), false, 1, 0.5, "model.9");
auto conv10 = convBlock(network, weightMap, *bottleneck_csp9->getOutput(0), get_width(512, gw), 1, 1, 1,
"model.10");

float *deval(malloc(sizeof(float) * get_width(512, gw) * 2 * 2));
for (int i = 0; i < get_width(512, gw) * 2 * 2; i++) {
deval(512, gw) * 2 * 2};
IDeconvolutionLayer *deconv11 = network->addDeconvolutionNd(*conv10->getOutput(0), get_width(512, gw), DimsHW{2, 2},
deconvwts11, emptywts);
deconv11->setStrideNd(DimsHW{2, 2});
deconv11->setNbGroups(get_width(512, gw));
weightMap["deconv11"] = deconvwts11;

ITensor *inputTensors12[] = {deconv11->getOutput(0), bottleneck_csp6->getOutput(0)};
auto cat12 = network->addConcatenation(inputTensors12, 2);
auto bottleneck_csp13 = C3(network, weightMap, *cat12->getOutput(0), get_width(1024, gw), get_width(512, gw),
get_depth(3, gd), false, 1, 0.5, "model.13");
auto conv14 = convBlock(network, weightMap, *bottleneck_csp13->getOutput(0), get_width(256, gw), 1, 1, 1,
"model.14");

Weights deconvwts15{DataType::kFLOAT, deval(256, gw) * 2 * 2};
IDeconvolutionLayer *deconv15 = network->addDeconvolutionNd(*conv14->getOutput(0), get_width(256, gw), DimsHW{2, 2},
deconvwts15, emptywts);
deconv15->setStrideNd(DimsHW{2, 2});
deconv15->setNbGroups(get_width(256, gw));
ITensor *inputTensors16[] = {deconv15->getOutput(0), bottleneck_csp4->getOutput(0)};
auto cat16 = network->addConcatenation(inputTensors16, 2);

auto bottleneck_csp17 = C3(network, weightMap, *cat16->getOutput(0), get_width(512, gw), get_width(256, gw),
get_depth(3, gd), false, 1, 0.5, "model.17");

// yolo layer 0
IConvolutionLayer *det0 = network->addConvolutionNd(*bottleneck_csp17->getOutput(0), 3 * (Yolo::CLASS_NUM + 5),
DimsHW{1, 1}, weightMap["model.24.m.0.weight"],
weightMap["model.24.m.0.bias"]);
auto conv18 = convBlock(network, weightMap, *bottleneck_csp17->getOutput(0), get_width(256, gw), 3, 2, 1,
"model.18");
ITensor *inputTensors19[] = {conv18->getOutput(0), conv14->getOutput(0)};
auto cat19 = network->addConcatenation(inputTensors19, 2);
auto bottleneck_csp20 = C3(network, weightMap, *cat19->getOutput(0), get_width(512, gw), get_width(512, gw),
get_depth(3, gd), false, 1, 0.5, "model.20");
//yolo layer 1
IConvolutionLayer *det1 = network->addConvolutionNd(*bottleneck_csp20->getOutput(0), 3 * (Yolo::CLASS_NUM + 5),
DimsHW{1, 1}, weightMap["model.24.m.1.weight"],
weightMap["model.24.m.1.bias"]);
auto conv21 = convBlock(network, weightMap, *bottleneck_csp20->getOutput(0), get_width(512, gw), 3, 2, 1,
"model.21");
ITensor *inputTensors22[] = {conv21->getOutput(0), conv10->getOutput(0)};
auto cat22 = network->addConcatenation(inputTensors22, 2);
auto bottleneck_csp23 = C3(network, weightMap, *cat22->getOutput(0), get_width(1024, gw), get_width(1024, gw),
get_depth(3, gd), false, 1, 0.5, "model.23");
IConvolutionLayer *det2 = network->addConvolutionNd(*bottleneck_csp23->getOutput(0), 3 * (Yolo::CLASS_NUM + 5),
DimsHW{1, 1}, weightMap["model.24.m.2.weight"],
weightMap["model.24.m.2.bias"]);

auto yolo = addYoLoLayer(network, weightMap, det0, det1, det2);
yolo->getOutput(0)->setName(OUTPUT_BLOB_NAME);
network->markOutput(*yolo->getOutput(0));

// Build engine
builder->setMaxBatchSize(maxBatchSize);
config->setMaxWorkspaceSize(16 * (1 << 20)); // 16MB
#if defined(USE_FP16)
config->setFlag(BuilderFlag::kFP16);
#elif defined(USE_INT8)
std::cout << "Your platform support int8: " << (builder->platformHasFastInt8() ? "true" : "false") << std::endl;
assert(builder->platformHasFastInt8());
config->setFlag(BuilderFlag::kINT8);
Int8EntropyCalibrator2* calibrator = new Int8EntropyCalibrator2(1, INPUT_W, INPUT_H, "./coco_calib/", "int8calib.table", INPUT_BLOB_NAME);
config->setInt8Calibrator(calibrator);
#endif

std::cout << "Building engine, please wait for a while..." << std::endl;
ICudaEngine *engine = builder->buildEngineWithConfig(*network, *config);
std::cout << "Build engine successfully!" << std::endl;

// Don't need the network any more
network->destroy();

// Release host memory
for (auto &mem : weightMap) {
free((void *) (mem.second.values));
}

return engine;
}

void APIToModel(unsigned int maxBatchSize, IHostMemory **modelStream, float &gd, float &gw, std::string &wts_name) {
// Create builder
IBuilder *builder = createInferBuilder(gLogger);
IBuilderConfig *config = builder->createBuilderConfig();

// Create model to populate the network, then set the outputs and create an engine
ICudaEngine *engine = build_engine(maxBatchSize, builder, config, DataType::kFLOAT, gd, gw, wts_name);
assert(engine != nullptr);

// Serialize the engine
(*modelStream) = engine->serialize();

// Close everything down
engine->destroy();
builder->destroy();
config->destroy();
}

void doInference(IExecutionContext &context, cudaStream_t &stream, void **buffers, float *input, float *output,
int batchSize) {
// DMA input batch data to device, infer on the batch asynchronously, and DMA output back to host
CUDA_CHECK(cudaMemcpyAsync(buffers[0], input, batchSize * 3 * INPUT_H * INPUT_W * sizeof(float),
cudaMemcpyHostToDevice, stream));
context.enqueue(batchSize, buffers, stream, nullptr);
CUDA_CHECK(cudaMemcpyAsync(output, buffers[1], batchSize * OUTPUT_SIZE * sizeof(float), cudaMemcpyDeviceToHost,
stream));
cudaStreamSynchronize(stream);
}

bool
parse_args(int argc, char **argv, std::string &wts, std::string &engine, float &gd, float &gw, std::string &img_dir,
std::string &video_path) {
if (argc < 4) return false;
if (std::string(argv[1]) == "-s" && (argc == 5 || argc == 7)) {
wts = std::string(argv[2]);
engine = std::string(argv[3]);
auto net = std::string(argv[4]);
if (net == "s") {
gd = 0.33;
gw = 0.50;
} else if (net == "m") {
gd = 0.67;
gw = 0.75;
} else if (net == "l") {
gd = 1.0;
gw = 1.0;
} else if (net == "x") {
gd = 1.33;
gw = 1.25;
} else if (net == "c" && argc == 7) {
gd = atof(argv[5]);
gw = atof(argv[6]);
} else {
return false;
}
} else if (std::string(argv[1]) == "-d" && argc == 4) {
engine = std::string(argv[2]);
img_dir = std::string(argv[3]);
} else if (std::string(argv[1]) == "-v" && argc == 4) {
engine = std::string(argv[2]);
video_path = std::string(argv[3]);
} else {
return false;
}
return true;
}

int main(int argc, char **argv) {
cudaSetDevice(DEVICE);

std::string wts_name = "";
std::string engine_name = "";
float gd = 0.0f, gw = 0.0f;
std::string img_dir;
std::string video_path = "";
if (!parse_args(argc, argv, wts_name, engine_name, gd, gw, img_dir, video_path)) {
std::cerr << "arguments not right!" << std::endl;
std::cerr << "./yolov5 -s [.wts] [.engine] [s/m/l/x or c gd gw] // serialize model to plan file" << std::endl;
std::cerr << "./yolov5 -d [.engine] ../samples // deserialize plan file and run inference" << std::endl;
std::cerr << "./yolov5 -v [.engine] [.mp4] // deserialize plan file and run inference"
<< std::endl; //sxj731533730
return -1;
}

// create a model using the API directly and serialize it to a stream
if (!wts_name.empty()) {
IHostMemory *modelStream{nullptr};
APIToModel(BATCH_SIZE, &modelStream, gd, gw, wts_name);
assert(modelStream != nullptr);
std::ofstream p(engine_name, std::ios::binary);
if (!p) {
std::cerr << "could not open plan output file" << std::endl;
return -1;
}
p.write(reinterpret_cast<const char *>(modelStream->data()), modelStream->size());
modelStream->destroy();
return 0;
}

// deserialize the .engine and run inference
std::ifstream file(engine_name, std::ios::binary);
if (!file.good()) {
std::cerr << "read " << engine_name << " error!" << std::endl;
return -1;
}
char *trtModelStream = nullptr;
size_t size = 0;
file.seekg(0, file.end);
size = file.tellg();
file.seekg(0, file.beg);
trtModelStream = new char[size];
assert(trtModelStream);
file.read(trtModelStream, size);
file.close();

std::vector<std::string> file_names;
if (read_files_in_dir(img_dir.c_str(), file_names) < 0 && video_path.empty()) {
std::cerr << "read_files_in_dir failed." << std::endl;
return -1;
}

// prepare input data ---------------------------
static float data[BATCH_SIZE * 3 * INPUT_H * INPUT_W];
//for (int i = 0; i < 3 * INPUT_H * INPUT_W; i++)
// data[i] = 1.0;
static float prob[BATCH_SIZE * OUTPUT_SIZE];
IRuntime *runtime = createInferRuntime(gLogger);
assert(runtime != nullptr);
ICudaEngine *engine = runtime->deserializeCudaEngine(trtModelStream, size);
assert(engine != nullptr);
IExecutionContext *context = engine->createExecutionContext();
assert(context != nullptr);
delete[] trtModelStream;
assert(engine->getNbBindings() == 2);
void *buffers[2];
// In order to bind the buffers, we need to know the names of the input and output tensors.
// Note that indices are guaranteed to be less than IEngine::getNbBindings()
const int inputIndex = engine->getBindingIndex(INPUT_BLOB_NAME);
const int outputIndex = engine->getBindingIndex(OUTPUT_BLOB_NAME);
assert(inputIndex == 0);
assert(outputIndex == 1);
// Create GPU buffers on device
CUDA_CHECK(cudaMalloc(&buffers[inputIndex], BATCH_SIZE * 3 * INPUT_H * INPUT_W * sizeof(float)));
CUDA_CHECK(cudaMalloc(&buffers[outputIndex], BATCH_SIZE * OUTPUT_SIZE * sizeof(float)));
// Create stream
cudaStream_t stream;
CUDA_CHECK(cudaStreamCreate(&stream));


if (!video_path.empty()) {

cv::Mat frame;
std::cout << video_path << std::endl;
cv::VideoCapture capture(video_path);

if (!capture.isOpened()) {
printf("could not read this video file...\n");
return -1;
}
int type = static_cast<int>(capture.get(cv::CAP_PROP_FOURCC));
cv::Size S = cv::Size((int)capture.get(cv::CAP_PROP_FRAME_WIDTH), (int)capture.get(cv::CAP_PROP_FRAME_HEIGHT));
int fps = capture.get(cv::CAP_PROP_FPS);
printf("当前视频文件 FPS: %d \n", fps);

cv::VideoWriter out("/home/ubuntu/yolov5/runs/detect/tensorRTbest/20210201090237.mp4", type, fps, S, true);

auto Tstart = std::chrono::system_clock::now();
while (true) {
capture >> frame; //读取当前帧
if (frame.empty()) { //判断
break;
}
cv::Mat pr_img = preprocess_img(frame, INPUT_W, INPUT_H); // letterbox BGR to RGB
int i = 0;
for (int row = 0; row < INPUT_H; ++row) {
uchar *uc_pixel = pr_img.data + row * pr_img.step;
for (int col = 0; col < INPUT_W; ++col) {
data[0 * 3 * INPUT_H * INPUT_W + i] = (float) uc_pixel[2] / 255.0;
data[0 * 3 * INPUT_H * INPUT_W + i + INPUT_H * INPUT_W] = (float) uc_pixel[1] / 255.0;
data[0 * 3 * INPUT_H * INPUT_W + i + 2 * INPUT_H * INPUT_W] = (float) uc_pixel[0] / 255.0;
uc_pixel += 3;
++i;
}
}

// Run inference
auto start = std::chrono::system_clock::now();
doInference(*context, stream, buffers, data, prob, BATCH_SIZE);
auto end = std::chrono::system_clock::now();
std::cout << std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count() << "ms"
<< std::endl;
std::vector<std::vector<Yolo::Detection>> batch_res(1);

auto &res = batch_res[0];
nms(res, &prob[0 * OUTPUT_SIZE], CONF_THRESH, NMS_THRESH);
//std::cout << res.size() << std::endl;
for (size_t j = 0; j < res.size(); j++) {
cv::Rect r = get_rect(frame, res[j].bbox);
cv::rectangle(frame, r, cv::Scalar(0x27, 0xC1, 0x36), 2);
cv::putText(frame, std::to_string((int) res[j].class_id), cv::Point(r.x, r.y - 1),
cv::FONT_HERSHEY_PLAIN, 1.2, cv::Scalar(0xFF, 0xFF, 0xFF), 2);
}
cv::imshow("demo", frame);
out << frame;
if (cv::waitKey(20) == 'q') //延时20ms,获取用户是否按键的情况,如果按下q,会推出程序
break;
}
auto Tend = std::chrono::system_clock::now();
std::cout << std::chrono::duration_cast<std::chrono::milliseconds>(Tend - Tstart).count() << "ms"
<< std::endl;
out.release();
capture.release(); //释放摄像头资源
} else {
int fcount = 0;
for (int f = 0; f < (int) file_names.size(); f++) {
fcount++;
if (fcount < BATCH_SIZE && f + 1 != (int) file_names.size()) continue;
for (int b = 0; b < fcount; b++) {
cv::Mat img = cv::imread(img_dir + "/" + file_names[f - fcount + 1 + b]);
if (img.empty()) continue;
cv::Mat pr_img = preprocess_img(img, INPUT_W, INPUT_H); // letterbox BGR to RGB
int i = 0;
for (int row = 0; row < INPUT_H; ++row) {
uchar *uc_pixel = pr_img.data + row * pr_img.step;
for (int col = 0; col < INPUT_W; ++col) {
data[b * 3 * INPUT_H * INPUT_W + i] = (float) uc_pixel[2] / 255.0;
data[b * 3 * INPUT_H * INPUT_W + i + INPUT_H * INPUT_W] = (float) uc_pixel[1] / 255.0;
data[b * 3 * INPUT_H * INPUT_W + i + 2 * INPUT_H * INPUT_W] = (float) uc_pixel[0] / 255.0;
uc_pixel += 3;
++i;
}
}
}

// Run inference
auto start = std::chrono::system_clock::now();
doInference(*context, stream, buffers, data, prob, BATCH_SIZE);
auto end = std::chrono::system_clock::now();
std::cout << std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count() << "ms"
<< std::endl;
std::vector<std::vector<Yolo::Detection>> batch_res(fcount);
for (int b = 0; b < fcount; b++) {
auto &res = batch_res[b];
nms(res, &prob[b * OUTPUT_SIZE], CONF_THRESH, NMS_THRESH);
}
for (int b = 0; b < fcount; b++) {
auto &res = batch_res[b];
//std::cout << res.size() << std::endl;
cv::Mat img = cv::imread(img_dir + "/" + file_names[f - fcount + 1 + b]);
for (size_t j = 0; j < res.size(); j++) {
cv::Rect r = get_rect(img, res[j].bbox);
cv::rectangle(img, r, cv::Scalar(0x27, 0xC1, 0x36), 2);
cv::putText(img, std::to_string((int) res[j].class_id), cv::Point(r.x, r.y - 1),
cv::FONT_HERSHEY_PLAIN, 1.2, cv::Scalar(0xFF, 0xFF, 0xFF), 2);
}
cv::imwrite("_" + file_names[f - fcount + 1 + b], img);
}
fcount = 0;
}
}
// Release stream and buffers
cudaStreamDestroy(stream);
CUDA_CHECK(cudaFree(buffers[inputIndex]));
CUDA_CHECK(cudaFree(buffers[outputIndex]));
// Destroy the engine
context->destroy();
engine->destroy();
runtime->destroy();

// Print histogram of the output distribution
//std::cout << "\nOutput:\n\n";
//for (unsigned int i = 0; i < OUTPUT_SIZE; i++)
//{
// std::cout << prob[i] << ", ";
// if (i % 10 == 0) std::cout << std::endl;
//}
//std::cout << std::endl;

return 0;
}

好像偶尔存在这个问题

cudnn 初始化失败的情况~

安装一下这个就好了

pip install torch==1.8.1+cu111 torchvision==0.9.1+cu111 torchaudio==0.8.1 -f https://download.pytorch.org/whl/torch_stable.html

如果某一天遇到问题,如下

NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

解决方案,请先把自动更新内核关掉,参考上述文章,在按照下面方法修改

ubuntu@ubuntu:~/ncnn/build/benchmark$ nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

ubuntu@ubuntu:~/ncnn/build/benchmark$ nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

ubuntu@ubuntu:~/ncnn/build/benchmark$ sudo apt-get install dkms
[sudo] password for ubuntu:
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following packages were automatically installed and are no longer required:
linux-headers-5.8.0-50-generic linux-hwe-5.8-headers-5.8.0-50 linux-image-5.8.0-50-generic linux-modules-5.8.0-50-generic
linux-modules-extra-5.8.0-50-generic
Use 'sudo apt autoremove' to remove them.
The following additional packages will be installed:
dctrl-tools
Suggested packages:
debtags menu
The following NEW packages will be installed:
dctrl-tools dkms
0 upgraded, 2 newly installed, 0 to remove and 173 not upgraded.
Need to get 128 kB of archives.
After this operation, 599 kB of additional disk space will be used.
Do you want to continue? [Y/n] y
Get:1 http://mirrors.aliyun.com/ubuntu focal/main amd64 dctrl-tools amd64 2.24-3 [61.5 kB]
Get:2 http://mirrors.aliyun.com/ubuntu focal-updates/main amd64 dkms all 2.8.1-5ubuntu2 [66.8 kB]
Fetched 128 kB in 1s (157 kB/s)
Selecting previously unselected package dctrl-tools.
(Reading database ... 288405 files and directories currently installed.)
Preparing to unpack .../dctrl-tools_2.24-3_amd64.deb ...
Unpacking dctrl-tools (2.24-3) ...
Selecting previously unselected package dkms.
Preparing to unpack .../dkms_2.8.1-5ubuntu2_all.deb ...
Unpacking dkms (2.8.1-5ubuntu2) ...
Setting up dctrl-tools (2.24-3) ...
Setting up dkms (2.8.1-5ubuntu2) ...
Processing triggers for man-db (2.9.1-1) ...
ubuntu@ubuntu:~/ncnn/build/benchmark$ sudo dkms install -m nvidia -v 460.67

Creating symlink /var/lib/dkms/nvidia/460.67/source ->
/usr/src/nvidia-460.67

DKMS: add completed.

Kernel preparation unnecessary for this kernel. Skipping...

Building module:
cleaning build area...
'make' -j12 NV_EXCLUDE_BUILD_MODULES='' KERNEL_UNAME=5.8.0-53-generic IGNORE_CC_MISMATCH='' modules.............
Signing module:
Generating a new Secure Boot signing key:
Can't load /var/lib/shim-signed/mok/.rnd into RNG
140140304328000:error:2406F079:random number generator:RAND_load_file:Cannot open file:../crypto/rand/randfile.c:98:Filename=/var/lib/shim-signed/mok/.rnd
Generating a RSA private key
.+++++
......................................+++++
writing new private key to '/var/lib/shim-signed/mok/MOK.priv'
-----
- /var/lib/dkms/nvidia/460.67/5.8.0-53-generic/x86_64/module/nvidia.ko
- /var/lib/dkms/nvidia/460.67/5.8.0-53-generic/x86_64/module/nvidia-drm.ko
- /var/lib/dkms/nvidia/460.67/5.8.0-53-generic/x86_64/module/nvidia-uvm.ko
- /var/lib/dkms/nvidia/460.67/5.8.0-53-generic/x86_64/module/nvidia-modeset.ko
Secure Boot not enabled on this system.
cleaning build area...

DKMS: build completed.

nvidia.ko:
Running module version sanity check.
- Original module
- No original module exists within this kernel
- Installation
- Installing to /lib/modules/5.8.0-53-generic/updates/dkms/

nvidia-uvm.ko:
Running module version sanity check.
- Original module
- No original module exists within this kernel
- Installation
- Installing to /lib/modules/5.8.0-53-generic/updates/dkms/

nvidia-modeset.ko:
Running module version sanity check.
- Original module
- No original module exists within this kernel
- Installation
- Installing to /lib/modules/5.8.0-53-generic/updates/dkms/

nvidia-drm.ko:
Running module version sanity check.
- Original module
- No original module exists within this kernel
- Installation
- Installing to /lib/modules/5.8.0-53-generic/updates/dkms/

depmod...........

DKMS: install completed.
ubuntu@ubuntu:~/ncnn/build/benchmark$ nvidia-smi
Wed Jul 7 22:35:07 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.67 Driver Version: 460.67 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce GTX 105... Off | 00000000:01:00.0 Off | N/A |
| N/A 52C P8 N/A / N/A | 0MiB / 4040MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+

如果修改之后仍然未生效,那就在进入ubuntu系统过程中选择高级选项,试试提示的3个内核版本,哪个可用,就用哪个内核版本。 

(八)、Deepstream5-1的安装 

25、Jetson Xavier NX使用yolov5对比GPU模型下的pt、onnx、engine 、 DeepStream 加速性能_sxj731533730

标签:std,lib,get,Deepsteam5.1,TensorRT7.2,usr,ubuntu,vulkan,local
From: https://blog.51cto.com/u_12504263/5719088

相关文章