50、ubuntu18.04&20.04+CUDA11.1+cudnn11.3+TensorRT7.2+Deepsteam5.1+vulkan环境搭建和YOLO5部署

标签：std lib get Deepsteam5.1 TensorRT7.2 usr ubuntu vulkan local

基本思想：想学习一下TensorRT的使用，随笔记录一下；

链接：https://pan.baidu.com/s/1uFOktdF-bHcDDsufIqmNSA
提取码：k55w
复制这段内容后打开百度网盘手机App，操作更方便哦

记录一下pip安装命令：

pip install **** -i http://mirrors.aliyun.com/pypi/simple --trusted-host mirrors.aliyun.com

一、安装显卡驱动，显卡版本为RTX2080显卡

注意：首先将secure boot 设置为disabled

避免sudo apt-get install nvidia-*安装方式造成登录界面循环。

1. ubuntu 18.04默认安装nvidia显卡驱动首先需要禁用nouveau。

ubuntu@ubuntu:~$ sudo vim /etc/modprobe.d/blacklist.conf

在文件最后部分插入以下两行内容

blacklist nouveau
options nouveau modeset=0

更新系统

ubuntu@ubuntu:~$ sudo update-initramfs -u

重启系统，并且验证nouveau是否已禁用

ubuntu@ubuntu:~$ lsmod | grep nouveau

没有信息显示，说明nouveau已被禁用，重启一下。

2. 在英伟达的官网上查找你自己电脑的显卡型号然后下载相应的驱动。网址：官方 GeForce 驱动程序 | NVIDIA

我下载的版本：NVIDIA-Linux-x86_64-460.67.run

下载后的run文件拷贝至home目录下。

3. 在ubuntu下按ctrl+alt+f1进入命令行界面，

然后在命令行界面下输入：

ubuntu@ubuntu:~$ sudo apt-get install lightdm

ubuntu@ubuntu:~$ sudo service lightdm stop

4.然后卸载掉原有驱动：

ubuntu@ubuntu:~$ sudo apt-get remove nvidia-*

给驱动run文件赋予执行权限：

ubuntu@ubuntu:~$ sudo chmod  a+x NVIDIA-Linux-x86_64-460.67.run

执行安装：

ubuntu@ubuntu:~$ sudo ./NVIDIA-Linux-x86_64-460.67.run -no-x-check -no-nouveau-check -no-opengl-files

检查驱动是否安装成功：

ubuntu@ubuntu:~$ nvidia-smi

如果出现如下提示，则说明安装成功：

Mon Mar 22 13:00:57 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.67       Driver Version: 460.67       CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce RTX 208...  Off  | 00000000:01:00.0 Off |                  N/A |
| 22%   43C    P0    53W / 250W |      0MiB / 11016MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce RTX 208...  Off  | 00000000:03:00.0 Off |                  N/A |
| 19%   42C    P0    62W / 250W |      0MiB / 11019MiB |      1%      Default |
+-------------------------------+----------------------+----------------------+
|   2  GeForce RTX 208...  Off  | 00000000:82:00.0 Off |                  N/A |
|  0%   40C    P0     1W / 250W |      0MiB / 11019MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

还需要设置一下，禁止更新内核，以防以后出问题

命令行禁用linux自动更新内核

ubuntu@ubuntu:~$ cat /etc/apt/apt.conf.d/10periodic

APT::Periodic::Update-Package-Lists "1";

APT::Periodic::Download-Upgradeable-Packages "0";

APT::Periodic::AutocleanInterval "0";

将配置中的"Update-Package-Lists"参数配置为"0";

二、然后下载cuda11.1的驱动文件

ubuntu@ubuntu:~$ sudo ./cuda_11.1.0_455.23.05_linux.run

分别的选项为

50、ubuntu18.04&20.04+CUDA11.1+cudnn11.3+TensorRT7.2+Deepsteam5.1+vulkan环境搭建和YOLO5部署_linux

下一步，选中EULA选项

50、ubuntu18.04&20.04+CUDA11.1+cudnn11.3+TensorRT7.2+Deepsteam5.1+vulkan环境搭建和YOLO5部署_pytorch_02

下一步，安装不带驱动文件的cuda11.1

50、ubuntu18.04&20.04+CUDA11.1+cudnn11.3+TensorRT7.2+Deepsteam5.1+vulkan环境搭建和YOLO5部署_linux_03

显示安装完成


===========
= Summary =
===========

Driver:   Not Selected
Toolkit:  Installed in /usr/local/cuda-11.1/
Samples:  Installed in /home/ubuntu/

Please make sure that
 -   PATH includes /usr/local/cuda-11.1/bin
 -   LD_LIBRARY_PATH includes /usr/local/cuda-11.1/lib64, or, add /usr/local/cuda-11.3/lib64 to /etc/ld.so.conf and run ldconfig as root

To uninstall the CUDA Toolkit, run cuda-uninstaller in /usr/local/cuda-11.3/bin
***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least .00 is required for CUDA 11.3 functionality to work.
To install the driver using this installer, run the following command, replacing <CudaInstaller> with the name of this run file:
    sudo <CudaInstaller>.run --silent --driver

Logfile is /var/log/cuda-installer.log

然后在配置文件添加配置项

ubuntu@ubuntu:~$ sudo gedit ~/.bashrc

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-11.1/lib64
export PATH=$PATH:/usr/local/cuda-11.1/bin
export CUDA_HOME=/usr/local/cuda-11.1

ubuntu@ubuntu:~$ source ~/.bashrc

测试版本

nvcc -V

Pytorch

pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html

三、下载cudnn https://developer.nvidia.com/rdp/cudnn-download

解压文件，进行复制 11.3

ubuntu@ubuntu:~$ tar -zxvf cudnn-11.3-linux-x64-v8.2.0.53.tgz 
ubuntu@ubuntu:~$ sudo cp cuda/include/cudnn*.h   /usr/local/cuda/include
ubuntu@ubuntu:~$ sudo cp cuda/include/cudnn.h    /usr/local/cuda/include     
ubuntu@ubuntu:~$ sudo cp cuda/lib64/libcudnn*    /usr/local/cuda/lib64
ubuntu@ubuntu:~$ sudo chmod a+r /usr/local/cuda/include/cudnn.h   /usr/local/cuda/lib64/libcudnn*

查看一下

ubuntu@ubuntu:~$ cat /usr/local/cuda/include/cudnn.h | grep cudnn
/*   cudnn : Neural Networks Library
#include "cudnn_version.h"
#include "cudnn_ops_infer.h"
#include "cudnn_ops_train.h"
#include "cudnn_adv_infer.h"
#include "cudnn_adv_train.h"
#include "cudnn_cnn_infer.h"
#include "cudnn_cnn_train.h"
#include "cudnn_backend.h"

四、软连接

sudo ln -sf /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_adv_train.so.8.1.0 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_adv_train.so.8
sudo ln -sf /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8.1.0 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8
sudo ln -sf /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8.1.0 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8
sudo ln -sf /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8.1.0 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8
sudo ln -sf /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_ops_train.so.8.1.0 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_ops_train.so.8
sudo ln -sf /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8.1.0 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8
sudo ln -sf /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn.so.8.1.0 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn.so.8

sudo ln -sf /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_adv_train.so.8.2.0 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_adv_train.so.8
sudo ln -sf /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8.2.0 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8
sudo ln -sf /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8.2.0 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8
sudo ln -sf /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8.2.0 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8
sudo ln -sf /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_ops_train.so.8.2.0 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_ops_train.so.8
sudo ln -sf /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8.2.0 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8
sudo ln -sf /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn.so.8.2.0 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn.so.8

五、安装一下tensorRT https://developer.nvidia.com/nvidia-tensorrt-7x-download

ubuntu@ubuntu:~$ tar xzvf TensorRT-7.2.2.3.Ubuntu-18.04.x86_64-gnu.cuda-11.1.cudnn8.0.tar.gz /home/ubuntu/NVIDIA_CUDA-11.1_Samples
ubuntu@ubuntu:~$ sudo vim ~/.bashrc
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/ubuntu/NVIDIA_CUDA-11.1_Samples/TensorRT-7.2.2.3
ubuntu@ubuntu:~$ source ~/.bashrc
ubuntu@ubuntu:~$ cd NVIDIA_CUDA-11.3_Samples/TensorRT-7.2.2.3
ubuntu@ubuntu:~$ cd python
ubuntu@ubuntu:~$ sudo pip3 install tensorrt-7.2.2.3-cp37-none-linux_x86_64.whl
ubuntu@ubuntu:~$ cd ../uff
ubuntu@ubuntu:~$ sudo pip3 install uff-0.6.5-py2.py3-none-any.whl
ubuntu@ubuntu:~$ cd ../graphsurgeon
ubuntu@ubuntu:~$ sudo pip3 install graphsurgeon-0.4.1-py2.py3-none-any.whl

拷贝库

# TensorRT路径下
sudo cp -r ./lib/* /usr/lib
sudo cp -r ./include/* /usr/include

在以后遇到代码执行过程中问题（Python）然后需要拷贝so到/usr/lib文件夹中以后缺什么so 就从该目录中拷贝so到usr/local中

ubuntu@ubuntu:~/tensorrt_inference/arcface/build$ sudo cp ~/NVIDIA_CUDA-11.1_Samples/TensorRT-7.2.2.3/targets/x86_64-linux-gnu/lib/libnvinfer.so.7 /usr/lib/
ubuntu@ubuntu:~/tensorrt_inference/arcface/build$ sudo cp ~/NVIDIA_CUDA-11.1_Samples/TensorRT-7.2.2.3/targets/x86_64-linux-gnu/lib/libnvonnxparser.so.7 /usr/lib/
ubuntu@ubuntu:~/tensorrt_inference/arcface/build$ sudo cp ~/NVIDIA_CUDA-11.1_Samples/TensorRT-7.2.2.3/targets/x86_64-linux-gnu/lib/libnvinfer_plugin.so.7 /usr/lib/
ubuntu@ubuntu:~/tensorrt_inference/arcface/build$ sudo cp ~/NVIDIA_CUDA-11.1_Samples/TensorRT-7.2.2.3/targets/x86_64-linux-gnu/lib/libnvparsers.so.7 /usr/lib/
ubuntu@ubuntu:~/tensorrt_inference/arcface/build$ sudo cp ~/NVIDIA_CUDA-11.1_Samples/TensorRT-7.2.2.3/targets/x86_64-linux-gnu/lib/libmyelin.so.1 /usr/lib/

最后的文件配置为{.bashrc}

export CUDA_HOME=/usr/local/cuda
export PATH=$PATH:$CUDA_HOME/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64
export LD_LIBRARY_PATH=/home/ubuntu/NVIDIA_CUDA-11.1_Samples/TensorRT-7.2.2.3/lib:$LD_LIBRARY_PATH

测试一下版本信息

cat /usr/local/cuda/include/cudnn_version.h | grep CUDNN_MAJOR -A 2

ubuntu@ubuntu:~$ sudo ldconfig /usr/local/cuda/lib64
ubuntu@ubuntu:~$ python3
Python 3.8.6 (default, Sep 25 2020, 09:36:53) 
[GCC 10.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> print(torch.cuda.is_available())
True
>>> from torch.backends import cudnn
>>> print(cudnn.is_available())
True
>>>import tensorrt
>>>

然后下载代码；yolo5&

同时也顺便把vulkan安装一下，以便后续使用ncnn的vulkan加速功能

wget -qO - http://packages.lunarg.com/lunarg-signing-key-pub.asc | sudo apt-key add -
sudo wget -qO /etc/apt/sources.list.d/lunarg-vulkan-1.1.97-bionic.list http://packages.lunarg.com/vulkan/1.1.97/lunarg-vulkan-1.1.97-bionic.list
sudo apt update
sudo apt install lunarg-vulkan-sdk
sudo apt-get install cmake git gcc g++ mesa-* libwayland-dev libxrandr-dev
sudo apt-get install libvulkan1 mesa-vulkan-drivers vulkan-utils
vulkaninfo

测试结果

ubuntu@ubuntu:~/ncnn/build/benchmark$ vulkaninfo
ERROR: [Loader Message] Code 0 : libGLX_nvidia.so.0: cannot open shared object file: No such file or directory
ERROR: [Loader Message] Code 0 : /usr/lib/i386-linux-gnu/libvulkan_radeon.so: wrong ELF class: ELFCLASS32
ERROR: [Loader Message] Code 0 : /usr/lib/i386-linux-gnu/libvulkan_intel.so: wrong ELF class: ELFCLASS32
INTEL-MESA: warning: Performance support disabled, consider sysctl dev.i915.perf_stream_paranoid=0
 
==========
VULKANINFO
==========
 
Vulkan Instance Version: 1.2.131
 
 
Instance Extensions: count = 18
====================
  VK_EXT_acquire_xlib_display            : extension revision 1
  VK_EXT_debug_report                    : extension revision 9
  VK_EXT_debug_utils                     : extension revision 1
  VK_EXT_direct_mode_display             : extension revision 1
  VK_EXT_display_surface_counter         : extension revision 1
  VK_KHR_device_group_creation           : extension revision 1
  VK_KHR_display                         : extension revision 23
  VK_KHR_external_fence_capabilities     : extension revision 1
  VK_KHR_external_memory_capabilities    : extension revision 1
  VK_KHR_external_semaphore_capabilities : extension revision 1
  VK_KHR_get_display_properties2         : extension revision 1
  VK_KHR_get_physical_device_properties2 : extension revision 1
  VK_KHR_get_surface_capabilities2       : extension revision 1
  VK_KHR_surface                         : extension revision 25
  VK_KHR_surface_protected_capabilities  : extension revision 1
  VK_KHR_wayland_surface                 : extension revision 6
  VK_KHR_xcb_surface                     : extension revision 6
  VK_KHR_xlib_surface                    : extension revision 6
 
Layers: count = 12
=======
....
---------------------------------
  samplerMirrorClampToEdge                           = true
  drawIndirectCount                                  = true
  storageBuffer8BitAccess                            = true
  uniformAndStorageBuffer8BitAccess                  = true
  storagePushConstant8                               = true
  shaderBufferInt64Atomics                           = true
  shaderSharedInt64Atomics                           = false
  shaderFloat16                                      = true
  shaderInt8                                         = true
  descriptorIndexing                                 = true
  shaderInputAttachmentArrayDynamicIndexing          = false
  shaderUniformTexelBufferArrayDynamicIndexing       = true
  shaderStorageTexelBufferArrayDynamicIndexing       = true
  shaderUniformBufferArrayNonUniformIndexing         = false
  shaderSampledImageArrayNonUniformIndexing          = true
  shaderStorageBufferArrayNonUniformIndexing         = true
  shaderStorageImageArrayNonUniformIndexing          = true
  shaderInputAttachmentArrayNonUniformIndexing       = false
  shaderUniformTexelBufferArrayNonUniformIndexing    = true
  shaderStorageTexelBufferArrayNonUniformIndexing    = true
  descriptorBindingUniformBufferUpdateAfterBind      = false
  descriptorBindingSampledImageUpdateAfterBind       = true
  descriptorBindingStorageImageUpdateAfterBind       = true
  descriptorBindingStorageBufferUpdateAfterBind      = true
  descriptorBindingUniformTexelBufferUpdateAfterBind = true
  descriptorBindingStorageTexelBufferUpdateAfterBind = true
  descriptorBindingUpdateUnusedWhilePending          = true
  descriptorBindingPartiallyBound                    = true
  descriptorBindingVariableDescriptorCount           = false
  runtimeDescriptorArray                             = true
  samplerFilterMinmax                                = true
  scalarBlockLayout                                  = true
  imagelessFramebuffer                               = true
  uniformBufferStandardLayout                        = true
  shaderSubgroupExtendedTypes                        = true
  separateDepthStencilLayouts                        = true
  hostQueryReset                                     = true
  timelineSemaphore                                  = true
  bufferDeviceAddress                                = true
  bufferDeviceAddressCaptureReplay                   = true
  bufferDeviceAddressMultiDevice                     = false
  vulkanMemoryModel                                  = true
  vulkanMemoryModelDeviceScope                       = true
  vulkanMemoryModelAvailabilityVisibilityChains      = true
  shaderOutputViewportIndex                          = true
  shaderOutputLayer                                  = true
  subgroupBroadcastDynamicId                         = true
 
VkPhysicalDeviceVulkanMemoryModelFeatures:
------------------------------------------
  vulkanMemoryModel                             = true
  vulkanMemoryModelDeviceScope                  = true
  vulkanMemoryModelAvailabilityVisibilityChains = true
 
VkPhysicalDeviceYcbcrImageArraysFeaturesEXT:
--------------------------------------------
  ycbcrImageArrays = true
 
 
ubuntu@ubuntu:~/ncnn/build/benchmark$ vkcube
INTEL-MESA: warning: Performance support disabled, consider sysctl dev.i915.perf_stream_paranoid=0

测试结果图

50、ubuntu18.04&20.04+CUDA11.1+cudnn11.3+TensorRT7.2+Deepsteam5.1+vulkan环境搭建和YOLO5部署_ubuntu_04

貌似NCNN执行需要以vulkan的sdk导入方式使用，官网https://vulkan.lunarg.com/sdk/home 下载的vulkansdk-linux-x86_64-1.2.182.0 然后解压，放入了/usr/local目录下了

ubuntu@ubuntu:~$ sudo cp -r  vulkansdk-linux-x86_64-1.2.182.0/ /usr/local/
[sudo] password for ubuntu: 
ubuntu@ubuntu:~$ cd /usr/local/

添加环境变量记得source一下

export Vulkan_LIBRARY=/usr/local/vulkansdk-linux-x86_64-1.2.182.0/1.2.182.0/x86_64/lib
export Vulkan_INCLUDE_DIR=/usr/local/vulkansdk-linux-x86_64-1.2.182.0/1.2.182.0/x86_64/include
export Vulkan_BIN=/usr/local/vulkansdk-linux-x86_64-1.2.182.0/1.2.182.0/x86_64/bin
export PATH=$PATH:$Vulkan_LIBRARY
export PATH=$PATH:$Vulkan_INCLUDE_DIR
export PATH=$PATH:$Vulkan_BIN

（七）下载大佬提供的tensorRT源代码

ubuntu@ubuntu:~$  https://github.com/wang-xinyu/tensorrtx.git
ubuntu@ubuntu:~$ https://github.com/ultralytics/yolov5.git
ubuntu@ubuntu:~$ cp tensorrtx/yolov5/gen_wts.py yolov5

然后修改一下py脚本的内容

import torch
import struct
from utils.torch_utils import select_device

# Initialize
device = select_device('cpu')
# Load model
model = torch.load('/home/ubuntu/yolov5/runs/train/exp/weights/best.pt', map_location=device)['model'].float()  # load to FP32
model.to(device).eval（)

f = open('/home/ubuntu/yolov5/runs/train/exp/weights/bestyolov5x.wts', 'w')
f.write('{}\n'.format(len(model.state_dict().keys())))
for k, v in model.state_dict().items():
    vr = v.reshape(-1).cpu().numpy()
    f.write('{} {} '.format(k, len(vr)))
    for vv in vr:
        f.write(' ')
        f.write(struct.pack('>f',float(vv)).hex())
    f.write('\n')

修改问题点

ubuntu@ubuntu:~/yolov5$ sudo apt-get install liblzma-dev
Reading package lists... Done
Building dependency tree       
Reading state information... Done
liblzma-dev is already the newest version (5.2.2-1.3).
0 upgraded, 0 newly installed, 0 to remove and 190 not upgraded.
ubuntu@ubuntu:~/yolov5$ sudo pip3 install -i https://pypi.tuna.tsinghua.edu.cn/simple backports.lzma
ubuntu@ubuntu:~/yolov5$ sudo vim /usr/local/lib/python3.7/lzma.py
源码代码
from _lzma import *
from _lzma import _encode_filter_properties, _decode_filter_properties
修改代码为
try:#
    from _lzma import *
    from _lzma import _encode_filter_properties, _decode_filter_properties
except ImportError:#
    from backports.lzma import * ##
    from backports.lzma import _encode_filter_properties, _decode_filter_properties#

然后继续执行成功

ubuntu@ubuntu:~/yolov5$ python3 gen_wts.py 
ubuntu@ubuntu:~/yolov5$ ls runs/train/exp/weights/
best.pt  bestyolov5x.wts  last.pt
ubuntu@ubuntu:~/yolov5$ cp runs/train/exp/weights/bestyolov5x.wts ../tensorrtx/yolov5

进行模型转换

ubuntu@ubuntu:~$ mkdir tensorrtx/yolov5/build 
ubuntu@ubuntu:~$ cd tensorrtx/yolov5/build
ubuntu@ubuntu:~/tensorrtx/yolov5$ cp /home/ubuntu/NVIDIA_CUDA-11.1_Samples/TensorRT-7.2.2.3/include/* .
ubuntu@ubuntu:~/tensorrtx/yolov5/build$ make
[ 20%] Linking CXX shared library libmyplugins.so
/usr/bin/ld: cannot find -lnvinfer
collect2: error: ld returned 1 exit status
CMakeFiles/myplugins.dir/build.make:341: recipe for target 'libmyplugins.so' failed
make[2]: *** [libmyplugins.so] Error 1
CMakeFiles/Makefile2:67: recipe for target 'CMakeFiles/myplugins.dir/all' failed
make[1]: *** [CMakeFiles/myplugins.dir/all] Error 2
Makefile:83: recipe for target 'all' failed
make: *** [all] Error 2
ubuntu@ubuntu:~/tensorrtx/yolov5/build$ sudo cp ~/NVIDIA_CUDA-11.1_Samples/TensorRT-7.2.2.3/lib/libnvinfer.so /usr/local/lib/
ubuntu@ubuntu:~/tensorrtx/yolov5/build$ make
[ 20%] Linking CXX shared library libmyplugins.so
[ 40%] Built target myplugins
Scanning dependencies of target yolov5
[ 60%] Building CXX object CMakeFiles/yolov5.dir/calibrator.cpp.o
[ 80%] Building CXX object CMakeFiles/yolov5.dir/yolov5.cpp.o
[100%] Linking CXX executable yolov5
[100%] Built target yolov5
ubuntu@ubuntu:~/tensorrtx/yolov5/build$ sudo ./yolov5 -s
./yolov5: error while loading shared libraries: libnvinfer.so.7: cannot open shared object file: No such file or directory
ubuntu@ubuntu:~/tensorrtx/yolov5/build$ sudo cp ~/NVIDIA_CUDA-11.1_Samples/TensorRT-7.2.2.3/lib/libnvinfer.so.7 /usr/local/lib/
ubuntu@ubuntu:~/tensorrtx/yolov5/build$ sudo ./yolov5 -s
./yolov5: error while loading shared libraries: libmyelin.so.1: cannot open shared object file: No such file or directory
ubuntu@ubuntu:~/tensorrtx/yolov5/build$ sudo cp ~/NVIDIA_CUDA-11.1_Samples/TensorRT-7.2.2.3/lib/libmyelin.so.1 /usr/local/lib/

转换模型报错

ubuntu@ubuntu:~/tensorrtx/yolov5/build$ sudo ./yolov5 -s ../bestyolov5x.wts ../bestyolov5.eigine x
Loading weights: ../bestyolov5x.wts
[03/23/2021-09:24:48] [E] [TRT] (Unnamed Layer* 475) [Convolution]: kernel weights has count 6720 but 81600 was expected
[03/23/2021-09:24:48] [E] [TRT] (Unnamed Layer* 475) [Convolution]: count of 6720 weights in kernel, but kernel dimensions (1,1) with 320 input channels, 255 output channels and 1 groups were specified. Expected Weights count is 320 * 1*1 * 255 / 1 = 81600
[03/23/2021-09:24:48] [E] [TRT] (Unnamed Layer* 475) [Convolution]: kernel weights has count 6720 but 81600 was expected
[03/23/2021-09:24:48] [E] [TRT] (Unnamed Layer* 475) [Convolution]: count of 6720 weights in kernel, but kernel dimensions (1,1) with 320 input channels, 255 output channels and 1 groups were specified. Expected Weights count is 320 * 1*1 * 255 / 1 = 81600
[03/23/2021-09:24:48] [E] [TRT] (Unnamed Layer* 475) [Convolution]: kernel weights has count 6720 but 81600 was expected
[03/23/2021-09:24:48] [E] [TRT] (Unnamed Layer* 475) [Convolution]: count of 6720 weights in kernel, but kernel dimensions (1,1) with 320 input channels, 255 output channels and 1 groups were specified. Expected Weights count is 320 * 1*1 * 255 / 1 = 81600
[03/23/2021-09:24:48] [E] [TRT] (Unnamed Layer* 475) [Convolution]: kernel weights has count 6720 but 81600 was expected
[03/23/2021-09:24:48] [E] [TRT] (Unnamed Layer* 475) [Convolution]: count of 6720 weights in kernel, but kernel dimensions (1,1) with 320 input channels, 255 output channels and 1 groups were specified. Expected Weights count is 320 * 1*1 * 255 / 1 = 81600
Building engine, please wait for a while...
[03/23/2021-09:24:48] [E] [TRT] (Unnamed Layer* 475) [Convolution]: kernel weights has count 6720 but 81600 was expected
[03/23/2021-09:24:48] [E] [TRT] (Unnamed Layer* 475) [Convolution]: count of 6720 weights in kernel, but kernel dimensions (1,1) with 320 input channels, 255 output channels and 1 groups were specified. Expected Weights count is 320 * 1*1 * 255 / 1 = 81600
[03/23/2021-09:24:48] [E] [TRT] Could not compute dimensions for (Unnamed Layer* 475) [Convolution]_output, because the network is not valid.
[03/23/2021-09:24:48] [E] [TRT] Network validation failed.
Build engine successfully!
yolov5: /home/ubuntu/tensorrtx/yolov5/yolov5.cpp:143: void APIToModel(unsigned int, nvinfer1::IHostMemory**, float&, float&, std::__cxx11::string&): Assertion `engine != nullptr' failed.
Aborted

修改代码类别

yololayer.h中修改static constexpr int CLASS_NUM = 2;默认是80

ubuntu@ubuntu:~/tensorrtx/yolov5/build$ sudo ./yolov5 -s ../bestyolov5x.wts ../bestyolov5.eigine x
Loading weights: ../bestyolov5x.wts
Building engine, please wait for a while...
[03/23/2021-09:32:55] [W] [TRT] Half2 support requested on hardware without native FP16 support, performance will be negatively affected.
[03/23/2021-09:33:03] [W] [TRT] TensorRT was linked against cuBLAS/cuBLAS LT 11.3.0 but loaded cuBLAS/cuBLAS LT 11.2.1
[03/23/2021-09:34:10] [W] [TRT] TensorRT was linked against cuBLAS/cuBLAS LT 11.3.0 but loaded cuBLAS/cuBLAS LT 11.2.1
Build engine successfully!

测试一下Pytorch和tensorRT速度

ubuntu@ubuntu:~/yolov5$ python3 detect.py --source /home/ubuntu/Downloads/video/defect/20210201090237.avi --weights runs/train/exp/weights/best.pt --device "0"

视频时长 1 minute 11 seconds ;

PyTorch处理时间 (166.625s)

ubuntu@ubuntu:~/yolov5$ python3 detect.py --source /home/ubuntu/Downloads/video/defect/20210201090749.avi --weights runs/train/exp/weights/best.pt --device "0"

视频时长 5 seconds ;

PyTorch处理时间 (12.689s)

ubuntu@ubuntu:~/tensorrtx/yolov5/build$ ./yolov5 -v ../bestyolov5.eigine /home/ubuntu/Downloads/video/defect/20210201090237.avi

视频时长 1 minute 11 seconds ;

PyTorch处理时间 (137200ms)

ubuntu@ubuntu:~/tensorrtx/yolov5/build$ ./yolov5 -v ../bestyolov5.eigine /home/ubuntu/Downloads/video/defect/20210201090749.avi

视频时长 5 seconds ;

TensorRT处理时间 (12832ms)

付录一个修改读视频的代码很简单/home/ubuntu/tensorrtx/yolov5/yolov5.cpp 修改版

#include <iostream>
#include <chrono>
#include "cuda_utils.h"
#include "logging.h"
#include "common.hpp"
#include "utils.h"
#include "calibrator.h"

#define USE_FP16  // set USE_INT8 or USE_FP16 or USE_FP32
#define DEVICE 0  // GPU id
#define NMS_THRESH 0.4
#define CONF_THRESH 0.5
#define BATCH_SIZE 1

// stuff we know about the network and the input/output blobs
static const int INPUT_H = Yolo::INPUT_H;
static const int INPUT_W = Yolo::INPUT_W;
static const int CLASS_NUM = Yolo::CLASS_NUM;
static const int OUTPUT_SIZE = Yolo::MAX_OUTPUT_BBOX_COUNT * sizeof(Yolo::Detection) / sizeof(float) +
                               1;  // we assume the yololayer outputs no more than MAX_OUTPUT_BBOX_COUNT boxes that conf >= 0.1
const char *INPUT_BLOB_NAME = "data";
const char *OUTPUT_BLOB_NAME = "prob";
static Logger gLogger;

static int get_width(int x, float gw, int divisor = 8) {
    //return math.ceil(x / divisor) * divisor
    if (int(x * gw) % divisor == 0) {
        return int(x * gw);
    }
    return (int(x * gw / divisor) + 1) * divisor;
}

static int get_depth(int x, float gd) {
    if (x == 1) {
        return 1;
    } else {
        return round(x * gd) > 1 ? round(x * gd) : 1;
    }
}

ICudaEngine *
build_engine(unsigned int maxBatchSize, IBuilder *builder, IBuilderConfig *config, DataType dt, float &gd, float &gw,
             std::string &wts_name) {
    INetworkDefinition *network = builder->createNetworkV2(0U);

    // Create input tensor of shape {3, INPUT_H, INPUT_W} with name INPUT_BLOB_NAME
    ITensor *data = network->addInput(INPUT_BLOB_NAME, dt, Dims3{3, INPUT_H, INPUT_W});
    assert(data);

    std::map<std::string, Weights> weightMap = loadWeights(wts_name);
    Weights emptywts{DataType::kFLOAT, nullptr, 0};

    /* ------ yolov5 backbone------ */
    auto focus0 = focus(network, weightMap, *data, 3, get_width(64, gw), 3, "model.0");
    auto conv1 = convBlock(network, weightMap, *focus0->getOutput(0), get_width(128, gw), 3, 2, 1, "model.1");
    auto bottleneck_CSP2 = C3(network, weightMap, *conv1->getOutput(0), get_width(128, gw), get_width(128, gw),
                              get_depth(3, gd), true, 1, 0.5, "model.2");
    auto conv3 = convBlock(network, weightMap, *bottleneck_CSP2->getOutput(0), get_width(256, gw), 3, 2, 1, "model.3");
    auto bottleneck_csp4 = C3(network, weightMap, *conv3->getOutput(0), get_width(256, gw), get_width(256, gw),
                              get_depth(9, gd), true, 1, 0.5, "model.4");
    auto conv5 = convBlock(network, weightMap, *bottleneck_csp4->getOutput(0), get_width(512, gw), 3, 2, 1, "model.5");
    auto bottleneck_csp6 = C3(network, weightMap, *conv5->getOutput(0), get_width(512, gw), get_width(512, gw),
                              get_depth(9, gd), true, 1, 0.5, "model.6");
    auto conv7 = convBlock(network, weightMap, *bottleneck_csp6->getOutput(0), get_width(1024, gw), 3, 2, 1, "model.7");
    auto spp8 = SPP(network, weightMap, *conv7->getOutput(0), get_width(1024, gw), get_width(1024, gw), 5, 9, 13,
                    "model.8");

    /* ------ yolov5 head ------ */
    auto bottleneck_csp9 = C3(network, weightMap, *spp8->getOutput(0), get_width(1024, gw), get_width(1024, gw),
                              get_depth(3, gd), false, 1, 0.5, "model.9");
    auto conv10 = convBlock(network, weightMap, *bottleneck_csp9->getOutput(0), get_width(512, gw), 1, 1, 1,
                            "model.10");

    float *deval（malloc(sizeof(float) * get_width(512, gw) * 2 * 2));
    for (int i = 0; i < get_width(512, gw) * 2 * 2; i++) {
        deval（512, gw) * 2 * 2};
    IDeconvolutionLayer *deconv11 = network->addDeconvolutionNd(*conv10->getOutput(0), get_width(512, gw), DimsHW{2, 2},
                                                                deconvwts11, emptywts);
    deconv11->setStrideNd(DimsHW{2, 2});
    deconv11->setNbGroups(get_width(512, gw));
    weightMap["deconv11"] = deconvwts11;

    ITensor *inputTensors12[] = {deconv11->getOutput(0), bottleneck_csp6->getOutput(0)};
    auto cat12 = network->addConcatenation(inputTensors12, 2);
    auto bottleneck_csp13 = C3(network, weightMap, *cat12->getOutput(0), get_width(1024, gw), get_width(512, gw),
                               get_depth(3, gd), false, 1, 0.5, "model.13");
    auto conv14 = convBlock(network, weightMap, *bottleneck_csp13->getOutput(0), get_width(256, gw), 1, 1, 1,
                            "model.14");

    Weights deconvwts15{DataType::kFLOAT, deval（256, gw) * 2 * 2};
    IDeconvolutionLayer *deconv15 = network->addDeconvolutionNd(*conv14->getOutput(0), get_width(256, gw), DimsHW{2, 2},
                                                                deconvwts15, emptywts);
    deconv15->setStrideNd(DimsHW{2, 2});
    deconv15->setNbGroups(get_width(256, gw));
    ITensor *inputTensors16[] = {deconv15->getOutput(0), bottleneck_csp4->getOutput(0)};
    auto cat16 = network->addConcatenation(inputTensors16, 2);

    auto bottleneck_csp17 = C3(network, weightMap, *cat16->getOutput(0), get_width(512, gw), get_width(256, gw),
                               get_depth(3, gd), false, 1, 0.5, "model.17");

    // yolo layer 0
    IConvolutionLayer *det0 = network->addConvolutionNd(*bottleneck_csp17->getOutput(0), 3 * (Yolo::CLASS_NUM + 5),
                                                        DimsHW{1, 1}, weightMap["model.24.m.0.weight"],
                                                        weightMap["model.24.m.0.bias"]);
    auto conv18 = convBlock(network, weightMap, *bottleneck_csp17->getOutput(0), get_width(256, gw), 3, 2, 1,
                            "model.18");
    ITensor *inputTensors19[] = {conv18->getOutput(0), conv14->getOutput(0)};
    auto cat19 = network->addConcatenation(inputTensors19, 2);
    auto bottleneck_csp20 = C3(network, weightMap, *cat19->getOutput(0), get_width(512, gw), get_width(512, gw),
                               get_depth(3, gd), false, 1, 0.5, "model.20");
    //yolo layer 1
    IConvolutionLayer *det1 = network->addConvolutionNd(*bottleneck_csp20->getOutput(0), 3 * (Yolo::CLASS_NUM + 5),
                                                        DimsHW{1, 1}, weightMap["model.24.m.1.weight"],
                                                        weightMap["model.24.m.1.bias"]);
    auto conv21 = convBlock(network, weightMap, *bottleneck_csp20->getOutput(0), get_width(512, gw), 3, 2, 1,
                            "model.21");
    ITensor *inputTensors22[] = {conv21->getOutput(0), conv10->getOutput(0)};
    auto cat22 = network->addConcatenation(inputTensors22, 2);
    auto bottleneck_csp23 = C3(network, weightMap, *cat22->getOutput(0), get_width(1024, gw), get_width(1024, gw),
                               get_depth(3, gd), false, 1, 0.5, "model.23");
    IConvolutionLayer *det2 = network->addConvolutionNd(*bottleneck_csp23->getOutput(0), 3 * (Yolo::CLASS_NUM + 5),
                                                        DimsHW{1, 1}, weightMap["model.24.m.2.weight"],
                                                        weightMap["model.24.m.2.bias"]);

    auto yolo = addYoLoLayer(network, weightMap, det0, det1, det2);
    yolo->getOutput(0)->setName(OUTPUT_BLOB_NAME);
    network->markOutput(*yolo->getOutput(0));

    // Build engine
    builder->setMaxBatchSize(maxBatchSize);
    config->setMaxWorkspaceSize(16 * (1 << 20));  // 16MB
#if defined(USE_FP16)
    config->setFlag(BuilderFlag::kFP16);
#elif defined(USE_INT8)
    std::cout << "Your platform support int8: " << (builder->platformHasFastInt8() ? "true" : "false") << std::endl;
    assert(builder->platformHasFastInt8());
    config->setFlag(BuilderFlag::kINT8);
    Int8EntropyCalibrator2* calibrator = new Int8EntropyCalibrator2(1, INPUT_W, INPUT_H, "./coco_calib/", "int8calib.table", INPUT_BLOB_NAME);
    config->setInt8Calibrator(calibrator);
#endif

    std::cout << "Building engine, please wait for a while..." << std::endl;
    ICudaEngine *engine = builder->buildEngineWithConfig(*network, *config);
    std::cout << "Build engine successfully!" << std::endl;

    // Don't need the network any more
    network->destroy();

    // Release host memory
    for (auto &mem : weightMap) {
        free((void *) (mem.second.values));
    }

    return engine;
}

void APIToModel(unsigned int maxBatchSize, IHostMemory **modelStream, float &gd, float &gw, std::string &wts_name) {
    // Create builder
    IBuilder *builder = createInferBuilder(gLogger);
    IBuilderConfig *config = builder->createBuilderConfig();

    // Create model to populate the network, then set the outputs and create an engine
    ICudaEngine *engine = build_engine(maxBatchSize, builder, config, DataType::kFLOAT, gd, gw, wts_name);
    assert(engine != nullptr);

    // Serialize the engine
    (*modelStream) = engine->serialize();

    // Close everything down
    engine->destroy();
    builder->destroy();
    config->destroy();
}

void doInference(IExecutionContext &context, cudaStream_t &stream, void **buffers, float *input, float *output,
                 int batchSize) {
    // DMA input batch data to device, infer on the batch asynchronously, and DMA output back to host
    CUDA_CHECK(cudaMemcpyAsync(buffers[0], input, batchSize * 3 * INPUT_H * INPUT_W * sizeof(float),
                               cudaMemcpyHostToDevice, stream));
    context.enqueue(batchSize, buffers, stream, nullptr);
    CUDA_CHECK(cudaMemcpyAsync(output, buffers[1], batchSize * OUTPUT_SIZE * sizeof(float), cudaMemcpyDeviceToHost,
                               stream));
    cudaStreamSynchronize(stream);
}

bool
parse_args(int argc, char **argv, std::string &wts, std::string &engine, float &gd, float &gw, std::string &img_dir,
           std::string &video_path) {
    if (argc < 4) return false;
    if (std::string(argv[1]) == "-s" && (argc == 5 || argc == 7)) {
        wts = std::string(argv[2]);
        engine = std::string(argv[3]);
        auto net = std::string(argv[4]);
        if (net == "s") {
            gd = 0.33;
            gw = 0.50;
        } else if (net == "m") {
            gd = 0.67;
            gw = 0.75;
        } else if (net == "l") {
            gd = 1.0;
            gw = 1.0;
        } else if (net == "x") {
            gd = 1.33;
            gw = 1.25;
        } else if (net == "c" && argc == 7) {
            gd = atof(argv[5]);
            gw = atof(argv[6]);
        } else {
            return false;
        }
    } else if (std::string(argv[1]) == "-d" && argc == 4) {
        engine = std::string(argv[2]);
        img_dir = std::string(argv[3]);
    } else if (std::string(argv[1]) == "-v" && argc == 4) {
        engine = std::string(argv[2]);
        video_path = std::string(argv[3]);
    } else {
        return false;
    }
    return true;
}

int main(int argc, char **argv) {
    cudaSetDevice(DEVICE);

    std::string wts_name = "";
    std::string engine_name = "";
    float gd = 0.0f, gw = 0.0f;
    std::string img_dir;
    std::string video_path = "";
    if (!parse_args(argc, argv, wts_name, engine_name, gd, gw, img_dir, video_path)) {
        std::cerr << "arguments not right!" << std::endl;
        std::cerr << "./yolov5 -s [.wts] [.engine] [s/m/l/x or c gd gw]  // serialize model to plan file" << std::endl;
        std::cerr << "./yolov5 -d [.engine] ../samples  // deserialize plan file and run inference" << std::endl;
        std::cerr << "./yolov5 -v [.engine] [.mp4]  // deserialize plan file and run inference"
                  << std::endl; //sxj731533730
        return -1;
    }

    // create a model using the API directly and serialize it to a stream
    if (!wts_name.empty()) {
        IHostMemory *modelStream{nullptr};
        APIToModel(BATCH_SIZE, &modelStream, gd, gw, wts_name);
        assert(modelStream != nullptr);
        std::ofstream p(engine_name, std::ios::binary);
        if (!p) {
            std::cerr << "could not open plan output file" << std::endl;
            return -1;
        }
        p.write(reinterpret_cast<const char *>(modelStream->data()), modelStream->size());
        modelStream->destroy();
        return 0;
    }

    // deserialize the .engine and run inference
    std::ifstream file(engine_name, std::ios::binary);
    if (!file.good()) {
        std::cerr << "read " << engine_name << " error!" << std::endl;
        return -1;
    }
    char *trtModelStream = nullptr;
    size_t size = 0;
    file.seekg(0, file.end);
    size = file.tellg();
    file.seekg(0, file.beg);
    trtModelStream = new char[size];
    assert(trtModelStream);
    file.read(trtModelStream, size);
    file.close();

    std::vector<std::string> file_names;
    if (read_files_in_dir(img_dir.c_str(), file_names) < 0 && video_path.empty()) {
        std::cerr << "read_files_in_dir failed." << std::endl;
        return -1;
    }

    // prepare input data ---------------------------
    static float data[BATCH_SIZE * 3 * INPUT_H * INPUT_W];
    //for (int i = 0; i < 3 * INPUT_H * INPUT_W; i++)
    //    data[i] = 1.0;
    static float prob[BATCH_SIZE * OUTPUT_SIZE];
    IRuntime *runtime = createInferRuntime(gLogger);
    assert(runtime != nullptr);
    ICudaEngine *engine = runtime->deserializeCudaEngine(trtModelStream, size);
    assert(engine != nullptr);
    IExecutionContext *context = engine->createExecutionContext();
    assert(context != nullptr);
    delete[] trtModelStream;
    assert(engine->getNbBindings() == 2);
    void *buffers[2];
    // In order to bind the buffers, we need to know the names of the input and output tensors.
    // Note that indices are guaranteed to be less than IEngine::getNbBindings()
    const int inputIndex = engine->getBindingIndex(INPUT_BLOB_NAME);
    const int outputIndex = engine->getBindingIndex(OUTPUT_BLOB_NAME);
    assert(inputIndex == 0);
    assert(outputIndex == 1);
    // Create GPU buffers on device
    CUDA_CHECK(cudaMalloc(&buffers[inputIndex], BATCH_SIZE * 3 * INPUT_H * INPUT_W * sizeof(float)));
    CUDA_CHECK(cudaMalloc(&buffers[outputIndex], BATCH_SIZE * OUTPUT_SIZE * sizeof(float)));
    // Create stream
    cudaStream_t stream;
    CUDA_CHECK(cudaStreamCreate(&stream));


    if (!video_path.empty()) {

        cv::Mat frame;
        std::cout << video_path << std::endl;
        cv::VideoCapture capture(video_path);

        if (!capture.isOpened()) {
            printf("could not read this video file...\n");
            return -1;
        }
        int type = static_cast<int>(capture.get(cv::CAP_PROP_FOURCC));
        cv::Size S = cv::Size((int)capture.get(cv::CAP_PROP_FRAME_WIDTH), (int)capture.get(cv::CAP_PROP_FRAME_HEIGHT));
        int fps = capture.get(cv::CAP_PROP_FPS);
        printf("当前视频文件 FPS:  %d \n", fps);

        cv::VideoWriter out("/home/ubuntu/yolov5/runs/detect/tensorRTbest/20210201090237.mp4", type, fps, S, true);

        auto Tstart = std::chrono::system_clock::now();
        while (true) {
            capture >> frame;            //读取当前帧
            if (frame.empty()) {          //判断
                break;
            }
            cv::Mat pr_img = preprocess_img(frame, INPUT_W, INPUT_H); // letterbox BGR to RGB
            int i = 0;
            for (int row = 0; row < INPUT_H; ++row) {
                uchar *uc_pixel = pr_img.data + row * pr_img.step;
                for (int col = 0; col < INPUT_W; ++col) {
                    data[0 * 3 * INPUT_H * INPUT_W + i] = (float) uc_pixel[2] / 255.0;
                    data[0 * 3 * INPUT_H * INPUT_W + i + INPUT_H * INPUT_W] = (float) uc_pixel[1] / 255.0;
                    data[0 * 3 * INPUT_H * INPUT_W + i + 2 * INPUT_H * INPUT_W] = (float) uc_pixel[0] / 255.0;
                    uc_pixel += 3;
                    ++i;
                }
            }

            // Run inference
            auto start = std::chrono::system_clock::now();
            doInference(*context, stream, buffers, data, prob, BATCH_SIZE);
            auto end = std::chrono::system_clock::now();
            std::cout << std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count() << "ms"
                      << std::endl;
            std::vector<std::vector<Yolo::Detection>> batch_res(1);

            auto &res = batch_res[0];
            nms(res, &prob[0 * OUTPUT_SIZE], CONF_THRESH, NMS_THRESH);
            //std::cout << res.size() << std::endl;
            for (size_t j = 0; j < res.size(); j++) {
                cv::Rect r = get_rect(frame, res[j].bbox);
                cv::rectangle(frame, r, cv::Scalar(0x27, 0xC1, 0x36), 2);
                cv::putText(frame, std::to_string((int) res[j].class_id), cv::Point(r.x, r.y - 1),
                            cv::FONT_HERSHEY_PLAIN, 1.2, cv::Scalar(0xFF, 0xFF, 0xFF), 2);
            }
            cv::imshow("demo", frame);
            out << frame;
            if (cv::waitKey(20) == 'q')   //延时20ms,获取用户是否按键的情况，如果按下q，会推出程序
                break;
        }
        auto Tend = std::chrono::system_clock::now();
        std::cout << std::chrono::duration_cast<std::chrono::milliseconds>(Tend - Tstart).count() << "ms"
                   << std::endl;
        out.release();
        capture.release();     //释放摄像头资源
    } else {
        int fcount = 0;
        for (int f = 0; f < (int) file_names.size(); f++) {
            fcount++;
            if (fcount < BATCH_SIZE && f + 1 != (int) file_names.size()) continue;
            for (int b = 0; b < fcount; b++) {
                cv::Mat img = cv::imread(img_dir + "/" + file_names[f - fcount + 1 + b]);
                if (img.empty()) continue;
                cv::Mat pr_img = preprocess_img(img, INPUT_W, INPUT_H); // letterbox BGR to RGB
                int i = 0;
                for (int row = 0; row < INPUT_H; ++row) {
                    uchar *uc_pixel = pr_img.data + row * pr_img.step;
                    for (int col = 0; col < INPUT_W; ++col) {
                        data[b * 3 * INPUT_H * INPUT_W + i] = (float) uc_pixel[2] / 255.0;
                        data[b * 3 * INPUT_H * INPUT_W + i + INPUT_H * INPUT_W] = (float) uc_pixel[1] / 255.0;
                        data[b * 3 * INPUT_H * INPUT_W + i + 2 * INPUT_H * INPUT_W] = (float) uc_pixel[0] / 255.0;
                        uc_pixel += 3;
                        ++i;
                    }
                }
            }

            // Run inference
            auto start = std::chrono::system_clock::now();
            doInference(*context, stream, buffers, data, prob, BATCH_SIZE);
            auto end = std::chrono::system_clock::now();
            std::cout << std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count() << "ms"
                      << std::endl;
            std::vector<std::vector<Yolo::Detection>> batch_res(fcount);
            for (int b = 0; b < fcount; b++) {
                auto &res = batch_res[b];
                nms(res, &prob[b * OUTPUT_SIZE], CONF_THRESH, NMS_THRESH);
            }
            for (int b = 0; b < fcount; b++) {
                auto &res = batch_res[b];
                //std::cout << res.size() << std::endl;
                cv::Mat img = cv::imread(img_dir + "/" + file_names[f - fcount + 1 + b]);
                for (size_t j = 0; j < res.size(); j++) {
                    cv::Rect r = get_rect(img, res[j].bbox);
                    cv::rectangle(img, r, cv::Scalar(0x27, 0xC1, 0x36), 2);
                    cv::putText(img, std::to_string((int) res[j].class_id), cv::Point(r.x, r.y - 1),
                                cv::FONT_HERSHEY_PLAIN, 1.2, cv::Scalar(0xFF, 0xFF, 0xFF), 2);
                }
                cv::imwrite("_" + file_names[f - fcount + 1 + b], img);
            }
            fcount = 0;
        }
    }
    // Release stream and buffers
    cudaStreamDestroy(stream);
    CUDA_CHECK(cudaFree(buffers[inputIndex]));
    CUDA_CHECK(cudaFree(buffers[outputIndex]));
    // Destroy the engine
    context->destroy();
    engine->destroy();
    runtime->destroy();

    // Print histogram of the output distribution
    //std::cout << "\nOutput:\n\n";
    //for (unsigned int i = 0; i < OUTPUT_SIZE; i++)
    //{
    //    std::cout << prob[i] << ", ";
    //    if (i % 10 == 0) std::cout << std::endl;
    //}
    //std::cout << std::endl;

    return 0;
}

好像偶尔存在这个问题

cudnn 初始化失败的情况~

安装一下这个就好了

pip install torch==1.8.1+cu111 torchvision==0.9.1+cu111 torchaudio==0.8.1 -f https://download.pytorch.org/whl/torch_stable.html

如果某一天遇到问题,如下

NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

解决方案，请先把自动更新内核关掉，参考上述文章，在按照下面方法修改

ubuntu@ubuntu:~/ncnn/build/benchmark$ nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

ubuntu@ubuntu:~/ncnn/build/benchmark$ nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

ubuntu@ubuntu:~/ncnn/build/benchmark$ sudo apt-get install dkms
[sudo] password for ubuntu: 
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following packages were automatically installed and are no longer required:
  linux-headers-5.8.0-50-generic linux-hwe-5.8-headers-5.8.0-50 linux-image-5.8.0-50-generic linux-modules-5.8.0-50-generic
  linux-modules-extra-5.8.0-50-generic
Use 'sudo apt autoremove' to remove them.
The following additional packages will be installed:
  dctrl-tools
Suggested packages:
  debtags menu
The following NEW packages will be installed:
  dctrl-tools dkms
0 upgraded, 2 newly installed, 0 to remove and 173 not upgraded.
Need to get 128 kB of archives.
After this operation, 599 kB of additional disk space will be used.
Do you want to continue? [Y/n] y
Get:1 http://mirrors.aliyun.com/ubuntu focal/main amd64 dctrl-tools amd64 2.24-3 [61.5 kB]
Get:2 http://mirrors.aliyun.com/ubuntu focal-updates/main amd64 dkms all 2.8.1-5ubuntu2 [66.8 kB]
Fetched 128 kB in 1s (157 kB/s) 
Selecting previously unselected package dctrl-tools.
(Reading database ... 288405 files and directories currently installed.)
Preparing to unpack .../dctrl-tools_2.24-3_amd64.deb ...
Unpacking dctrl-tools (2.24-3) ...
Selecting previously unselected package dkms.
Preparing to unpack .../dkms_2.8.1-5ubuntu2_all.deb ...
Unpacking dkms (2.8.1-5ubuntu2) ...
Setting up dctrl-tools (2.24-3) ...
Setting up dkms (2.8.1-5ubuntu2) ...
Processing triggers for man-db (2.9.1-1) ...
ubuntu@ubuntu:~/ncnn/build/benchmark$ sudo dkms install -m nvidia -v 460.67

Creating symlink /var/lib/dkms/nvidia/460.67/source ->
                 /usr/src/nvidia-460.67

DKMS: add completed.

Kernel preparation unnecessary for this kernel.  Skipping...

Building module:
cleaning build area...
'make' -j12 NV_EXCLUDE_BUILD_MODULES='' KERNEL_UNAME=5.8.0-53-generic IGNORE_CC_MISMATCH='' modules.............
Signing module:
Generating a new Secure Boot signing key:
Can't load /var/lib/shim-signed/mok/.rnd into RNG
140140304328000:error:2406F079:random number generator:RAND_load_file:Cannot open file:../crypto/rand/randfile.c:98:Filename=/var/lib/shim-signed/mok/.rnd
Generating a RSA private key
.+++++
......................................+++++
writing new private key to '/var/lib/shim-signed/mok/MOK.priv'
-----
 - /var/lib/dkms/nvidia/460.67/5.8.0-53-generic/x86_64/module/nvidia.ko
 - /var/lib/dkms/nvidia/460.67/5.8.0-53-generic/x86_64/module/nvidia-drm.ko
 - /var/lib/dkms/nvidia/460.67/5.8.0-53-generic/x86_64/module/nvidia-uvm.ko
 - /var/lib/dkms/nvidia/460.67/5.8.0-53-generic/x86_64/module/nvidia-modeset.ko
Secure Boot not enabled on this system.
cleaning build area...

DKMS: build completed.

nvidia.ko:
Running module version sanity check.
 - Original module
   - No original module exists within this kernel
 - Installation
   - Installing to /lib/modules/5.8.0-53-generic/updates/dkms/

nvidia-uvm.ko:
Running module version sanity check.
 - Original module
   - No original module exists within this kernel
 - Installation
   - Installing to /lib/modules/5.8.0-53-generic/updates/dkms/

nvidia-modeset.ko:
Running module version sanity check.
 - Original module
   - No original module exists within this kernel
 - Installation
   - Installing to /lib/modules/5.8.0-53-generic/updates/dkms/

nvidia-drm.ko:
Running module version sanity check.
 - Original module
   - No original module exists within this kernel
 - Installation
   - Installing to /lib/modules/5.8.0-53-generic/updates/dkms/

depmod...........

DKMS: install completed.
ubuntu@ubuntu:~/ncnn/build/benchmark$ nvidia-smi
Wed Jul  7 22:35:07 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.67       Driver Version: 460.67       CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce GTX 105...  Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   52C    P8    N/A /  N/A |      0MiB /  4040MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

如果修改之后仍然未生效，那就在进入ubuntu系统过程中选择高级选项，试试提示的3个内核版本，哪个可用，就用哪个内核版本。

（八）、Deepstream5-1的安装

25、Jetson Xavier NX使用yolov5对比GPU模型下的pt、onnx、engine 、 DeepStream 加速性能_sxj731533730

标签：std,lib,get,Deepsteam5.1,TensorRT7.2,usr,ubuntu,vulkan,local
From： https://blog.51cto.com/u_12504263/5719088

50、ubuntu18.04&20.04+CUDA11.1+cudnn11.3+TensorRT7.2+Deepsteam5.1+vulkan环境搭建和YOLO5部署

相关文章

赞助商

阅读排行