首页 > 其他分享 >gpu-exporter+prometheus实现gpu监控

gpu-exporter+prometheus实现gpu监控

时间:2023-03-15 16:25:12浏览次数:52  
标签:exporter HELP bytes prometheus smi nvidia go gpu TYPE

目录

gpu-exporter+prometheus实现gpu监控

本文档基于开源项目nvidia_gpu_exporter实现gpu监控

一、在gpu服务器上部署nvidia_gpu_exporter

1、获取nvidia_gpu_exporter

wget https://github.com/utkuozdemir/nvidia_gpu_exporter/releases/download/v1.2.0/nvidia_gpu_exporter_1.2.0_linux_x86_64.tar.gz

2、运行nvidia_gpu_exporter

tar xf nvidia_gpu_exporter_1.2.0_linux_x86_64.tar.gz

mv nvidia_gpu_exporter /usr/local/gpu-exporter/nvidia_gpu_exporter

nohup /usr/local/gpu-exporter/nvidia_gpu_exporter >/var/log/gpu-exporter.log 2>&1 &

3、检测监控数据

项目运行后会自动监听:9835端口
尝试curl本机来查看是否正常获取到监控数据

curl localhost:9835/metrics

二、接入prometheus监控

在prometheus.yml中添加exporter地址

- job_name: gpu-exporter
  static_configs:
  - targets: ['192.168.2.23:9835']
    lables:
      gpu: nvidia-4090
      app: gpu-exporter

三、常用监控指标

指标名 含义
nvidia_smi_utilization_gpu_ratio GPU使用率
nvidia_smi_memory_used_bytes GPU显存使用量
nvidia_smi_memory_total_bytes GPU显存总量
nvidia_smi_temperature_gpu GPU温度

四、metrics接口信息示例

# HELP go_gc_duration_seconds A summary of the pause duration of garbage collection cycles.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 0
go_gc_duration_seconds{quantile="0.25"} 0
go_gc_duration_seconds{quantile="0.5"} 0
go_gc_duration_seconds{quantile="0.75"} 0
go_gc_duration_seconds{quantile="1"} 0
go_gc_duration_seconds_sum 0
go_gc_duration_seconds_count 0
# HELP go_goroutines Number of goroutines that currently exist.
# TYPE go_goroutines gauge
go_goroutines 7
# HELP go_info Information about the Go environment.
# TYPE go_info gauge
go_info{version="go1.16.5"} 1
# HELP go_memstats_alloc_bytes Number of bytes allocated and still in use.
# TYPE go_memstats_alloc_bytes gauge
go_memstats_alloc_bytes 1.169224e+06
# HELP go_memstats_alloc_bytes_total Total number of bytes allocated, even if freed.
# TYPE go_memstats_alloc_bytes_total counter
go_memstats_alloc_bytes_total 1.169224e+06
# HELP go_memstats_buck_hash_sys_bytes Number of bytes used by the profiling bucket hash table.
# TYPE go_memstats_buck_hash_sys_bytes gauge
go_memstats_buck_hash_sys_bytes 1.44498e+06
# HELP go_memstats_frees_total Total number of frees.
# TYPE go_memstats_frees_total counter
go_memstats_frees_total 273
# HELP go_memstats_gc_cpu_fraction The fraction of this program's available CPU time used by the GC since the program started.
# TYPE go_memstats_gc_cpu_fraction gauge
go_memstats_gc_cpu_fraction 0
# HELP go_memstats_gc_sys_bytes Number of bytes used for garbage collection system metadata.
# TYPE go_memstats_gc_sys_bytes gauge
go_memstats_gc_sys_bytes 4.110176e+06
# HELP go_memstats_heap_alloc_bytes Number of heap bytes allocated and still in use.
# TYPE go_memstats_heap_alloc_bytes gauge
go_memstats_heap_alloc_bytes 1.169224e+06
# HELP go_memstats_heap_idle_bytes Number of heap bytes waiting to be used.
# TYPE go_memstats_heap_idle_bytes gauge
go_memstats_heap_idle_bytes 6.397952e+07
# HELP go_memstats_heap_inuse_bytes Number of heap bytes that are in use.
# TYPE go_memstats_heap_inuse_bytes gauge
go_memstats_heap_inuse_bytes 2.637824e+06
# HELP go_memstats_heap_objects Number of allocated objects.
# TYPE go_memstats_heap_objects gauge
go_memstats_heap_objects 6126
# HELP go_memstats_heap_released_bytes Number of heap bytes released to OS.
# TYPE go_memstats_heap_released_bytes gauge
go_memstats_heap_released_bytes 6.397952e+07
# HELP go_memstats_heap_sys_bytes Number of heap bytes obtained from system.
# TYPE go_memstats_heap_sys_bytes gauge
go_memstats_heap_sys_bytes 6.6617344e+07
# HELP go_memstats_last_gc_time_seconds Number of seconds since 1970 of last garbage collection.
# TYPE go_memstats_last_gc_time_seconds gauge
go_memstats_last_gc_time_seconds 0
# HELP go_memstats_lookups_total Total number of pointer lookups.
# TYPE go_memstats_lookups_total counter
go_memstats_lookups_total 0
# HELP go_memstats_mallocs_total Total number of mallocs.
# TYPE go_memstats_mallocs_total counter
go_memstats_mallocs_total 6399
# HELP go_memstats_mcache_inuse_bytes Number of bytes in use by mcache structures.
# TYPE go_memstats_mcache_inuse_bytes gauge
go_memstats_mcache_inuse_bytes 9600
# HELP go_memstats_mcache_sys_bytes Number of bytes used for mcache structures obtained from system.
# TYPE go_memstats_mcache_sys_bytes gauge
go_memstats_mcache_sys_bytes 16384
# HELP go_memstats_mspan_inuse_bytes Number of bytes in use by mspan structures.
# TYPE go_memstats_mspan_inuse_bytes gauge
go_memstats_mspan_inuse_bytes 46240
# HELP go_memstats_mspan_sys_bytes Number of bytes used for mspan structures obtained from system.
# TYPE go_memstats_mspan_sys_bytes gauge
go_memstats_mspan_sys_bytes 49152
# HELP go_memstats_next_gc_bytes Number of heap bytes when next garbage collection will take place.
# TYPE go_memstats_next_gc_bytes gauge
go_memstats_next_gc_bytes 4.473924e+06
# HELP go_memstats_other_sys_bytes Number of bytes used for other system allocations.
# TYPE go_memstats_other_sys_bytes gauge
go_memstats_other_sys_bytes 885044
# HELP go_memstats_stack_inuse_bytes Number of bytes in use by the stack allocator.
# TYPE go_memstats_stack_inuse_bytes gauge
go_memstats_stack_inuse_bytes 491520
# HELP go_memstats_stack_sys_bytes Number of bytes obtained from system for stack allocator.
# TYPE go_memstats_stack_sys_bytes gauge
go_memstats_stack_sys_bytes 491520
# HELP go_memstats_sys_bytes Number of bytes obtained from system.
# TYPE go_memstats_sys_bytes gauge
go_memstats_sys_bytes 7.36146e+07
# HELP go_threads Number of OS threads created.
# TYPE go_threads gauge
go_threads 8
# HELP nvidia_gpu_exporter_build_info A metric with a constant '1' value labeled by version, revision, branch, and goversion from which nvidia_gpu_exporter was built.
# TYPE nvidia_gpu_exporter_build_info gauge
nvidia_gpu_exporter_build_info{branch="",goversion="go1.16.5",revision="",version=""} 1
# HELP nvidia_smi_accounting_buffer_size accounting.buffer_size
# TYPE nvidia_smi_accounting_buffer_size gauge
nvidia_smi_accounting_buffer_size{uuid="df6e7a7c-7314-46f8-abc4-b88b36dcf3aa"} 4000
# HELP nvidia_smi_accounting_mode accounting.mode
# TYPE nvidia_smi_accounting_mode gauge
nvidia_smi_accounting_mode{uuid="df6e7a7c-7314-46f8-abc4-b88b36dcf3aa"} 0
# HELP nvidia_smi_clocks_current_graphics_clock_hz clocks.current.graphics [MHz]
# TYPE nvidia_smi_clocks_current_graphics_clock_hz gauge
nvidia_smi_clocks_current_graphics_clock_hz{uuid="df6e7a7c-7314-46f8-abc4-b88b36dcf3aa"} 6e+06
# HELP nvidia_smi_clocks_current_memory_clock_hz clocks.current.memory [MHz]
# TYPE nvidia_smi_clocks_current_memory_clock_hz gauge
nvidia_smi_clocks_current_memory_clock_hz{uuid="df6e7a7c-7314-46f8-abc4-b88b36dcf3aa"} 1.6e+07
# HELP nvidia_smi_clocks_current_sm_clock_hz clocks.current.sm [MHz]
# TYPE nvidia_smi_clocks_current_sm_clock_hz gauge
nvidia_smi_clocks_current_sm_clock_hz{uuid="df6e7a7c-7314-46f8-abc4-b88b36dcf3aa"} 6e+06
# HELP nvidia_smi_clocks_current_video_clock_hz clocks.current.video [MHz]
# TYPE nvidia_smi_clocks_current_video_clock_hz gauge
nvidia_smi_clocks_current_video_clock_hz{uuid="df6e7a7c-7314-46f8-abc4-b88b36dcf3aa"} 5.4e+08
# HELP nvidia_smi_clocks_max_graphics_clock_hz clocks.max.graphics [MHz]
# TYPE nvidia_smi_clocks_max_graphics_clock_hz gauge
nvidia_smi_clocks_max_graphics_clock_hz{uuid="df6e7a7c-7314-46f8-abc4-b88b36dcf3aa"} 2.28e+09
# HELP nvidia_smi_clocks_max_memory_clock_hz clocks.max.memory [MHz]
# TYPE nvidia_smi_clocks_max_memory_clock_hz gauge
nvidia_smi_clocks_max_memory_clock_hz{uuid="df6e7a7c-7314-46f8-abc4-b88b36dcf3aa"} 7.751e+09
# HELP nvidia_smi_clocks_max_sm_clock_hz clocks.max.sm [MHz]
# TYPE nvidia_smi_clocks_max_sm_clock_hz gauge
nvidia_smi_clocks_max_sm_clock_hz{uuid="df6e7a7c-7314-46f8-abc4-b88b36dcf3aa"} 2.28e+09
# HELP nvidia_smi_clocks_throttle_reasons_active clocks_throttle_reasons.active
# TYPE nvidia_smi_clocks_throttle_reasons_active gauge
nvidia_smi_clocks_throttle_reasons_active{uuid="df6e7a7c-7314-46f8-abc4-b88b36dcf3aa"} 4
# HELP nvidia_smi_clocks_throttle_reasons_applications_clocks_setting clocks_throttle_reasons.applications_clocks_setting
# TYPE nvidia_smi_clocks_throttle_reasons_applications_clocks_setting gauge
nvidia_smi_clocks_throttle_reasons_applications_clocks_setting{uuid="df6e7a7c-7314-46f8-abc4-b88b36dcf3aa"} 0
# HELP nvidia_smi_clocks_throttle_reasons_gpu_idle clocks_throttle_reasons.gpu_idle
# TYPE nvidia_smi_clocks_throttle_reasons_gpu_idle gauge
nvidia_smi_clocks_throttle_reasons_gpu_idle{uuid="df6e7a7c-7314-46f8-abc4-b88b36dcf3aa"} 0
# HELP nvidia_smi_clocks_throttle_reasons_hw_power_brake_slowdown clocks_throttle_reasons.hw_power_brake_slowdown
# TYPE nvidia_smi_clocks_throttle_reasons_hw_power_brake_slowdown gauge
nvidia_smi_clocks_throttle_reasons_hw_power_brake_slowdown{uuid="df6e7a7c-7314-46f8-abc4-b88b36dcf3aa"} 0
# HELP nvidia_smi_clocks_throttle_reasons_hw_slowdown clocks_throttle_reasons.hw_slowdown
# TYPE nvidia_smi_clocks_throttle_reasons_hw_slowdown gauge
nvidia_smi_clocks_throttle_reasons_hw_slowdown{uuid="df6e7a7c-7314-46f8-abc4-b88b36dcf3aa"} 0
# HELP nvidia_smi_clocks_throttle_reasons_hw_thermal_slowdown clocks_throttle_reasons.hw_thermal_slowdown
# TYPE nvidia_smi_clocks_throttle_reasons_hw_thermal_slowdown gauge
nvidia_smi_clocks_throttle_reasons_hw_thermal_slowdown{uuid="df6e7a7c-7314-46f8-abc4-b88b36dcf3aa"} 0
# HELP nvidia_smi_clocks_throttle_reasons_supported clocks_throttle_reasons.supported
# TYPE nvidia_smi_clocks_throttle_reasons_supported gauge
nvidia_smi_clocks_throttle_reasons_supported{uuid="df6e7a7c-7314-46f8-abc4-b88b36dcf3aa"} 511
# HELP nvidia_smi_clocks_throttle_reasons_sw_power_cap clocks_throttle_reasons.sw_power_cap
# TYPE nvidia_smi_clocks_throttle_reasons_sw_power_cap gauge
nvidia_smi_clocks_throttle_reasons_sw_power_cap{uuid="df6e7a7c-7314-46f8-abc4-b88b36dcf3aa"} 1
# HELP nvidia_smi_clocks_throttle_reasons_sw_thermal_slowdown clocks_throttle_reasons.sw_thermal_slowdown
# TYPE nvidia_smi_clocks_throttle_reasons_sw_thermal_slowdown gauge
nvidia_smi_clocks_throttle_reasons_sw_thermal_slowdown{uuid="df6e7a7c-7314-46f8-abc4-b88b36dcf3aa"} 0
# HELP nvidia_smi_clocks_throttle_reasons_sync_boost clocks_throttle_reasons.sync_boost
# TYPE nvidia_smi_clocks_throttle_reasons_sync_boost gauge
nvidia_smi_clocks_throttle_reasons_sync_boost{uuid="df6e7a7c-7314-46f8-abc4-b88b36dcf3aa"} 0
# HELP nvidia_smi_compute_mode compute_mode
# TYPE nvidia_smi_compute_mode gauge
nvidia_smi_compute_mode{uuid="df6e7a7c-7314-46f8-abc4-b88b36dcf3aa"} 0
# HELP nvidia_smi_count count
# TYPE nvidia_smi_count gauge
nvidia_smi_count{uuid="df6e7a7c-7314-46f8-abc4-b88b36dcf3aa"} 1
# HELP nvidia_smi_display_active display_active
# TYPE nvidia_smi_display_active gauge
nvidia_smi_display_active{uuid="df6e7a7c-7314-46f8-abc4-b88b36dcf3aa"} 0
# HELP nvidia_smi_display_mode display_mode
# TYPE nvidia_smi_display_mode gauge
nvidia_smi_display_mode{uuid="df6e7a7c-7314-46f8-abc4-b88b36dcf3aa"} 1
# HELP nvidia_smi_driver_version driver_version
# TYPE nvidia_smi_driver_version gauge
nvidia_smi_driver_version{uuid="df6e7a7c-7314-46f8-abc4-b88b36dcf3aa"} 471.11
# HELP nvidia_smi_encoder_stats_average_fps encoder.stats.averageFps
# TYPE nvidia_smi_encoder_stats_average_fps gauge
nvidia_smi_encoder_stats_average_fps{uuid="df6e7a7c-7314-46f8-abc4-b88b36dcf3aa"} 0
# HELP nvidia_smi_encoder_stats_average_latency encoder.stats.averageLatency
# TYPE nvidia_smi_encoder_stats_average_latency gauge
nvidia_smi_encoder_stats_average_latency{uuid="df6e7a7c-7314-46f8-abc4-b88b36dcf3aa"} 0
# HELP nvidia_smi_encoder_stats_session_count encoder.stats.sessionCount
# TYPE nvidia_smi_encoder_stats_session_count gauge
nvidia_smi_encoder_stats_session_count{uuid="df6e7a7c-7314-46f8-abc4-b88b36dcf3aa"} 0
# HELP nvidia_smi_enforced_power_limit_watts enforced.power.limit [W]
# TYPE nvidia_smi_enforced_power_limit_watts gauge
nvidia_smi_enforced_power_limit_watts{uuid="df6e7a7c-7314-46f8-abc4-b88b36dcf3aa"} 250
# HELP nvidia_smi_fan_speed_ratio fan.speed [%]
# TYPE nvidia_smi_fan_speed_ratio gauge
nvidia_smi_fan_speed_ratio{uuid="df6e7a7c-7314-46f8-abc4-b88b36dcf3aa"} 0.38
# HELP nvidia_smi_gpu_info A metric with a constant '1' value labeled by gpu uuid, name, driver_model_current, driver_model_pending, vbios_version, driver_version.
# TYPE nvidia_smi_gpu_info gauge
nvidia_smi_gpu_info{driver_model_current="WDDM",driver_model_pending="WDDM",driver_version="471.11",name="NVIDIA GeForce RTX 2080 SUPER",uuid="df6e7a7c-7314-46f8-abc4-b88b36dcf3aa",vbios_version="90.04.7a.40.73"} 1
# HELP nvidia_smi_index index
# TYPE nvidia_smi_index gauge
nvidia_smi_index{uuid="df6e7a7c-7314-46f8-abc4-b88b36dcf3aa"} 0
# HELP nvidia_smi_inforom_oem inforom.oem
# TYPE nvidia_smi_inforom_oem gauge
nvidia_smi_inforom_oem{uuid="df6e7a7c-7314-46f8-abc4-b88b36dcf3aa"} 1.1
# HELP nvidia_smi_memory_free_bytes memory.free [MiB]
# TYPE nvidia_smi_memory_free_bytes gauge
nvidia_smi_memory_free_bytes{uuid="df6e7a7c-7314-46f8-abc4-b88b36dcf3aa"} 7.883194368e+09
# HELP nvidia_smi_memory_total_bytes memory.total [MiB]
# TYPE nvidia_smi_memory_total_bytes gauge
nvidia_smi_memory_total_bytes{uuid="df6e7a7c-7314-46f8-abc4-b88b36dcf3aa"} 8.589934592e+09
# HELP nvidia_smi_memory_used_bytes memory.used [MiB]
# TYPE nvidia_smi_memory_used_bytes gauge
nvidia_smi_memory_used_bytes{uuid="df6e7a7c-7314-46f8-abc4-b88b36dcf3aa"} 7.06740224e+08
# HELP nvidia_smi_name name
# TYPE nvidia_smi_name gauge
nvidia_smi_name{uuid="df6e7a7c-7314-46f8-abc4-b88b36dcf3aa"} 2080
# HELP nvidia_smi_pci_bus pci.bus
# TYPE nvidia_smi_pci_bus gauge
nvidia_smi_pci_bus{uuid="df6e7a7c-7314-46f8-abc4-b88b36dcf3aa"} 12
# HELP nvidia_smi_pci_device pci.device
# TYPE nvidia_smi_pci_device gauge
nvidia_smi_pci_device{uuid="df6e7a7c-7314-46f8-abc4-b88b36dcf3aa"} 0
# HELP nvidia_smi_pci_device_id pci.device_id
# TYPE nvidia_smi_pci_device_id gauge
nvidia_smi_pci_device_id{uuid="df6e7a7c-7314-46f8-abc4-b88b36dcf3aa"} 7809
# HELP nvidia_smi_pci_domain pci.domain
# TYPE nvidia_smi_pci_domain gauge
nvidia_smi_pci_domain{uuid="df6e7a7c-7314-46f8-abc4-b88b36dcf3aa"} 0
# HELP nvidia_smi_pci_sub_device_id pci.sub_device_id
# TYPE nvidia_smi_pci_sub_device_id gauge
nvidia_smi_pci_sub_device_id{uuid="df6e7a7c-7314-46f8-abc4-b88b36dcf3aa"} 1.074074712e+09
# HELP nvidia_smi_pcie_link_gen_current pcie.link.gen.current
# TYPE nvidia_smi_pcie_link_gen_current gauge
nvidia_smi_pcie_link_gen_current{uuid="df6e7a7c-7314-46f8-abc4-b88b36dcf3aa"} 3
# HELP nvidia_smi_pcie_link_gen_max pcie.link.gen.max
# TYPE nvidia_smi_pcie_link_gen_max gauge
nvidia_smi_pcie_link_gen_max{uuid="df6e7a7c-7314-46f8-abc4-b88b36dcf3aa"} 3
# HELP nvidia_smi_pcie_link_width_current pcie.link.width.current
# TYPE nvidia_smi_pcie_link_width_current gauge
nvidia_smi_pcie_link_width_current{uuid="df6e7a7c-7314-46f8-abc4-b88b36dcf3aa"} 16
# HELP nvidia_smi_pcie_link_width_max pcie.link.width.max
# TYPE nvidia_smi_pcie_link_width_max gauge
nvidia_smi_pcie_link_width_max{uuid="df6e7a7c-7314-46f8-abc4-b88b36dcf3aa"} 16
# HELP nvidia_smi_power_default_limit_watts power.default_limit [W]
# TYPE nvidia_smi_power_default_limit_watts gauge
nvidia_smi_power_default_limit_watts{uuid="df6e7a7c-7314-46f8-abc4-b88b36dcf3aa"} 250
# HELP nvidia_smi_power_draw_watts power.draw [W]
# TYPE nvidia_smi_power_draw_watts gauge
nvidia_smi_power_draw_watts{uuid="df6e7a7c-7314-46f8-abc4-b88b36dcf3aa"} 28.07
# HELP nvidia_smi_power_limit_watts power.limit [W]
# TYPE nvidia_smi_power_limit_watts gauge
nvidia_smi_power_limit_watts{uuid="df6e7a7c-7314-46f8-abc4-b88b36dcf3aa"} 250
# HELP nvidia_smi_power_management power.management
# TYPE nvidia_smi_power_management gauge
nvidia_smi_power_management{uuid="df6e7a7c-7314-46f8-abc4-b88b36dcf3aa"} 1
# HELP nvidia_smi_power_max_limit_watts power.max_limit [W]
# TYPE nvidia_smi_power_max_limit_watts gauge
nvidia_smi_power_max_limit_watts{uuid="df6e7a7c-7314-46f8-abc4-b88b36dcf3aa"} 350
# HELP nvidia_smi_power_min_limit_watts power.min_limit [W]
# TYPE nvidia_smi_power_min_limit_watts gauge
nvidia_smi_power_min_limit_watts{uuid="df6e7a7c-7314-46f8-abc4-b88b36dcf3aa"} 105
# HELP nvidia_smi_pstate pstate
# TYPE nvidia_smi_pstate gauge
nvidia_smi_pstate{uuid="df6e7a7c-7314-46f8-abc4-b88b36dcf3aa"} 8
# HELP nvidia_smi_temperature_gpu temperature.gpu
# TYPE nvidia_smi_temperature_gpu gauge
nvidia_smi_temperature_gpu{uuid="df6e7a7c-7314-46f8-abc4-b88b36dcf3aa"} 34
# HELP nvidia_smi_utilization_gpu_ratio utilization.gpu [%]
# TYPE nvidia_smi_utilization_gpu_ratio gauge
nvidia_smi_utilization_gpu_ratio{uuid="df6e7a7c-7314-46f8-abc4-b88b36dcf3aa"} 0
# HELP nvidia_smi_utilization_memory_ratio utilization.memory [%]
# TYPE nvidia_smi_utilization_memory_ratio gauge
nvidia_smi_utilization_memory_ratio{uuid="df6e7a7c-7314-46f8-abc4-b88b36dcf3aa"} 0
# HELP promhttp_metric_handler_requests_in_flight Current number of scrapes being served.
# TYPE promhttp_metric_handler_requests_in_flight gauge
promhttp_metric_handler_requests_in_flight 1
# HELP promhttp_metric_handler_requests_total Total number of scrapes by HTTP status code.
# TYPE promhttp_metric_handler_requests_total counter
promhttp_metric_handler_requests_total{code="200"} 0
promhttp_metric_handler_requests_total{code="500"} 0
promhttp_metric_handler_requests_total{code="503"} 0

标签:exporter,HELP,bytes,prometheus,smi,nvidia,go,gpu,TYPE
From: https://www.cnblogs.com/c-moon/p/17218977.html

相关文章

  • GPU服务器无root权限conda初始化
    1.给anaconda文件写入权限sudochmoda+w.conda如果没有权限则会在创建环境时报以下错误NoWritableEnvsDirError:Nowriteableenvsdirectoriesconfigured.-......
  • Prometheus概述与部署
    Prometheus概述与部署prometheus是一个开源的基于指标的监控以及告警系统(包含各种指标收集器、监控、时序数据库、告警),最初由SoundCloud开发特点多维数据模型,其时......
  • #创作者激励#OpenHarmony富设备移植指南(6.2)GPU测试程序编译
    【本文正在参加2023年第一期优质创作者激励计划】上一篇文章讲解了编译开源gpu驱动,并把gpu驱动添加到编译框架中,此时理论上gpu已经可以调用,但是我们需要一些......
  • kubernetes集成GPU原理
    这里以NvidiaGPU设备如何在Kubernetes中管理调度为例研究,工作流程分为以下两个方面:如何在容器中使用GPUKubernetes如何调度GPU容器中使用GPU想要在容器中的应用可......
  • 模型训练慢和显存不够怎么办?GPU加速混合精度训练
    目录​​混合精度训练​​​​理论原理​​​​三大深度学习框架的打开方式​​​​Pytorch​​​​Tensorflow​​​​PaddlePaddle​​混合精度训练一切还要从2018年ICLR......
  • GPU代码编写笔记
    1.内存拷贝//重新排序pointsthrust::device_ptr<unsignedint>rank_ptr_points;//新建一个指针allocThrustDevicePtr(&rank_ptr_points,state.numPoints);/......
  • oracledb_exporter部署
    oracledb_exporter服务端口:9161部署1.下载安装包并解压安装包:oracledb_exporter.0.2.8-ora18.5.linux-amd64.tar.gz解压:tar-zxvforacledb_exporter.0.2.8-ora18.5.......
  • 06centos7部署安装prometheus+exproter+grafana
    一.prometheus二进制编译安装1.关闭防火墙(或放通具体应用端口)systemctlstopfirewalld2.下载prometheus安装包官网:https://github.com/prometheus/prometheu......
  • Prometheus-pushgateway自定义监控项
    目录一、前言二、pushgateway安装三、pushgateway的使用四、pushgateway脚本思路一、前言pushgateway相比较exporter是主动向服务器发送请求,pushgateway本身也是一个程序......
  • prometheus监控系统
    准备官网:https://prometheus.iohttps://grafana.com创建命名空间:mkdir-p/root/prometheus&&cd/root/prometheuscat>monitor-sa.yaml<<"EOF"kind:Nam......