首页 > 其他分享 >prometheus 使用 ipmi exporter 增加硬件级别监控

prometheus 使用 ipmi exporter 增加硬件级别监控

时间:2023-06-26 09:33:24浏览次数:48  
标签:exporter rules ipmi labels job prometheus yml

prometheus 监控硬件

安装ipmitool 并加载相应模块

yum install ipmitool freeipmi  -y
modprobe ipmi_msghandler
modprobe ipmi_devintf
modprobe ipmi_poweroff
modprobe ipmi_si
modprobe ipmi_watchdog

下载 ipmi_exporter 源码包

wget https://github.com/soundcloud/ipmi_exporter/releases/download/v1.0.0/ipmi_exporter-v1.0.0.linux-amd64.tar.gz  
tar -xf ipmi_exporter-v1.0.0.linux-amd64.tar.gz   -C /opt/
cd /opt/ipmi_exporter-v1.0.0.linux-amd64/

增加配置文件

cat ipmi_remote.yml
modules:
        10.193.x.x:               #远控卡ip地址
                    user: "root"  #远控卡用户
                    pass: "xxxxxxxxxxxxx"  #远控卡密码
                    # Available collectors are bmc, ipmi, chassis, and dcmi 
                    collectors:
                    - bmc
                    - ipmi
                    - dcmi
                    - chassis
                    # Got any sensors you don't care about? Add them here. 
                    exclude_sensor_ids:
                    - 2
                    - 29
                    - 32

启动ipmi_exporter

./ipmi_exporter  --config.file=/usr/local/ipmi_exporter-v1.0.0.linux-amd64/ipmi_remote.yml  --web.listen-address=:19293 & 

增加prometheus server job 配置

#增加监控ipmi exporter rules 规则
  - "rules/Memory_hardware.yml"
  - "rules/power.yml"
  - "rules/fan.yml"
  - "rules/processor.yml"
  - "rules/harddisk.yml"

#增加主配置文件job
#cat /usr/local/prometheus/prometheus.yml
  - job_name: 'ipmi_exporter'
    file_sd_configs:
    - refresh_interval: 5s  
      files:
      - ./conf.d/ipmi_exporter.json
#cat  /usr/local/prometheus/conf.d/ipmi_exporter.json 
[
{
"targets": ["10.65.x.x:19293"],
"labels": {
"hostname": "lgy-storage-glusterxxx"
}
}
]

增加rules 配置文件

# cd /usr/local/prometheus/rules
# cat Memory_hardware.yml  (内存条监控)
groups:
- name: Memory_hardware
  rules:
  - alert: Memory_hardware error
    expr: ipmi_sensor_state{type="Memory"} == 1
    for: 3m
    labels:
      user: caizh
    annotations:
      summary: "Instance {{ $labels.instance }} 内存硬件警告"
      description: "{{ $labels.instance }} of job {{$labels.job}} 内存硬件警告,当前状态[{{ $value }}]."



# cat power.yml (服务器电源模块监控)
groups:
- name: power status
  rules:
  - alert: power bad
    expr: ipmi_sensor_state{name="Status",type="Power Supply"} == 1
    for: 3m
    labels:
      user: caizh
    annotations:
      summary: "Instance {{ $labels.instance }} 电源坏了"
      description: "{{ $labels.instance }} of job {{$labels.job}} 电源坏了,当前状态[{{ $value }}]."


#  cat fan.yml  (服务器风扇监控)
groups:
- name: fan status
  rules:
  - alert: speed fan bad
    expr: ipmi_fan_speed_state{} == 1
    for: 3m
    labels:
      user: caizh
    annotations:
      summary: "Instance {{ $labels.instance }} 风扇坏了"
      description: "{{ $labels.instance }} of job {{$labels.job}} 风扇坏了,当前状态[{{ $value }}]."


# cat processor.yml (服务器处理器监控)
groups:
- name: Processor
  rules:
  - alert: Processor hardware error
    expr: ipmi_sensor_state{name="Status",type="Processor"} == 1
    for: 3m
    labels:
      user: caizh
    annotations:
      summary: "Instance {{ $labels.instance }} 处理器硬件警告"


#  cat harddisk.yml (硬盘监控,主要是raid 组监控,系统盘和数据盘分开做的raid 组,会有两个参数)
groups:
- name: harddisk
  rules:
  - alert: hard disk bad
    expr: ipmi_sensor_state{type="Drive Slot"} == 1
    for: 3m
    labels:
      user: caizh
    annotations:
      summary: "Instance {{ $labels.instance }} 硬盘坏了"
      description: "{{ $labels.instance }} of job {{$labels.job}} 硬盘坏了,当前状态[{{ $value }}]."
 

标签:exporter,rules,ipmi,labels,job,prometheus,yml
From: https://www.cnblogs.com/cheyunhua/p/17504513.html

相关文章

  • 修改管理口IPMI密码
    问题描述太久没登陆IPMI了,用旧的密码死活登不上,如下图:遂在网上搜索,查看如何直接在本机上对IPMI的密码进行修改。TAKEAWAY在本机(Ubuntu20.04)执行下列命令:查看IPMI用户IDsudoipmitooluserlist输出效果如下:一般ADMIN的ID就是2,记得这个数,下面要用。密码修改......
  • 使用lightdb-em或Prometheus+grafana监控lightdb/PostgreSQL
    lightdb提供了一体化的运维监控平台lightdb-em,支持集中式的监控所有的lightdb实例以及postgresql,包括单机、高可用、分布式。其架构如下: lightdb-em功能:  详细的使用可以参考官方文档,运维指南。安装包可从lightdb官网下载。如果不想使用lightdb-em......
  • linux下安装Ipmi工具,调整服务器CPU风扇速度.
    安装ipmi工具yum-yinstallipmitool加载模块modprobeipmi_si&&modprobeipmi_devintf&&modprobeipmi_msghandler查看模块lsmod|grepipmi开启风扇控制设置风扇为手动模式ipmitoolraw0x300x300x010x0000为手动模式01为自动模式设置CPU风扇转速ipmit......
  • prometheus 监控 hadoop + Hbase + zookeeper + mysql exporter
    1. run JMX exporter as a java agent with all the four daemons. For this I have added EXTRA_JAVA_OPTS in hadoop-env.sh and yarn-env.sh :[root@cloud01hadoop]#catyarn-env.sh|egrep-v'^$|#'exportYARN_RESOURCEMANAGER_OPTS="$YARN_RESOURC......
  • Zabbix server: Utilization of ipmi poller processes over 75%
    #vim/etc/zabbix/zabbix_server.confStartIPMIPollers=5#从3改到5 # systemctlrestartzabbix-server.service  等待几分钟后可以看到ipmipoller使用率下降 ......
  • Prometheus文档--1概述
    概述什么是Prometheus?Prometheus是一个开源监控系统和报警工具,Prometheus将其指标收集并存储为时间序列数据,即指标信息与记录的时间戳以及称为标签的可选键值对一起存储。特征:Prometheus的主要特点是:具有指标名称和键/值对标识的时间序列数据的多维数据模型PromQL,一种......
  • 使用 JMX-Exporter 监控 Kafka 和 Zookeeper
    JVM默认会通过JMX的方式暴露基础指标,很多中间件也会通过JMX的方式暴露业务指标,比如Kafka、Zookeeper、ActiveMQ、Cassandra、Spark、Tomcat、Flink等等。掌握了JMX监控方式,就掌握了一批程序的监控方式。本节介绍JMX-Exporter的使用,利用JMX-Exporter把JMX监控数据......
  • prometheus安装和使用记录
    Gettingstarted|PrometheusConfiguration|PrometheusDownload|PrometheusDownloadGrafana|GrafanaLabs #prometheusmkdir-m=777-p/data/{download,app_logs,app/prometheus}cd/data/downloadwgethttps://github.com/prometheus/prometheus/relea......
  • IPMItool安装后出现找不到libfreeipmi.so.17库文件问题
    描述:IPMItool安装后出现找不到libfreeipmi.so.17库文件问题解决:通过find查找libfreeipmi.so.17文件,设置环境变量:exportLD_LIBRARY_PATH=/usr/local/lib/:$LD_LIBRARY_PATH(单次生效) 永久生效 ......
  • python抓取prometheus容器数据,并实现监控报警
    importjsonimportmathimportpytzimportrequestsfromdatetimeimportdatetimeclassMonitoring(object):def__init__(self):self.namespace_list=["apollo","bhpc-admin-nginx","bluehelix","broker","cer......