prometheus 核心是一个单独的二进制方式文件 pull模型 内置的时间序列数据库(TSDB) 强大的查询语言 PromQL 可视化 开放化
维度存储模型 OLAP系统
1、存储计算层
> Prometheus Server ,里面包含了存储引擎和计算引擎
> Retrieval 组件为取数组件,它会主动从Pushgateway 或Exporter 拉取数据
> Service discovery 可以动态发现要监控的目标
> TSDB ,数据核心存储和查询
> HTTP server ,对外提供HTTP 服务
2、采集层
采集层分为两类,一类是生命周期较短的作业,还有一类是生命周期较长的作业
> 短作业: 直接通过API ,在退出时间指标推送给Pushgateway
> 长作业: Retrieval 组件直接从Job 或者Exporter 拉取数据
3、应用层
应用层主要分为 两种 ,一种是AlertManager,另一种是数据可视化
> AlertManager 对接Pagerduty ,是一套付费的监控报警系统,短信 ,电话,Email 发邮件
> 数据可视化 Prometheus build-in WebUI Grafana 其他基于API开发的客户端
一、实操 利用docker 安装prometheus 、granfan
1.统一环境配置
下载了docker 并关闭防火墙和selinux
2.下载相关镜像
docker pull prom/prometheus
docker pull prom/node-exporter
docker pull prom/alertmanager
docker pull grafana/grafana
3.启动相关组件
prometheus-webhook-dingtalk 启动
docker run -d -p 8060:8060 -v /data/prom/config.yml:/etc/prometheus-webhook-dingtalk/config.yml --name alertdingtalk timonwong/prometheus-webhook-dingtalk
alertmanager.yml
alertmanager 启动 暂时stop
docker run -d -p 9093:9093 -p 9094:9094 -v /data/prom/alertmanager.yml:/etc/alertmanager/alertmanager.yml --name alertmanager prom/alertmanager
prometheus.yml
global: scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute. evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute. alerting: #指定alertmanager报警组件地址 alertmanagers: - static_configs: - targets: [ '192.168.188.2:9093'] rule_files: #指定报警规则文件 - "*rules.yml" scrape_configs: - job_name: 'prometheus' static_configs: - targets: ['192.168.188.2:9090'] - job_name: 'node' static_configs: - targets: ['192.168.188.3:9100'] - job_name: 'alertmanager' static_configs: - targets: [ '192.168.188.2:9093']
alert-rules.yml
groups: - name: example #定义规则组 rules: - alert: InstanceDown #定义报警名称 expr: up == 0 #Promql语句,触发规则 for: 1m # 一分钟 labels: #标签定义报警的级别和主机 name: instance severity: Critical annotations: #注解 summary: " {{ $labels.appname }}" #报警摘要,取报警信息的appname名称 description: " 服务停止运行 " #报警信息 value: "{{ $value }}%" # 当前报警状态值 - name: Host rules: - alert: HostMemory Usage expr: (node_memory_MemTotal_bytes - (node_memory_MemFree_bytes + node_memory_Buffers_bytes + node_memory_Cached_bytes)) / node_memory_MemTotal_bytes * 100 > 80 for: 1m labels: name: Memory severity: Warning annotations: summary: " {{ $labels.appname }} " description: "宿主机内存使用率超过80%." value: "{{ $value }}" - alert: HostCPU Usage expr: sum(avg without (cpu)(irate(node_cpu_seconds_total{mode!='idle'}[5m]))) by (instance,appname) > 0.65 for: 1m labels: name: CPU severity: Warning annotations: summary: " {{ $labels.appname }} " description: "宿主机CPU使用率超过65%." value: "{{ $value }}" - alert: HostLoad expr: node_load5 > 4 for: 1m labels: name: Load severity: Warning annotations: summary: "{{ $labels.appname }} " description: " 主机负载5分钟超过4." value: "{{ $value }}" - alert: HostFilesystem Usage expr: 1-(node_filesystem_free_bytes / node_filesystem_size_bytes) > 0.8 for: 1m labels: name: Disk severity: Warning annotations: summary: " {{ $labels.appname }} " description: " 宿主机 [ {{ $labels.mountpoint }} ]分区使用超过80%." value: "{{ $value }}%" - alert: HostDiskio expr: irate(node_disk_writes_completed_total{job=~"Host"}[1m]) > 10 for: 1m labels: name: Diskio severity: Warning annotations: summary: " {{ $labels.appname }} " description: " 宿主机 [{{ $labels.device }}]磁盘1分钟平均写入IO负载较高." value: "{{ $value }}iops" - alert: Network_receive expr: irate(node_network_receive_bytes_total{device!~"lo|bond[0-9]|cbr[0-9]|veth.*|virbr.*|ovs-system"}[5m]) / 1048576 > 3 for: 1m labels: name: Network_receive severity: Warning annotations: summary: " {{ $labels.appname }} " description: " 宿主机 [{{ $labels.device }}] 网卡5分钟平均接收流量超过3Mbps." value: "{{ $value }}3Mbps" - alert: Network_transmit expr: irate(node_network_transmit_bytes_total{device!~"lo|bond[0-9]|cbr[0-9]|veth.*|virbr.*|ovs-system"}[5m]) / 1048576 > 3 for: 1m labels: name: Network_transmit severity: Warning annotations: summary: " {{ $labels.appname }} " description: " 宿主机 [{{ $labels.device }}] 网卡5分钟内平均发送流量超过3Mbps." value: "{{ $value }}3Mbps" - name: Container rules: - alert: ContainerCPU Usage expr: (sum by(name,instance) (rate(container_cpu_usage_seconds_total{image!=""}[5m]))*100) > 60 for: 1m labels: name: CPU severity: Warning annotations: summary: "{{ $labels.name }} " description: " 容器CPU使用超过60%." value: "{{ $value }}%" - alert: ContainerMem Usage expr: container_memory_usage_bytes{name=~".+"} / 1048576 > 1024 for: 1m labels: name: Memory severity: Warning annotations: summary: "{{ $labels.name }} " description: " 容器内存使用超过1GB." value: "{{ $value }}G" - name: node_usage_record_rules interval: 2m rules: - record: cpu:usage:rate1m expr: (1 - avg(rate(node_cpu_seconds_total{mode="idle"}[1m])) by (job,instance)) * 100 - record: mem:usage:rate1m expr: (1 - node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) * 100
prometheus 启动
docker run -d -p 9090:9090 \
-v /data/prom/prometheus.yml:/etc/prometheus/prometheus.yml \
-v /data/prom/alert-rules.yml:/etc/prometheus/alert-rules.yml \
-v /data/prom/data:/prometheus --name prometheus prom/prometheus:latest
grafana启动
docker run -d -p 3000:3000 -v /data/prom/grafana:/var/lib/grafana --name=grafana grafana/grafana:latest
node-exporter 启动 #Node-exporter需要监控实际的主机硬件信息,不推荐用docker来安装,所以通过二进制包来安装
docker run -d -p 9100:9100 --name node-exporter prom/node-exporter:latest
docker run -d -p 9100:9100 --net=host -v "/proc:/host/proc:ro" -v "/sys:/host/sys:ro" -v "/:/rootfs:ro" --name node-exporter prom/node-exporter:latest
客户端下载地址:https://github.com/prometheus/node_exporter/releases
同样找到Linux-amd64这个版本,下载解压即可
#下载
wget https://github.com/prometheus/node_exporter/releases/download/v1.5.0/node_exporter-1.5.0.linux-amd64.tar.gz
#解压
tar -zxvf node_exporter-1.5.0.linux-amd64.tar.gz
#重命名
mv node_exporter-1.5.0.linux-amd64 node_exporter
启动方式:
#不保存日志
nohup ./node_exporter >/dev/null 2>&1 &
#保存日志到/var/log/node_exporter.log
nohup ./node_exporter >/var/log/node_exporter.log 2>&1 &
被监控端 安装 Node_Exporter客户端
#下载
wget https://github.com/prometheus/node_exporter/releases/download/v1.5.0/node_exporter-1.5.0.linux-amd64.tar.gz
https://github.com/prometheus/node_exporter/releases/download/v1.5.0/node_exporter-1.5.0.linux-amd64.tar.gz
#解压
tar -zxvf node_exporter-1.5.0.linux-amd64.tar.gz
#重命名
mv node_exporter-1.5.0.linux-amd64 node_exporter
参考链接:
https://it.cha138.com/mysql/show-99068.html
标签:node,exporter,name,搭配,labels,prometheus,value,grafana,Prometheus From: https://www.cnblogs.com/xq0422/p/17111080.html