首页 > 系统相关 >linux7系统搭建Prometheus+Grafana+Alertmanager监控平台

linux7系统搭建Prometheus+Grafana+Alertmanager监控平台

时间:2022-10-26 16:02:22浏览次数:84  
标签:node opt Alertmanager linux7 -- Grafana prometheus alertmanager exporter

一、环境准备

1.系统     

centos7.9

2.安装包下载

​https://prometheus.io/download/ ​

grafana官网下载:https://grafana.com/grafana/download

alertmanager-0.23.0.linux-amd64.tar.gz            consul_exporter-0.8.0.linux-amd64.tar.gz

node_exporter-1.3.1.linux-amd64.tar.gz            prometheus-2.33.5.linux-amd64.tar.gz

二、prometheus安装配置

tar zxf prometheus-2.33.5.linux-amd64.tar.gz -C /opt/
ln -sv /opt/prometheus-2.33.5.linux-amd64/ /opt/prometheus
groupadd prometheus
useradd -g prometheus -m -d /opt/prometheus/ -s /sbin/nologin prometheus
mkdir /opt/prometheus/data
chown -R prometheus:prometheus /opt/prometheus/*
cd /opt/prometheus
./promtool check config prometheus.yml #检测语法
./prometheus --config.file=prometheus.yml

#并配置开机启动
touch /usr/lib/systemd/system/prometheus.service
chown prometheus:prometheus /usr/lib/systemd/system/prometheus.service

vim /usr/lib/systemd/system/prometheus.service
[Unit]
Description=Prometheus
Documentation=https://prometheus.io/
After=network.target

[Service]
Type=simple
User=prometheus
# --storage.tsdb.path是可选项,默认数据目录在运行目录的./dada目录中
ExecStart=/opt/prometheus/prometheus --config.file=/opt/prometheus/prometheus.yml --web.enable-lifecycle --storage.tsdb.path=/opt/prometheus/data --storage.tsdb.retention=60d
Restart=on-failure

[Install]
WantedBy=multi-user.target

#配置参数介绍
--config.file -- 指明prometheus的配置文件路径
--web.enable-lifecycle -- 指明prometheus配置更改后可以进行热加载
--storage.tsdb.path -- 指明监控数据存储路径
--storage.tsdb.retention --指明数据保留时间


systemctl daemon-reload
systemctl enable prometheus.service
systemctl status prometheus.service
systemctl restart prometheus.service


curl -X POST http://localhost:9090/-/reload #配置修改后, 热加载配置

三、被监控安装配置node_exporter

1.安装配置node_exporter

tar -zxf node_exporter-1.3.1.linux-amd64.tar.gz -C /usr/local/
ln -sv /usr/local/node_exporter-1.3.1.linux-amd64/ /usr/local/node_exporter
groupadd prometheus
useradd -g prometheus -m -d /usr/local/node_exporter/ -s /sbin/nologin prometheus

#配置开机启动node_exporter
touch /usr/lib/systemd/system/node_exporter.service
chown prometheus:prometheus /usr/lib/systemd/system/node_exporter.service
chown -R prometheus:prometheus /usr/local/node_exporter

vim /usr/lib/systemd/system/node_exporter.service
[Unit]
Description=node_exporter
After=network.target
[Service]
Type=simple
User=prometheus
ExecStart=/usr/local/node_exporter/node_exporter
Restart=on-failure

[Install]
WantedBy=multi-user.target


systemctl daemon-reload
systemctl enable node_exporter.service
systemctl start node_exporter.service


访问:http://192.168.142.132:9100/metrics

2.将 node_exporter 加入 prometheus.yml配置中

vim /opt/prometheus/prometheus.yml
# my global config

global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
# - alertmanager:9093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
# - "first_rules.yml"
# - "second_rules.yml"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: "prometheus"
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ["localhost:9090"]
- job_name: 'Linux'
static_configs:
- targets: ['192.168.142.132:9100','192.168.142.134:9100']
labels:
group: 'client-node-exporter'



curl -X POST http://localhost:9090/-/reload #热加载配置
访问地址:http://192.168.142.132:9090/targets

linux7系统搭建Prometheus+Grafana+Alertmanager监控平台_prometheus

四、grafana页面访问配置

1.grafana安装

wget https://dl.grafana.com/oss/release/grafana-8.4.3-1.x86_64.rpm
sudo yum install grafana-8.4.3-1.x86_64.rpm
systemctl daemon-reload
systemctl enable grafana-server.service
systemctl start grafana-server.service

访问:http://192.168.142.132:3000/

2.页面访问并添加数据源(默认用户名密码为admin/admin)

linux7系统搭建Prometheus+Grafana+Alertmanager监控平台_prometheus_02

linux7系统搭建Prometheus+Grafana+Alertmanager监控平台_linux_03

linux7系统搭建Prometheus+Grafana+Alertmanager监控平台_prometheus_04

linux7系统搭建Prometheus+Grafana+Alertmanager监控平台_prometheus_05

3.导入dashboard面板

在grafana网站​​​https://grafana.com/grafana/dashboards/​​​找到dashboard,选择一款自己喜欢的,通过下载json文件或copy id将其导入到自己grafana面板中(专网环境使用json文件上传方式)。这里导入id为9276的面板。

linux7系统搭建Prometheus+Grafana+Alertmanager监控平台_prometheus_06

linux7系统搭建Prometheus+Grafana+Alertmanager监控平台_grafana_07

linux7系统搭建Prometheus+Grafana+Alertmanager监控平台_prometheus_08

linux7系统搭建Prometheus+Grafana+Alertmanager监控平台_prometheus_09

五、alertmanager告警组件安装配置

邮箱和企业微信开通和配置设置参考​​zabbix5.0自定义web监控和邮箱告警,企业微信告警​

1.alertmanager安装配置

tar zxf alertmanager-0.23.0.linux-amd64.tar.gz  
mv alertmanager-0.23.0.linux-amd64 /opt/prometheus/alertmanager

vim /usr/lib/systemd/system/alertmanager.service
[Unit]
Description=alertmanager
Documentation=https://github.com/prometheus/alertmanager
After=network.target
[Service]
Type=simple
User=prometheus
ExecStart=/opt/prometheus/alertmanager/alertmanager --config.file=/opt/prometheus/alertmanager/alertmanager.yml --storage.path=/opt/prometheus/alertmanager/data
Restart=on-failure
[Install]
WantedBy=multi-user.target


vim /opt/prometheus/prometheus.yml
找到alertmanager告警相关配置进行修改:
alerting:
alertmanagers:
- static_configs:
- targets:
- 192.168.142.132:9093

rule_files:
# - "first_rules.yml"
- "rules.yml"


vim /opt/prometheus/rules.yml
groups:
- name: hostStatsAlert
rules:
- alert: NodeDown
expr: up == 0
for: 1m
labels:
severity: "Critical"
annotations:
summary: "Instance {{$labels.instance}} down"
description: "{{$labels.instance}} of job {{$labels.job}} has been down for more than 5 minutes."

- alert: NodeCPUUsage
expr: sum(avg without (cpu)(irate(node_cpu_seconds_total{mode!='idle'}[5m]))) by (instance) > 0.85
for: 1m
labels:
severity: "Warning"
annotations:
summary: "Instance {{ $labels.instance }} CPU usgae high"
description: "{{ $labels.instance }} CPU usage above 85% (current value: {{ $value }})"

- alert: NodeMemoryUsage
expr: (node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes)/node_memory_MemTotal_bytes > 0.85
for: 1m
labels:
severity: "Warning"
annotations:
summary: "Instance {{ $labels.instance }} MEM usgae high"
description: "{{ $labels.instance }} MEM usage above 85% (current value: {{ $value }})"

- alert: filesystemUsageAlert
expr: 100 - ((node_filesystem_avail_bytes{mountpoint="/",fstype=~"ext4|xfs"} * 100) / node_filesystem_size_bytes {mountpoint="/",fstype=~"ext4|xfs"}) > 85
for: 1m
labels:
severity: "Warning"
annotations:
summary: "Instance {{ $labels.instance }} root DISK usgae high"
description: "{{ $labels.instance }} root DISK usage above 85% (current value: {{ $value }})"


vim /opt/prometheus/alertmanager/alertmanager.yml
global:
resolve_timeout: 5m
smtp_smarthost: 'smtp.163.com:465'
smtp_from: '******@163.com'
smtp_auth_username: '******@163.com'
smtp_auth_password: 'VTKQYELFHUNAPLYC' #获取的授权码
smtp_require_tls: false

templates:
- '/opt/prometheus/alertmanager/template/*.tmpl'

route:
group_by: ['alertname']
group_wait: 30s
group_interval: 5m
repeat_interval: 10m
receiver: 'mail'
receivers:
- name: 'mail'
email_configs:
- to: '*********@163.com' #自己的邮箱
wechat_configs: # 企业微信报警配置
- send_resolved: true
to_party: '2' # 接收组的id
agent_id: '1000002' # (企业微信-->自定应用-->AgentId)
corp_id: 'wwa073243b1766e26b' # 企业信息(我的企业-->CorpId[在底部])
api_secret: '***************' # 企业微信(企业微信-->自定应用-->Secret)
message: '{{ template "test_wechat.html" . }}' # 发送消息模板的设定

inhibit_rules:
- source_match:
severity: 'critical'
target_match:
severity: 'warning'
equal: ['alertname', 'dev', 'instance']


vim /opt/prometheus/alertmanager/template/testmail.tmpl
{{ define "test.html" }}
<table border="1">
<tr>
<td>报警项</td>
<td>实例</td>
<td>报警阀值</td>
<td>开始时间</td>
</tr>
{{ range $i, $alert := .Alerts }}
<tr>
<td>{{ index $alert.Labels "alertname" }}</td>
<td>{{ index $alert.Labels "instance" }}</td>
<td>{{ index $alert.Annotations "value" }}</td>
<td>{{ $alert.StartsAt }}</td>
</tr>
{{ end }}
</table>
{{ end }}

vim /opt/prometheus/alertmanager/template/testwechat.tmpl
{{ define "cdn_live_wechat.html" }}
{{ range $i, $alert := .Alerts.Firing }}
[报警项]:{{ index $alert.Labels "alertname" }}
[实例]:{{ index $alert.Labels "instance" }}
[报警阀值]:{{ index $alert.Annotations "value" }}
[开始时间]:{{ $alert.StartsAt }}
{{ end }}
{{ end }}


2.alertmanager服务启动

chown -R prometheus:prometheus /usr/lib/systemd/system/alertmanager.service
chown -R prometheus:prometheus /opt/prometheus/*
curl -X POST http://localhost:9090/-/reload
systemctl daemon-reload
systemctl enable alertmanager
systemctl start alertmanager

3.邮件告警测试

磁盘使用超过90%告警

linux7系统搭建Prometheus+Grafana+Alertmanager监控平台_grafana_10

linux7系统搭建Prometheus+Grafana+Alertmanager监控平台_grafana_11


标签:node,opt,Alertmanager,linux7,--,Grafana,prometheus,alertmanager,exporter
From: https://blog.51cto.com/qidian510/5798007

相关文章

  • 快速启动grafana
    文档说明:只记录关键地方;试验环境:linuxdebian11grafanaversion:"3"services:grafana-service:image:grafana/grafana:latestcontainer_name:grafa......
  • Prometheus+Grafana搭建 (未完待续)
    1.Prometheus介绍prometheus是由谷歌研发的一款开源的监控软件,它通过安装在远程机器上的exporter,通过HTTP协议从远程的机器收集数据并存储在本地的时序数据库上同时Prome......
  • grafana Loki 轻量级日志收集系统
    部署dockerrun-dti-p3000:3000grafana/grafana:masterdockerrun-dti-p3100:3100--nameloki grafana/loki:2.4.1 dockerrun-dti--namepromtailgrafa......
  • 网页通过iframe嵌入grafana
    进入grafana容器dockerexec-uroot-itgrafanash修改/usr/share/grafana/conf下的缺省配置文件defaults.ini我们添加参数:cfg:default.security.allow_embedding=tru......
  • 使用Prometheus和Grafana监控Envoy Mesh
    环境说明宿主机地址为:192,.168.174.103envoy:FrontProxy,地址为172.31.10.2webserver01:第一个后端服务webserver01-sidecar:第一个后端服务的SidecarProxy,地址为172.31.10.......
  • Prometheus之Alertmanager介绍
    一告警功能概述Prometheus对指标的收集、存储同告警能力分属于PrometheusServer和Alertmanager连个独立的组件,前者仅负责基于告警规则生成告警通知,具体的告警操作则由后者......
  • grafana钉钉告警
    一设置钉钉机器人二设置grafana 三验证告警信息......
  • grafana邮件告警
    一修改grafana.ini~#cat/etc/grafana/grafana.ini####################################SMTP/Emailing##########################[smtp]enabled=truehost=smtp.......
  • Prometheus之blackbox exporter通过grafana可视化
    一导入模板推荐模板ID:9719二验证dashboard......
  • Prometheus之grafana可视化
    一添加Prometheus数据源1.将光标移动到侧面菜单上的齿轮图标,该图标将显示配置选项。2.单击数据源。数据源页面打开,显示先前为Grafana实例配置的数据源列表。3.单击添......