这是prometheus告警规则配置,实际告警要结合alertmanager使用,请看下一篇文章。
rule
https://samber.github.io/awesome-prometheus-alerts/rules
文件内容
点击查看代码
groups:
- name: exceptionRule
rules:
- alert: exceptionAlert
expr: application_exception{application="userDemo"} < 10
for: 1m
labels:
severity: warning
team: frontend
annotations:
summary: "服务器频繁报错"
description: "报错的频率达到(当前值:{{ $value }}%)"
- name: ckExceptionRule
rules:
- alert: ckExceptionAlert
expr: sum(increase(bbc_request_timer_ID_seconds_count{}[5m])) by (business_name) > 10
for: 2m
labels:
severity: warning
app: "gateway"
annotations:
summary: "test系统最近5分钟服务异常"
description: "报错的频率达到(当前值:{{ $value }})"
检查模版
./promtool check rules first_rules.yml
./promtool check rules jvm-exporter.yml
关闭
ps -ef |grep prometheus |awk '{print $2}'|xargs kill -9
启动
nohup ./prometheus --config.file=./prometheus.yml --web.enable-lifecycle --storage.tsdb.retention.time=20d --web.external-url=http://8.219.198.22:9090 > server_prometheus.log 2>&1 &
重启
curl -X POST http://localhost:9090/-/reload