首页 > 其他分享 >promethues通过alertmanager通过企微发送告警

promethues通过alertmanager通过企微发送告警

时间:2024-01-13 09:45:39浏览次数:32  
标签:alertmanager end 3434 alert promethues 告警 企微 CPU

###alertmanager的目录
[root@test /data/software/alertmanager]# ll
total 62512
-rwxr-xr-x 1 3434 3434 35410965 Aug 24 19:12 alertmanager
-rw-r--r-- 1 3434 3434      727 Nov 30 14:33 alertmanager.yml
-rwxr-xr-x 1 3434 3434 28566971 Aug 24 19:13 amtool
-rw-r--r-- 1 3434 3434    11357 Aug 24 19:14 LICENSE
-rw-r--r-- 1 3434 3434      457 Aug 24 19:14 NOTICE
-rw-r--r-- 1 root root     1305 Nov 30 17:35 wechat.tmpl

###alertmanager的配置文件
[root@test /data/software/alertmanager]# cat alertmanager.yml 
route:
  group_by: ['alertname']
  group_wait: 10s
  group_interval: 5m
  repeat_interval: 1h
  receiver: 'wechat'
#  receiver: 'web.hook'
#receivers:
#  - name: 'web.hook'
#    webhook_configs:
#      - url: 'http://127.0.0.1:8080/adapter/wx'
#        send_resolved: false
templates:
  - '/data/software/alertmanager/*.tmpl'
receivers:
- name: 'wechat'
  wechat_configs:
  - api_secret: '企微应用的key'
    corp_id: '企微公司id'
    agent_id: '企微应用id'
    #to_party: '1'  #企业微信中部门ID
    to_user: '要发送人的id'
    send_resolved: true
inhibit_rules:
  - source_match:
      severity: 'critical'
    target_match:
      severity: 'warning'
    equal: ['alertname', 'instance']

###消息通知的模板
[root@test /data/software/alertmanager]# cat wechat.tmpl
{{ define "wechat.default.message" }}
{{- if gt (len .Alerts.Firing) 0 -}}
{{- range $index, $alert := .Alerts -}}
{{- if eq $index 0 }}
========= 监控报警 =========
告警状态:{{   .Status }}
告警级别:{{ .Labels.severity }}
告警类型:{{ $alert.Labels.alertname }}
故障主机: {{ $alert.Labels.instance }}
告警主题: {{ $alert.Annotations.summary }}
告警详情: {{ $alert.Annotations.message }}{{ $alert.Annotations.description}};
触发阀值:{{ .Annotations.value }}
故障时间: {{ ($alert.StartsAt.Add 28800e9).Format "2006-01-02 15:04:05" }}
========= = end =  =========
{{- end }}
{{- end }}
{{- end }}
{{- if gt (len .Alerts.Resolved) 0 -}}
{{- range $index, $alert := .Alerts -}}
{{- if eq $index 0 }}
========= 异常恢复 =========
告警状态:{{   .Status }}
告警类型:{{ $alert.Labels.alertname }}
告警主题: {{ $alert.Annotations.summary }}
告警详情: {{ $alert.Annotations.message }}{{ $alert.Annotations.description}};
故障时间: {{ ($alert.StartsAt.Add 28800e9).Format "2006-01-02 15:04:05" }}
恢复时间: {{ ($alert.EndsAt.Add 28800e9).Format "2006-01-02 15:04:05" }}
{{- if gt (len $alert.Labels.instance) 0 }}
实例信息: {{ $alert.Labels.instance }}
{{- end }}
========= = end =  =========
{{- end }}
{{- end }}
{{- end }}
{{- end }}

###prometheus配置文件
[root@test /data/software/prometheus]# cat prometheus.yml 
# my global config
global:
  scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
    - static_configs:
        - targets:
          - localhost:9093    ###这里是 alertmanagers的端口

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  - "/data/software/prometheus/rules/*.yml"    ###这里是告警规则放置的目录
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: "prometheus-server"
    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.
    static_configs:
      - targets: ["prometheus.com:9090"]

###cpu告警规则
[root@test /data/software/prometheus/rules]# cat cpu_over.yml 
groups:
- name: CPU报警规则50
  rules:
  - alert: CPU使用率告警
    expr: 100 - (avg by (instance)(irate(node_cpu_seconds_total{mode="idle"}[1m]) )) * 100 > 50
    for: 1m
    labels:
      severity: warning
    annotations:
      summary: "CPU使用率正在飙升。注意!!!"
      description: "CPU使用率超过50%(当前值:{{ $value }}%)"

- name: CPU报警规则70
  rules:
  - alert: CPU使用率告警
    expr: 100 - (avg by (instance)(irate(node_cpu_seconds_total{mode="idle"}[1m]) )) * 100 > 70
    for: 1m
    labels:
      severity: critical
    annotations:
      summary: "CPU使用率正在飙升。关注!!!"
      description: "CPU使用率超过70%(当前值:{{ $value }}%)"

- name: CPU报警规则90
  rules:
  - alert: CPU使用率告警
    expr: 100 - (avg by (instance)(irate(node_cpu_seconds_total{mode="idle"}[1m]) )) * 100 > 90
    for: 1m
    labels:
      severity: emergency
    annotations:
      summary: "CPU使用率正在飙升。严重,立即处理!!!"
      description: "CPU使用率超过90%(当前值:{{ $value }}%)"


告警规则可参考:
https://www.modb.pro/db/335991
https://blog.csdn.net/agonie201218/article/details/126243110
https://cloud.tencent.com/developer/article/2216582?areaSource=102001.2&traceId=pWGCiwZuYxp0FamqoV8-w

标签:alertmanager,end,3434,alert,promethues,告警,企微,CPU
From: https://www.cnblogs.com/world-of-yuan/p/17958825

相关文章

  • 个微和企微,哪个做私域流量的优势更大?
    个人微信和企业微信是目前最为常用的私域经营平台,那在功能和使用上都有哪些区别:1、开通对象不同:个人微信是个人用户,个人就可以申请开通使用;企业微信则要由企业在官方网站申请开通,并完成实名认证等流程。2、好友数量不同:个人微信的好友数量上限为10000+;企业微信的好友数量上限为2000......
  • Prometheus+Alertmanager + Webhook-dingtalk
    一、下载alertmanager和webhook-dingtalkwww.github.com搜索alertmanagerwebhook-dingtalk1、解压、安装webhook-dingtalktar-zxvfprometheus-webhook-dingtalk-2.1.0.linux-amd64.tar.gzmvprometheus-webhook-dingtalk-2.1.0.linux-amd64/usr/local/webhook......
  • kube-promethues配置钉钉告警
    kube-promethues配置钉钉告警前置:k8s部署kube-promethues一.配置钉钉机器人打开钉钉的智能群助手,点击添加机器人选择自定义机器人勾选加签,复制后保存复制webhook地址后点击保存二.编写dingtalk的yaml部署文件vidingtalk.yamlapiVersion:v1kind:Ser......
  • Alertmanager redis告警规则
    groups:-name:redis集群预警rules:-alert:"redis节点下线"expr:up{instance=~".*:59121"}==0for:20slabels:severity:ERRORalert_type:"节点下线通知"alert_host:"{{reReplaceAll\":(.*)......
  • Alertmanager Rabbitmq告警规则
    下载rabbitmq_exporter-0.29.0.linux-amd64.tar.gz这个包,下载地址:https://github.com/kbudde/rabbitmq_exporter/releasestarxfrabbitmq_exporter-0.29.0.linux-amd64.tar.gz-C/opt/cd/opt/rabbitmq_exporter-0.29.0.linux-amd64启动命令就1行:RABBIT_USER=guest......
  • AlertManager告警
    一、Prometheus告警功能概述Prometheus对指标的收集、存储同告警能力分别由PrometheusServer和AlertManager两个独立的组件完成。PrometheusServer负责生成告警,AlertManager负责发送告警。    对于AlertManager而言,客户端通常是PrometheusServer,但它也支持接收来自其它......
  • 企微消息接口
    需求背景:希望将业务系统收到的消息推送给企微中的某个成员。大致流程:1)企微后台注册self-built应用,https://work.weixin.qq.com/进入App配置页,可以得到AgentID、Secret。在最下方可以配置TrustedIP。需要注意的是,配置TrustedIP前要配置DomainServerUrl。我的例子是在腾讯云......
  • 即时通讯私有化部署,为什么更符合企业对钉钉和企微的替代需求?
    随着企业对安全性和数据隐私保护的关注日益增加,私有化部署已成为替代钉钉和企业微信的趋势。WorkPlus作为领先品牌,致力于提供私有化部署的解决方案,以满足企业对即时通讯和协作的需求。本文将深入探讨为何私有化部署更符合企业的要求,使WorkPlus成为钉钉和企业微信的理想替代方案。一......
  • k8s安装kube-promethues(0.7版本)
    k8s安装kube-promethues(0.7版本)一.检查本地k8s版本,下载对应安装包kubectlversion如图可见是1.19版本进入kube-promethus下载地址,查找自己的k8s版本适合哪一个kube-promethues版本。然后下载自己合适的版本#还可以通过如下地址,在服务器上直接下已经打包好的包。或者复......
  • Prometheus+Alertmanager集成免费告警电话、短信
    Prometheus是由SoundCloud开发的开源监控报警系统和时序列数据库(TSDB)。Prometheus使用Go语言开发,是GoogleBorgMon监控系统的开源版本。 https://www.ccloudalarm.com账号后,选择Prometheus监控集成CCloudAlarm告警平台支持参数自定义,可针对自定义参数做告警处理~首先提供Prometh......