首页 > 其他分享 >prometheus告警处理Alertmanager

prometheus告警处理Alertmanager

时间:2023-01-20 11:22:05浏览次数:43  
标签:Alertmanager group default SMTP smtp prometheus api 告警

1.prometheus告警简介

  prometheus告警架构分成两个独立的部分。 通过prometheus中定义AlertRule,prometheus会周期的对告警规则进行计算, 如果满足告警触发条件就会向AlertManager发送告警信息。

Alertmanger特性: 分组、抑制、静默等

分组:详细的告警信息合并成一个通知。 比如系统宕机导致大量的告警被同时触发,在这种情况下分组机制可以讲这些触发的告警合并成一个告警通知

抑制:当某一个告警发出后,可以停止重复发送由此告警引发的其他告警机制(alertmanager配置文件配置)

静默: 可以快速根据标签对告警进行静默处理。 altermanager不会发送告警通知(alertmanger的web上配置)

 

 

2.定义告警规则

一个group中可以定义多个告警规则,一条告警规则主要组成部分:

alert:告警规则名称

expr:PromQL表达式触发条件,用于计算是否有时间序列满足该条件

for:等待时间,触发条件持续一段时间后才发送告警。在等待期间新产生告警的状态为pending

labels:自定义标签,允许用户指定要附加到告警上的一组附加标签(配合alertmanager配置,匹配正则等,告警通知不同人)

annotations:用于指定一组附加信息,描述告警信息等文字,作为参数发送给alertmanager

示例:

memory.yml

groups:
- name: 内存报警规则
  rules:
  - alert: 内存使用率告警
    expr: (1 - (node_memory_MemAvailable_bytes / (node_memory_MemTotal_bytes))) * 100 > 40
    for: 10s
    labels:
      severity: warning
      team: frontend
    annotations:
      summary: "服务器可用内存不足。"
      description: "内存使用率已超过50%(当前值:{{ $value }}%)"

 

 

修改Prometheus配置文件prometheus.yml添加alertmanager配置:

#关联prometheus和Alertmanager
alerting: alertmanagers: - static_configs: - targets: - 127.0.0.1:9093 # # 指定规则文件 rule_files: - rules/*.yml

 

3.Alertmanager 配置概述

global:全局配置,用于定义一些全局公共参数,如SMTP等

templates:用户定义告警通知时的模板,如HTML,邮件等

route:告警路由,根据标签匹配,确定当前告警应该如何处理

receivers:接收人,微信、钉钉、webhook等

inhibit_rules: 抑制规则,合理设置,减少垃圾告警

 

resolve_timeout :定义了当Alertmanager持续多长时间未收到告警后标记为已解决状态:resolved

group_by: 定义分组规则。基于告警中包含的标签,如果满足group_by中定义标签名称,那么这些告警将会合并为一个通知发送给接收器

group_wait: 果在等待时间内当前group接收到了新的告警,这些告警将会合并为一个通知向receiver发送(秒级别)

group_interval: 相同的Group之间发送告警通知的时间间隔

repeat_interval: 一条成功发送的告警,在最终发送通知之前的等待时间(小时以上)

官网完整配置:

global:
  # The default SMTP From header field.
  [ smtp_from: <tmpl_string> ]
  # The default SMTP smarthost used for sending emails, including port number.
  # Port number usually is 25, or 587 for SMTP over TLS (sometimes referred to as STARTTLS).
  # Example: smtp.example.org:587
  [ smtp_smarthost: <string> ]
  # The default hostname to identify to the SMTP server.
  [ smtp_hello: <string> | default = "localhost" ]
  # SMTP Auth using CRAM-MD5, LOGIN and PLAIN. If empty, Alertmanager doesn't authenticate to the SMTP server.
  [ smtp_auth_username: <string> ]
  # SMTP Auth using LOGIN and PLAIN.
  [ smtp_auth_password: <secret> ]
  # SMTP Auth using LOGIN and PLAIN.
  [ smtp_auth_password_file: <string> ]
  # SMTP Auth using PLAIN.
  [ smtp_auth_identity: <string> ]
  # SMTP Auth using CRAM-MD5.
  [ smtp_auth_secret: <secret> ]
  # The default SMTP TLS requirement.
  # Note that Go does not support unencrypted connections to remote SMTP endpoints.
  [ smtp_require_tls: <bool> | default = true ]

  # The API URL to use for Slack notifications.
  [ slack_api_url: <secret> ]
  [ slack_api_url_file: <filepath> ]
  [ victorops_api_key: <secret> ]
  [ victorops_api_key_file: <filepath> ]
  [ victorops_api_url: <string> | default = "https://alert.victorops.com/integrations/generic/20131114/alert/" ]
  [ pagerduty_url: <string> | default = "https://events.pagerduty.com/v2/enqueue" ]
  [ opsgenie_api_key: <secret> ]
  [ opsgenie_api_key_file: <filepath> ]
  [ opsgenie_api_url: <string> | default = "https://api.opsgenie.com/" ]
  [ wechat_api_url: <string> | default = "https://qyapi.weixin.qq.com/cgi-bin/" ]
  [ wechat_api_secret: <secret> ]
  [ wechat_api_corp_id: <string> ]
  [ telegram_api_url: <string> | default = "https://api.telegram.org" ]
  [ webex_api_url: <string> | default = "https://webexapis.com/v1/messages" ]
  # The default HTTP client configuration
  [ http_config: <http_config> ]

  # ResolveTimeout is the default value used by alertmanager if the alert does
  # not include EndsAt, after this time passes it can declare the alert as resolved if it has not been updated.
  # This has no impact on alerts from Prometheus, as they always include EndsAt.
  [ resolve_timeout: <duration> | default = 5m ]

# Files from which custom notification template definitions are read.
# The last component may use a wildcard matcher, e.g. 'templates/*.tmpl'.
templates:
  [ - <filepath> ... ]

# The root node of the routing tree.
route: <route>

# A list of notification receivers.
receivers:
  - <receiver> ...

# A list of inhibition rules.
inhibit_rules:
  [ - <inhibit_rule> ... ]

# DEPRECATED: use time_intervals below.
# A list of mute time intervals for muting routes.
mute_time_intervals:
  [ - <mute_time_interval> ... ]

# A list of time intervals for muting/activating routes.
time_intervals:
  [ - <time_interval> ... ]
View Code

案例1(所有告警通知只通知一个人或者群等):

route:
  group_by: ['alertname']   
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 1h
  receiver: 'web.hook'
receivers:
  - name: 'web.hook'
    webhook_configs:
      - url: 'http://11.0.1.1:5000/send'
inhibit_rules:
  - source_match:
      severity: 'critical'
    target_match:
      severity: 'warning'
    equal: ['alertname', 'dev', 'instance']

案例2(根据不通分类,告警通知人或方式不同):

global:
  resolve_timeout: 1m
  smtp_smarthost: 'smtp.qq.com:465'
  smtp_from: '[email protected]'
  smtp_auth_username: '[email protected]'
  smtp_auth_password: 'gdfawprxfuonbfcf'
  smtp_hello: '@qq.com'
  smtp_require_tls: false

route:
  group_by: ['alertname']
  group_wait: 10s
  group_interval: 20s
  repeat_interval: 5h
  receiver: 'default'
  routes:
  - receiver: "web.hook"  #webhook通知
    group_wait: 10s
    match_re:
      service: test

  - receiver: "mails"  #邮件通知
    group_by: [product, environment]
    match:
      team: frontend
receivers:
- name: 'web.hook'
  webhook_configs:
  - url: 'http://11.0.1.1:5000/send'
- name: "mails"
  email_configs:
  - to: '[email protected]'
    send_resolved: true #通知已经恢复的告警
- name: "default"
  webhook_configs:
  - url: 'http://11.0.1.1:5000/senddef'

inhibit_rules: #抑制的规则
- source_match:
    severity: 'critical'
  target_match:
    severity: 'warning'
  equal: ['alertname', 'dev', 'instance']  

 

 

4.Alertmanager部署

方法一:二进制部署

下载地址:https://github.com/prometheus/alertmanager/releases/download/v0.25.0/alertmanager-0.25.0.linux-amd64.tar.gz

启动服务:./alertmanager --config.file=alertmanager.yml  --cluster.advertise-address=0.0.0.0:9093

 

 

备注:

查看Prometheus的alertmanager相关配置是否生效:http://11.0.1.141:9099/config

标签:Alertmanager,group,default,SMTP,smtp,prometheus,api,告警
From: https://www.cnblogs.com/aroin/p/17061207.html

相关文章