首页 > 其他分享 >prometheus监控告警多个es集群

prometheus监控告警多个es集群

时间:2024-02-05 16:23:31浏览次数:28  
标签:-- labels instance prometheus elasticsearch Elasticsearch 告警 es

exporter安装

      分别在两个集群中的任一节点安装elasticsearch_exporter

      节点1安装

         nohup ./elasticsearch_exporter --es.all --es.indices --es.cluster_settings --es.indices_settings --es.shards --es.snapshots --es.timeout=10s --web.listen-address=":9114" --web.telemetry-path="/metrics" --es.ssl-skip-verify --es.uri="https://elastic:[email protected]:9200" > /dev/null 2>&1 &

     节点2安装

         nohup ./elasticsearch_exporter --es.all --es.indices --es.cluster_settings --es.indices_settings --es.shards --es.snapshots --es.timeout=10s --web.listen-address=":9114" --web.telemetry-path="/metrics" --es.ssl-skip-verify --es.uri="https://elastic:[email protected]:9200" > /dev/null 2>&1 &

 

prometheus配置job

scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: "prometheus"
    static_configs:
      - targets: ["10.30.92.71:9090"]
  - job_name: "alertmanager"
    static_configs:
      - targets: ["10.30.92.71:9095"]
  - job_name: "elasticsearch_asset"
    static_configs:
      - targets: ["10.32.3.2:9114"]
  - job_name: "elasticsearch_sa"
    static_configs:
      - targets: ["10.32.3.18:9114"]
  - job_name: "node_exporter"
    static_configs:
      - targets: ["10.32.3.2:9100","10.32.3.3:9100","10.32.3.5:9100"]
prometheus.yml

 

prometheus配置告警规则

groups:

- name: PrometheusCommunityElasticsearchExporter

  rules:

    - alert: ElasticsearchHeapUsageTooHigh
      expr: '(elasticsearch_jvm_memory_used_bytes{area="heap"} / elasticsearch_jvm_memory_max_bytes{area="heap"}) * 100 > 90'
      for: 2m
      labels:
        severity: critical
      annotations:
        summary: Elasticsearch Heap Usage Too High (instance {{ $labels.instance }})
        description: "The heap usage is over 90%\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

    - alert: ElasticsearchHeapUsageWarning
      expr: '(elasticsearch_jvm_memory_used_bytes{area="heap"} / elasticsearch_jvm_memory_max_bytes{area="heap"}) * 100 > 80'
      for: 2m
      labels:
        severity: warning
      annotations:
        summary: Elasticsearch Heap Usage warning (instance {{ $labels.instance }})
        description: "The heap usage is over 80%\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

    - alert: ElasticsearchDiskOutOfSpace
      expr: 'elasticsearch_filesystem_data_available_bytes / elasticsearch_filesystem_data_size_bytes * 100 < 10'
      for: 0m
      labels:
        severity: critical
      annotations:
        summary: Elasticsearch disk out of space (instance {{ $labels.instance }})
        description: "The disk usage is over 90%\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

    - alert: ElasticsearchDiskSpaceLow
      expr: 'elasticsearch_filesystem_data_available_bytes / elasticsearch_filesystem_data_size_bytes * 100 < 20'
      for: 2m
      labels:
        severity: warning
      annotations:
        summary: Elasticsearch disk space low (instance {{ $labels.instance }})
        description: "The disk usage is over 80%\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

    - alert: ElasticsearchClusterRed
      expr: 'elasticsearch_cluster_health_status{color="red"} == 1'
      for: 0m
      labels:
        severity: critical
      annotations:
        summary: Elasticsearch Cluster Red (instance {{ $labels.instance }})
        description: "Elastic Cluster Red status\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

    - alert: ElasticsearchClusterYellow
      expr: 'elasticsearch_cluster_health_status{color="yellow"} == 1'
      for: 0m
      labels:
        severity: warning
      annotations:
        summary: Elasticsearch Cluster Yellow (instance {{ $labels.instance }})
        description: "Elastic Cluster Yellow status\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

    - alert: ElasticsearchHealthyNodes
      expr: 'elasticsearch_cluster_health_number_of_nodes < 3'
      for: 0m
      labels:
        severity: critical
      annotations:
        summary: Elasticsearch Healthy Nodes (instance {{ $labels.instance }})
        description: "Missing node in Elasticsearch cluster\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

    - alert: ElasticsearchHealthyDataNodes
      expr: 'elasticsearch_cluster_health_number_of_data_nodes < 3'
      for: 0m
      labels:
        severity: critical
      annotations:
        summary: Elasticsearch Healthy Data Nodes (instance {{ $labels.instance }})
        description: "Missing data node in Elasticsearch cluster\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

    - alert: ElasticsearchRelocatingShards
      expr: 'elasticsearch_cluster_health_relocating_shards > 0'
      for: 0m
      labels:
        severity: info
      annotations:
        summary: Elasticsearch relocating shards (instance {{ $labels.instance }})
        description: "Elasticsearch is relocating shards\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

    - alert: ElasticsearchRelocatingShardsTooLong
      expr: 'elasticsearch_cluster_health_relocating_shards > 0'
      for: 15m
      labels:
        severity: warning
      annotations:
        summary: Elasticsearch relocating shards too long (instance {{ $labels.instance }})
        description: "Elasticsearch has been relocating shards for 15min\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

    - alert: ElasticsearchInitializingShards
      expr: 'elasticsearch_cluster_health_initializing_shards > 0'
      for: 0m
      labels:
        severity: info
      annotations:
        summary: Elasticsearch initializing shards (instance {{ $labels.instance }})
        description: "Elasticsearch is initializing shards\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

    - alert: ElasticsearchInitializingShardsTooLong
      expr: 'elasticsearch_cluster_health_initializing_shards > 0'
      for: 15m
      labels:
        severity: warning
      annotations:
        summary: Elasticsearch initializing shards too long (instance {{ $labels.instance }})
        description: "Elasticsearch has been initializing shards for 15 min\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

    - alert: ElasticsearchUnassignedShards
      expr: 'elasticsearch_cluster_health_unassigned_shards > 0'
      for: 0m
      labels:
        severity: critical
      annotations:
        summary: Elasticsearch unassigned shards (instance {{ $labels.instance }})
        description: "Elasticsearch has unassigned shards\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

    - alert: ElasticsearchPendingTasks
      expr: 'elasticsearch_cluster_health_number_of_pending_tasks > 0'
      for: 15m
      labels:
        severity: warning
      annotations:
        summary: Elasticsearch pending tasks (instance {{ $labels.instance }})
        description: "Elasticsearch has pending tasks. Cluster works slowly.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

    - alert: ElasticsearchNoNewDocuments
      expr: 'increase(elasticsearch_indices_indexing_index_total{es_data_node="true"}[10m]) < 1'
      for: 0m
      labels:
        severity: warning
      annotations:
        summary: Elasticsearch no new documents (instance {{ $labels.instance }})
        description: "No new documents for 10 min!\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
rules.yml

     监控和告警的时候是根据instance来抓取的 所以和es的集群名字没有太大的关系

     

      

 grafana监控多集群数据

    

    

 

标签:--,labels,instance,prometheus,elasticsearch,Elasticsearch,告警,es
From: https://www.cnblogs.com/yxh168/p/18008356

相关文章

  • Docker:Failed to copy files, no space left on device
    主页个人微信公众号:密码应用技术实战个人博客园首页:https://www.cnblogs.com/informatics/问题描述在Mac上进行docker构建时,偶尔会遇到以下问题Failedtocopyfiles:userspacecopyfailed:write/var/lib/docker/volumes/xxx/_data/xxx.dbf:nospaceleftondevice......
  • Kubernetes 服务类型详解
    Kubernetes服务类型详解如今,Kubernetes已成为管理和扩展云原生应用程序的强大工具。组织需要利用高度可扩展且始终可用的功能来保持零停机时间,快速部署他们的软件。随着越来越多的应用程序被容器化和部署,容器的管理也变得越来越复杂。因此,软件的扩展成为一个问题,而这正是Kuber......
  • .NET 7 MAUI 使用基于 REST 的 Web 服务过程中本地开发的问题
    .NET7MAUI使用基于REST的Web服务过程中本地开发的问题微软文档:https://learn.microsoft.com/zh-cn/dotnet/maui/data-cloud/rest?view=net-maui-7.0错误代码Java.Security.Cert.CertificateException:'TheremotecertificatewasrejectedbytheprovidedRemoteCert......
  • Unity打包Android报错:Target Android SDK not installed Android SDK does not includ
    1.需要查看当前unity版本中安装的SDKVersion2.找到对应路径下的文件,打开build-tools文件,其中就是对应的SDKVersion4.修改Unity中对应的配置 ......
  • [ARC162D] Smallest Vertices 题解
    题目链接点击打开链接题目解法这种树的形态计数题首先可以想到\(prufer\)序列计数,如果没有任何限制,那么方案数为\(\prod\frac{(n-2)!}{deg_i}\),其中\(deg_1=d_1-1,deg_i=d_i(2\lei\len)\)对于每个点分开求贡献考虑这个式子只和点的个数和子树内的\(\sumdeg\)有关......
  • WebSocket 协议 message, ping , Pong, 消息
    以前一直不明白,WebSocket 已经有了message回调函数,可以接收任何的消息,按理说,ping和pong也只是  message 众多消息类型中的两个消息特里,直到看到 <<WebSocket协议 >>的定义,才明白,为什么了 一、数据帧(DataFraming)WebSocket协议中,数据是通过数据帧来传递的,协议......
  • 在我的VS2019中重新配置2017项目生成的google test 项目
    原来的项目是其他版本的VS配置的,自己下载下来时候,本机也没有装GoogleTest所以用不起。如果重建项目在一个个引入工程代码太麻烦(文件多),所以我就想着有没有什么办法快捷配置,不用重建工程以下是我的一个配置方法,供大家交流学习:1.首先你本机要安装上GoogleTest,安装方法自查;2.如......
  • CommonJS、AMD、CMD、ES Module
    依赖前置和依赖就近RequireJS采用依赖前置,举个例子就是炒菜前一次性把所有食材洗好,切好,根据内部逻辑把有些酱料拌好,最后开始炒菜,前面任何一个步骤出现问题都能较早发现错误;SeaJS的依赖就近就是要炒青椒了去切青椒要炒肉了去切肉cmd:例如seajsamd:例如requirejscommonjs模块规范nod......
  • (python)做题记录||2024.2.4||题目是codewars的【 All Balanced Parentheses】
    题目链接:https://www.codewars.com/kata/5426d7a2c2c7784365000783/python我的解决方案:defbalanced_parens(n):#Yourcodehere!used_l=[Falseforiinrange(n)]used_r=[Falseforiinrange(n)]answers=[]defprocess(answer):iflen(a......
  • 无涯教程-toLocaleDateString()函数
    JavascriptdatetoLocaleDateString()方法根据本地时间格式,把Date对象的日期部分转换为字符串。toLocaleDateString()-语法Date.toLocaleString()toLocaleDateString()-返回值使用操作系统的区域设置约定返回"Date"部分。toLocaleDateString()-示例vardt=......