1.监控redis
1.1 安装方式
1.1.1 二进制源码安装方式
参考nginx二进制安装方法
redis_exporter下载地址:https://github.com/oliver006/redis_exporter/releases 系统服务: cat > /etc/systemd/system/redis_exporter.service <<"EOF" [Unit] Description=Prometheus Redis Exporter After=network.target [Service] Type=simple User=prometheus Group=prometheus Restart=always ExecStart=/opt/prometheus/redis_exporter/redis_exporter \ -redis.addr localhost:6379 \ -redis.password 123456 [Install] WantedBy=multi-user.target EOF
1.1.2 docker安装
# docker直接运行 docker run -d --restart=always --name redis_exporter -p 9121:9121 oliver006/redis_exporter --redis.addr redis://192.168.10.100:6379 --redis.password '123456' # docker-compose方式 cat >docker-compose.yaml <<EOF version: '3.3' services: redis_exporter: image: oliver006/redis_exporter container_name: redis_exporter restart: always environment: REDIS_ADDR: "192.168.10.100:6379" REDIS_PASSWORD: 123456 ports: - "9121:9121" EOF # 启动 docker-compose up -d # metrics地址 http://192.168.10.100:9121/metrics
1.2 Prometheus配置
# 配置prometheus去采集(拉取)redis_exporter的监控样本数据 cd /data/docker-prometheus #在scrape_configs(搜刮配置):下面增加如下配置: cat >> prometheus/prometheus.yml << "EOF" - job_name: 'redis_exporter' static_configs: - targets: ['192.168.10.100:9121'] labels: instance: test服务器 EOF # 重载 curl -X POST http://localhost:9090/-/reload
1.3 granfa展示
https://grafana.com/grafana/dashboards/11835-redis-dashboard-for-prometheus-redis-exporter-helm-stable-redis-ha/
1.5 常用监控指标
redis_up # 服务器是否在线 redis_uptime_in_seconds # 运行时长,单位 s rate(redis_cpu_sys_seconds_total[1m]) + rate(redis_cpu_user_seconds_total[1m]) # 占用 CPU 核数 redis_memory_used_bytes # 占用内存量 redis_memory_max_bytes # 限制的最大内存,如果没限制则为 0 delta(redis_net_input_bytes_total[1m]) # 网络接收的 bytes delta(redis_net_output_bytes_total[1m]) # 网络发送的 bytes redis_connected_clients # 客户端连接数 redis_connected_clients / redis_config_maxclients # 连接数使用率 redis_rejected_connections_total # 拒绝的客户端连接数 redis_connected_slaves # slave 连接数
1.6 触发器配置
将触发器根据服务不同,进行分开,避免规则列表过长
cd /data/docker-prometheus mkdir prometheus/rules vim prometheus/prometheus.yml # 报警(触发器)配置 rule_files: - "alert.yml" - "rules/*.yml"
redis触发器(告警规则)
cat >> prometheus/rules/redis.yml <<"EOF" groups: - name: redis rules: - alert: RedisDown expr: redis_up == 0 for: 0m labels: severity: critical annotations: summary: 'Redis Down,实例:{{ $labels.instance }}' description: "Redis实例 is down" - alert: RedisMissingBackup expr: time() - redis_rdb_last_save_timestamp_seconds > 60 * 60 * 24 for: 0m labels: severity: critical annotations: summary: "Redis备份丢失,实例:{{ $labels.instance }}" description: "Redis 24小时未备份" - alert: RedisOutOfConfiguredMaxmemory expr: redis_memory_used_bytes / redis_memory_max_bytes * 100 > 90 for: 2m labels: severity: warning annotations: summary: "Redis超出配置的最大内存,实例:{{ $labels.instance }}" description: "Redis内存使用超过配置最大内存的90%" - alert: RedisTooManyConnections expr: redis_connected_clients > 100 for: 2m labels: severity: warning annotations: summary: "Redis连接数过多,实例:{{ $labels.instance }}" description: "Redis当前连接数为: {{ $value }}" - alert: RedisNotEnoughConnections expr: redis_connected_clients < 1 for: 2m labels: severity: warning annotations: summary: "Redis没有足够的连接,实例:{{ $labels.instance }}" description: "Redis当前连接数为: {{ $value }}" - alert: RedisRejectedConnections expr: increase(redis_rejected_connections_total[1m]) > 0 for: 0m labels: severity: critical annotations: summary: "Redis有拒绝连接,实例:{{ $labels.instance }}" description: "与Redis 的某些连接被拒绝{{ $value }}" EOFredis告警规则
检查配置,重新加载
docker exec -it prometheus promtool check config /etc/prometheus/prometheus.yml
curl -X POST http://localhost:9090/-/reload
检查:
http://192.168.10.14:9090/rules#redis
http://192.168.10.14:9090/alerts?search=
2.监控rabbitmq
2.1 安装rabbitmq方式
二进制安装
rabbit_exporter下载地址: https://github.com/kbudde/rabbitmq_exporter/releases 系统服务: cat > /etc/systemd/system/rabbitmq_exporter.service <<"EOF" [Unit] Description=prometheus rabbitmq exporter After=network.target [Service] Environment=RABBIT_USER=guest Environment=RABBIT_PASSWORD=guest Environment=RABBIT_URL=http://localhost:15672 OUTPUT_FORMAT=JSON Type=simple User=prometheus Group=prometheus Restart=always ExecStart=/opt/prometheus/rabbitmq_exporter/rabbitmq_exporter [Install] WantedBy=multi-user.target EOF
docker-compose安装
cd /data/rabbitmq cat >>docker-compose.yaml<<"EOF" version: '3' services: rabbitmq: image: rabbitmq:3.7.15-management container_name: rabbitmq restart: always volumes: - /data/rabbitmq/data:/var/lib/rabbitmq - /data/rabbitmq/log:/var/log/rabbitmq ports: - 5672:5672 - 15672:15672 EOF # 启动 docker-compose up -d
2.2 安装 rabbitmq_exporter方式
docker安装
# docker直接安装 docker run -d --restart=always -p 9419:9419 --name rabbitmq_exporter -e RABBIT_URL=http://192.168.10.100:15672 -e RABBIT_USER=guest -e RABBIT_PASSWORD=guest kbudde/rabbitmq-exporter # docker-compose安装 cat >docker-compose.yaml <<EOF version: '3.3' services: rabbitmq_exporter: image: kbudde/rabbitmq-exporter container_name: rabbitmq_exporter restart: always environment: RABBIT_URL: "http://192.168.10.100:15672" RABBIT_USER: "guest" RABBIT_PASSWORD: "guest" PUBLISH_PORT: "9419" OUTPUT_FORMAT: "JSON" ports: - "9419:9419" EOF # 启动 docker-compose up -d
# 参数解释
Environment variable |
default |
description |
RABBIT_URL |
rabbitMQ管理插件的url(必须以http(s)://开头) |
|
RABBIT_USER |
guest |
rabbitMQ 管理插件的用户名。 |
RABBIT_PASSWORD |
guest |
rabbitMQ 管理插件的密码。 |
OUTPUT_FORMAT |
JSON |
输出格式 |
PUBLISH_PORT |
9419 |
运行端口(监听端口) |
metrics地址:http://192.168.10.100:9419/metrics
2.3 Prometheus配置
配置prometheus去采集(拉取)rabbitmq_exporter的监控样本数据
cd /data/docker-prometheus #在scrape_configs(搜刮配置):下面增加如下配置: cat >> prometheus/prometheus.yml << "EOF" - job_name: 'rabbitmq_exporter' static_configs: - targets: ['192.168.10.100:9419'] labels: instance: test服务器 EOF # 重新加载配置 curl -X POST http://localhost:9090/-/reload
2.4 常用监控指标
rabbitmq_queue_messages_unacknowledged_global 队列中有未确认的消息总数(未被消费的消息) rabbitmq_node_disk_free_limit 使用磁盘大小 rabbitmq_node_disk_free 磁盘总大小 rabbitmq_node_mem_used 使用内存大小 rabbitmq_node_mem_limit 内存总大小 rabbitmq_sockets_used 使用sockets的数量 rabbitmq_sockets_available 可用的sockets总数 rabbitmq_fd_used 使用文件描述符的数量 rabbitmq_fd_available 可用的文件描述符总数
2.5 rabbitmq触发器告警规则
cat > prometheus/rules/rabbitmq.yml <<"EOF" groups: - name: Rabbitmq rules: - alert: RabbitMQDown expr: rabbitmq_up != 1 labels: severity: High annotations: summary: "Rabbitmq Down,实例:{{ $labels.instance }}" description: "Rabbitmq_exporter连不上RabbitMQ! ! !" - alert: RabbitMQ有未确认消息 expr: rabbitmq_queue_messages_unacknowledged_global > 0 for: 1m labels: severity: critical annotations: summary: "RabbitMQ有未确认消息,实例:{{ $labels.instance }}" description: 'RabbitMQ未确认消息>0,当前值为:{{ $value }}' - alert: RabbitMQ可用磁盘空间不足告警 expr: rabbitmq_node_disk_free_alarm != 0 #expr: rabbitmq_node_disk_free_limit / rabbitmq_node_disk_free *100 > 90 for: 0m labels: severity: critical annotations: summary: "RabbitMQ可用磁盘空间不足,实例:{{ $labels.instance }}" description: "RabbitMQ可用磁盘空间不足,请检查" - alert: RabbitMQ可用内存不足告警 expr: rabbitmq_node_mem_alarm != 0 #expr: rabbitmq_node_mem_used / rabbitmq_node_mem_limit * 100 > 90 for: 0m labels: severity: critical annotations: summary: "RabbitMQ可用内存不足,实例:{{ $labels.instance }}" description: "RabbitMQ可用内存不足,请检查" - alert: RabbitMQ_socket连接数使用过高告警 expr: rabbitmq_sockets_used / rabbitmq_sockets_available * 100 > 60 for: 0m labels: severity: critical annotations: summary: "RabbitMQ_socket连接数使用过高,实例:{{ $labels.instance }}" description: 'RabbitMQ_sockets使用>60%,当前值为:{{ $value }}' - alert: RabbitMQ文件描述符使用过高告警 expr: rabbitmq_fd_used / rabbitmq_fd_available * 100 > 60 for: 0m labels: severity: critical annotations: summary: "RabbitMQ文件描述符使用过高,实例:{{ $labels.instance }}" description: 'RabbitMQ文件描述符使用>60%,当前值为:{{ $value }}' EOFrabbitmq报警规则
# 检查配置 docker exec -it prometheus promtool check config /etc/prometheus/prometheus.yml #重新加载配置 curl -X POST http://localhost:9090/-/reload # 查看 http://192.168.10.14:9090/rules http://192.168.10.14:9090/alerts?search=
2.6 grafana dashboard展示
grafana展示prometheus从rabbitmq_exporter收集到的的数据
id:4279
标签:exporter,--,mongodb,labels,redis,rabbitmq,prometheus,监控,docker From: https://www.cnblogs.com/yangmeichong/p/18156069