一 、Prometheus 源码安装和启动配置
普罗米修斯下载网址:https://prometheus.io/download/
监控集成器下载地址:http://www.coderdocument.com/docs/prometheus/v2.14/instrumenting/exporters_and_integrations.html
1.实验环境
IP 角色 系统
172.16.11.7 Prometheus 服务端 CentOS 7
172.16.11.8 node_exporter 客户端 CentOS 7
2.下载prometheus
[root@prometheus ~]# cd /usr/local/
[root@prometheus local]# wget https://github.com/prometheus/prometheus/releases/download/v2.25.0/prometheus-2.25.0.linux-amd64.tar.gz
[root@prometheus local]# tar xf prometheus-2.25.0.linux-amd64.tar.gz
[root@prometheus local]# mv prometheus-2.25.0.linux-amd64/ prometheus
1
2
3
4
查看版本号
[root@prometheus prometheus]# ./prometheus --version
1
查看帮助文档
[root@prometheus prometheus]# ./prometheus --help
1
3.prometheus.yml 配置解释
cat /usr/local/prometheus/prometheus.yml
# my global config
global:
# 默认情况下,每15s拉取一次目标采样点数据。
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
# 每15秒评估一次规则。默认值为每1分钟。
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
# - alertmanager:9093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
# - "first_rules.yml"
# - "second_rules.yml"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# job名称会增加到拉取到的所有采样点上,同时还有一个instance目标服务的host:port标签也会增加到采样点上
- job_name: 'prometheus'
# 覆盖global的采样点,拉取时间间隔5s
scrape_interval: 5s
static_configs:
- targets: ['localhost:9090']
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
4.启动服务
#启动服务
[root@prometheus prometheus]# ./prometheus --config.file=prometheus.yml
# 指定配置文件
--config.file="prometheus.yml"
# 默认指定监听地址端口,可修改端口
--web.listen-address="0.0.0.0:9090"
# 最大连接数
--web.max-connections=512
# tsdb数据存储的目录,默认当前data/
--storage.tsdb.path="data/"
# premetheus 存储数据的时间,默认保存15天
--storage.tsdb.retention=15d
# 通过命令热加载无需重启 curl -XPOST 172.16.11.7:9090/-/reload
--web.enable-lifecycle
# 可以启用 TLS 或 身份验证 的配置文件的路径
--web.config.file=""
启动选项更多了解:./prometheus --help
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
5.访问:http://172.16.11.7:9090
6.查看暴露指标
访问 http://172.16.11.7:9090/metrics
7.将Prometheus配置为系统服务
进入systemd目录下
cd /usr/lib/systemd/system
1
创建文件:vim prometheus.service
[Unit]
Description=https://prometheus.io
[Service]
Restart=on-failure
ExecStart=/usr/local/prometheus/prometheus --config.file=/usr/local/prometheus/prometheus.yml --web.listen-address=:9090
[Install]
WantedBy=multi-user.target
1
2
3
4
5
6
7
8
9
生效系统system文件
systemctl daemon-reload
1
启动服务
systemctl start prometheus
1
二、客户端,配置服务发现监控linux主机及相关服务
172.16.11.8操作
1.安装node_exporter
监控Linux 安装常用node_exporter
cd /usr/local/
wget https://github.com/prometheus/node_exporter/releases/download/v1.1.2/node_exporter-1.1.2.linux-amd64.tar.gz
tar xf node_exporter-1.1.2.linux-amd64.tar.gz
mv node_exporter-1.1.2.linux-amd64/ node_exporter
1
2
3
4
2.启动node_exporter,并添加到服务
(1)直接启动
cd /usr/local/node_exporter && ./node_exporter &
# 启动后会监听9100端口
1
2
(2)添加为服务方式启动
vim /usr/lib/systemd/system/node_exporter.service
[Unit]
Description=node_exporter
After=network.target
[Service]
ExecStart=/usr/local/node_exporter/node_exporter
Restart=on-failure
[Install]
WantedBy=multi-user.target
1
2
3
4
5
6
7
8
9
10
11
这里选择(2)添加为服务方式启动
systemctl daemon-reload
systemctl start node_exporter
1
2
三 、服务端配置文件添加监控项
172.16.11.7操作
cd /usr/local/prometheus
vim prometheus.yml
1
2
# my global config
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
# - alertmanager:9093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['172.16.11.7:9090']
- job_name: 'linux'
static_configs:
- targets: ['172.16.11.8:9100']
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
重启prometheus
[root@prometheus ~]# systemctl restart prometheus.service
1
重启之后,再次刷新查看
四 、监控mysql(mysqld-exporter)
172.16.11.8操作
1.下载跟配置
cd /usr/local
wget https://github.com/prometheus/mysqld_exporter/releases/download/v0.12.1/mysqld_exporter-0.12.1.linux-amd64.tar.gz
tar xf mysqld_exporter-0.12.1.linux-amd64.tar.gz -C /usr/local/
mv mysqld_exporter-0.12.1.linux-amd64 mysqld_exporter
cd /usr/local/mysqld_exporter && vim .my.cnf
1
2
3
4
5
2.启动mysqld-exporter
cd /usr/local/mysqld_exporter
./mysqld_exporter --config.my-cnf="/usr/local/mysqld_exporter/.my.cnf" &
1
2
启动后会监听9104端口
3.配置文件添加监控项后重启
172.16.11.7Prometheus 服务端操作
cd /usr/local/prometheus
vim prometheus.yml
1
2
- job_name: 'mysql'
static_configs:
- targets: ['172.16.11.8:9104']
1
2
3
重启普罗米修斯
systemctl restart prometheus.service
1
五 、监控节点的其它系统服务
172.16.11.8操作
如果要监控节点的系统服务,需要在后面添加名单参数
–collector.systemd.unit-whitelist=“.+” 从systemd中循环正则匹配单元
–collector.systemd.unit-whitelist=“(docker|sshd|nginx).service” 白名单,收集目标
#监控客户端,docker服务,nginx服务,sshd
vi /usr/lib/systemd/system/node_exporter.service
[Unit]
Description=https://prometheus.io
[Service]
Restart=on-failure
ExecStart=/usr/local/node_exporter/node_exporter --collector.systemd --collector.systemd.unit-whitelist=(docker|sshd|nginx).service
[Install]
WantedBy=multi-user.target
1
2
3
4
5
6
7
8
9
重启服务node_exporter
systemctl daemon-reload
systemctl restart node_exporter
1
2
六 、Grafana 展示 Prometheus 数据
1.快速下载安装Grafana
172.16.11.7操作
wget --no-check-certificate https://mirrors.tuna.tsinghua.edu.cn/grafana/yum/rpm/grafana-7.4.3-1.x86_64.rpm
wget https://dl.grafana.com/enterprise/release/grafana-enterprise-9.3.2-1.x86_64.rpm
yum install -y initscripts fontconfig
yum install -y grafana-7.4.3-1.x86_64.rpm
systemctl start grafana-server.service
systemctl status grafana-server.service
1
2
3
4
5
启动后访问地址:ip:3000
初始用户名和密码都是admin
mysqld_exporter报错Error 1146: Table 'my2.status' doesn't exist
导入以下SQL
create database IF NOT EXISTS my2; use my2; CREATE TABLE IF NOT EXISTS status ( VARIABLE_NAME varchar(64) CHARACTER SET utf8 NOT NULL DEFAULT '', VARIABLE_VALUE varchar(1024) CHARACTER SET utf8 DEFAULT NULL, TIMEST timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP ) ENGINE=InnoDB; CREATE TABLE IF NOT EXISTS current ( VARIABLE_NAME varchar(64) CHARACTER SET utf8 NOT NULL DEFAULT '', VARIABLE_VALUE varchar(1024) CHARACTER SET utf8 DEFAULT NULL ) ENGINE=InnoDB; ALTER TABLE status ADD unique KEY idx01 (VARIABLE_NAME,timest); -- delete from my2.status where VARIABLE_NAME like 'PROCESSES_HOSTS.%'; -- update my2.status set variable_value=0, timest=timest where VARIABLE_NAME like '%-d' and variable_value<0; ALTER TABLE current ADD unique KEY idx02 (VARIABLE_NAME); DROP PROCEDURE IF EXISTS collect_stats; DELIMITER // ; CREATE PROCEDURE collect_stats() BEGIN DECLARE a datetime; DECLARE v varchar(10); set sql_log_bin = 0; set a=now(); select substr(version(),1,3) into v; if v='5.7' OR v='8.0' then insert into my2.status(variable_name,variable_value,timest) select upper(variable_name),variable_value, a from performance_schema.global_status where variable_value REGEXP '^-*[[:digit:]]+(\.[[:digit:]]+)?$' and variable_name not like 'Performance_schema_%' and variable_name not like 'SSL_%'; insert into my2.status(variable_name,variable_value,timest) SELECT 'replication_worker_time', coalesce(max(PROCESSLIST_TIME), 0.1), a FROM performance_schema.threads WHERE (NAME = 'thread/sql/slave_worker' AND (PROCESSLIST_STATE IS NULL OR PROCESSLIST_STATE != 'Waiting for an event from Coordinator')) OR NAME = 'thread/sql/slave_sql'; -- *** Comment the following 4 lines with 8.0 *** else insert into my2.status(variable_name,variable_value,timest) select variable_name,variable_value,a from information_schema.global_status; end if; insert into my2.status(variable_name,variable_value,timest) select concat('PROCESSES.',user),count(*),a from information_schema.processlist group by user; insert into my2.status(variable_name,variable_value,timest) select concat('PROCESSES_HOSTS.',SUBSTRING_INDEX(host,':',1)),count(*),a from information_schema.processlist group by concat('PROCESSES_HOSTS.',SUBSTRING_INDEX(host,':',1)); insert into my2.status(variable_name,variable_value,timest) select concat('PROCESSES_COMMAND.',command),count(*),a from information_schema.processlist group by concat('PROCESSES_COMMAND.',command); insert into my2.status(variable_name,variable_value,timest) select substr(concat('PROCESSES_STATE.',state),1,64),count(*),a from information_schema.processlist group by substr(concat('PROCESSES_STATE.',state),1,64); if v='5.6' OR v='5.7' OR v='8.0' OR v='10.' then insert into my2.status(variable_name,variable_value,timest) SELECT 'SUM_TIMER_WAIT', sum(sum_timer_wait*1.0), a FROM performance_schema.events_statements_summary_global_by_event_name; end if; -- Delta values if v='5.7' OR v='8.0' then insert into my2.status(variable_name,variable_value,timest) select concat(upper(s.variable_name),'-d'), greatest(s.variable_value-c.variable_value,0), a from performance_schema.global_status s, my2.current c where s.variable_name=c.variable_name; insert into my2.status(variable_name,variable_value,timest) SELECT concat('COM_',upper(substr(s.EVENT_NAME,15,58)), '-d'), greatest(s.COUNT_STAR-c.variable_value,0), a FROM performance_schema.events_statements_summary_global_by_event_name s, my2.current c WHERE s.EVENT_NAME LIKE 'statement/sql/%' AND s.EVENT_NAME = c.variable_name; insert into my2.status(variable_name,variable_value,timest) SELECT 'SUM_TIMER_WAIT-d', sum(sum_timer_wait*1.0)-c.variable_value, a FROM performance_schema.events_statements_summary_global_by_event_name, my2.current c WHERE c.variable_name='SUM_TIMER_WAIT'; insert into my2.status(variable_name, variable_value, timest) select 'replication_connection_status',if(SERVICE_STATE='ON', 1, 0),a from performance_schema.replication_connection_status; insert into my2.status(variable_name, variable_value, timest) select 'replication_applier_status',if(SERVICE_STATE='ON', 1, 0),a from performance_schema.replication_applier_status; delete from my2.current; insert into my2.current(variable_name,variable_value) select upper(variable_name),variable_value+0 from performance_schema.global_status where variable_value REGEXP '^-*[[:digit:]]+(\.[[:digit:]]+)?$' and variable_name not like 'Performance_schema_%' and variable_name not like 'SSL_%'; insert into my2.current(variable_name,variable_value) SELECT substr(EVENT_NAME,1,40), COUNT_STAR FROM performance_schema.events_statements_summary_global_by_event_name WHERE EVENT_NAME LIKE 'statement/sql/%'; insert into my2.current(variable_name,variable_value) SELECT 'SUM_TIMER_WAIT', sum(sum_timer_wait*1.0) FROM performance_schema.events_statements_summary_global_by_event_name; insert into my2.current(variable_name,variable_value) select concat('PROCESSES_COMMAND.',command),count(*) from information_schema.processlist group by concat('PROCESSES_COMMAND.',command); insert into my2.current(variable_name,variable_value) select upper(variable_name),variable_value from performance_schema.global_variables where variable_name in ('max_connections', 'innodb_buffer_pool_size', 'query_cache_size', 'innodb_log_buffer_size', 'key_buffer_size', 'table_open_cache'); else insert into my2.status(variable_name,variable_value,timest) select concat(upper(s.variable_name),'-d'), greatest(s.variable_value-c.variable_value,0), a from information_schema.global_status s, my2.current c where s.variable_name=c.variable_name; delete from my2.current; insert into my2.current(variable_name,variable_value) select upper(variable_name),variable_value+0 from information_schema.global_status where variable_value REGEXP '^-*[[:digit:]]+(\.[[:digit:]]+)?$' and variable_name not like 'Performance_schema_%' and variable_name not like 'SSL_%'; insert into my2.current(variable_name,variable_value) select upper(variable_name),variable_value from information_schema.global_variables where variable_name in ('max_connections', 'innodb_buffer_pool_size', 'query_cache_size', 'innodb_log_buffer_size', 'key_buffer_size', 'table_open_cache'); end if; set sql_log_bin = 1; END // DELIMITER ; // -- Collect daily statistics on space usage and delete old statistics (older than 62 days, 1 year for DB size) DROP PROCEDURE IF EXISTS collect_daily_stats; DELIMITER // ; CREATE PROCEDURE collect_daily_stats() BEGIN DECLARE a datetime; set sql_log_bin = 0; set a=now(); insert into my2.status(variable_name,variable_value,timest) select concat('SIZEDB.',table_schema), sum(data_length+index_length), a from information_schema.tables group by table_schema; insert into my2.status(variable_name,variable_value,timest) select 'SIZEDB.TOTAL', sum(data_length+index_length), a from information_schema.tables; delete from my2.status where timest < date_sub(now(), INTERVAL 62 DAY) and variable_name <>'SIZEDB.TOTAL'; delete from my2.status where timest < date_sub(now(), INTERVAL 365 DAY); set sql_log_bin = 1; END // DELIMITER ; // -- The event scheduler must also be activated in the my.cnf (event_scheduler=1) set global event_scheduler=1; set sql_log_bin = 0; DROP EVENT IF EXISTS collect_stats; CREATE EVENT collect_stats ON SCHEDULE EVERY 10 Minute DO call collect_stats(); DROP EVENT IF EXISTS collect_daily_stats; CREATE EVENT collect_daily_stats ON SCHEDULE EVERY 1 DAY DO call collect_daily_stats(); set sql_log_bin = 1;
2.添加Prometheus数据源
Configuration -> Data Sources ->add data source -> Prometheus
3.新增Dashboard Linux基础数据展示
Create -> import
4.导入模板8919
5.选择数据源
点击lmport
6.查看Dashboard
Dashboards ->Manage
七 、MySQL数据展示
1 设置数据源
2.导入已经画好的dashboard,数据源选择刚刚创建好的mysql数据源即可
导入画好的dashboard,可在官网下载
点击访问mysql数据源
这里我选择第一个
7991数字是上面官网复制过来的
粘贴,点击load
选择Mysql源
七 、监控Redis(redis_exporter)
1.安装redis_exporter
172.16.11.8操作
cd /usr/local
wget https://github.com/oliver006/redis_exporter/releases/download/v0.15.0/redis_exporter-v0.15.0.linux-amd64.tar.gz
tar -xvf redis_exporter-v0.15.0.linux-amd64.tar.gz
2.启动redis_exporter
172.16.11.8操作
默认redis_exporter端口为9121
cd /usr/local
./redis_exporter redis//172.16.11.8:6379 &
3.prometheus配置文件中加入redis监控并重启
172.16.11.7操作
vim /usr/local/prometheus/prometheus.yml
1
- job_name: 'Redis'
static_configs:
- targets: ['172.16.11.8:9121']
1
2
3
systemctl restart prometheus
————————————————
版权声明:本文为CSDN博主「兴乐安宁」的原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接及本声明。
原文链接:https://blog.csdn.net/weixin_42324463/article/details/128006734