prometheus
https://prometheus.io/
From metrics to insight
Power your metrics and alerting with the leading
open-source monitoring solution.
架构
https://juejin.cn/post/7201757033321267258
Prometheus Server
: 用于收集和存储时间序列数据Client Library
: 客户端库,检测应用程序代码,当Prometheus抓取实例的HTTP端点时,客户端库会将所有跟踪的metrics指标的当前状态发送到prometheus server端。Exporters
: prometheus支持多种exporter,通过exporter可以采集metrics数据,然后发送到prometheus server端,所有向promtheus server提供监控数据的程序都可以被称为exporterAlertmanager
: 从 Prometheus server 端接收到 alerts 后,会进行去重,分组,并路由到相应的接收方,发出报警,常见的接收方式有:电子邮件,微信,钉钉, slack等。Grafana
:监控仪表盘,可视化监控数据pushgateway
: 各个目标主机可上报数据到pushgateway,然后prometheus server统一从pushgateway拉取数据。从上图可发现,Prometheus整个生态圈组成主要包括prometheus server,Exporter,pushgateway,alertmanager,grafana,Web ui界面,Prometheus server由三个部分组成,Retrieval,Storage,PromQL
Retrieval
负责在活跃的target主机上抓取监控指标数据Storage
存储主要是把采集到的数据存储到磁盘中PromQL
是Prometheus提供的查询语言模块。
Prometheus工作流程
1)Prometheus server可定期从活跃的(up)目标主机上(target)拉取监控指标数据,目标主机的监控数据可通过配置静态job或者服务发现的方式被prometheus server采集到,这种方式默认的pull方式拉取指标;也可通过pushgateway把采集的数据上报到prometheus server中;还可通过一些组件自带的exporter采集相应组件的数据;
2)Prometheus server把采集到的监控指标数据保存到本地磁盘或者数据库;
3)Prometheus采集的监控指标数据按时间序列存储,通过配置报警规则,把触发的报警发送到alertmanager
4)Alertmanager通过配置报警接收方,发送报警到邮件,微信或者钉钉等
5)Prometheus 自带的web ui界面提供PromQL查询语言,可查询监控数据
6)Grafana可接入prometheus数据源,把监控数据以图形化形式展示出来
理解时间序列
https://www.prometheus.wang/promql/what-is-prometheus-metrics-and-labels.html
在1.2节当中,通过Node Exporter暴露的HTTP服务,Prometheus可以采集到当前主机所有监控指标的样本数据。例如:
# HELP node_cpu Seconds the cpus spent in each mode. # TYPE node_cpu counter node_cpu{cpu="cpu0",mode="idle"} 362812.7890625 # HELP node_load1 1m load average. # TYPE node_load1 gauge node_load1 3.0703125
其中非#开头的每一行表示当前Node Exporter采集到的一个监控样本:node_cpu和node_load1表明了当前指标的名称、大括号中的标签则反映了当前样本的一些特征和维度、浮点数则是该监控样本的具体值。
样本
Prometheus会将所有采集到的样本数据以时间序列(time-series)的方式保存在内存数据库中,并且定时保存到硬盘上。time-series是按照时间戳和值的序列顺序存放的,我们称之为向量(vector). 每条time-series通过指标名称(metrics name)和一组标签集(labelset)命名。如下所示,可以将time-series理解为一个以时间为Y轴的数字矩阵:
^ │ . . . . . . . . . . . . . . . . . . . node_cpu{cpu="cpu0",mode="idle"} │ . . . . . . . . . . . . . . . . . . . node_cpu{cpu="cpu0",mode="system"} │ . . . . . . . . . . . . . . . . . . node_load1{} │ . . . . . . . . . . . . . . . . . . v <------------------ 时间 ---------------->
在time-series中的每一个点称为一个样本(sample),样本由以下三部分组成:
- 指标(metric):metric name和描述当前样本特征的labelsets;
- 时间戳(timestamp):一个精确到毫秒的时间戳;
- 样本值(value): 一个float64的浮点型数据表示当前样本的值。
<--------------- metric ---------------------><-timestamp -><-value-> http_request_total{status="200", method="GET"}@1434417560938 => 94355 http_request_total{status="200", method="GET"}@1434417561287 => 94334 http_request_total{status="404", method="GET"}@1434417560938 => 38473 http_request_total{status="404", method="GET"}@1434417561287 => 38544 http_request_total{status="200", method="POST"}@1434417560938 => 4748 http_request_total{status="200", method="POST"}@1434417561287 => 4785
Label and relabel
https://grafana.com/blog/2022/03/21/how-relabeling-in-prometheus-works/#available-actions
Prometheus labels
Labels are sets of key-value pairs that allow us to characterize and organize what’s actually being measured in a Prometheus metric.
For example, when measuring HTTP latency, we might use labels to record the HTTP method and status returned, which endpoint was called, and which server was responsible for the request.
Each unique combination of key-value label pairs is stored as a new time series in Prometheus, so labels are crucial for understanding the data’s cardinality and unbounded sets of values should be avoided as labels.
Internal labels
But what about metrics with no labels? Prometheus also provides some internal labels for us. These begin with two underscores and are removed after all relabeling steps are applied; that means they will not be available unless we explicitly configure them to.
Some of these special labels available to us are
Label name Description __name__ The scraped metric’s name __address__ host:port of the scrape target __scheme__ URI scheme of the scrape target __metrics_path__ Metrics endpoint of the scrape target __param_<name> is the value of the first URL parameter passed to the target __scrape_interval__ The target’s scrape interval (experimental) __scrape_timeout__ The target’s timeout (experimental) __meta_ Special labels set set by the Service Discovery mechanism __tmp Special prefix used to temporarily store label values before discarding them So now that we understand what the input is for the various relabel_config rules, how do we create one? And what can they actually be used for?
The base <relabel_config> block
A
<relabel_config>
consists of seven fields. These are:
- source_labels
- separator (default = ;)
- target_label
- regex (default = (.*))
- modulus
- replacement (default = $1)
- action (default = replace)
A Prometheus configuration may contain an array of relabeling steps; they are applied to the label set in the order they’re defined in. Omitted fields take on their default value, so these steps will usually be shorter.
source_labels and separator
Let’s start off with
source_labels
. It expects an array of one or more label names, which are used to select the respective label values. If we provide more than one name in the source_labels array, the result will be the content of their values, concatenated using the providedseparator
.As an example, consider the following two metrics
my_custom_counter_total{server="webserver01",subsystem="kata"} 192 1644075044000 my_custom_counter_total{server="sqldatabase",subsystem="kata"} 147 1644075044000
The following relabel_config
source_labels: [subsystem, server] separator: "@"
would extract these values.
kata@webserver01 kata@sqldatabase
PromQL
https://prometheus.io/docs/prometheus/latest/querying/examples/
Simple time series selection
Return all time series with the metric
http_requests_total
:http_requests_total
Return all time series with the metric
http_requests_total
and the givenjob
andhandler
labels:http_requests_total{job="apiserver", handler="/api/comments"}
Return a whole range of time (in this case 5 minutes up to the query time) for the same vector, making it a range vector:
http_requests_total{job="apiserver", handler="/api/comments"}[5m]
Note that an expression resulting in a range vector cannot be graphed directly, but viewed in the tabular ("Console") view of the expression browser.
Monitoring Linux host metrics with the Node Exporter
https://prometheus.io/docs/guides/node-exporter/
Simple Demo
https://github.com/fanqingsong/docker-prometheus
Prometheus Monitoring
This repository contains minimal Prometheus Server, NodeExporter, BlackBoxExporter, AlertManager and Grafana implementation for monitoring various services. You can use this repository to monitor a bare-metal Linux instance or to monitor Apache, NGINX or other HTTP based services using Prometheus.
Monitoring a Bare-Metal Linux Server
To monitor a stand-alone Linux Server, you have to checkout against the tag v1.0 of the repository. Where all the configurations for monitoring a stand-alone Linux Server are available. Just
docker-compose up -d
and you're good to go. (You have to map alerts manually against tag v1.0)
Monitoring HTTP-based Web Services
The v1.1 tag of the repository monitors 2 HTTP-based Web Services by default: An Apache httpd server and NGINX server both running in Docker Containers. If either or both of them goes down, an Prometheus will fire alerts in the form emails specified in the
config.yml
file in the AlertManager folder.
https://github.com/prometheus/blackbox_exporter
Checking the results
Visiting http://localhost:9115/probe?target=google.com&module=http_2xx will return metrics for a HTTP probe against google.com. The
probe_success
metric indicates if the probe succeeded. Adding adebug=true
parameter will return debug information for that probe.
https://www.cnblogs.com/cyleon/p/12876897.html
HTTP 测试: 定义 Request Header 信息、判断 Http status / Http Respones Header / Http Body 内容
TCP 测试: 业务组件端口状态监听、应用层协议定义与监听
ICMP 测试: 主机探活机制
POST 测试: 接口联通性
https://github.com/prometheus/node_exporter
If you are new to Prometheus and
node_exporter
there is a simple step-by-step guide.The
node_exporter
listens on HTTP port 9100 by default. See the--help
output for more options.
定制数据exporter
https://github.com/prometheus/client_python#counter
from prometheus_client import start_http_server, Summary import random import time # Create a metric to track time spent and requests made. REQUEST_TIME = Summary('request_processing_seconds', 'Time spent processing request') # Decorate function with metric. @REQUEST_TIME.time() def process_request(t): """A dummy function that takes some time.""" time.sleep(t) if __name__ == '__main__': # Start up the server to expose the metrics. start_http_server(8000) # Generate some requests. while True: process_request(random.random())
标签:__,labels,Prometheus,server,prometheus,time From: https://www.cnblogs.com/lightsong/p/17514175.html