《Windows Azure Platform 系列文章目录》
一.用户现状及需求
1.客户团队使用Prometheus Cloud Watch Exporter,把AWS监控指标,与Prometheus整合:
https://github.com/prometheus/cloudwatch_exporter
客户团队希望微软云Azure也提供类似的Exporter功能,能把Azure的监控指标(虚拟机、Redis PaaS等,MySQL PaaS数据等),与Prometheus整合
二.说明
微软目前没有官方提供的Exporter功能,但是查询到第三方开源的解决方案:
https://github.com/webdevops/azure-metrics-exporter
三.技术实现
该方案通过基于Azure SDK for Go,实现Azure Monitor Metric Exporter功能
四.实现关键步骤
1.创建和使用Azure订阅,步骤略
2.创建Service Principal,并赋权的权限为订阅的Reader。具体步骤略。
3.安装Azure虚拟机,我们这里以CentOS 7.9为例,具体步骤略
4.设置环境变量
vi ~/.bashrc
5.设置Service Principal相关信息
#App ID export AZURE_CLIENT_ID=XXXXXXXX #租户ID export AZURE_TENANT_ID=XXXXXXXX #App Key export AZURE_CLIENT_SECRET=XXXXXXXX
6.设置环境变量生效
source ~/.bashrc
五.安装Prometheus
1.我这里使用Prometheus 2.50.1,具体安装步骤略
2.下载与运行Azure Monitor Metric Exporter项目,项目文件在:https://github.com/webdevops/azure-metrics-exporter/releases
3.我们先下载24.2.0版本: https://github.com/webdevops/azure-metrics-exporter/releases/download/24.2.0/azure-metrics-exporter.linux.amd644.下载后运行
nohup ./azure-metrics-exporter.linux.amd64 &
六.配置Prometheus yml文件for Azure Storage
我们编辑prometheus.yml文件,增加下面的内容
1. job_name,设置job名称
2. 下面的第7行,是我的订阅ID。请PE团队按照实际情况修改
3. 下面的第11行,是指标名称。我们这里查询的是BlobCapacity,存储容量大小
4. 请注意下图使用的端口号,在第21行,为8080
具体指标可以参考文档:https://github.com/webdevops/azure-metrics-exporter
- job_name: azure-metrics-storageaccount-connections scrape_interval: 1m metrics_path: /probe/metrics/list params: name: ["my_own_metric_name"] subscription: - 166157a8-9ce9-400b-91c7-1d42482b83d6 resourceType: ["Microsoft.Storage/storageAccounts"] metricNamespace: ["Microsoft.Storage/storageAccounts/blobServices"] metric: - BlobCapacity interval: ["PT1H"] timespan: ["PT1H"] aggregation: - average - count # by blobtype (dimension support) # metricFilter: ["BlobType eq '*'"] metricTop: ["10"] static_configs: - targets: ["localhost:8080"]
七.配置Prometheus yml文件for Azure MySQL Flexible Server
1. 以下是配置Azure MySQL Flexible Server的Prometheus配置文件
2. 具体的Metric可以参考:https://learn.microsoft.com/en-us/azure/azure-monitor/reference/supported-metrics/microsoft-dbformysql-flexibleservers-metrics
- job_name: azure-metrics-databases scrape_interval: 1m metrics_path: /probe/metrics/list params: name: ["azure-database"] subscription: - 166157a8-9ce9-400b-91c7-1d42482b83d6 #filter: ["resourceType eq 'Microsoft.DBforMySQL/servers'"] resourceType: ["Microsoft.DBforMySQL/flexibleServers"] #metricNamespace: ["Microsoft.DBforMySQL/flexibleServers"] metric: - cpu_percent - memory_percent interval: ["PT1M"] timespan: ["2024-08-09T07:00:00Z/2024-08-09T08:00:00Z"] aggregation: - average #- count # by blobtype (dimension support) # metricFilter: ["BlobType eq '*'"] metricTop: ["10"] static_configs: - targets: ["localhost:8080"]
八.Azure Postgre SQL Flexible Server
1.以下是配置Azure PGSQL Flexible Server的Prometheus配置文件
2.主要监控的指标有两个:CPU利用率和内存利用率
3.具体的Metric可以参考:https://learn.microsoft.com/en-us/azure/azure-monitor/reference/supported-metrics/microsoft-dbforpostgresql-flexibleservers-metrics
- job_name: azure-metrics-pgsql scrape_interval: 1m metrics_path: /probe/metrics/list params: name: ["azure-pgsql"] subscription: - 166157a8-9ce9-400b-91c7-1d42482b83d6 resourceType: ["Microsoft.DBforPostgreSQL/flexibleServers"] metric: - cpu_percent - memory_percent interval: ["PT1M"] #P7D表示最近7天 timespan: ["P7D"] aggregation: - average #- count metricTop: ["20"] static_configs: - targets: ["localhost:8080"]
九.根据PGSQL Flexible Server名称等于某个值
1. 以下是配置Azure PGSQL Flexible Server的Prometheus配置文件
2. 显示类型为:PGSQL Flexible Server
3. 筛选PGSQL的Server Name为:等于leipgsql01
4. 这里用的是OData的运算符eq,也就是equal,等于某个值
- job_name: azure-metrics-pgsql scrape_interval: 1m metrics_path: /probe/metrics/list params: name: ["azure-pgsql"] subscription: - 166157a8-9ce9-400b-91c7-1d42482b83d6 filter: ["resourceType eq 'Microsoft.DBforPostgreSQL/flexibleServers' and name eq 'leipgsql01'"] #filter: ["resourceName eq 'leipgsql01'"] #resourceType: ["Microsoft.DBforPostgreSQL/flexibleServers"] #metricNamespace: ["Microsoft.DBforMySQL/flexibleServers"] metric: - cpu_percent - memory_percent interval: ["PT1M"] #timespan: ["2024-12-26T03:00:00Z/2024-12-28T08:00:00Z"] timespan: ["P7D"] aggregation: - average #- count # by blobtype (dimension support) # metricFilter: ["BlobType eq '*'"] metricTop: ["20"] static_configs: - targets: ["localhost:8080"]
- 执行结果,可以看到只显示Server Name为leipgsql01的指标:cpu_percent,memory_percent
# HELP azure_pgsql Azure monitor insight metric # TYPE azure_pgsql gauge azure_pgsql{aggregation="average",interval="PT1M",metric="cpu_percent",resourceGroup="sig-rg",resourceID="/subscriptions/166157a8-9ce9-400b-91c7-1d42482b83d6/resourcegroups/sig-rg/providers/microsoft.dbforpostgresql/flexibleservers/leipgsql01",resourceName="leipgsql01",subscriptionID="166157a8-9ce9-400b-91c7-1d42482b83d6",subscriptionName="leizhang-non-prod",tag_owner="",timespan="P7D",unit="Percent"} 10.5 azure_pgsql{aggregation="average",interval="PT1M",metric="memory_percent",resourceGroup="sig-rg",resourceID="/subscriptions/166157a8-9ce9-400b-91c7-1d42482b83d6/resourcegroups/sig-rg/providers/microsoft.dbforpostgresql/flexibleservers/leipgsql01",resourceName="leipgsql01",subscriptionID="166157a8-9ce9-400b-91c7-1d42482b83d6",subscriptionName="leizhang-non-prod",tag_owner="",timespan="P7D",unit="Percent"} 65.5
十.根据PGSQL Flexible Server名称不等于某个值
1. 以下是配置Azure PGSQL Flexible Server的Prometheus配置文件
2. 显示类型为:PGSQL Flexible Server
3. 筛选PGSQL的Server Name为:不等于leipgsql01
4. 这里用的是OData的运算符ne,也就是not equal,不等于某个值
- job_name: azure-metrics-pgsql scrape_interval: 1m metrics_path: /probe/metrics/list params: name: ["azure-pgsql"] subscription: - 166157a8-9ce9-400b-91c7-1d42482b83d6 filter: ["resourceType eq 'Microsoft.DBforPostgreSQL/flexibleServers' and name ne 'leipgsql01'"] #filter: ["resourceName eq 'leipgsql01'"] #resourceType: ["Microsoft.DBforPostgreSQL/flexibleServers"] #metricNamespace: ["Microsoft.DBforMySQL/flexibleServers"] metric: - cpu_percent - memory_percent interval: ["PT1M"] #timespan: ["2024-12-26T03:00:00Z/2024-12-28T08:00:00Z"] timespan: ["P7D"] aggregation: - average #- count # by blobtype (dimension support) # metricFilter: ["BlobType eq '*'"] metricTop: ["20"] static_configs: - targets: ["localhost:8080"]
十一.根据PGSQL Flexible Server名称包含某个值
1. 以下是配置Azure PGSQL Flexible Server的Prometheus配置文件
2. 显示类型为:PGSQL Flexible Server
3. 筛选PGSQL的Server Name为:包含lei
4. 这里用的是OData的运算符substringof
- job_name: "prometheus" # metrics_path defaults to '/metrics' # scheme defaults to 'http'. static_configs: - targets: ["localhost:9090"] - job_name: azure-metrics-pgsql scrape_interval: 1m metrics_path: /probe/metrics/list params: name: ["azure-pgsql"] subscription: - 166157a8-9ce9-400b-91c7-1d42482b83d6 filter: ["resourceType eq 'Microsoft.DBforPostgreSQL/flexibleServers' and substringof('lei',name)"] #filter: ["resourceName eq 'leipgsql01'"] #resourceType: ["Microsoft.DBforPostgreSQL/flexibleServers"] #metricNamespace: ["Microsoft.DBforMySQL/flexibleServers"] metric: - cpu_percent - memory_percent interval: ["PT1M"] #timespan: ["2024-12-26T03:00:00Z/2024-12-28T08:00:00Z"] timespan: ["P7D"] aggregation: - average #- count # by blobtype (dimension support) # metricFilter: ["BlobType eq '*'"] metricTop: ["20"] static_configs: - targets: ["localhost:8080"]
十二.启动Prometheus
1.我们执行命令:
./prometheus --config.file=prometheus.yml
2.Prometheus的默认端口为9090
3.打开浏览器,查看http://ip:9090,如下图。点击Status, Target
4.下图展示的azure-metric-storageaccount-connection,就是我们之前配置的
5.我们打开上图的Exporter端口,显示结果为,包含我订阅下所有存储账户名称和存储账户的容量大小
# HELP my_own_metric_name Azure monitor insight metric # TYPE my_own_metric_name gauge my_own_metric_name{aggregation="average",interval="PT1H",metric="BlobCapacity",resourceGroup="cdn-rg",resourceID="/subscriptions/166157a8-9ce9-400b-91c7-1d42482b83d6/resourcegroups/cdn-rg/providers/microsoft.storage/storageaccounts/leicdnoriginalstorage",resourceName="leicdnoriginalstorage",subscriptionID="166157a8-9ce9-400b-91c7-1d42482b83d6",subscriptionName="leizhang-non-prod",tag_owner="",timespan="PT1H",unit="Bytes"} 33835 my_own_metric_name{aggregation="average",interval="PT1H",metric="BlobCapacity",resourceGroup="cloud-shell-storage-southeastasia",resourceID="/subscriptions/166157a8-9ce9-400b-91c7-1d42482b83d6/resourcegroups/cloud-shell-storage-southeastasia/providers/microsoft.storage/storageaccounts/cs110032002647d220b",resourceName="cs110032002647d220b",subscriptionID="166157a8-9ce9-400b-91c7-1d42482b83d6",subscriptionName="leizhang-non-prod",tag_owner="",timespan="PT1H",unit="Bytes"} 0 my_own_metric_name{aggregation="average",interval="PT1H",metric="BlobCapacity",resourceGroup="fw-hybrid-test",resourceID="/subscriptions/166157a8-9ce9-400b-91c7-1d42482b83d6/resourcegroups/fw-hybrid-test/providers/microsoft.storage/storageaccounts/niostoragetest01",resourceName="niostoragetest01",subscriptionID="166157a8-9ce9-400b-91c7-1d42482b83d6",subscriptionName="leizhang-non-prod",tag_owner="",timespan="PT1H",unit="Bytes"} 0 my_own_metric_name{aggregation="average",interval="PT1H",metric="BlobCapacity",resourceGroup="lab-rg",resourceID="/subscriptions/166157a8-9ce9-400b-91c7-1d42482b83d6/resourcegroups/lab-rg/providers/microsoft.storage/storageaccounts/leiadls",resourceName="leiadls",subscriptionID="166157a8-9ce9-400b-91c7-1d42482b83d6",subscriptionName="leizhang-non-prod",tag_owner="",timespan="PT1H",unit="Bytes"} 1104 my_own_metric_name{aggregation="average",interval="PT1H",metric="BlobCapacity",resourceGroup="lab-rg",resourceID="/subscriptions/166157a8-9ce9-400b-91c7-1d42482b83d6/resourcegroups/lab-rg/providers/microsoft.storage/storageaccounts/leilabstorage01",resourceName="leilabstorage01",subscriptionID="166157a8-9ce9-400b-91c7-1d42482b83d6",subscriptionName="leizhang-non-prod",tag_owner="",timespan="PT1H",unit="Bytes"} 34272
6.我们还可以执行:
nohup ./azure-metrics-exporter.linux.amd64 --development.webui &
7.Azure metric exporter提供web 界面进行查询。以我的环境为例,打开链接:http://20.52.9.41:8080/query。我们可以在下面进行调试:
标签:Exporter,Monitor,metric,metrics,azure,91c7,Azure,name From: https://www.cnblogs.com/threestone/p/18636566