AnalysisTemplate CRD
apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
name: success-rate
spec:
args: # 模板参数,模板内部引用的格式为“{{args.NAME}}”;可在调用该模板时对其赋值;
- name: <string>
value: <string>
valueFrom:
secretKeyRef:
name: <string>
key: <string>
metrics: # 必选字段,定义用于对交付效果进行分析的指标
- name: <string> # 必选字段,指标名称;
initialDelay: 5m # 延迟特定指标分析
interval: 5m # 多次测试时的测试间隔时长
consecutiveErrorLimit: <Object>
count: <Object> # 总共测试的次数
failureCondition: result[0] >= 0.95 # 测试结果为“失败”的条件表达式
# NOTE: prometheus queries return results in the form of a vector.
# So it is common to access the index 0 of the returned array to obtain the value
successCondition: result[0] >= 0.95 # 测试结果为“成功”的条件表达式
failureLimit: 3 # 允许的最大失败运行次数
provider: # 指标供应方,支持web、wavefront、skywalking、prometheus、plugin、newRelic、kayenta、job、influxdb、graphite、datadog、cloudWatch。
prometheus:
# Prometheus服务的访问入口
address: http://prometheus.example.com:9090
# 向Prometheus服务发起的查询请求(PromQL)
query: |
sum(irate(
istio_requests_total{reporter="source",destination_service=~"{{args.service-name}}",response_code!~"5.*"}[5m]
)) /
sum(irate(
istio_requests_total{reporter="source",destination_service=~"{{args.service-name}}"}[5m]
))
dryRun: # 运行于dryRun模式的metric列表,这些metric的结果不会影响最终分析结果
- metricName: <string> # 指标名称
measurementRetention: # 测量结果历史的保留数,dryRun模式的参数也支持历史结果保留
- metricName: <string> # 指标名称
limit: <integer> # 保留数量
ClusterAnalysisTemplate CRD
Rollout 可以引用称为 ClusterAnalysisTemplate 的集群范围 AnalysisTemplate。当您想要在多个rollouts之间共享 AnalysisTemplate 时,这会很有用;在不同的命名空间中,并避免在每个命名空间中重复相同的模板。使用字段 clusterScope: true 来引用 ClusterAnalysisTemplate 而不是 AnalysisTemplate。
apiVersion: argoproj.io/v1alpha1
kind: ClusterAnalysisTemplate
metadata:
name: success-rate
spec:
args: # 模板参数,模板内部引用的格式为“{{args.NAME}}”;可在调用该模板时对其赋值;
- name: <string>
value: <string>
valueFrom:
secretKeyRef:
name: <string>
key: <string>
metrics: # 必选字段,定义用于对交付效果进行分析的指标
- name: <string> # 必选字段,指标名称;
interval: 5m # 多次测试时的测试间隔时长
# NOTE: prometheus queries return results in the form of a vector.
# So it is common to access the index 0 of the returned array to obtain the value
successCondition: result[0] >= 0.95 # 测试结果为“成功”的条件表达式
failureLimit: 3 # 允许的最大失败运行次数
provider: # 指标供应方,支持web、wavefront、skywalking、prometheus、plugin、newRelic、kayenta、job、influxdb、graphite、datadog、cloudWatch。
prometheus:
# Prometheus服务的访问入口
address: http://prometheus.example.com:9090
# 向Prometheus服务发起的查询请求(PromQL)
query: |
sum(irate(
istio_requests_total{reporter="source",destination_service=~"{{args.service-name}}",response_code!~"5.*"}[5m]
)) /
sum(irate(
istio_requests_total{reporter="source",destination_service=~"{{args.service-name}}"}[5m]
))
dryRun: # 运行于dryRun模式的metric列表,这些metric的结果不会影响最终分析结果
- metricName: <string> # 指标名称
measurementRetention: # 测量结果历史的保留数,dryRun模式的参数也支持历史结果保留
- metricName: <string> # 指标名称
limit: <integer> # 保留数量
AnalysisRun CRD
配置格式与AnalysisTemplaste大致相同,所不同的是,AnalysisRun用于调用并实例化分析模板。
apiVersion: argoproj.io/v1alpha1
kind: AnalysisRun
metadata:
name: success-rate
spec:
args: # 模板参数,模板内部引用的格式为“{{args.NAME}}”;可在调用该模板时对其赋值;
- name: <string>
value: <string>
valueFrom:
secretKeyRef:
name: <string>
key: <string>
metrics: # 必选字段,定义用于对交付效果进行分析的指标
- name: <string> # 必选字段,指标名称;
interval: 5m # 多次测试时的测试间隔时长
# NOTE: prometheus queries return results in the form of a vector.
# So it is common to access the index 0 of the returned array to obtain the value
successCondition: result[0] >= 0.95 # 测试结果为“成功”的条件表达式
failureLimit: 3 # 允许的最大失败运行次数
provider: # 指标供应方,支持web、wavefront、skywalking、prometheus、plugin、newRelic、kayenta、job、influxdb、graphite、datadog、cloudWatch。
prometheus:
# Prometheus服务的访问入口
address: http://prometheus.example.com:9090
# 向Prometheus服务发起的查询请求(PromQL)
query: |
sum(irate(
istio_requests_total{reporter="source",destination_service=~"{{args.service-name}}",response_code!~"5.*"}[5m]
)) /
sum(irate(
istio_requests_total{reporter="source",destination_service=~"{{args.service-name}}"}[5m]
))
dryRun: # 运行于dryRun模式的metric列表,这些metric的结果不会影响最终分析结果
- metricName: <string> # 指标名称
measurementRetention: # 测量结果历史的保留数,dryRun模式的参数也支持历史结果保留
- metricName: <string> # 指标名称
limit: <integer> # 保留数量
terminate: <boolean>
AnalysisTemplate 示例
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
name: guestbook
spec:
...
strategy:
canary:
analysis:
templates:
- templateName: success-rate
# 引用ClusterAnalysisTemplate配置
# clusterScope: true
startingStep: 2 # delay starting analysis run until setWeight: 40%
args:
- name: service-name
value: guestbook-svc.default.svc.cluster.local
steps:
- setWeight: 20
- pause: {duration: 10m}
- setWeight: 40
- pause: {duration: 10m}
- setWeight: 60
- pause: {duration: 10m}
- setWeight: 80
- pause: {duration: 10m}
apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
name: success-rate
spec:
args:
- name: service-name
metrics:
- name: success-rate
interval: 5m
successCondition: result[0] >= 0.95
failureLimit: 3
count:4
provider:
prometheus:
address: http://prometheus.example.com:9090
query: |
sum(irate(
istio_requests_total{reporter="source",destination_service=~"{{args.service-name}}",response_code!~"5.*"}[5m]
)) /
sum(irate(
istio_requests_total{reporter="source",destination_service=~"{{args.service-name}}"}[5m]
))
---
apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
name: error-rate
spec:
args:
- name: service-name
measurementRetention:
- metricName: total-5xx-errors
limit: 20
dryRun:
- metricName: total-5xx-errors
metrics:
- name: total-5xx-errors
interval: 5m
failureCondition: result[0] >= 10
failureLimit: 3
provider:
prometheus:
address: http://prometheus.example.com:9090
query: |
sum(irate(
istio_requests_total{reporter="source",destination_service=~"{{args.service-name}}",response_code~"5.*"}[5m]
))
- name: total-4xx-errors
interval: 5m
failureCondition: result[0] >= 10
failureLimit: 3
provider:
prometheus:
address: http://prometheus.example.com:9090
query: |
sum(irate(
istio_requests_total{reporter="source",destination_service=~"{{args.service-name}}",response_code~"4.*"}[5m]
))
参考文档
https://argoproj.github.io/argo-rollouts/features/analysis/