ArgoWorkflow教程(二)---快速构建流水线：Workflow & Template 概念

标签：name Workflow --- WorkflowTemplate steps template Template hello

上一篇我们部署了 ArgoWorkflow，并创建了一个简单的流水线做了个 Demo。本篇主要分析 ArgoWorkflow 中流水线相关的概念，了解概念后才能更好使用 ArgoWorkflow。

本文主要分析以下问题：

1）如何创建流水线？ Workflow 中各参数含义
2）WorkflowTemplate 流水线模版如何使用，
3）Workflow、WorkflowTemplate、template 之间的引用关系
4）ArgoWorkflow 流水线最佳实践

1.基本概念

ArgoWorkflow 中包括以下几个概念：

Workflow：流水线，真正运行的流水线实例，类似于 Tekton 中的 pipelinerun
WorkflowTemplate：流水线模板，可以基于模板创建流水线，类似于 Tekton 中的 pipeline
- ClusterWorkflowTemplate：集群级别的流水线模板，和 WorkflowTemplate 的关系类似于 K8s 中的 Role 和 ClusterRole
templates：Workflow 或者 WorkflowTemplate/ClusterWorkflowTemplate 的最小组成单位，流水线由多个 template 组成，可以理解为流水线中的某一个步骤。

WorkflowTemplate 和 ClusterWorkflowTemplate 暂时统称为 Template。

Workflow、Template(大写)、template(小写)之间的关系如下：

三者间关系比较复杂，官方也有提到这块因为一些历史遗留问题导致命名上比较混乱

个人感觉下面这种方式比较好理解：

template(小写)：为 Template(大写)的基本组成单位，可以理解为流水线中的步骤
Template(大写)：一条完整的流水线，一般由多个 template(小写) 组成
Workflow：真正运行的流水线实例，一般由 Template 直接创建，类似于流水线运行记录，每一条记录就是一个 Workflow

理清基本概念之后，接下来就看下看具体对象的分析。

2.Workflow

Workflow 是Argo中最重要的资源，具有两个重要功能：

1）工作流定义
2）工作流状态存储

先看下 Workflow 是怎么样的，以下是一个简单的 Workflow 例子：

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: steps-
spec:
  entrypoint: hello           # We reference our first "template" here
  templates:
  - name: hello               # The first "template" in this Workflow, it is referenced by "entrypoint"
    steps:                    # The type of this "template" is "steps"
    - - name: hello
        template: whalesay    # We reference our second "template" here
        arguments:
          parameters: [{name: message, value: "hello"}]

  - name: whalesay             # The second "template" in this Workflow, it is referenced by "hello"
    inputs:
      parameters:
      - name: message
    container:                # The type of this "template" is "container"
      image: docker/whalesay
      command: [cowsay]
      args: ["{{inputs.parameters.message}}"]

整个 Workflow 对象核心内容分为以下三部分：

templates：模板列表，此处定义了流水线中的所有步骤以及步骤之间的先后顺序。
entrypoint: 流水线入口，类似于代码中的 main 方法，此处一般引用某一个 template invocators 即可。
parameters：流水线中用到的参数，包括 arguments 块中的全局参数和 inputs 块中的局部参数两种

entrypoint

Workflow 中必须要指定 entrypoint，entrypoint 作为任务的执行起点，类似于程序中的 main 方法。

templates

ArgoWorkflow 当前支持 6 种 template，接下来挨个分析一下。

container

和 Kubernetes container spec 是一致的，这个类型的 template 就是启动一个 container，用户可以指定image、command、args 等信息来控制具体执行的动作。

  - name: whalesay
    container:
      image: docker/whalesay
      command: [cowsay]
      args: ["hello world"]

script

script 实际上是 container 的封装，spec 和 container 一致，同时增加了 source 字段，用于定义一个脚本，脚本的运行结果会记录到{{tasks.<NAME>.outputs.result}} or {{steps.<NAME>.outputs.result}}。

script 可以理解为简化了使用 container 来执行脚本的配置

  - name: gen-random-int
    script:
      image: python:alpine3.6
      command: [python]
      source: |
        import random
        i = random.randint(1, 100)
        print(i)

resource

Resource 类型的 template 用于操作集群中的资源，action 参数表示具体的动作，支持 get, create, apply, delete, replace, patch。

  - name: k8s-owner-reference
    resource:
      action: create
      manifest: |
        apiVersion: v1
        kind: ConfigMap
        metadata:
          generateName: owned-eg-
        data:
          some: value

suspend

Suspend 类型的 template 比较简单，就是用于暂停流水线执行。

默认会一直阻塞直到用户通过argo resume命令手动恢复，或者通过duration 参数指定暂停时间，到时间后会自动恢复。

  - name: delay
    suspend:
      duration: "20s"

steps

Steps 用于处理模版之间的关系，具体包含两方面：

1）哪些任务需要运行
2）这些任务按照什么先后顺序运行

看下面这个例子：

  - name: hello-hello-hello
    steps:
    - - name: step1
        template: prepare-data
    - - name: step2a
        template: run-data-first-half
      - name: step2b
        template: run-data-second-half

哪些任务需要运行？

该 steps 则定义了要运行 step1、step2a、step2b 3 个 template。

这些任务按照什么先后顺序运行？

steps 中元素定义的先后顺序就是各个任务的执行先后顺序，在这里就是 step1 先运行，然后 step2a、step2b 并行运行。

注意：仔细看 yaml 中 step2a 和 step2b 是同一个元素中的，steps 是一个二维数组，定义如下：

type Template struct {
    Steps []ParallelSteps `json:"steps,omitempty" protobuf:"bytes,11,opt,name=steps"`
}
type ParallelSteps struct {
    Steps []WorkflowStep `json:"-" protobuf:"bytes,1,rep,name=steps"`
}

转换为 json 形式就像这样：

{
    "steps": [
        ["step1"],
        ["step2a", "step2b"]
    ]
}

这样应该比较清晰了，先后顺序一目了然

dag

Dag template 的作用和 steps 是一样的。

这里的 DAG 就是 Directed Acyclic Graph 这个 DAG。

DAG 和 Steps 区别在于任务先后顺序的定义上：

Steps 以定义先后顺序作为 template 执行先后顺序
DAG 则可以定义任务之间的依赖，由 argo 根据依赖自行生成最终的运行的先后顺序

看下面这个例子：

  - name: diamond
    dag:
      tasks:
      - name: A
        template: echo
      - name: B
        dependencies: [A]
        template: echo
      - name: C
        dependencies: [A]
        template: echo
      - name: D
        dependencies: [B, C]
        template: echo

DAG 中新增了 dependencies 字段，可以指定当前步骤依赖的的依赖。

哪些任务需要运行？

该 steps 则定义了要运行 A、B、C、D 4 个任务。

这些任务按照什么先后顺序运行？

不如 Steps 那么直接，需要根据 dependencies 分析依赖关系。

A 没有依赖，因此最先执行，B、C 都只依赖于 A，因此会再 A 后同时执行，D 则依赖于 B、C，因此会等B、C都完成后才执行。

转换为 json 形式如下：

{
    "steps": [
        ["A"],
        ["B", "C"],
        ["D"]
    ]
}

ps：相比之下 steps 方式更为直接，任务先后顺序一目了然。如果整个 Workflow 中所有任务先后顺序理清楚了就推荐使用 steps，如果很复杂，只知道每个任务之间的依赖关系那就直接用 DAG，让 ArgoWorkflow 计算。

template definitions & template invocators

大家可以发现，steps、dag 模板和另外 4 个不一样，他们都是可以指定多个 template 的。

前面分别介绍了 ArgoWorkflow 中的 6 种 template，实际上可以按照具体作用将这 6 个 template 分为 template definitions（模板定义）以及 template invocators（模板调用器）两种。

template definitions（模板定义）：该类型 template 用于定义具体步骤要执行的内容，例子中的 whalesay 模板就是该类型
- 包含 container, script, resource, suspend 等类型
template invocators（模板调用器）：该类型 template 用于组合其他 template definitions(模版定义) ，定义步骤间的执行顺序等，例子中的 hello 模板就是该类型。
一般 entrypoint 指向的就是该类型的模板
包含dag 和 steps 两种类型，例子中的 hello 模板就是 steps 类型。

吐槽一下：template 这里有点绕，如果能将 模板定义、模板调用器 拆分为两个不同的对象就比较清晰。

了解完 template 分类之后再回头看之前的 Workflow 例子就比较清晰了：

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: steps-
spec:
  entrypoint: hello           # We reference our first "template" here
  templates:
  - name: hello               # The first "template" in this Workflow, it is referenced by "entrypoint"
    steps:                    # The type of this "template" is "steps"
    - - name: hello
        template: whalesay    # We reference our second "template" here
        arguments:
          parameters: [{name: message, value: "hello"}]

  - name: whalesay             # The second "template" in this Workflow, it is referenced by "hello"
    inputs:
      parameters:
      - name: message
    container:                # The type of this "template" is "container"
      image: docker/whalesay
      command: [cowsay]
      args: ["{{inputs.parameters.message}}"]

1）首先 whalesay 模板是一个 container 类型的 template，因此是 template definitions（模板定义）
2）其次 hello 是一个 steps 类型的 template，因此是 template invocators（模板调用器）
- 在该调用器中 steps 字段中定义了一个名为 hello 的步骤，该步骤引用的就是 whalesay template
3）entrypoint 指定的是 hello 这个 template invocators（模板调用器）

接下来就是 Workflow 中另一重要对象 entrypoit。

entrypoint

entrypoint 作为任务的执行起点，类似于程序中的 main 方法，每个 Workflow 中都必须要指定 entrypoint。

注意：只有被 entrypoint 指定的任务才会运行，因此，entrypoint 一般只会指定 Steps 和 DAG 类型的 template,也就是template invocators（模板调用器）。然后由 Steps 中的 steps 或者 DAG中的 tasks 来指定多个任务。

因此，并不是 Workflow 中写了的 templates 都会执行。

看下面这个例子：

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: steps-
spec:
  entrypoint: hello           # We reference our first "template" here
  templates:
  - name: hello               # The first "template" in this Workflow, it is referenced by "entrypoint"
    steps:                    # The type of this "template" is "steps"
    - - name: hello
        template: whalesay    # We reference our second "template" here
        arguments:
          parameters: [{name: message, value: "hello"}]
  - name: whalesay             # The second "template" in this Workflow, it is referenced by "hello"
    inputs:
      parameters:
      - name: message
    container:                # The type of this "template" is "container"
      image: docker/whalesay
      command: [cowsay]
      args: ["{{inputs.parameters.message}}"]

Entrypoint 指定 hello，然后 hello 是一个 steps 类型的 template，也就是template invocators。然后在 hello template 的 steps 中指定了 whalesay 这个 template，最终 whalesay template 为 container 类型，也就是 template definitions。这里就是最终要运行的任务。

当然，entrypoint 也可以指定 template definitions（模板定义）类型的 template，不过这样只能运行一个任务，就像这样：

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: steps-
spec:
  entrypoint: whalesay
  templates:
  - name: whalesay    
    container:
      image: docker/whalesay
      command: [cowsay]
      args: ["hello"]

至此，我们应该基本搞清楚了 Workflow 对象（参数部分除外）。接下来就看一下最后一部分，parameters。

Demo

列出几个复杂一点点的 Workflow，看一下是不是真的搞懂 Workflow 了。

下面是一个包含 4个任务的 Workflow：

1）首先打印 hello
2）然后执行一段 python 脚本，生成随机数
3）sleep 20s
4）创建一个 Configmap

提供了 steps 和 dag 两种写法，可以对比下

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: steps-
spec:
  entrypoint: hello
  templates:
  - name: hello              
    steps:          
    - - name: hello
        template: whalesay
        arguments:
          parameters: [{name: message, value: "hello"}]
    - - name: runscript
        template: gen-random-int
    - - name: sleep
        template: delay
    - - name: create-cm
        template: k8s-owner-reference
  # - name: diamond
  #   dag:
  #     tasks:
  #     - name: hello
  #       template: whalesay
  #       arguments:
  #         parameters: [{name: message, value: "hello"}]
  #     - name: runscript
  #       dependencies: [hello]
  #       template: gen-random-int
  #     - name: sleep
  #       template: delay
  #       dependencies: [runscript]
  #     - name: create-cm
  #       template: k8s-owner-reference
  #       dependencies: [sleep]
  - name: whalesay
    inputs:
      parameters:
      - name: message
    container:
      image: docker/whalesay
      command: [cowsay]
      args: ["{{inputs.parameters.message}}"]
  - name: gen-random-int
    script:
      image: python:alpine3.6
      command: [python]
      source: |
        import random
        i = random.randint(1, 100)
        print(i)
  - name: k8s-owner-reference
    resource:
      action: create
      manifest: |
        apiVersion: v1
        kind: ConfigMap
        metadata:
          generateName: owned-eg-
        data:
          host: lixueduan.com
          wx: 探索云原生
  - name: delay
    suspend:
      duration: "20s"

parameters

Workflow 中的参数可以分为以下两种：

形参：在 template(template definitions) 中通过 inputs 字段定义需要哪些参数，可以指定默认值
实参：在 template(template invocators) 中通过 arguments 字段为参数赋值，覆盖 inputs 中的默认值

以上仅为个人理解

inputs 形式参数

template 中可以使用 spec.templates[*].inputs 字段来指定形式参数，在 template 中可以通过{{inputs.parameters.$name}} 语法来引用参数。

下面这个例子则是声明了 template 需要一个名为 message 的参数,这样调用方在使用该 template 时就知道需要传哪些参数过来。

  templates:
    - name: whalesay-template
      inputs:
        parameters:
          - name: message
      container:
        image: docker/whalesay
        command: [cowsay]
        args: ["{{inputs.parameters.message}}"]

当然也可以指定默认值

  templates:
    - name: whalesay-template
      inputs:
        parameters:
          - name: message
            value: "default message"
      container:
        image: docker/whalesay
        command: [cowsay]
        args: ["{{inputs.parameters.message}}"]

注意：如果未指定默认值，则调用该 template 时必须指定该参数，有默认值则可以不指定。

arguments 实际参数

spec.arguments用于定义要传递的实际参数，这部分参数在当前 Workflow 下的所有 Template 中都可以使用，可以使用 {{workflow.parameters.$name}} 语法来引用。

例如下面这个例子中指定了一个名为 message 的参数，并赋值为 hello world。

  arguments:
    parameters:
      - name: message
        value: hello world

参数复用

除了在 steps/dag 中指定 arguments，甚至可以直接在 Workflow 中指定，然后 steps/dag 中通过{{workflow.parameters.$name}} 语法进行引用。这样即可实现参数复用，Workflow 中定义一次，steps/dag 中可以多次引用。

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: example-
spec:
  entrypoint: main
  arguments:
    parameters:
    - name: workflow-param-1
  templates:
  - name: main
    dag:
      tasks:
      - name: step-A 
        template: step-template-A
        arguments:
          parameters:
          - name: template-param-1
            value: "{{workflow.parameters.workflow-param-1}}"

  - name: step-template-A
    inputs:
      parameters:
        - name: template-param-1
    script:
      image: alpine
      command: [/bin/sh]
      source: |
          echo "{{inputs.parameters.template-param-1}}"

Demo

通过下面这个 Demo 来理解参数传递：

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: steps-
spec:
  entrypoint: hello           # We reference our first "template" here
  templates:
  - name: hello               # The first "template" in this Workflow, it is referenced by "entrypoint"
    steps:                    # The type of this "template" is "steps"
    - - name: hello
        template: whalesay    # We reference our second "template" here
        arguments:
          parameters: [{name: message, value: "hello"}]

  - name: whalesay             # The second "template" in this Workflow, it is referenced by "hello"
    inputs:
      parameters:
      - name: message
    container:                # The type of this "template" is "container"
      image: docker/whalesay
      command: [cowsay]
      args: ["{{inputs.parameters.message}}"]

上述例子中，template whalesay 定义了需要一个名为 message 的参数，同时在 steps template 中引用 whalesay 时通过 arguments 指定了参数 message 的值为 hello。因此最终会打印出 hello。

3.WorkflowTemplate

官方原文：

A WorkflowTemplate is a definition of a Workflow that lives in your cluster.

WorkflowTemplate 就是 Workflow 的定义，WorkflowTemplate 描述了这个流水线的详细信息，包括有哪些任务，任务之间的先后顺序等等。

根据前面对 Workflow 的描述可知，我们能直接创建 Workflow 对象来运行流水线，不过这种方式存在的一些问题：

1）如果 template 比较多的话，Workflow 对象就会特别大，修改起来比较麻烦
2）模板无法共享，不同 Workflow 都需要写一样的 template，或者同一个 template 会出现在不同的 Workflow yaml 中。

因此，关于 Workflow 和 WorkflowTemplate 的最佳实践：将 template 存到 WorkflowTemplate，Workflow 中只引用 Template 并提供参数即可。

而 ArgoWorkflow 中的工作流模板根据范围不同分为 WorkflowTemplate 和 ClusterWorkflowTemplate 两种。

WorkflowTemplate：命名空间范围，只能在同一命名空间引用
ClusterWorkflowTemplate：集群范围，任意命名空间都可以引用

WorkflowTemplate

下面是一个简单的 WorkflowTemplate：

apiVersion: argoproj.io/v1alpha1
kind: WorkflowTemplate
metadata:
  name: workflow-template-submittable
  namespace: default
spec:
  entrypoint: whalesay-template
  arguments:
    parameters:
      - name: message
        value: tpl-argument-default
  templates:
    - name: whalesay-template
      inputs:
        parameters:
          - name: message
            value: tpl-input-default
      container:
        image: docker/whalesay
        command: [cowsay]
        args: ["{{inputs.parameters.message}}"]

可以看到 WorkflowTemplate 和 Workflow 参数是一模一样，这里就不在赘述了。

只需要将 kind 由 Workflow 替换为 WorkflowTemplate 即可实现转换。

workflowMetadata

workflowMetadata 是 Template 中独有的一个字段，主要用于存储元数据，后续由这个 Template 创建出的 Workflow 都会自动携带上这些信息。

通过这些信息可以追踪到 Workflow 是由哪个 Template 创建的。

使用方式就像下面这样，workflowMetadata 中指定了一个 label

apiVersion: argoproj.io/v1alpha1
kind: WorkflowTemplate
metadata:
  name: workflow-template-submittable
spec:
  workflowMetadata:
    labels:
      example-label: example-value

然后由该 Template 创建的 Workflow 对象都会携带这个 label：

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  annotations:
    workflows.argoproj.io/pod-name-format: v2
  creationTimestamp: "2023-10-27T06:26:13Z"
  generateName: workflow-template-hello-world-
  generation: 2
  labels:
    example-label: example-value
  name: workflow-template-hello-world-5w7ss
  namespace: default

ClusterWorkflowTemplate

类似于 WorkflowTemplate，可以理解为 k8s 中的 Role 和 ClusterRole 的关系,作用域不同罢了。

和 WorkflowTemplate 所有参数都一致，只是将 yaml 中的 kind 替换为 ClusterWorkflowTemplate 即可。

4.TemplateRef

创建好 WorkflowTemplate 之后就可以在 Workflow 中使用 TemplateRef 直接引用对应模板了，这样 Workflow 对象就会比较干净。

对于 WorkflowTemplate 的引用也有两种方式：

1）workflowTemplateRef：引用完整的 WorkflowTemplate，Workflow 中只需要指定全局参数即可
2）templateRef：只引用某一个 template，Workflow 中还可以指定其他的 template、entrypoint 等信息。

workflowTemplateRef

可以通过workflowTemplateRef字段直接引用 WorkflowTemplate。

注意
标签：name,Workflow,---,WorkflowTemplate,steps,template,Template,hello
From： https://www.cnblogs.com/KubeExplorer/p/18369259

ArgoWorkflow教程(二)---快速构建流水线：Workflow & Template 概念

1.基本概念

2.Workflow

entrypoint

templates

container

script

resource

suspend

steps

dag

template definitions & template invocators

entrypoint

Demo

parameters

inputs 形式参数

arguments 实际参数

参数复用

Demo

3.WorkflowTemplate

WorkflowTemplate

workflowMetadata

ClusterWorkflowTemplate

4.TemplateRef

workflowTemplateRef

相关文章

赞助商

阅读排行