首页 > 其他分享 >Run a tfx pipeline using kubeflow pipeline

Run a tfx pipeline using kubeflow pipeline

时间:2024-02-02 15:22:40浏览次数:28  
标签:pipeline Run image namespace tfx io using docker

1. what is kubeflow pipeline for tfx pipeline ?

kubeflow pipeline is an ochetrator of tfx pipeline, which runs on a kubernetes cluster.

LocalDagRuner is an orchetrator of tfx pipeline, which runs local.


# run a tfx pipeline usging LocalGagRunner

tfx.orchestration.LocalDagRunner().run(

_create_schema_pipeline(
pipeline_name=SCHEMA_PIPELINE_NAME,
pipeline_root=SCHEMA_PIPELINE_ROOT,
data_root=DATA_ROOT,
schema_path=SCHEMA_PATH,
metadata_path=SCHEMA_METADATA_PATH,
module_file=_trainer_module_file,
serving_model_dir=SERVING_MODEL_DIR,
)
)


# run a tfx pipeline using KubeflowDagRunner
tfx.orchestration.experimental.KubeflowDagRunner().run(
_create_schema_pipeline(
pipeline_name=SCHEMA_PIPELINE_NAME,
pipeline_root=SCHEMA_PIPELINE_ROOT,
data_root=DATA_ROOT,
schema_path=SCHEMA_PATH,
metadata_path=SCHEMA_METADATA_PATH,
module_file=_trainer_module_file,
serving_model_dir=SERVING_MODEL_DIR,
)
)

2. steps of running a tfx pipeline using kubeflow pipeline

2.1 generate file pipeline.yaml (namely definition file of kubeflow pipeline):

tfx.orchestration.experimental.KubeflowDagRunner().run(
_create_schema_pipeline(
pipeline_name=SCHEMA_PIPELINE_NAME,
pipeline_root=SCHEMA_PIPELINE_ROOT,
data_root=DATA_ROOT,
schema_path=SCHEMA_PATH,
metadata_path=SCHEMA_METADATA_PATH,
module_file=_trainer_module_file,
serving_model_dir=SERVING_MODEL_DIR,
)
)

2.2 change image registry in file pipline.yaml, due to that gcr.io is not accessible in china.

# in file pipeline.yaml

# raw image generated by tfx.orchestration.experimental.KubeflowDagRunner().run(),
# it equals to hub.docker.com/tensorflow/tfx:1.14.0, which is not accessible in china.
#image: tensorflow/tfx:1.14.0

# replacement image
# docker.nju.edu.cn has not tfx:1.14.0 temporally, si3nce it's the latest version, 
# and docker.nju.edu.cn has not pulled it yet,
# so use tfx:1.13.0.
image: docker.nju.edu.cn/tensorflow/tfx:1.13.0
imagePullPolicy: Never

Attention

  1. The size of image tensorflow/tfx:1.13.0 is about 30G, and its blobs (namely gzip) is about 9G, it's better to pull (namely download) it before hand. imagePullPolicy: Never means never to pull image when running the container, other options are Always, 'IfNotPresent` .
  2. imagePullPolicy: Never needs the image exits on each node which is possible to assign the pod to, or will raise error:
    'Warning ErrImageNeverPull 2m36s (x10 over 4m16s) kubelet Container image "docker.nju.edu.cn/tensorflow/tfx:1.13.0" is not present with pull policy of Never',
    when scheduling the pod (namely assigning the pod to one node in the kubernetes cluster).
    Or, setting nodeAffinity to the node who has the image for the pod:
  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: kubernetes.io/hostname
          operator: In
          values:
          - maye-inspiron-5547
  1. containerd has namespaces for images, the default namespace is "default", the namespace of images pulled by containers in a kubernetes cluster is "k8s.io" .

`crictl' is container runtime interface cli of kubernetes,
crictl镜像的namespace就一个,k8s.io。因此也是默认拉取镜像的namespace。
如果通过ctr拉取镜像时如果不指定放在k8s.io空间下,crictl是无法读取到本地的该镜像的。
ctr是containerd自带的命令行工具。一共有三个命名空间default,k8s.io 和moby。默认default。
nerdctl is docker-compatible cli of containerd.

ctr image ls : list images in namespace "default" .

拉取镜像到k8s.io命名空间:

nerdctl pull nginx:latest --namespace k8s.io

查看k8s.io下的镜像:

sudo nerdctl images --namespace k8s.io

Attention

nerdctl image list --namespace k8s.io  

No image shown

This is due to using rootless k8s nerdctl, acessing image namespace k8s.io needs root access right.

copy an image from one namespace to another namespace:

ctr -n default image export my-image.tar my-image 
ctr -n k8s.io image import my-image.tar

# or,
nerdctl save my-image.tar my-image --namespace default
nerdctl load my-image.tar --namespace k8s.io

ttentionf
needed image not in namespace "k8s.io", kubernetes can not see it,
因此,Kubernetes在创建pod(ErrImageNeverPull,imagePullPolicy设置为Never)时无法找到映像。

2.3

Error & Solution
[ERROR: Failed to pull image]

(base) maye@maye-Inspiron-5547:~$ kubectl describe pod detect-anomolies-on-wafer-tfdv-schema-ldvtw-1952722848 -n kubeflow
Events:
Type Reason Age From Message


Warning Failed 52m (x4 over 92m) kubelet Error: ImagePullBackOff
Warning Failed 13m (x9 over 92m) kubelet Error: ErrImagePull
Warning Failed 4m42s (x9 over 92m) kubelet Failed to pull image "tensorflow/tfx:1.14.0": rpc error: code = Unknown desc = failed to pull and unpack image "docker.io/tensorflow/tfx:1.14.0": failed to copy: httpReadSeeker: failed open: unexpected status code https://pft7f97f.mirror.aliyuncs.com/v2/tensorflow/tfx/blobs/sha256:f2cce533751060f702397991bc7f0acf6d691c898fe1c7cc25b3ece25a409879?ns=docker.io: 500 Internal Server Error - Server message: unknown: unknown error
Normal BackOff 4m17s (x13 over 92m) kubelet Back-off pulling image "tensorflow/tfx:1.14.0"
(base) maye@maye-Inspiron-5547:~$

[Solution]
This is due to that docker.io is not accessible in china, replace it with its mirror website, such as: docker.nju.edu.cn , in file pipeline.yaml, namely replace "tensorflow/tfx:1.14.0" to "docker.nju.edu.cn/tensorflow/tfx:1.14.0" .

Note:

  1. mirror websites of hub.docker.com

    汇总国内可用镜像
    DaoCloud 镜像站
    加速地址:https://docker.m.daocloud.io

支持:Docker Hub、GCR、K8S、GHCR、Quay、NVCR 等

对外免费:是

网易云
加速地址:https://hub-mirror.c.163.com

支持:Docker Hub

对外免费:是

Docker 镜像代理
加速地址:https://dockerproxy.com

支持:Docker Hub、GCR、K8S、GHCR

对外免费:是

百度云
加速地址:https://mirror.baidubce.com

支持:Docker Hub

对外免费:是

南京大学镜像站
加速地址:https://docker.nju.edu.cn

支持:Docker Hub、GCR、GHCR、Quay、NVCR 等

对外免费:是

上海交大镜像站
加速地址:https://docker.mirrors.sjtug.sjtu.edu.cn/

支持:Docker Hub、GCR 等

限制:无

阿里云
加速地址:https://<your_code>.mirror.aliyuncs.com

支持:Docker Hub

限制:需要登录账号获取CODE

[1]

  1. check trace stack of failed linux process which runs in backgroud
strace -e trace=none -p <PID>

Refernece:


  1. https://zhuanlan.zhihu.com/p/642560164 ↩︎

标签:pipeline,Run,image,namespace,tfx,io,using,docker
From: https://www.cnblogs.com/zhenxia-jiuyou/p/18003167

相关文章

  • ml-pipeline-ui of kubeflow pipeline
    1.Creatingapipelineonml-pipeline-uiwebpageissavingthepipelinetodatabasemlpipeline,deleteapipelineonml-ppeline-uiwebpageisdeletingtherecordofthepipelinefromdatabasemlpipeline.2.Createapipeline-runonml-pipeline-uiwebpage......
  • Runtime Reflection
    参考:1. AFlexibleReflectionSysteminC++:Part1(preshing.com)2. C++Reflection|AustinBrunkhorst 2做的更好。反射的代码是自动生成的。目的为了学习这个理念,先是从0感受一个最简单的实现:假设:structNode{std::stringkey;intvalue;}可以......
  • ILRuntime是如何实现热更新的
    一、ILRuntime的基本原理ILRuntime的基本原理是将C#代码编译成IL代码,然后在运行时通过IL解释器将其转换成机器码执行。这种方式与传统的AOT(AheadofTime)编译方式不同,传统的AOT编译方式是在编译时将C#代码编译成机器码,然后在运行时直接执行机器码。由于ILRuntime是在运行时解释......
  • ILRuntime编码中如何注意性能问题
    一、避免频繁的反射操作在使用ILRuntime时,我们需要频繁地进行反射操作,例如获取类型、获取方法、获取属性等等。反射操作是非常耗费性能的,所以我们需要尽可能地避免频繁的反射操作。例如,我们需要获取一个类型的所有属性,我们可以使用以下代码:PropertyInfo[]properties=typeof......
  • RunnerGo低代码测试体验
    RunnerGo是基于go语言自研的一款企业级全栈式测试平台,采用Apache-2.0license开源协议,涵盖接口测试、性能测试、UI测试和项目管理等功能,并独创“拖拉拽”的方式快速编排真实测试场景的功能,加速产品交付周期、保证产品交付质量,为企业测试团队和产研团队提供一站式解决方案,是目前市......
  • 安装MySQL出现由于找不到vcruntime140_1.dll,无法继续执行代码的提示
    问题描述:在安装MySQL服务的时候,执行安装命令提示如下的错误信息。解决方法:通过分析可以知道,是由于缺少了vcruntime140_1.dll动态链接库文件,这是windows缺少vc_redist.x64.exe程序导致的服务安装错误,与我们要安装的MySQL服务并没有关系。(如果您的安装过VS类型的工具,就不会提示该......
  • CSharp: create pdf file using iText 8.0 in donet 8.0
     /*IDE:VS202217.5OS:windows10.net8.0iText8.0System.Text.Encoding.CodePages*/namespaceConsoleAppPdfdemo{usingSystem;usingSystem.Collections.Generic;usingSystem.Linq;usingSystem.IO;usingSystem.Text;......
  • Pass Artifact between tfx compoents when running with kubeflow pipeline
    WhatisArtifact?AnArtifactisafileordirectoryproducedbyatfxcomponent,whichcanbepassedtoadownstreamcomponent,andthenthedownstreamcomponentcanuseit.HowdoestfxpassanArtifactbetweencomponents?tfxpipelinehasanargument......
  • 在RunnerGo测试平台中做WebSocket、Dubbo、TCP/IP接口测试
    大家好,RunnerGo作为一款一站式测试平台不断为用户提供更好的使用体验,最近得知RunnerGo新增对,WebSocket、Dubbo、TCP/IP,三种协议API的测试支持,本篇文章跟大家分享一下使用方法。WebSocket协议WebSocket是一种在单个TCP连接上进行全双工通信的API技术。相比于传统的HTTP请求,We......
  • 在RunnerGo测试平台中做WebSocket、Dubbo、TCP/IP接口测试
    大家好,RunnerGo作为一款一站式测试平台不断为用户提供更好的使用体验,最近得知RunnerGo新增对,WebSocket、Dubbo、TCP/IP,三种协议API的测试支持,本篇文章跟大家分享一下使用方法。WebSocket协议WebSocket是一种在单个TCP连接上进行全双工通信的API技术。相比于传统的HTTP请......