Kubernetes 与 dockershim

标签：Container Kubernetes containerd Docker CRI dockershim

dockershim是什么？

https://v1-27.docs.kubernetes.io/zh-cn/blog/2022/05/03/dockershim-historical-context/

在 Kubernetes 的早期，只支持一个Container runtimes Docker Engine，当时并没有太多其他选择，而 Docker 是使用容器的主要工具，所以这不是一个有争议的选择。之后kubernetes决定开始兼容拓展更多的Container runtimes ，比如 rkt 和 hypernetes，萝卜青菜各有所爱，在一些其它场景用户希望选择自己最适合的Container runtimes，因此Kubernetes 需要迎刃而解这一难题，实现一种方法来兼容多种Container runtimes即 CRI

容器运行时接口 (CRI) 支持这种灵活性。 CRI 的引入对项目和用户来说都很棒，但它确实引入了一个问题：Docker Engine 作为容器运行时的使用早于 CRI，并且 Docker Engine 不兼容 CRI。为了解决这个问题，在 kubelet 组件中引入了一个小型软件 shim (dockershim)，专门用于填补 Docker Engine 和 CRI 之间的空白，得以集群操作员继续使用 Docker Engine 作为集群平台的Container runtimes

Kubernetes的设计初衷 shim 本不是一个永久的解决方案。但是此shim的存在却给 kubelet 本身带来了许多不必要的复杂性。由于这个 shim，Docker 的一些集成实现不一致，导致维护人员的负担增加，为特定的vendor维护特定的代码违反了kubernetes项目的开源理念。因此引入了 KEP-2221，建议移除 dockershim。随着 Kubernetes v1.20 的发布，正式宣告弃用

不幸的是，弃用公告在社区内引起了一些恐慌，由于社区的关注，Docker 和 Mirantis 共同决定继续以 cri-dockerd 的形式支持 dockershim 代码，允许你在需要时继续使用 Docker Engine 作为容器运行时。对于想要尝试其他运行时（如 containerd 或 cri-o）的用户，已编写迁移文档。

dockershim生命结束期

Kubernetes官方文档1.20发布通知Dockershim从kubernetes kubelet组件功能移除，并明确规定在1.24版本之后完全移除dockershim，在此1.24版本之前Kubernetes为兼容Docker作为Container runntimes集成在Kubernetes之内

https://v1-27.docs.kubernetes.io/zh-cn/blog/2022/01/07/kubernetes-is-moving-on-from-dockershim/#deprecation-timeline

我们正式宣布于 2020 年 12 月弃用 dockershim。目标是在 2022 年 4 月， Kubernetes 1.24 中完全移除 dockershim。此时间线与我们的弃用策略一致，即规定已弃用的行为必须在其宣布弃用后至少运行 1 年。

包括 dockershim 的 Kubernetes 1.23 版本，在 Kubernetes 项目中将再支持一年。对于托管 Kubernetes 的供应商，供应商支持可能会持续更长时间，但这取决于公司本身。无论如何，我们相信所有集群操作都有时间进行迁移。如果你有更多关于 dockershim 移除的问题，请参考弃用 Dockershim 的常见问题。

在这个你是否为 dockershim 的删除做好了准备的调查中，我们询问你是否为 dockershim 的迁移做好了准备。我们收到了 600 多个回复。感谢所有花时间填写调查问卷的人。

结果表明，在帮助你顺利迁移方面，我们还有很多工作要做。存在其他容器运行时，并且已被广泛推广。但是，许多用户告诉我们他们仍然依赖 dockershim，并且有时需要重新处理依赖项。其中一些依赖项超出控制范围。根据收集到的反馈，我们采取了一些措施提供帮助。

dockershim架构

Container runtimes 是用来组成 Kubernetes Pod 的容器的软件。 Kubernetes 负责编排和调度 Pod；在每一个节点上，kubelet 使用抽象 Container runtimes interface，所以你可以任意选择支持的Container runtimes

在早期版本中Kubernetes 只支持Docker，之后 Kubernetes 不断加以改善实施支持多种 Container runtimes。于是有了CRI 满足这类灵活性需求 – 而 kubelet 亦开始支持 CRI。因为 Docker 在 CRI 之前就已经存在，Kubernetes 使用适配器组件 dockershim，dockershim 适配器允许 kubelet 与 Docker 交互，得以Docker engine 支持 CRI

下图展示了二种 Container runtiems 与 CRI交互逻辑，早期的Containerd 1.0 版本

Docker
Containerd

Containerd 与Docker的区别在于Containerd支持Kubernetes CRI 直接与Containerd 通过gRPC通信调度容器，它们对 Docker 是不可见的。因此，你以前用来检查这些容器的 Docker 工具或漂亮的 UI 都不再可用。
你不能再使用 docker ps 或 docker inspect 命令来获取容器信息。由于你不能列出容器，因此你不能获取日志、停止容器，甚至不能通过 docker exec 在容器中执行命令

Containerd 1.1 版本架构改善

改善后的性能

https://v1-27.docs.kubernetes.io/blog/2018/05/24/kubernetes-containerd-integration-goes-ga/

Pod启动延迟，对比Containerd作为CR，启动时间105个Pod启动只需要75s

CPU & MEM
- At the steady state, with 105 pods, the containerd 1.1 integration consumes less CPU and memory overall compared to Docker 18.03 CE integration with dockershim. The results vary with the number of pods running on the node, 105 is chosen because it is the current default for the maximum number of user pods per node.
- As shown in the figures below, compared to Docker 18.03 CE integration with dockershim, the containerd 1.1 integration has 30.89% lower kubelet cpu usage, 68.13% lower container runtime cpu usage, 11.30% lower kubelet resident set size (RSS) memory usage, 12.78% lower container runtime RSS memory usage.

dockershim移除之影响

https://v1-27.docs.kubernetes.io/zh-cn/docs/tasks/administer-cluster/migrating-from-dockershim/check-if-dockershim-removal-affects-you/#find-docker-dependencies

检查部署在平台应用是否依赖于docker

当更换成其它的Container runtimes 则意味着docker-cli 不可使用或者有可能返回异常结果

确认没有特权 Pod 执行 Docker 命令（如 docker ps）、重新启动 Docker 服务（如 systemctl restart docker.service）或修改 Docker 配置文件 /etc/docker/daemon.json。
检查 Docker 配置文件（如 /etc/docker/daemon.json）中容器镜像仓库的镜像（mirror）站点设置。这些配置通常需要针对不同容器运行时来重新设置。
检查确保在 Kubernetes 基础设施之外的节点上运行的脚本和应用程序没有执行 Docker 命令。可能的情况有：
- SSH 到节点排查故障；
- 节点启动脚本；
- 直接安装在节点上的监控和security agent。
检查上述特权操作的第三方工具。详细操作请参考从 dockershim 迁移遥测和安全代理。
确认没有对 dockershim 间接依赖。那当然只是在极端情况下不太可能影响你的应用。一些工具很可能被配置为使用了 Docker 特性，比如，基于特定指标发警报，或者在故障排查指令的一个环节中搜索特定的日志信息。如果你有此类配置的工具，需要在迁移之前，在测试集群上测试这类行为。
使用工具检测kubernetes依赖于docker.sock https://github.com/aws-containers/kubectl-detector-for-docker-socket

dockershim迁移

自从Kubernetes 1.20官方宣布移除 dockershim 功能，随之引来各种问题，后面参考官方逐一说明

Dockershim 官宣 1.24版本正式删除该部分代码，官方也提供其它的Container runtimes，如下（具体的详细说明参考官方文档 https://v1-27.docs.kubernetes.io/zh-cn/docs/setup/production-environment/container-runtimes/）

如果当前使用的是低于1.24版本，升级到1.24则面临升级迁移问题，具体取决于生产环境使用哪种 Container Runtimes ，具体的升级请参考官方文档，如下

将 Docker Engine 节点从 dockershim 迁移到 cri-dockerd

https://v1-27.docs.kubernetes.io/zh-cn/docs/tasks/administer-cluster/migrating-from-dockershim/migrate-dockershim-dockerd/

下文说明从dockershim迁移至cri-dockerd

如果你当前kubernetes平台使用的docker engine，但是仍将使用docker engine作为集群的 container runntimes
如果升级到1.24+版本（新版本弃用dockershim）需要迁移至cri-dockerd

cri-dockerd 是什么？

This adapter provides a shim for Docker Engine that lets you control Docker via the Kubernetes Container Runtime Interface.

官方文档的意思，为解决kubernetes1.24以后的版本支持docker engine 而开发的

迁移过程

安装 cri-dockerd；
隔离（Cordon）并腾空（Drain）该节点；
- kubectl cordon <NODE_NAME>
- kubectl drain <NODE_NAME> –ignore-daemonsets
配置 kubelet 使用 cri-dockerd；
- 在每个被影响的节点上，打开 /var/lib/kubelet/kubeadm-flags.env 文件；将 --container-runtime-endpoint 标志，将其设置为 unix:///var/run/cri-dockerd.sock。
- 修改集群node的 Annotations， KUBECONFIG=/path/to/admin.conf kubectl edit no <NODE_NAME>
- 将 kubeadm.alpha.kubernetes.io/cri-socket 标志从 /var/run/dockershim.sock 更改为 unix:///var/run/cri-dockerd.sock；
重新启动 kubelet；
验证节点处于健康状态。

迁移后验证

kubelet 与 continaerd runtimes 通信使用 unix-socket是基于 gRPC框架的CRI协议，kubelet是客户端，containerd runtimes是服务端

具体方法如下

查看 kubelet 进程的启动命令 tr \0 ' ' < /proc/"$(pgrep kubelet)"/cmdline
在命令的输出中，查找 --container-runtime 和 --container-runtime-endpoint 标志，Kubernetes v1.23或更早的版本，这两个参数不存在，或者 --container-runtime 标志值不是 remote，则你使用的是dockershim 套接字使用 Docker Engine。注意在 Kubernetes v1.27 及以后的版本中，--container-runtime 命令行参数不再可用

将节点上的容器运行时从 Docker Engine 改为 containerd

https://v1-27.docs.kubernetes.io/zh-cn/docs/tasks/administer-cluster/migrating-from-dockershim/change-runtime-containerd/

迁移过程

containerd安装文档
配置 containerd：sudo mkdir -p /etc/containerd containerd config default | sudo tee /etc/containerd/config.toml
重启 containerd：sudo systemctl restart containerd
配置 kubelet 使用 containerd 作为其容器运行时，编辑文件 /var/lib/kubelet/kubeadm-flags.env，将 containerd 运行时添加到标志中； --container-runtime-endpoint=unix:///run/containerd/containerd.sock
修改节点 annotations，kubectl edit no <node-name>，更改 kubeadm.alpha.kubernetes.io/cri-socket 值，将其从 /var/run/dockershim.sock 改为你所选择的 CRI 套接字路径（例如：unix:///run/containerd/containerd.sock）
重启 kubelet

验证如上

Docker Engine ？

https://v1-27.docs.kubernetes.io/blog/2018/05/24/kubernetes-containerd-integration-goes-ga/#what-about-docker-engine

官方原文

Docker Engine is built on top of containerd. The next release of Docker Community Edition (Docker CE) will use containerd version 1.1. Of course, it will have the CRI plugin built-in and enabled by default. This means users will have the option to continue using Docker Engine for other purposes typical for Docker users, while also being able to configure Kubernetes to use the underlying containerd that came with and is simultaneously being used by Docker Engine on the same node. See the architecture figure below showing the same containerd being used by Docker Engine and Kubelet:

大致意思：Docker Engine是建立在Containerd之上，在新版的 Docker CE发布已默认植入CRI plugin，什么意思意思呢？就是Docker CE既支持Kubernetes的基础容器同时也不影响使用Docker CE在同一节点上

架构图

Since containerd is being used by both Kubelet and Docker Engine, this means users who choose the containerd integration will not just get new Kubernetes features, performance, and stability improvements, they will also have the option of keeping Docker Engine around for other use cases.

A containerd namespace mechanism is employed to guarantee that Kubelet and Docker Engine won’t see or have access to containers and images created by each other. This makes sure they won’t interfere with each other. This also means that:

Users won’t see Kubernetes created containers with the docker ps command. Please use crictl ps instead. And vice versa, users won’t see Docker CLI created containers in Kubernetes or with crictl ps command. The crictl create and crictl runp commands are only for troubleshooting. Manually starting pod or container with crictl on production nodes is not recommended.
Users won’t see Kubernetes pulled images with the docker images command. Please use the crictl images command instead. And vice versa, Kubernetes won’t see images created by docker pull, docker load or docker build commands. Please use the crictl pull command instead, and ctr cri load if you have to load an image.

大致意思，就是互不影响又可并存，使用kubernetes 创建的容器docker ps 看不到，使用 docker command创建的容器crictl反之亦然也不受管理

OCI是什么

https://v1-27.docs.kubernetes.io/zh-cn/blog/2022/02/17/dockershim-faq/#people-keep-referencing-oci-what-is-that

OCI 是 Open Container Initiative 的缩写，它标准化了容器工具和底层实现之间的大量接口。它们维护了打包容器镜像（OCI image）和运行时（OCI runtime）的标准规范。它们还以 runc 的形式维护了一个 runtime-spec 的真实实现，这也是 containerd 和 CRI-O 依赖的默认运行时。 CRI 建立在这些底层规范之上，为管理容器提供端到端的标准。

CRI是什么

https://kubernetes.io/blog/2016/12/container-runtime-interface-cri-in-kubernetes/

Kubernetes 1.5版本中已经推出自己的CRI（Container Runtime Interface ），支持多种Container runtimes并且不需要重新编译。CRI是由protocol buffers and gRPC API, and libraries组合而成，

CRI Overview

Kubelet与Container runtimes（or a CRI shim for the runtime）之间的通信是通过Unix Socket使用 gRPC框架，Kubelet相当于gRPC Client与CRI shim相当于gRPC server

protocol buffers API 包含二个services

RuntimeService
ImageService

RuntimeService

主要负责Pod与Container的生命周期管理，常见的API接口如下

service RuntimeService {

    // Sandbox operations.

    rpc RunPodSandbox(RunPodSandboxRequest) returns (RunPodSandboxResponse) {}  
    rpc StopPodSandbox(StopPodSandboxRequest) returns (StopPodSandboxResponse) {}  
    rpc RemovePodSandbox(RemovePodSandboxRequest) returns (RemovePodSandboxResponse) {}  
    rpc PodSandboxStatus(PodSandboxStatusRequest) returns (PodSandboxStatusResponse) {}  
    rpc ListPodSandbox(ListPodSandboxRequest) returns (ListPodSandboxResponse) {}  

    // Container operations.  
    rpc CreateContainer(CreateContainerRequest) returns (CreateContainerResponse) {}  
    rpc StartContainer(StartContainerRequest) returns (StartContainerResponse) {}  
    rpc StopContainer(StopContainerRequest) returns (StopContainerResponse) {}  
    rpc RemoveContainer(RemoveContainerRequest) returns (RemoveContainerResponse) {}  
    rpc ListContainers(ListContainersRequest) returns (ListContainersResponse) {}  
    rpc ContainerStatus(ContainerStatusRequest) returns (ContainerStatusResponse) {}

    ...  
}

Pod是在独立环境具有资源约束的一组应用容器组成的，在CRI中这个独立环境称之为PodSandbox又称Pod pause容器

Before starting a pod, kubelet calls RuntimeService.RunPodSandbox to create the environment. This includes setting up networking for a pod (e.g., allocating an IP). Once the PodSandbox is active, individual containers can be created/started/stopped/removed independently. To delete the pod, kubelet will stop and remove containers before stopping and removing the PodSandbox.
// 启动一个Pod前，Kubelet请求RuntimeService.RunPodSandbox创建PodSandbox环境并且配置网络分配IP地址以激活PodSandbox，可以独立创建、启动、停止、删除单个容器，在删除停止或者删除PodSandbox时需要先停止且删除Container俗称应用容器

Kubelet is responsible for managing the lifecycles of the containers through the RPCs, exercising the container lifecycles hooks and liveness/readiness checks, while adhering to the restart policy of the pod.
// 大致意思是说Kubelet的功能，负责Container的生命周期管理（创建、启动、停止、删除）及Pod的存活及就绪状态检查，还有Pod的重启策略等

ImageService

ImageService通过gRPC协议执行下载镜像、删除镜像、查看镜像

// ImageService defines the public APIs for managing images.
service ImageService {
    // ListImages lists existing images.
    rpc ListImages(ListImagesRequest) returns (ListImagesResponse) {}
    // ImageStatus returns the status of the image. If the image is not
    // present, returns a response with ImageStatusResponse.Image set to
    // nil.
    rpc ImageStatus(ImageStatusRequest) returns (ImageStatusResponse) {}
    // PullImage pulls an image with authentication config.
    rpc PullImage(PullImageRequest) returns (PullImageResponse) {}
    // RemoveImage removes the image.
    // This call is idempotent, and must not return an error if the image has
    // already been removed.
    rpc RemoveImage(RemoveImageRequest) returns (RemoveImageResponse) {}
    // ImageFSInfo returns information of the filesystem that is used to store images.
    rpc ImageFsInfo(ImageFsInfoRequest) returns (ImageFsInfoResponse) {}
}

标签：Container,Kubernetes,containerd,Docker,CRI,dockershim
From： https://www.cnblogs.com/apink/p/18379036