TiDB增强型 StatefulSet 控制器--Advanced StatefulSet

标签：StatefulSet https kubernetes -- TiDB ## io tidb operator

K8S内置的 StatefulSet 为 Pods 分配连续的序号。比如 3 个副本时，Pods 分别为 pod-0, pod-1, pod-2。扩缩容时，必须在尾部增加或删除 Pods。比如扩容到 4 个副本时，会新增 pod-3。缩容到 2 副本时，会删除 pod-2。

在使用本地存储时，Pods 与 Nodes 存储资源绑定，无法自由调度。若希望删除掉中间某个 Pod ，以便维护其所在的 Node 但并没有其他 Node 可以迁移时，或者某个 Pod 故障想直接删除，另起一个序号不一样的 Pod 时，无法通过内置 StatefulSet 实现。

增强型 StatefulSet 控制器基于内置 StatefulSet 实现，新增了自由控制 Pods 序号的功能。本文介绍如何在 TiDB Operator 中使用。

开启

载入 Advanced StatefulSet 的CRD文件：

Kubernetes 1.16 之前版本：

kubectl apply -f https://raw.githubusercontent.com/pingcap/tidb-operator/v1.3.9/manifests/advanced-statefulset-crd.v1beta1.yaml

Kubernetes 1.16 及之后版本:

kubectl apply -f https://raw.githubusercontent.com/pingcap/tidb-operator/v1.3.9/manifests/advanced-statefulset-crd.v1.yaml

具体执行如下：

[root@k8s-master tidb]# kubectl apply -f https://raw.githubusercontent.com/pingcap/tidb-operator/v1.3.9/manifests/advanced-statefulset-crd.v1beta1.yaml
Warning: apiextensions.k8s.io/v1beta1 CustomResourceDefinition is deprecated in v1.16+, unavailable in v1.22+; use apiextensions.k8s.io/v1 CustomResourceDefinition
customresourcedefinition.apiextensions.k8s.io/statefulsets.apps.pingcap.com configured

在 TiDB Operator chart 的 values.yaml 中启用 AdvancedStatefulSet 特性：

##修改1
features:
- AdvancedStatefulSet=true
advancedStatefulset:
  create: true
  
##修改2：
advancedStatefulset:
  create: true
  image: pingcap/advanced-statefulset:v0.4.0
  imagePullPolicy: IfNotPresent
  serviceAccount: advanced-statefulset-controller
  logLevel: 4
  replicas: 1
  resources:
    limits:
      cpu: 500m
      memory: 300Mi
    requests:
      cpu: 200m
      memory: 50Mi

具体实例操作如下：

生成相关的配置文件

[root@k8s-master tidb]# helm inspect values pingcap/tidb-operator --version=v1.3.9 > /home/tidb/tidb-operator/values-tidb-operator.yaml
[root@k8s-master tidb]# vim tidb-operator/values-tidb-operator.yaml
[root@k8s-master tidb-operator]# cat  values-tidb-operator.yaml
# Default values for tidb-operator

# clusterScoped is whether tidb-operator should manage kubernetes cluster wide tidb clusters
# Also see rbac.create, controllerManager.serviceAccount, scheduler.create and controllerManager.clusterPermissions.
clusterScoped: true

# Also see clusterScoped and controllerManager.serviceAccount
rbac:
  create: true

# timezone is the default system timzone
timezone: UTC

# operatorImage is TiDB Operator image
operatorImage: pingcap/tidb-operator:v1.3.9
imagePullPolicy: IfNotPresent
# imagePullSecrets: []

# tidbBackupManagerImage is tidb backup manager image
tidbBackupManagerImage: pingcap/tidb-backup-manager:v1.3.9

#
# Enable or disable tidb-operator features:
#
#   StableScheduling (default: true)
#     Enable stable scheduling of tidb servers.
#
#   AdvancedStatefulSet (default: false)
#     If enabled, tidb-operator will use AdvancedStatefulSet to manage pods
#     instead of Kubernetes StatefulSet.
#     It's ok to turn it on if this feature is not enabled. However it's not ok
#     to turn it off when the tidb-operator already uses AdvancedStatefulSet to
#     manage pods. This is in alpha phase.
#
features:
- AdvancedStatefulSet=true
advancedStatefulset:
  create: true
# - AdvancedStatefulSet=false
# - StableScheduling=true
# - AutoScaling=false

appendReleaseSuffix: false

controllerManager:
  create: true
  # With rbac.create=false, the user is responsible for creating this account
  # With rbac.create=true, this service account will be created
  # Also see rbac.create and clusterScoped
  serviceAccount: tidb-controller-manager

  # clusterPermissions are some cluster scoped permissions that will be used even if `clusterScoped: false`.
  # the default value of these fields is `true`. if you want them to be `false`, you MUST set them to `false` explicitly.
  clusterPermissions:
    nodes: true
    persistentvolumes: true
    storageclasses: true

  logLevel: 2
  replicas: 1
  resources:
    requests:
      cpu: 80m
      memory: 50Mi
#  # REF: https://kubernetes.io/docs/concepts/configuration/pod-priority-preemption/
#  priorityClassName: system-cluster-critical
#

  # REF: https://pkg.go.dev/k8s.io/client-go/tools/leaderelection#LeaderElectionConfig
  ## leaderLeaseDuration is the duration that non-leader candidates will wait to force acquire leadership
  # leaderLeaseDuration: 15s
  ## leaderRenewDeadline is the duration that the acting master will retry refreshing leadership before giving up
  # leaderRenewDeadline: 10s
  ## leaderRetryPeriod is the duration the LeaderElector clients should wait between tries of actions
  # leaderRetryPeriod: 2s

  ## number of workers that are allowed to sync concurrently. default 5
  # workers: 5

  # autoFailover is whether tidb-operator should auto failover when failure occurs
  autoFailover: true
  # pd failover period default(5m)
  pdFailoverPeriod: 5m
  # tikv failover period default(5m)
  tikvFailoverPeriod: 5m
  # tidb failover period default(5m)
  tidbFailoverPeriod: 5m
  # tiflash failover period default(5m)
  tiflashFailoverPeriod: 5m
  # dm-master failover period default(5m)
  dmMasterFailoverPeriod: 5m
  # dm-worker failover period default(5m)
  dmWorkerFailoverPeriod: 5m
  ## affinity defines pod scheduling rules,affinity default settings is empty.
  ## please read the affinity document before set your scheduling rule:
  ## ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity
  affinity: {}
  ## nodeSelector ensure pods only assigning to nodes which have each of the indicated key-value pairs as labels
  ## ref:https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector
  nodeSelector: {}
  ## Tolerations are applied to pods, and allow pods to schedule onto nodes with matching taints.
  ## refer to https://kubernetes.io/docs/concepts/configuration/taint-and-toleration
  tolerations: []
  # - key: node-role
  #   operator: Equal
  #   value: tidb-operator
  #   effect: "NoSchedule"
  ## Selector (label query) to filter on, make sure that this controller manager only manages the custom resources that match the labels
  ## refer to https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/#equality-based-requirement
  selector: []
  # - canary-release=v1
  # - k1==v1
  # - k2!=v2

  # SecurityContext is security config of this component, it will set template.spec.securityContext
  # Refer to https://kubernetes.io/docs/tasks/configure-pod-container/security-context
  securityContext: {}
  # runAsUser: 1000
  # runAsGroup: 2000
  # fsGroup: 2000
  # PodAnnotations will set template.metadata.annotations
  # Refer to https://kubernetes.io/docs/concepts/overview/working-with-objects/annotations/
  podAnnotations: {}

scheduler:
  create: true
  # With rbac.create=false, the user is responsible for creating this account
  # With rbac.create=true, this service account will be created
  # Also see rbac.create and clusterScoped
  serviceAccount: tidb-scheduler
  logLevel: 2
  replicas: 1
  schedulerName: tidb-scheduler
  resources:
    limits:
      cpu: 250m
      memory: 150Mi
    requests:
      cpu: 80m
      memory: 50Mi
  kubeSchedulerImageName: k8s.gcr.io/kube-scheduler
  # This will default to matching your kubernetes version
  # kubeSchedulerImageTag:
  ## affinity defines pod scheduling rules,affinity default settings is empty.
  ## please read the affinity document before set your scheduling rule:
  ## ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity
  affinity: {}
  ## nodeSelector ensure pods only assigning to nodes which have each of the indicated key-value pairs as labels
  ## ref:https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector
  nodeSelector: {}
  ## Tolerations are applied to pods, and allow pods to schedule onto nodes with matching taints.
  ## refer to https://kubernetes.io/docs/concepts/configuration/taint-and-toleration
  tolerations: []
  # - key: node-role
  #   operator: Equal
  #   value: tidb-operator
  #   effect: "NoSchedule"
  #
  # SecurityContext is security config of this component, it will set template.spec.securityContext
  # Refer to https://kubernetes.io/docs/tasks/configure-pod-container/security-context
  securityContext: {}
  # runAsUser: 1000
  # runAsGroup: 2000
  # fsGroup: 2000
  # PodAnnotations will set template.metadata.annotations
  # Refer to https://kubernetes.io/docs/concepts/overview/working-with-objects/annotations/
  podAnnotations: {}

  # additional annotations for the configmap, mainly to prevent spinnaker versioning the cm
  configmapAnnotations: {}

# When AdvancedStatefulSet feature is enabled, you must install
# AdvancedStatefulSet controller.
# Note that AdvancedStatefulSet CRD must be installed manually via the following
# command:
#   kubectl apply -f manifests/advanced-statefulset-crd.v1beta1.yaml # k8s version < 1.16.0
#   kubectl apply -f manifests/advanced-statefulset-crd.v1.yaml      # k8s version >= 1.16.0
advancedStatefulset:
  create: true
  image: pingcap/advanced-statefulset:v0.4.0
  imagePullPolicy: IfNotPresent
  serviceAccount: advanced-statefulset-controller
  logLevel: 4
  replicas: 1
  resources:
    limits:
      cpu: 500m
      memory: 300Mi
    requests:
      cpu: 200m
      memory: 50Mi
  ## affinity defines pod scheduling rules,affinity default settings is empty.
  ## please read the affinity document before set your scheduling rule:
  ## ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity
  affinity: {}
  ## nodeSelector ensure pods only assigning to nodes which have each of the indicated key-value pairs as labels
  ## ref:https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector
  nodeSelector: {}
  ## Tolerations are applied to pods, and allow pods to schedule onto nodes with matching taints.
  ## refer to https://kubernetes.io/docs/concepts/configuration/taint-and-toleration
  tolerations: []
  # - key: node-role
  #   operator: Equal
  #   value: tidb-operator
  #   effect: "NoSchedule"
  #
  # SecurityContext is security config of this component, it will set template.spec.securityContext
  # Refer to https://kubernetes.io/docs/tasks/configure-pod-container/security-context
  securityContext: {}
  # runAsUser: 1000
  # runAsGroup: 2000
  # fsGroup: 2000

admissionWebhook:
  create: false
  replicas: 1
  serviceAccount: tidb-admission-webhook
  logLevel: 2
  rbac:
    create: true
  ## validation webhook would check the given request for the specific resource and operation
  validation:
    ## statefulsets hook would check requests for updating tidbcluster's statefulsets
    ## If enabled it, the statefulsets of tidbcluseter would update in partition by tidbcluster's annotation
    statefulSets: false
    ## validating hook validates the correctness of the resources under pingcap.com group
    pingcapResources: false
  ## mutation webhook would mutate the given request for the specific resource and operation
  mutation:
    ## defaulting hook set default values for the the resources under pingcap.com group
    pingcapResources: true
  ## failurePolicy are applied to ValidatingWebhookConfiguration which affect tidb-admission-webhook
  ## refer to https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/#failure-policy
  failurePolicy:
    ## the validation webhook would check the request of the given resources.
    ## If the kubernetes api-server version >= 1.15.0, we recommend the failurePolicy as Fail, otherwise, as Ignore.
    validation: Ignore
    ## the mutation webhook would mutate the request of the given resources.
    ## If the kubernetes api-server version >= 1.15.0, we recommend the failurePolicy as Fail, otherwise, as Ignore.
    mutation: Ignore
  ## tidb-admission-webhook deployed as kubernetes apiservice server
  ## refer to https://github.com/openshift/generic-admission-server
  apiservice:
    ## apiservice config
    ## refer to https://kubernetes.io/docs/tasks/access-kubernetes-api/configure-aggregation-layer/#contacting-the-extension-apiserver
    insecureSkipTLSVerify: true
    ## The Secret includes the TLS ca, cert and key for the `tidb-admission-webook.<Release Namespace>.svc` Service.
    ## If insecureSkipTLSVerify is true, this would be ignored.
    ## You can create the tls secret by:
    ## kubectl create secret generic <secret-name> --namespace=<release-namespace> --from-file=tls.crt=<path-to-cert> --from-file=tls.key=<path-to-key> --from-file=ca.crt=<path-to-ca>
    tlsSecret: ""
    ## The caBundle for the webhook apiservice, you could get it by the secret you created previously:
    ## kubectl get secret <secret-name> --namespace=<release-namespace> -o=jsonpath='{.data.ca\.crt}'
    caBundle: ""
  ## certProvider indicate the key and cert for the webhook configuration to communicate with `kubernetes.default` service.
  ## If your kube-apiserver's version >= 1.13.0, you can leave cabundle empty and the kube-apiserver
  ## would trust the roots on the apiserver.
  ## refer to https://github.com/kubernetes/api/blob/master/admissionregistration/v1/types.go#L529
  ## or you can get the cabundle by:
  ## kubectl get configmap -n kube-system extension-apiserver-authentication -o=jsonpath='{.data.client-ca-file}' | base64 | tr -d '\n'
  cabundle: ""
  # SecurityContext is security config of this component, it will set template.spec.securityContext
  # Refer to https://kubernetes.io/docs/tasks/configure-pod-container/security-context
  securityContext: {}
  # runAsUser: 1000
  # runAsGroup: 2000
  # fsGroup: 2000
  ## nodeSelector ensures that pods are only scheduled to nodes that have each of the indicated key-value pairs as labels
  ## ref:https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector
  nodeSelector: {}
  ## Tolerations are applied to pods, and allow pods to schedule onto nodes with matching taints.
  ## refer to https://kubernetes.io/docs/concepts/configuration/taint-and-toleration
  tolerations: []
  # - key: node-role
  #   operator: Equal
  #   value: tidb-operator
  #   effect: "NoSchedule"
  #

更新相关的tidb operator配置

[root@k8s-master tidb]# helm upgrade tidb-operator pingcap/tidb-operator --namespace=tidb-admin --version=v1.3.9 -f ./tidb-operator/values-tidb-operator.yaml && kubectl get po -n tidb-admin -l app.kubernetes.io/name=tidb-operator
Release "tidb-operator" has been upgraded. Happy Helming!
NAME: tidb-operator
LAST DEPLOYED: Fri Dec  9 16:31:42 2022
NAMESPACE: tidb-admin
STATUS: deployed
REVISION: 7
TEST SUITE: None
NOTES:
Make sure tidb-operator components are running:

    kubectl get pods --namespace tidb-admin -l app.kubernetes.io/instance=tidb-operator
NAME                                               READY   STATUS    RESTARTS   AGE
advanced-statefulset-controller-67885c5dd9-lfmbl   1/1     Running   0          27m
tidb-controller-manager-d5fc64f85-nlv4k            1/1     Running   1          34m
tidb-scheduler-566f48d4bd-82rrd                    2/2     Running   0          34m
[root@k8s-master tidb]#     kubectl get pods --namespace tidb-admin -l app.kubernetes.io/instance=tidb-operator
NAME                                               READY   STATUS    RESTARTS   AGE
advanced-statefulset-controller-67885c5dd9-lfmbl   1/1     Running   0          28m
tidb-controller-manager-d5fc64f85-nlv4k            1/1     Running   1          34m
tidb-scheduler-566f48d4bd-82rrd                    2/2     Running   0          34m

TiDB Operator 通过开启 AdvancedStatefulSet 特性，会将当前 StatefulSet 对象转换成 AdvancedStatefulSet 对象。但是，TiDB Operator 不支持在关闭 AdvancedStatefulSet 特性后，自动从 AdvancedStatefulSet 转换为 Kubernetes 内置的 StatefulSet 对象。

TiDB增强型 StatefulSet 控制器--Advanced StatefulSet_sed

使用

查看AdvancedStatefulSet 对象

AdvancedStatefulSet 数据格式与 StatefulSet 完全一致，但以 CRD 方式实现，别名为 asts ，可通过以下方法查看命名空间下的对象。

[root@k8s-master tidb]# kubectl get asts -ntidb
NAME                                DESIRED   CURRENT   AGE
lqb-pd                              1         1         38m
lqb-tidb                            1         1         38m
lqb-tikv                            1         1         38m
tidbmonitor-monitor                 1                   38m
tidbngmonitoring-yz-ng-monitoring   1                   38m
yz-pd                               1         1         38m
yz-tidb                             2         2         38m
yz-tiflash                          2         2         38m
yz-tikv                             3         3         38m
[root@k8s-master tidb]# kubectl get sts -ntidb
NAME                                READY   AGE
tidbmonitor-monitor                 1/1     46h
tidbngmonitoring-yz-ng-monitoring   1/1     2d

对集群指定的pod进行缩容

使用AdvancedStatefulSet在对TiDBCluster进行缩容时，除了减少副本数，可同时通过配置annotations（）对指定的PD、TiKV、TiDB组件下任意一个pod的编号进行缩容

metadata: annotations: tikv.tidb.pingcap.com/delete-slots: '[1]'

如果想要缩容需同时配置修改replicas和annotations两个配置文件

下边已具体缩容tikv1和pd1为例

编辑所要缩容的集群

TiDB增强型 StatefulSet 控制器--Advanced StatefulSet_github_02

TiDB增强型 StatefulSet 控制器--Advanced StatefulSet_github_03

查看执行的结果：

TiDB增强型 StatefulSet 控制器--Advanced StatefulSet_github_04

总结

支持的annoations为：

pd.tidb.pingcap.com/delete-slots：指定 PD 组件需要删除的 Pod 序号。

tidb.tidb.pingcap.com/delete-slots：指定 TiDB 组件需要删除的 Pod 序号。

tikv.tidb.pingcap.com/delete-slots：指定 TiKV 组件需要删除的 Pod 序号。

其中 Annotation 值为 JSON 的整数数组，比如 [0], [0,1], [1,3] 等。

对集群指定位置进行扩容

如果扩容的话需要对前面进行反向操作即修改replicas和把annoations修改的数组进行删除即可( delete-slots annotations 可留空，也可完全删除)。

编辑要扩容的集群

TiDB增强型 StatefulSet 控制器--Advanced StatefulSet_Pod_05

TiDB增强型 StatefulSet 控制器--Advanced StatefulSet_sed_06

查看执行的结果

TiDB增强型 StatefulSet 控制器--Advanced StatefulSet_Pod_07

TiDB增强型 StatefulSet 控制器--Advanced StatefulSet - lqbyz 的专栏 - https://tidb.net/blog/bac3e058?shareId=1deec6ea

标签：StatefulSet,https,kubernetes,--,TiDB,##,io,tidb,operator
From： https://blog.51cto.com/liqingbiao/6045986