Kafka 设计之配额（Quatos）

标签：clients quota Quatos client user 配额 Kafka 客户端

一. 前言

二. 配额（Quotas）

三. 为什么需要配额？（Why are quotas necessary?）

四. 客户端集群（Client groups）

五. 配额配置（Quota Configuration）

六. 网络带宽配额（Network Bandwidth Quotas）

七. 请求比率配额（Request Rate Quotas）

八. 强制执行（Enforcement）

一. 前言

Kafka 配额对生产和使用请求实施限制，以控制客户机所使用的代理资源。 Kafka 配额使管理员能够对单个生产者和使用者应用程序可使用的网络吞吐量实施限制。

二. 配额（Quotas）

原文引用：Kafka cluster has the ability to enforce quotas on requests to control the broker resources used by clients. Two types of client quotas can be enforced by Kafka brokers for each group of clients sharing a quota:

Network bandwidth quotas define byte-rate thresholds (since 0.9)
Request rate quotas define CPU utilization thresholds as a percentage of network and I/O threads (since 0.11)

从0.9开始，Kafka 集群能够对生产者和消费者设置配额。为每个客户端分组设置配额阈值（基于字节比率）。

Kafka 集群有能力对请求进行配额来控制客户端使用 Broker 的资源。可以为共享配额的每个客户组执行两种类型的客户配额：

通过配额定义网络带宽的字节率阈值（从0.9版本开始）
请求率配额将 CPU 的利用率阈值定义为网络和 I/O 线程的百分比（自0.11版本起）

三. 为什么需要配额？（Why are quotas necessary?）

原文引用：It is possible for producers and consumers to produce/consume very high volumes of data or generate requests at a very high rate and thus monopolize broker resources, cause network saturation and generally DOS other clients and the brokers themselves. Having quotas protects against these issues and is all the more important in large multi-tenant clusters where a small set of badly behaved clients can degrade user experience for the well behaved ones. In fact, when running Kafka as a service this even makes it possible to enforce API limits according to an agreed upon contract.

生产者和消费者有可能生产/消费非常大量的数据量或以非常高的速率生成请求，从而垄断了 Broker 的资源，引起网络饱和，并且经常 DOS 其他客户端和 Broker 本身。拥有配额可以防止这些问题。在大型多节点集群中更加重要，在大型集群中，一小部分表现不佳的客户端可能会降低表现良好的客户端的用户体验。事实上，当 Kafka 作为服务运行时，可以根据约定好的协议执行 API限制。

四. 客户端集群（Client groups）

原文引用：The identity of Kafka clients is the user principal which represents an authenticated user in a secure cluster. In a cluster that supports unauthenticated clients, user principal is a grouping of unauthenticated users chosen by the broker using a configurable PrincipalBuilder. Client-id is a logical grouping of clients with a meaningful name chosen by the client application. The tuple (user, client-id) defines a secure logical group of clients that share both user principal and client-id.

在安全的集群中，Kafka 客户端通过已认证用户的 Principal 来标识。在一个无需认证客户端的集群中，用户 Principal 是一个未认证的分组。用户的 Principal 是通过 Broker（使用可配置的PrincipalBuilder）。Client-id 是客户端应用程序使用具有有意义名称的客户端的逻辑分组。元组（user，client-id）定义了共享用户 Principal 和 client-id 的客户端安全逻辑分组。

原文引用：Quotas can be applied to (user, client-id), user or client-id groups. For a given connection, the most specific quota matching the connection is applied. All connections of a quota group share the quota configured for the group. For example, if (user="test-user", client-id="test-client") has a produce quota of 10MB/sec, this is shared across all producer instances of user "test-user" with the client-id "test-client".

配额可以应用到（user，client-id），用户或 clinet-id 组中。对于给定的连接，应用与连接匹配的最具体的配额。所有的配额分组的连接共享配置的分组配额。例如，如果（user=“test-user”,client-id="test-client"）生产配额是 10MB/sec，那么将会应用到所有生产者是“test-user”和clinet-id 是“test-client”的实例上。

五. 配额配置（Quota Configuration）

原文引用：Quota configuration may be defined for (user, client-id), user and client-id groups. It is possible to override the default quota at any of the quota levels that needs a higher (or even lower) quota. The mechanism is similar to the per-topic log config overrides. User and (user, client-id) quota overrides are written to ZooKeeper under /config/users and client-id quota overrides are written under /config/clients. These overrides are read by all brokers and are effective immediately. This lets us change quotas without having to do a rolling restart of the entire cluster. See here for details. Default quotas for each group may also be updated dynamically using the same mechanism.

可以为（user, client-id），user 和 client-id 分组定义配额配置。可根据自身需要去覆盖默认的配额，该机制类似于 Topic 日志配置覆盖。user 和（user, client-id）配额覆盖写在 ZooKeeper的 /config/users 下，client-id 配额覆盖写在 /config/clients 下。这些配置被所有 Broker 读取，并立即生效。并且我们更改配置而无需重启整个集群。请参阅此处了解详细信息。每个分组的默认配额也可使用相同的机制来动态地更新。

原文引用：The order of precedence for quota configuration is:
/config/users/<user>/clients/<client-id>
/config/users/<user>/clients/<default>
/config/users/<user>
/config/users/<default>/clients/<client-id>
/config/users/<default>/clients/<default>
/config/users/<default>
/config/clients/<client-id>
/config/clients/<default>

配额配置的优先级顺序为：

/config/users/<user>/clients/<client-id>
/config/users/<user>/clients/<default>
/config/users/<user>
/config/users/<default>/clients/<client-id>
/config/users/<default>/clients/<default>
/config/users/<default>
/config/clients/<client-id>
/config/clients/<default>

可以通过 Broker 配置（quota.producer.default, quota.consumer.default）为 client-id 分组设置默认的网络带宽配额。但已不推荐使用，并将在后续版本移除。client-id 的默认配额可以在ZooKeeper 设置（类似其他的默认配额覆盖）。

六. 网络带宽配额（Network Bandwidth Quotas）

原文引用：Network bandwidth quotas are defined as the byte rate threshold for each group of clients sharing a quota. By default, each unique client group receives a fixed quota in bytes/sec as configured by the cluster. This quota is defined on a per-broker basis. Each group of clients can publish/fetch a maximum of X bytes/sec per broker before clients are throttled.

网络带宽配额定义为共享配额的每组客户端的字节率阈值（客户端的每个分组共享的配额）。默认情况下，每个独立的客户端分组按照集群的配置接收固定的配额（字节/秒）。这个配额是基于每个 Broker 上的定义。每个客户端分组在客户端被限制之前发布/获取每个 Broker 的最大 X 字节/秒。

七. 请求比率配额（Request Rate Quotas）

原文引用：Request rate quotas are defined as the percentage of time a client can utilize on request handler I/O threads and network threads of each broker within a quota window. A quota of n% represents n% of one thread, so the quota is out of a total capacity of ((num.io.threads + num.network.threads) * 100)%. Each group of clients may use a total percentage of upto n% across all I/O and network threads in a quota window before being throttled. Since the number of threads allocated for I/O and network threads are typically based on the number of cores available on the broker host, request rate quotas represent the total percentage of CPU that may be used by each group of clients sharing the quota.

请求比率配额定义为一个客户端在配额窗口内可以在每个 Broker 的请求处理程序 I/O 线程和网络线程上使用的时间百分比。n% 的配额代表一个线程的 n%，因此配额超出 ((num.io.threads + num.network.threads) * 100)% 的总容量。在被限制之前，每个客户端分组可以在配额窗口的所有I/O 和网络线程中使用高达 n% 的总百分比。由于分配给 I/O 和网络线程的线程数通常基于 Broker主机上可用的核心数量（CPU 核心数），所以请求率配额表示共享配额的每组客户端可能使用的CPU 的总百分比。

八. 强制执行（Enforcement）

原文引用：By default, each unique client group receives a fixed quota as configured by the cluster. This quota is defined on a per-broker basis. Each client can utilize this quota per broker before it gets throttled. We decided that defining these quotas per broker is much better than having a fixed cluster wide bandwidth per client because that would require a mechanism to share client quota usage among all the brokers. This can be harder to get right than the quota implementation itself!

默认情况下，每个独立的客户端分组都会接收集群配置的一个固定的配额。该配额基于每个Broker 定义的。每个客户端都可以在每个 Broker 被限制之前使用该配额。我们决定，为每个Broker 定义的这些配额比每个客户端提供固定的集群带宽要好得多，因为需要一个机制在所有Broker 中共享客户端配额。这可能比配额实现本身更难！

原文引用：How does a broker react when it detects a quota violation? In our solution, the broker first computes the amount of delay needed to bring the violating client under its quota and returns a response with the delay immediately. In case of a fetch request, the response will not contain any data. Then, the broker mutes the channel to the client, not to process requests from the client anymore, until the delay is over. Upon receiving a response with a non-zero delay duration, the Kafka client will also refrain from sending further requests to the broker during the delay. Therefore, requests from a throttled client are effectively blocked from both sides. Even with older client implementations that do not respect the delay response from the broker, the back pressure applied by the broker via muting its socket channel can still handle the throttling of badly behaving clients. Those clients who sent further requests to the throttled channel will receive responses only after the delay is over.

当检测到配额违规时，Broker 会做出什么样的反应？在我们的解决方案中，Broker 不返回错误，而是尝试减慢超出其配额的客户端。它计算延迟量，使违规的客户端根据其配额并延迟该时段的响应时间。这个方法保持对客户端违法配额透明（除 client 度量）。这也使他们不必实现任何特殊的回退或重试（否则可能会变得很麻烦）。事实上，坏的客户端行为（不重试回退）可能加速尝试解决的配额问题。

原文引用：Byte-rate and thread utilization are measured over multiple small windows (e.g. 30 windows of 1 second each) in order to detect and correct quota violations quickly. Typically, having large measurement windows (for e.g. 10 windows of 30 seconds each) leads to large bursts of traffic followed by long delays which is not great in terms of user experience.

字节率和线程利用率是在多个小窗口（例如，每个1秒的30个窗口）上测量的，以便快速检测和纠正配额违规。通常，具有大的测量窗口（例如，每个30秒的10个窗口）会导致大量的流量突发，随后引起长时间的延迟，这在用户体验方面不好。

标签：clients,quota,Quatos,client,user,配额,Kafka,客户端
From： https://blog.csdn.net/mrluo735/article/details/136233164

Kafka 设计之配额（Quatos）

一. 前言

二. 配额（Quotas）

三. 为什么需要配额？（Why are quotas necessary?）

四. 客户端集群（Client groups）

五. 配额配置（Quota Configuration）

六. 网络带宽配额（Network Bandwidth Quotas）

七. 请求比率配额（Request Rate Quotas）

八. 强制执行（Enforcement）

相关文章

赞助商

阅读排行