问题描述:topic都是多分区,多副本,broker节点关闭后,由于某些情况,某个节点无法启动,其他节点都启动后,topic或部分分区仍不可用,造成无法选举leader
1. 创建topic
topic:topic-demo02 3分区,3副本
[xuhaixing@xhx151 cluster]$ kafka-topics.sh --zookeeper 192.168.94.151:2181/kafkaCluster --create --topic topic-demo02 --replication-factor 3 --partitions 3
Created topic topic-demo02.
[xuhaixing@xhx151 cluster]$ kafka-topics.sh --zookeeper 192.168.94.151:2181/kafkaCluster --describe --topic topic-demo02
Topic: topic-demo02 PartitionCount: 3 ReplicationFactor: 3 Configs:
Topic: topic-demo02 Partition: 0 Leader: 2 Replicas: 2,3,1 Isr: 2,3,1
Topic: topic-demo02 Partition: 1 Leader: 3 Replicas: 3,1,2 Isr: 3,1,2
Topic: topic-demo02 Partition: 2 Leader: 1 Replicas: 1,2,3 Isr: 1,2,3
2. 停掉三个broker
[xuhaixing@xhx151 cluster]$ kafka-topics.sh --zookeeper 192.168.94.151:2181/kafkaCluster --describe --topic topic-demo02
Topic: topic-demo02 PartitionCount: 3 ReplicationFactor: 3 Configs:
Topic: topic-demo02 Partition: 0 Leader: 1 Replicas: 2,3,1 Isr: 1
Topic: topic-demo02 Partition: 1 Leader: 1 Replicas: 3,1,2 Isr: 1
Topic: topic-demo02 Partition: 2 Leader: 1 Replicas: 1,2,3 Isr: 1
此时Leader与ISR集合并不是空,而是1。因为zookeeper中存储的信息是kafka节点维护的,随着最后一个kafka节点的宕机,没有kafka节点更新zookeeper中的信息,故保持不变。
3. 启动broker.id 为2,3的节点
[xuhaixing@xhx151 cluster]$ kafka-topics.sh --zookeeper 192.168.94.151:2181/kafkaCluster --describe --topic topic-demo02
Topic: topic-demo02 PartitionCount: 3 ReplicationFactor: 3 Configs: unclean.leader.election.enable=false
Topic: topic-demo02 Partition: 0 Leader: none Replicas: 2,3,1 Isr: 1
Topic: topic-demo02 Partition: 1 Leader: none Replicas: 3,1,2 Isr: 1
Topic: topic-demo02 Partition: 2 Leader: none Replicas: 1,2,3 Isr: 1
此时Leader为null,Isr依旧为未启动的broker节点,三个节点,恢复两个节点,此topic目前还是不可用状态。
部分分区不可用复现
1.创建如下topic
[xuhaixing@xhx151 cluster]$ kafka-topics.sh --zookeeper 192.168.94.151:2181/kafkaCluster --describe --topic topic-demo03
Topic: topic-demo03 PartitionCount: 3 ReplicationFactor: 2 Configs:
Topic: topic-demo03 Partition: 0 Leader: 3 Replicas: 3,1 Isr: 3,1
Topic: topic-demo03 Partition: 1 Leader: 1 Replicas: 1,2 Isr: 1,2
Topic: topic-demo03 Partition: 2 Leader: 2 Replicas: 2,3 Isr: 2,3
2.停掉3个broker节点
3.启动broker.id=2的节点
[xuhaixing@xhx151 bin]$ kafka-topics.sh --zookeeper 192.168.94.151:2181/kafkaCluster --describe --topic topic-demo03
Topic: topic-demo03 PartitionCount: 3 ReplicationFactor: 2 Configs:
Topic: topic-demo03 Partition: 0 Leader: none Replicas: 3,1 Isr: 3
Topic: topic-demo03 Partition: 1 Leader: 2 Replicas: 1,2 Isr: 2
Topic: topic-demo03 Partition: 2 Leader: 2 Replicas: 2,3 Isr: 2
此时分区0,Leader变成了none, Isr变成了3
4.启动broker.id=1的节点
再次查看状态
[xuhaixing@xhx151 bin]$ kafka-topics.sh --zookeeper 192.168.94.151:2181/kafkaCluster --describe --topic topic-demo03
Topic: topic-demo03 PartitionCount: 3 ReplicationFactor: 2 Configs:
Topic: topic-demo03 Partition: 0 Leader: none Replicas: 3,1 Isr: 3
Topic: topic-demo03 Partition: 1 Leader: 1 Replicas: 1,2 Isr: 2,1
Topic: topic-demo03 Partition: 2 Leader: 2 Replicas: 2,3 Isr: 2
分区0依旧不可用
原因分析
这跟broker端的一个配置项有关 unclean.leader.election.enable
,该参数决定是否从非ISR集合中选举leader。因怕非ISR节点集合数据延迟较大丢数据,所以默认为false。
因此,若kafka broker全部停机后,某topic的leader节点未启动,该topic依旧不可用
官方文档解释:
unclean.leader.election.enable
Indicates whether to enable replicas not in the ISR set to be elected as leader as a last resort, even though doing so may result in data loss.Type: boolean
Default: false
Valid Values:
Server Default Property: unclean.leader.election.enable
Importance: medium
解决方案
1.修改所有broker配置文件:unclean.leader.election.enable=true
,重启
2.动态修改broker配置,不用重启
[xuhaixing@xhx151 cluster]$ kafka-configs.sh --bootstrap-server 192.168.94.151:9094 --entity-type brokers --entity-name 2 --add-config unclean.leader.election.enable=true --alter
Completed updating config for broker 2.
可以再用--delete-config
删除配置
[xuhaixing@xhx151 cluster]$ kafka-configs.sh --bootstrap-server 192.168.94.151:9094 --entity-type brokers --entity-name 2 --delete-config unclean.leader.election.enable --alter
Completed updating config for broker 2.
3.修改topic配置参数
[xuhaixing@xhx151 cluster]$ kafka-configs.sh --bootstrap-server 192.168.94.151:9093 --entity-type topics --entity-name topic-demo02 --add-config unclean.leader.election.enable=true --alter
Completed updating config for
选举成功
[xuhaixing@xhx151 bin]$ kafka-topics.sh --zookeeper 192.168.94.151:2181/kafkaCluster --describe --topic topic-demo02
Topic: topic-demo02 PartitionCount: 3 ReplicationFactor: 3 Configs: unclean.leader.election.enable=true
Topic: topic-demo02 Partition: 0 Leader: 1 Replicas: 2,3,1 Isr: 1
Topic: topic-demo02 Partition: 1 Leader: 1 Replicas: 3,1,2 Isr: 1
Topic: topic-demo02 Partition: 2 Leader: 1 Replicas: 1,2,3 Isr: 1
再删除配置
[xuhaixing@xhx151 cluster]$ kafka-configs.sh --bootstrap-server 192.168.94.151:9093 --entity-type topics --entity-name topic-demo02 --delete-config unclean.leader.election.enable --alter
Completed updating config for
更多优质内容,请关注公众号:程序员星星