Kafka事务未关闭导致消费者无法消费消息。
背景
最近遇到一个问题:有一个公用topic,很多应用都读写这个topic。从某个时间点开始,所有消费该topic的消费者(read_committed级别)都拉不到消息了。随机看了一些应用的日志,未发现生产者报错,仍然能正常发消息并提交事务,消费者也未报错。打开运维工具,用read_uncommitted
级别的消费者查看该topic里面的消息,发现LSO(last stable offset)一直卡在某个offset,LEO(log end offset)仍在持续增加。
原因
-
生产者开启了事务,由于程序bug,在异常退出的情况下,producer的commit、abort都没执行到,后果就是事务未关闭。
-
事务除了手动commit、abort两种方式进行主动关闭外,还可以通过超时被动关闭。
broker 端事务超时时间配置:transaction.max.timeout.ms,默认15分钟
producer 端事务超时时间配置:transaction.timeout.ms,默认1分钟
-
但是我们将事务超时时间设置得非常大,5小时。未关闭的事务将会等待5小时才能被动关闭。这5小时内,后续的消息都无法被消费。注意,虽然无法消费,但是不同事务Id的其他生产者却能继续往该topic提交事务。但是这些已提交的消息是不能被消费的,这是通过LastStableOffset(LSO)进行控制的。LSO 的定义是,
smallest offset of any open transaction
,已开启的事务里面offset最小的那一个值,可消费的offset是小于LSO的。 -
为什么后续已提交事务的消息也无法被消费呢?因为Kafka要保证partition内的消息是有序的。当前序事务未结束,如果先消费了后续消息,此时前序事务即使提交也无意义了,因为消费者不能将offset倒回去消费它。如果倒回去消费,顺序就乱了。所以,前序事务必须结束,不管是主动commit/abort,还是超时被动abort,总之必须结束,后续消息才能被消费。
复现方案及延申测试
创建2个事务生产者tx-producer-1、tx-producer-2,创建1个非事务的生产者nontx-producer,创建1个隔离级别(isolation.level
)为read_committed的rc-consumer,1个隔离级别为read_uncommitted
的runc-consumer。
- nontx-producer每秒发一个消息
- tx-producer-1每秒提交一个事务,一个事务里面发送2条消息
- tx-producer-2每秒提交一个事务,一个事务里面发送2条消息;但是,在做第三个事务时,发了2条消息后,不提交,长时间sleep
- rc-comsumer和runc-consumer持续消费,打印消费的消息
每个生产者、消费者在自己的线程运行,互不影响。使用Kafka_2.13-3.3.1版本。
kafka使用默认配置。
producer.close会立即关闭事务吗-会
结论:会
代码:
for (int i = 1000; i < 2000; i += 2) {
try {
producer.beginTransaction();
producer.send(new ProducerRecord<>(testTopic, "msg-" + i));
log.info("[tx-producer-2]sent tx msg-" + i);
producer.send(new ProducerRecord<>(testTopic, "msg-" + (i + 1)));
log.info("[tx-producer-2]sent tx msg-" + (i + 1));
// 挂起事务
if (i == 1004) {
log.warn("tx-producer-2 hanging tx");
TimeUnit.SECONDS.sleep(5);
log.warn("tx-producer-2 close");
producer.close();
return;
}
producer.commitTransaction();
} catch (Exception e) {
producer.abortTransaction();
log.warn("tx-producer-2 abort tx");
}
}
操作:在IDEA运行应用即可
应用日志:
2022-11-17 09:56:39.757 WARN 2288 --- [pool-5-thread-1] cn.whu.wy.kafkamate.service.KafkaTxTest : tx-producer-2 close
2022-11-17 09:56:39.757 INFO 2288 --- [pool-5-thread-1] o.a.k.clients.producer.KafkaProducer : [Producer clientId=producer-tx-producer-2, transactionalId=tx-producer-2] Closing the Kafka producer with timeoutMillis = 9223372036854775807 ms.
2022-11-17 09:56:39.757 DEBUG 2288 --- [r-tx-producer-2] o.a.k.clients.producer.internals.Sender : [Producer clientId=producer-tx-producer-2, transactionalId=tx-producer-2] Beginning shutdown of Kafka producer I/O thread, sending remaining records.
2022-11-17 09:56:39.757 INFO 2288 --- [r-tx-producer-2] o.a.k.clients.producer.internals.Sender : [Producer clientId=producer-tx-producer-2, transactionalId=tx-producer-2] Aborting incomplete transaction due to shutdown
2022-11-17 09:56:39.758 DEBUG 2288 --- [r-tx-producer-2] o.a.k.c.p.internals.TransactionManager : [Producer clientId=producer-tx-producer-2, transactionalId=tx-producer-2] Transition from state IN_TRANSACTION to ABORTING_TRANSACTION
2022-11-17 09:56:39.758 DEBUG 2288 --- [r-tx-producer-2] o.a.k.c.p.internals.TransactionManager : [Producer clientId=producer-tx-producer-2, transactionalId=tx-producer-2] Enqueuing transactional request EndTxnRequestData(transactionalId='tx-producer-2', producerId=2, producerEpoch=0, committed=false)
2022-11-17 09:56:39.758 DEBUG 2288 --- [r-tx-producer-2] o.a.k.clients.producer.internals.Sender : [Producer clientId=producer-tx-producer-2, transactionalId=tx-producer-2] Sending transactional request EndTxnRequestData(transactionalId='tx-producer-2', producerId=2, producerEpoch=0, committed=false) to node ship:9092 (id: 0 rack: null) with correlation ID 40
2022-11-17 09:56:39.758 DEBUG 2288 --- [r-tx-producer-2] org.apache.kafka.clients.NetworkClient : [Producer clientId=producer-tx-producer-2, transactionalId=tx-producer-2] Sending END_TXN request with header RequestHeader(apiKey=END_TXN, apiVersion=3, clientId=producer-tx-producer-2, correlationId=40) and timeout 30000 to node 0: EndTxnRequestData(transactionalId='tx-producer-2', producerId=2, producerEpoch=0, committed=false)
2022-11-17 09:56:39.761 DEBUG 2288 --- [r-tx-producer-2] org.apache.kafka.clients.NetworkClient : [Producer clientId=producer-tx-producer-2, transactionalId=tx-producer-2] Received END_TXN response from node 0 for request with header RequestHeader(apiKey=END_TXN, apiVersion=3, clientId=producer-tx-producer-2, correlationId=40): EndTxnResponseData(throttleTimeMs=0, errorCode=0)
2022-11-17 09:56:39.761 DEBUG 2288 --- [r-tx-producer-2] o.a.k.c.p.internals.TransactionManager : [Producer clientId=producer-tx-producer-2, transactionalId=tx-producer-2] Transition from state ABORTING_TRANSACTION to READY
2022-11-17 09:56:39.762 DEBUG 2288 --- [r-tx-producer-2] o.a.k.clients.producer.internals.Sender : [Producer clientId=producer-tx-producer-2, transactionalId=tx-producer-2] Shutdown of Kafka producer I/O thread has completed.
2022-11-17 09:56:39.762 INFO 2288 --- [pool-5-thread-1] org.apache.kafka.common.metrics.Metrics : Metrics scheduler closed
2022-11-17 09:56:39.762 INFO 2288 --- [pool-5-thread-1] org.apache.kafka.common.metrics.Metrics : Closing reporter org.apache.kafka.common.metrics.JmxReporter
2022-11-17 09:56:39.762 INFO 2288 --- [pool-5-thread-1] org.apache.kafka.common.metrics.Metrics : Metrics reporters closed
2022-11-17 09:56:39.763 INFO 2288 --- [pool-5-thread-1] o.a.kafka.common.utils.AppInfoParser : App info kafka.producer for producer-tx-producer-2 unregistered
“Aborting incomplete transaction due to shutdown”,可见在producer.close()会关闭未完成的事务。
kafka server.log:
只有初始化事务的日志,没有结束事务的日志
[2022-11-17 09:15:03,988] INFO [TransactionCoordinator id=0] Initialized transactionalId tx-producer-2 with producerId 2 and producer epoch 0 on partition __transaction_state-2 (kafka.coordinator.transaction.TransactionCoordinator)
[2022-11-17 09:15:03,995] INFO [TransactionCoordinator id=0] Initialized transactionalId tx-producer-1 with producerId 1 and producer epoch 0 on partition __transaction_state-1 (kafka.coordinator.transaction.TransactionCoordinator)
应用进程退出会立即关闭事务吗-不会
结论:不会
代码:
for (int i = 1000; i < 2000; i += 2) {
try {
producer.beginTransaction();
producer.send(new ProducerRecord<>(testTopic, "msg-" + i));
log.info("[tx-producer-2]sent tx msg-" + i);
producer.send(new ProducerRecord<>(testTopic, "msg-" + (i + 1)));
log.info("[tx-producer-2]sent tx msg-" + (i + 1));
// 挂起事务
if (i == 1004) {
log.warn("tx-producer-2 hanging tx");
TimeUnit.MINUTES.sleep(60); // 长时间sleep
}
producer.commitTransaction();
} catch (Exception e) {
producer.abortTransaction();
log.warn("tx-producer-2 abort tx");
}
}
操作:在IDEA里面运行应用,在sleep过程中点击停止
应用日志:
Started MateApp in 1.702 seconds (JVM running for 2.123)
2022-11-16 21:05:58.582 INFO 14572 --- [pool-2-thread-1] cn.whu.wy.kafkamate.service.KafkaTxTest : runc pool...
2022-11-16 21:05:58.582 INFO 14572 --- [pool-1-thread-1] cn.whu.wy.kafkamate.service.KafkaTxTest : rc pool...
2022-11-16 21:05:58.609 INFO 14572 --- [pool-3-thread-1] cn.whu.wy.kafkamate.service.KafkaTxTest : [nontx-producer]sent msg-0
2022-11-16 21:05:59.104 INFO 14572 --- [pool-5-thread-1] cn.whu.wy.kafkamate.service.KafkaTxTest : [tx-producer-2]sent tx msg-1000
2022-11-16 21:06:00.192 INFO 14572 --- [pool-1-thread-1] cn.whu.wy.kafkamate.service.KafkaTxTest : consumer[rc] received: msg-1000
2022-11-16 21:06:00.192 INFO 14572 --- [pool-1-thread-1] cn.whu.wy.kafkamate.service.KafkaTxTest : consumer[rc] received: msg-1001
...
...
2022-11-16 21:06:02.160 WARN 14572 --- [pool-5-thread-1] cn.whu.wy.kafkamate.service.KafkaTxTest : tx-producer-2 hanging tx
2022-11-16 21:06:02.254 INFO 14572 --- [pool-2-thread-1] cn.whu.wy.kafkamate.service.KafkaTxTest : runc pool...
2022-11-16 21:06:02.256 INFO 14572 --- [pool-1-thread-1] cn.whu.wy.kafkamate.service.KafkaTxTest : consumer[rc] received: msg-2004
2022-11-16 21:06:02.256 INFO 14572 --- [pool-1-thread-1] cn.whu.wy.kafkamate.service.KafkaTxTest : consumer[rc] received: msg-2005
2022-11-16 21:06:02.256 INFO 14572 --- [pool-1-thread-1] cn.whu.wy.kafkamate.service.KafkaTxTest : rc pool...
2022-11-16 21:06:02.290 INFO 14572 --- [pool-2-thread-1] cn.whu.wy.kafkamate.service.KafkaTxTest : consumer[read_uncommitted] received: msg-2006
2022-11-16 21:06:02.290 INFO 14572 --- [pool-2-thread-1] cn.whu.wy.kafkamate.service.KafkaTxTest : consumer[read_uncommitted] received: msg-2007
...
...
2022-11-16 21:06:08.265 INFO 14572 --- [pool-2-thread-1] cn.whu.wy.kafkamate.service.KafkaTxTest : consumer[read_uncommitted] received: msg-2018
2022-11-16 21:06:08.265 INFO 14572 --- [pool-2-thread-1] cn.whu.wy.kafkamate.service.KafkaTxTest : consumer[read_uncommitted] received: msg-2019
2022-11-16 21:06:08.265 INFO 14572 --- [pool-2-thread-1] cn.whu.wy.kafkamate.service.KafkaTxTest : runc pool...
2022-11-16 21:06:08.292 INFO 14572 --- [pool-1-thread-1] cn.whu.wy.kafkamate.service.KafkaTxTest : rc pool...
Process finished with exit code 130
在事务挂起后,consumer[rc]就再也消费不到消息了,只有consumer[read_uncommitted]还能消费到。
kafka server.log:
[2022-11-16 21:06:00,359] INFO [TransactionCoordinator id=0] Initialized transactionalId tx-producer-2 with producerId 2 and producer epoch 0 on partition __transaction_state-2 (kafka.coordinator.transaction.TransactionCoordinator)
...
[2022-11-16 21:07:03,861] INFO [TransactionCoordinator id=0] Completed rollback of ongoing transaction for transactionalId tx-producer-2 due to timeout (kafka.coordinator.transaction.TransactionCoordinator)
可见,tx-producer-2的事务在init1分钟后自动rollback。应用在21:06:13停止,事务在21:07:03回滚。应用停止时事务并未立即回滚。
System.exit会立即关闭事务吗-不会
结论:不会
代码:
for (int i = 1000; i < 2000; i += 2) {
try {
producer.beginTransaction();
producer.send(new ProducerRecord<>(testTopic, "msg-" + i));
log.info("[tx-producer-2]sent tx msg-" + i);
producer.send(new ProducerRecord<>(testTopic, "msg-" + (i + 1)));
log.info("[tx-producer-2]sent tx msg-" + (i + 1));
// 挂起事务
if (i == 1004) {
log.warn("tx-producer-2 system exit");
System.exit(-1);
}
producer.commitTransaction();
} catch (Exception e) {
producer.abortTransaction();
log.warn("tx-producer-2 abort tx");
}
}
应用日志:
...
2022-11-17 10:18:49.195 WARN 10736 --- [pool-5-thread-1] cn.whu.wy.kafkamate.service.KafkaTxTest : tx-producer-2 system exit
...
Process finished with exit code -1
kafka server.log
[2022-11-17 10:18:46,106] INFO [TransactionCoordinator id=0] Initialized transactionalId tx-producer-1 with producerId 1 and producer epoch 0 on partition __transaction_state-1 (kafka.coordinator.transaction.TransactionCoordinator)
[2022-11-17 10:18:46,176] INFO [TransactionCoordinator id=0] Initialized transactionalId tx-producer-2 with producerId 2 and producer epoch 0 on partition __transaction_state-2 (kafka.coordinator.transaction.TransactionCoordinator)
[2022-11-17 10:19:52,492] INFO [TransactionCoordinator id=0] Completed rollback of ongoing transaction for transactionalId tx-producer-1 due to timeout (kafka.coordinator.transaction.TransactionCoordinator)
[2022-11-17 10:19:52,496] INFO [TransactionCoordinator id=0] Completed rollback of ongoing transaction for transactionalId tx-producer-2 due to timeout (kafka.coordinator.transaction.TransactionCoordinator)
看起来应用System.exit(-1) 3秒后事务就回滚了。再做一次
# 应用日志
2022-11-17 10:27:00.366 WARN 20524 --- [pool-5-thread-1] cn.whu.wy.kafkamate.service.KafkaTxTest : tx-producer-2 system exit
# kafka server 日志
[2022-11-17 10:26:53,406] INFO [TransactionCoordinator id=0] Initialized transactionalId tx-producer-1 with producerId 2 and producer epoch 0 on partition __transaction_state-1 (kafka.coordinator.transaction.TransactionCoordinator)
[2022-11-17 10:26:53,406] INFO [TransactionCoordinator id=0] Initialized transactionalId tx-producer-2 with producerId 1 and producer epoch 0 on partition __transaction_state-2 (kafka.coordinator.transaction.TransactionCoordinator)
[2022-11-17 10:28:02,428] INFO [TransactionCoordinator id=0] Completed rollback of ongoing transaction for transactionalId tx-producer-1 due to timeout (kafka.coordinator.transaction.TransactionCoordinator)
[2022-11-17 10:28:02,432] INFO [TransactionCoordinator id=0] Completed rollback of ongoing transaction for transactionalId tx-producer-2 due to timeout (kafka.coordinator.transaction.TransactionCoordinator)
这次相差一分钟。
Thread.stop会立即关闭事务吗-不会
结论:不会
代码:
for (int i = 1000; i < 2000; i += 2) {
try {
producer.beginTransaction();
producer.send(new ProducerRecord<>(testTopic, "msg-" + i));
log.info("[tx-producer-2]sent tx msg-" + i);
producer.send(new ProducerRecord<>(testTopic, "msg-" + (i + 1)));
log.info("[tx-producer-2]sent tx msg-" + (i + 1));
// 挂起事务
if (i == 1004) {
log.info("tx-producer-1 stop");
Thread.currentThread().stop();
}
producer.commitTransaction();
} catch (Exception e) {
producer.abortTransaction();
log.warn("tx-producer-2 abort tx");
}
}
应用日志:
2022-11-17 10:45:22.239 INFO 7504 --- [pool-5-thread-1] cn.whu.wy.kafkamate.service.KafkaTxTest : tx-producer-1 stop
kafka server.log:
[2022-11-17 10:45:15,340] INFO [TransactionCoordinator id=0] Initialized transactionalId tx-producer-2 with producerId 2 and producer epoch 0 on partition __transaction_state-2 (kafka.coordinator.transaction.TransactionCoordinator)
[2022-11-17 10:46:22,035] INFO [TransactionCoordinator id=0] Completed rollback of ongoing transaction for transactionalId tx-producer-2 due to timeout (kafka.coordinator.transaction.TransactionCoordinator)
生产者线程stop后一分钟,broker回滚事务。
总结
transaction.timeout.ms
(producer)、transaction.max.timeout.ms
(broker)、request.timeout.ms
(broker)三个参数尤其重要,不能配得太大。- 异常处理要完备,确保commit、abort、close方法能被执行到
如何检测挂起的事务
这个不消费的问题我排查了很久,知道原因后自然而然想到,有没有手段可以检测到未关闭的事务呢?KIP-664 正在讨论这个问题。
在这个特性发布之前,我们可以用一些其他方式勉强实现这个功能。创建一个read_uncommitted
的消费者,消费同样的topic partition,然后比较end offset。如果read_uncommitted
的end offset 逐渐增加,read_committed
的end offset 长时间未增加,则说明位于end offset + 1
的事务未提交。read_uncommitted
的end offset+1
就是 LSO。
如何关闭挂起的事务
一般来讲,如果应用异常退出,未提交的事务就只有一个结局了:就是回滚,要么由producer发起,要么超时时间到了由broker自动执行。
有时应用崩了,一时半会儿也无法运行起来,它开启的事务无法主动关闭,只能等超时。如果超时时间很久,对其他应用影响很大。此时,可以使用上一节描述的方法,将挂起的事务检测出来。LSO这个offset上的消息,就是第一个未关闭事务的第一条消息。查看该消息的内容可以知道是哪个应用发送了这条消息,进而得出发送该消息的应用的事务Id。然后使用该Id创建一个producer,再执行producer.initTransactions()
,之前的“旧”事务就会被关闭。
Reference
- KIP-664: Provide tooling to detect and abort hanging transactions - Apache Kafka - Apache Software Foundation
- [KAFKA-12671] Out of order processing with a transactional producer can lead to a stuck LastStableOffset - ASF JIRA (apache.org)
- [KAFKA-5880] Transactional producer and read committed consumer causes consumer to stuck - ASF JIRA (apache.org)
- Is it possible to force abort a Kafka transaction? - Stack Overflow
附:测试代码
package cn.whu.wy.kafkamate.service;
import cn.whu.wy.kafkamate.bean.TopicInfo;
import lombok.extern.slf4j.Slf4j;
import org.apache.kafka.clients.consumer.Consumer;
import org.apache.kafka.clients.consumer.ConsumerConfig;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.Producer;
import org.apache.kafka.clients.producer.ProducerRecord;
import org.apache.kafka.common.KafkaException;
import org.apache.kafka.common.errors.AuthorizationException;
import org.apache.kafka.common.errors.OutOfOrderSequenceException;
import org.apache.kafka.common.errors.ProducerFencedException;
import org.springframework.beans.factory.InitializingBean;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Service;
import java.time.Duration;
import java.util.Collections;
import java.util.Properties;
import java.util.Set;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
/**
* kafka事务测试。测试未关闭的事务,对消费者的影响
*
* @author WangYong
* Date 2022/11/15
* Time 22:05
*/
@Service
@Slf4j
public class KafkaTxTest implements InitializingBean {
private final String bootstrapServers = "192.168.136.128:9092";
@Autowired
private TopicManager topicManager;
@Override
public void afterPropertiesSet() throws Exception {
final String testTopic = "test-tx";
TopicInfo topicInfo = new TopicInfo(testTopic, 1, (short) 1);
topicManager.createTopics(Set.of(topicInfo));
Executors.newSingleThreadExecutor().execute(() -> {
try (Consumer<String, String> consumer = genRcConsumer("rc-consumer")) {
consumer.subscribe(Collections.singleton(testTopic));
while (true) {
ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(1000));
records.forEach(r -> log.info("consumer[rc] received: {}", r.value()));
log.info("rc pool...");
}
}
});
Executors.newSingleThreadExecutor().execute(() -> {
try (Consumer<String, String> consumer = genRuncConsumer("runc-consumer")) {
consumer.subscribe(Collections.singleton("test-tx"));
while (true) {
ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(1000));
records.forEach(r -> log.info("consumer[read_uncommitted] received: {}", r.value()));
log.info("runc pool...");
}
}
});
// 每秒发1个消息,10条消息
Executors.newSingleThreadExecutor().execute(() -> {
try (Producer<String, String> producer = genProducer()) {
for (int i = 0; i < 10; i++) {
producer.send(new ProducerRecord<>(testTopic, "msg-" + i));
TimeUnit.SECONDS.sleep(1);
log.info("[nontx-producer]sent msg-" + i);
}
} catch (Exception e) {
throw new RuntimeException(e);
}
});
// 每秒一个事务,每个事务发2条消息,10个事务,20条消息
Executors.newSingleThreadExecutor().execute(() -> {
Producer<String, String> producer = genTxProducer("tx-producer-1");
producer.initTransactions();
for (int i = 2000; i < 2020; i += 2) {
try {
producer.beginTransaction();
producer.send(new ProducerRecord<>(testTopic, "msg-" + i));
log.info("[tx-producer-1]sent tx msg-" + i);
producer.send(new ProducerRecord<>(testTopic, "msg-" + (i + 1)));
log.info("[tx-producer-1]sent tx msg-" + (i + 1));
TimeUnit.SECONDS.sleep(1);
producer.commitTransaction();
} catch (ProducerFencedException | OutOfOrderSequenceException | AuthorizationException e) {
// We can't recover from these exceptions, so our only option is to close the producer and exit.
producer.close();
log.error("tx-producer-1 closed", e);
} catch (KafkaException e) {
// For all other exceptions, just abort the transaction and try again.
producer.abortTransaction();
log.warn("tx-producer-1 abort tx");
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
}
});
// 每秒1个事务,每个事务发2条消息
// 在2个事务后sleep。即,commit了4条消息,未commit的有2条
Executors.newSingleThreadExecutor().execute(() -> {
Producer<String, String> producer = genTxProducer("tx-producer-2");
producer.initTransactions();
for (int i = 1000; i < 2000; i += 2) {
try {
producer.beginTransaction();
producer.send(new ProducerRecord<>(testTopic, "msg-" + i));
log.info("[tx-producer-2]sent tx msg-" + i);
producer.send(new ProducerRecord<>(testTopic, "msg-" + (i + 1)));
log.info("[tx-producer-2]sent tx msg-" + (i + 1));
TimeUnit.SECONDS.sleep(1);
// 挂起事务
if (i == 1004) {
// log.warn("tx-producer-2 hanging tx");
TimeUnit.SECONDS.sleep(5);
// log.warn("tx-producer-2 system exit");
// System.exit(-1);
// TimeUnit.MINUTES.sleep(60);
log.info("tx-producer-1 stop");
Thread.currentThread().stop();
}
producer.commitTransaction();
} catch (ProducerFencedException | OutOfOrderSequenceException | AuthorizationException e) {
// We can't recover from these exceptions, so our only option is to close the producer and exit.
producer.close();
log.error("tx-producer-2 closed", e);
} catch (KafkaException e) {
// For all other exceptions, just abort the transaction and try again.
producer.abortTransaction();
log.warn("tx-producer-2 abort tx");
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
}
log.info("thread tx-producer-2 exit");
});
// 10 秒后,使用挂起线程的事务id,创建一个producer,强制结束之前挂起的事务
// Executors.newSingleThreadExecutor().execute(() -> {
// try {
// TimeUnit.SECONDS.sleep(10);
// } catch (InterruptedException e) {
// throw new RuntimeException(e);
// }
// Producer<String, String> producer = genTxProducer("tx-producer-1");
// log.info("tx-producer-1 init again");
// producer.initTransactions();
// });
}
Producer<String, String> genTxProducer(String txId) {
Properties props = new Properties();
props.put("bootstrap.servers", bootstrapServers);
props.put("linger.ms", 1);
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("transactional.id", txId);
// props.put(ProducerConfig.TRANSACTION_TIMEOUT_CONFIG, 10000); // 10 min
return new KafkaProducer<>(props);
}
Producer<String, String> genProducer() {
Properties props = new Properties();
props.put("bootstrap.servers", bootstrapServers);
props.put("linger.ms", 1);
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
return new KafkaProducer<>(props);
}
Consumer<String, String> genRuncConsumer(String gid) {
Properties props = new Properties();
props.setProperty("bootstrap.servers", bootstrapServers);
props.setProperty("group.id", gid);
props.setProperty("enable.auto.commit", "true");
props.setProperty("auto.commit.interval.ms", "1000");
props.setProperty("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.setProperty("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
return new KafkaConsumer<>(props);
}
Consumer<String, String> genRcConsumer(String gid) {
Properties props = new Properties();
props.setProperty("bootstrap.servers", bootstrapServers);
props.setProperty("group.id", gid);
props.setProperty("enable.auto.commit", "true");
props.setProperty("auto.commit.interval.ms", "1000");
props.setProperty("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.setProperty("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.setProperty(ConsumerConfig.ISOLATION_LEVEL_CONFIG, "read_committed");
return new KafkaConsumer<>(props);
}
}
标签:11,transaction,producer,tx,Kafka,stuck,2022,kafka
From: https://www.cnblogs.com/duanguyuan/p/16911205.html