首页 > 其他分享 >confluent kafka

confluent kafka

时间:2024-10-23 14:00:42浏览次数:1  
标签:producer consumer confluent broker kafka data schema

1.Apache kafka fundementals

Producer -> Kafka <-Consumer

data producer:

 

data consumer:

 

producer & consumer is decoupled producer send data (log or file or other data) to kafka instead of consumer side that more scalability for both producer or consumer .

kafka architecture

 

zookeeper: 
cluster management , failure detection & recovery, store ACLs & secrets
if the leader broker down , the zookeeper will elected a new leader of broker .

 

broker:
working node of kafka cluster , each broker has own disk , and topic distributed on it .
    topic:
        producer produce the data to kafka need to be target a topic 
    partition:
        partition distributed on each broker node , in case one broker down, the data lost , each partition can have multi consumer to difference application 
    segement:
        store the data on disk file.

 

 

 

 

 

topic data strucature:

 

 

2.Kafka workflow

producer:

 

consumer:

 

 

 

 

leader: producer produce the data to partition leader and consumer consume the data from partition leader 
follower: follow up the leader and write data or log to it own log.

three mode:
A: low latency on producer side and might be loss data
B: wait for leader acks 
C: wait for both leader and follower acks

acks: the broker write the data to disk and response to producer

 

trouble shooting:
1.conflunet control center 
2.kafka log file
3.ssl logging
4.authorizer debugging

security:
kafka broker -> broker  broker -> zookeeper  can use ssl to encrpty

Serializers and Deserializers:
The serializer passes the unique ID of the schema as part of the Kafka message so that consumers can use the correct schema for deserialization. The ID can be in the message payload or in the message headers. The default location is the message payload

before transmitting the entire message to the broker, let the producer know how to convert the message into byte array we use serializers. Similarly, to convert the byte array back to the object we use the deserializers by the consumer.

schema:

 

what schema is ? 
A schema defines the structure of the data format. The Kafka topic name can be independent of the schema name. Schema Registry defines a scope in which schemas can evolve, and that scope is the subject

why schema registry:

  1. if producer change the schema but consumer still consuming the data , it will has errors. 

  2. to reduce the disk space , if whole message (schema & data) persist on the disk , the disk used space will double increased .

schema evolution process: 
add filed or changing filed or remove filed process called schema evolution

 

 

 

 

 

 

producer:
send data and message ID to kafka broker 
send schema ID and definition to schema registry and schema registry will confirm that is not presents and check it compatible and cache locally .

consumer:
consume the data from kafka broker and get scheme ID, check if it's unknown schema ID then ask for schema registery to get full schema definition .

confluent platform 

 

标签:producer,consumer,confluent,broker,kafka,data,schema
From: https://www.cnblogs.com/howtobuildjenkins/p/18496221

相关文章

  • 隨筆 Kafka 异步发送机制解析
    Kafka异步发送机制解析与比喻        Kafka是一个高效的分布式消息系统,异步发送是其实现高吞吐量和低延迟的关键机制之一。为了更好地理解Kafka生产者的异步发送过程,我们将其比作一个旅客乘飞机前往目的地的故事。在这个故事中,生产者就像一个机场,负责将旅客(数据)送......
  • 为什么说Kafka还不是完美的实时数据通道
     本文主要谈谈Kafka用于实时数据通道场景的缺陷,以及如何在架构上进行弥补。Kafka归属于消息队列类产品,其他竞品还有RabbitMQ、RocketMQ等,总的来说它们都是基于生产者、中介和消费者三种角色,提供高并发、大数据量场景下的消息传递。Kafka诞生自Hadoop生态,与生态中的其他组件......
  • kafka
    kafka1.zookeeper集群搭建1.1作用什么是zookeeperzookeeper致力于维护开源服务器,实现高度可靠的分布式协调zookeeper是一个用于维护配置信息,命名,提供分布式服务和提供组服务的集中式服务说白了:zookeeper的作用就是为分布式集群各节点提供数据共享的功能1.2应用场景......
  • 程序员必须了解的消息队列之王-Kafka
    1.Kafka概述1.1定义Kafka是由Apache软件基金会开发的一个开源流处理平台。Kafka是一个分布式的基于发布/订阅模式的消息队列(MessageQueue),主要应用于大数据实时处理领域。1.2消息队列1.2.1传统消息队列的应用场景1.2.2为什么需要消息队列解耦:允许你独立的扩展或......
  • k8s部署Kafka集群超详细讲解
    准备部署环境Kubernetes集群信息NAMEVERSIONk8s-masterv1.29.2k8s-node01v1.29.2k8s-node02v1.29.2Kafka:3.7.1版本Zookeeper:3.6.3版本准备StorageClass#kubectlgetscNAMEPROVISIONERRECLAIMPOLICYVOLUMEBINDINGMODEALLOWVOLUMEEXPAN......
  • Kafka原理剖析之「Purgatory(炼狱 | 时间轮)」
    一、前言本文介绍一下Kafka赫赫有名的组件Purgatory,相信做Kafka的朋友或多或少都对其有一定的了解,至少是听过它的名字。那它的作用是什么呢,用来解决什么问题呢?官网confluent早就有文章对其做了阐述https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=3483946......
  • Kafka快速入门
    Kafka简介:ApacheKafka是一个分布式流处理平台,由LinkedIn开发并开源,后来成为Apache软件基金会的顶级项目。Kafka主要用于构建实时数据管道和流式应用程序。它具有高吞吐量、可扩展性和容错性,能够处理数百万级别的读写请求。Kafka的核心特性包括:发布-订阅消息队列:Kaf......
  • Kafka集群以开启客户端鉴权
    在Kubernetes环境中,如果您使用的是StrimziKafkaOperator来管理您的Kafka集群,您可以通过CustomResourceDefinitions(CRD)来配置Kafka集群以开启客户端鉴权。以下是使用API接口创建Kafka集群并开启客户端鉴权的步骤:1.安装StrimziKafkaOperator首先,确保您已经在Kubernet......
  • Apache Kafka消息传递策略
    kafka消息传递策略微信公众号:阿俊的学习记录空间小红书:ArnoZhangwordpress:arnozhang1994博客园:arnozhangCSDN:ArnoZhang1994现在我们了解了一些关于生产者和消费者的工作原理,接下来讨论Kafka在生产者和消费者之间提供的策略保证。显然,消息传递可以提供多种保证:最多一次......
  • Apache Kafka设计思考
    kafka设计微信公众号:阿俊的学习记录空间小红书:ArnoZhangwordpress:arnozhang1994博客园:arnozhangCSDN:ArnoZhang1994一、目标能够作为一个统一的平台,处理大型公司可能拥有的所有实时数据流。(更像是数据库日志)高吞吐量:Kafka必须具有高吞吐量,以支持高容量的事件流,例如实时......