关于 Pulsar Summit
Pulsar Summit 是 Apache Pulsar 社区年度盛会,它将分布在世界各地的 Apache Pulsar 项目 Contributor、Committer 和各企业 CTO/CIO、开发者、架构师、数据科学家,以及消息和流计算社区的精英召集在一起。于此盛会,大家分享实践经验、交流想法、探讨关于 Pulsar 项目和社区的知识,切磋互动。
Pulsar Summit Asia 2022 将于 11 月 19-20 日线上举办,本次大会共分为 2 天进行。第一天为中文论坛,第二天为英文专场,汇聚技术大咖分享 Apache Pulsar 实践经验、场景案例、技术探究和运维故事,交流探讨 Pulsar 一线落地技术干货。
第二天为持续近一整天的英文论坛,讲师来自北美、印度、中国等不同区域,跨越多个时区。我们秉承开放的理念,在大会筹备一开始就面向亚洲和全球来征集演讲议题,确保议题多元化。今天,我们将为大家详细解析 11月 20 日英文论坛演讲议题。跟紧日程,精彩不要错过。
下面是 11 月 20 日大会第二天议题详情。
Day 2 - 英文论坛
在 Pulsar Summit Asia 2022 第二天全天英文论坛的演讲中,分享议题聚焦在 Apache Pulsar 在全球企业的技术探究和场景实践。
最新议程,请关注 Pulsar Summit 官网及后续公众号动态。
演讲嘉宾及详情
Handling 100K Consumers with One Topic: Practices and Technical Details
单 Topic 处理 10 万消费者:技术实践与实现细节
Hongjie Zhai, 日本电信电话公司软件创新中心研究员
随着智能工厂和联网车辆的发展,需要在大量设备之间交换消息,以监测和控制系统。Apache Pulsar 是保持消息管道简单、实时和安全的最佳解决方案之一。然而,由于当前的消息 broker 主要是为云服务设计的,用户在面对大量的消费者时将面临性能问题。在本演讲上,我们将分享在单 Pulsar topic 中处理 10 万消费者的实践和技术细节。
Awesome Pulsar in Yum China
Apache Pulsar 在百胜中国之旅
燕春翔,百胜中国架构组后端工程师
百胜中国控股有限公司是中国最大的餐饮企业,致力于成为全球最创新的餐饮先锋。百胜从 2019 年开始了对于下一代消息队列中间件的前期技术研究,最终选择 Pulsar 作为技术中台中消息中间件的实现标准。目前已经在业务中台、应用观测等场景中实施。本次分享将介绍 Apache Pulsar 在百胜中国的落地与实践故事。
Building Modern Data Streaming Apps
构建现代流式数据应用
Timothy Spann,StreamNative 布道师
本演讲将分享过往 7 年个人在物联网、CDC、日志等场景积累的实践经验。在现代工程中,我们经常会用 Apache NiFi 来编排进入 Kafka / Pulsar 的流数据,并借助 Spark 构建流式 ETL,使用 Pulsar Functions/Kafka Streams 来增强事件用于机器学习和数据丰富,并使用 Flink SQL 来持续查询,这就是 FLiPN/FLaNK 技术堆栈:https://www.flipn.app/。
Apache Pulsar Development 101 with Python
使用 Python 进行 Apache Pulsar 开发 101
Timothy Spann,StreamNative 布道师
本演讲将分享如何使用 Python 进行实时云原生流式编程,内容包括教你搭建 Apache Pulsar 单机集群,使用多个不同的 Python 库和客户端、websockets、MQTT 等生产和消费消息。通过该演讲,帮你用 Python 构建个人的实时流和消息应用。
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd.
David Kjerrumgaard,StreamNative 布道师,Apache Pulsar Committer,《Pulsar in Action》作者
Starting with version 2.10, the Apache ZooKeeper dependency has been eliminated and been replaced with a pluggable framework that enables you to reduce the infrastructure footprint of Apache Pulsar by leveraging alternative metadata and coordination systems based upon your deployment environment. In this talk, I will walk you through the steps required to utilize the existing etcd service running inside Kubernetes to act as Pulsar's metadata store. Thereby eliminating the need to run ZooKeeper entirely, leaving you with a Zookeeper-less Pulsar.
Introducing TableView: Pulsar's database table abstraction
David Kjerrumgaard,StreamNative 布道师,Apache Pulsar Committer,《Pulsar in Action》作者
In many use cases, applications are using Pulsar consumers or readers to fetch all the updates from a topic and construct a map with the latest value of each key for the messages that were received.
The new TableView consumer offers support for this access pattern directly in the Pulsar client API itself, and encapsulate the complexities of manually constructing such local cache manually. In this talk, we will demonstrate how to use the new TableView consumer using a simple application and discuss best practices and patterns for using the TableView consumer.
Turn Data in Pulsar to Real-time Charts and Alerts via Streaming SQL
Gang Tao, Timeplus CTO
Zhimin Liang, Senior Software Engineer at Timeplus
Do you know you can turn streaming data in Apache Pulsar or StreamNative Cloud as real-time charts or alerts, without writing a single line of code, or setting up Flink or Presto? You can also easily transform data from one format to another, or route the data to different topics, or even join data in different topics and apply tumble/hop/session windows. All in one streaming SQL query, with the free Timeplus Cloud service.
Jove is the co-founder of Timeplus and Gimi is the founding engineer. In this presentation, we will show various ways to integrate Pulsar with Timeplus, to make your streaming data immediately accessible, queryable, and actionable. Real-time data in Pulsar can be either pulled or pushed to Timeplus. After applying streaming transformation or analytics, the data or insights can be sent back to Pulsar, or any downstream applications or data stores, like Snowflake.
Streaming wars and How Apache Pulsar is acing the battle
Shivji Kumar Jha,Nutanix 首席工程师
Sachidananda Maharana,Nutanix IV 级工程师
This presentation will cover why we prefer Apache Pulsar over other streaming solutions. Given the streaming requirements of near-realtime action, scalability, high availability, disaster recovery, load balancing, low cost of operations, multi-tenancy and flexibility to fit a variety of use cases, we have run kafka, kinesis and NATS Jetstream across different use cases. And we chose Apache Pulsar as our platform of choice for cloud-native messaging.
This talk presents the operational challenges we have faced running Pulsar for over 4 years and how Pulsar fit into different use cases given its multi-tenancy and configurability. We will also talk about how we have aced these challenges to stick to pulsar and even moved application from other messaging solutions to Pulsar. We will end with the challenges and learnings on moving to Pulsar from Kafka and Kinesis. After this session, you will learn more on common messaging requirements, why you should also choose Apache Pulsar as your platform of choice and how you can safely transition to Pulsar if you have been running other messaging solutions.
Taking Jakarta JMS to new generation Messaging Systems - Apache Pulsar
Enrico Olivelli, Apache BookKeeper and Apache Pulsar PMC Member
Mary Grygleski, Streaming Developer Advocate at DataStax
In this session we will briefly describe Apache Pulsar and Jakarta JMS. We will see how Apache Pulsar concepts map to the Jakarta Messaging Specifications. You will also see how to connect a Jakarta EE application to Pulsar just by dropping a Resource Adapter in your application server and basically zero code changes.
Keeping on top of hybrid cloud usage with Pulsar
Shivji Kumar Jha,Nutanix 首席工程师
Tarun Annapareddy,Nutanix III 级工程师
This presentation will cover how we force controls on an application over a hybrid cloud infrastructure built from a combination of different clouds that could include private and public clouds. For instance, you could deploy your microservice in AWS but use BigTable as your data store.
Every cloud or on-premise infrastructure provider provides monitoring, alerting, metering, audit trail etc. In a hybrid cloud use case, the IT team needs a single view of the usage across the cloud providers. Such a platform needs to combine the data sourcing of these utilities from different infrastructure providers, parse them into a common format and build an integrated data sink. Adding to it the challenge of each data source evolving its data formats, volume, velocity, throughput, latency etc. You have a challenging task to understand data from varied sources and present it in one view.
We will present an architecture that has been battle-tested in production for over five years. The components include Pulsar, Flink, PostgreSQL, Redis, Neo4J DB, rule/ML engine etc., to name a few technologies.
After this presentation, you will learn more about
1. Combining infrastructure from multiple clouds and on-premise providers to build your application.
2. Appreciate the need for lambda architecture.
3. How to stream ever-evolving multi-schema data using pulsar
4. How to write custom rules over a stream analytics framework to make your application.
Make querying from Pulsar easier: Introduce Flink Pulsar SQL Connector
让查询 Pulsar 更简单:Flink Pulsar SQL Connector 详解
张雨霏,StreamNative 软件工程师,Apache Flink contributor
基于全新的 Flink DataStream API Pulsar connector,StreamNative 团队实现了 Flink 的 Table API Connector for Pulsar 和 PulsarCatalog,帮助用户轻松通过 Flink SQL 与 Pulsar 集群交互。 在本次演讲中,我们将介绍 Pulsar SQL Connector 的基本概念和示例,并介绍 PulsarCatalog 使用 Pulsar 作为元数据存储的两种不同模式,以及它如何使用户更轻松地从 Pulsar 查询。