arroyo 对于kafka 有着很不错的集成支持(目前版本可以说是优先支持的),使用原生kafka 是一个选择,但是部署以及管理感觉比较费事
以前简单介绍过redpanda,所以尝试下集成
环境准备
- docker-compose
包含了redpandadata console,connect,以及一个测试,对于arroyo 使用了与redpandadata 同一个网络
version: '3'
networks:
redpanda_network:
driver: bridge
volumes:
redpanda: null
services:
redpanda:
image: docker.redpanda.com/redpandadata/redpanda:v23.1.4
command:
- redpanda start
- --smp 1
- --overprovisioned
- --kafka-addr PLAINTEXT://0.0.0.0:29092,OUTSIDE://0.0.0.0:9092
- --advertise-kafka-addr PLAINTEXT://redpanda:29092,OUTSIDE://localhost:9092
- --pandaproxy-addr 0.0.0.0:8082
- --advertise-pandaproxy-addr localhost:8082
ports:
- 8081:8081
- 8082:8082
- 9092:9092
- 9644:9644
- 29092:29092
volumes:
- redpanda:/var/lib/redpanda/data
networks:
- redpanda_network
console:
image: docker.redpanda.com/redpandadata/console:v2.2.3
entrypoint: /bin/sh
command: -c "echo \"$$CONSOLE_CONFIG_FILE\" > /tmp/config.yml; /app/console"
environment:
CONFIG_FILEPATH: /tmp/config.yml
CONSOLE_CONFIG_FILE: |
kafka:
brokers: ["redpanda:29092"]
schemaRegistry:
enabled: true
urls: ["http://redpanda:8081"]
redpanda:
adminApi:
enabled: true
urls: ["http://redpanda:9644"]
connect:
enabled: true
clusters:
- name: local-connect-cluster
url: http://connect:8083
ports:
- 8080:8080
networks:
- redpanda_network
depends_on:
- redpanda
owl-shop:
image: quay.io/cloudhut/owl-shop:latest
networks:
- redpanda_network
#platform: 'linux/amd64'
environment:
- SHOP_KAFKA_BROKERS=redpanda:29092
- SHOP_KAFKA_TOPICREPLICATIONFACTOR=1
- SHOP_TRAFFIC_INTERVAL_RATE=1
- SHOP_TRAFFIC_INTERVAL_DURATION=0.1s
depends_on:
- redpanda
connect:
image: docker.redpanda.com/redpandadata/connectors:latest
hostname: connect
container_name: connect
networks:
- redpanda_network
#platform: 'linux/amd64'
depends_on:
- redpanda
ports:
- "8083:8083"
environment:
CONNECT_CONFIGURATION: |
key.converter=org.apache.kafka.connect.converters.ByteArrayConverter
value.converter=org.apache.kafka.connect.converters.ByteArrayConverter
group.id=connectors-cluster
offset.storage.topic=_internal_connectors_offsets
config.storage.topic=_internal_connectors_configs
status.storage.topic=_internal_connectors_status
config.storage.replication.factor=-1
offset.storage.replication.factor=-1
status.storage.replication.factor=-1
offset.flush.interval.ms=1000
producer.linger.ms=50
producer.batch.size=131072
CONNECT_BOOTSTRAP_SERVERS: redpanda:29092
CONNECT_GC_LOG_ENABLED: "false"
CONNECT_HEAP_OPTS: -Xms512M -Xmx512M
CONNECT_LOG_LEVEL: info
app:
image: ghcr.io/arroyosystems/arroyo-single:multi-arch
networks:
- redpanda_network
ports:
- "8000:8000"
- "8001:8001"
启动&测试
- 启动
docker-compose up -d
- 创建kafka topic
- 配置arroyo connections、source 以及sink
我们测试的时候因为部署了一个owl-shop 服务,可以直接将owl-shop 的topic 作为source,然后写入到demo topic中(sink)
connections 配置
kafka source 配置
json schema 定义
{
"$schema": "http://json-schema.org/draft-04/schema#",
"title": "Product",
"type": "object",
"properties": {
"version": { "type": "number" },
"id": { "type": "string" },
"customer": {
"type": "object",
"properties": { "id": { "type": "string" }, "type": { "type": "string" } }
},
"type": { "type": "string" },
"firstName": { "type": "string" },
"lastName": { "type": "string" },
"state": { "type": "string" },
"street": { "type": "string" },
"houseNumber": { "type": "string" },
"city": { "type": "string" },
"zip": { "type": "string" },
"latitude": { "type": "number" },
"longitude": { "type": "number" },
"phone": { "type": "string" },
"additionalAddressInfo": { "type": "string" },
"createdAt": { "type": "string" },
"revision": { "type": "number" }
}
}
kafka sink 创建
- 创建job (pipeline)
sql 查询
job 执行
redpanda console 效果
说明
arroyo+redpanda 的pipeline 还是一种不错的选择,至少redpanda 部署以及使用还是比较简单的,而且支持不少kafaka 周边的兼容,是一个不错的选择,
arroyo 测试目前来说支持的source 以及sink 不是很多(核心还是kafaka)但是基于sql 的处理模式真的很方便,相比kafaka 的stream 简单不少,值得尝试下
参考资料
https://github.com/rongfengliang/arroyo-redpanda-learning
https://github.com/ArroyoSystems/arroyo
https://github.com/redpanda-data/redpanda
https://redpanda.com/