SeaTunnel 2.3.6 在Ubuntu环境的安装
目录环境说明
- SeaTunnel 2.3.6
- Ubuntu 24.04 LTS
- sudo User : seatunnel
- 程序目录:/opt/apache-seatunnel-2.3.6
环境变量
export SEATUNNEL_HOME=/opt/apache-seatunnel-2.3.6
下载软件
下载SeaTunnel二进制文件
下载地址:https://seatunnel.apache.org/download/
- apache-seatunnel-2.3.6-bin.tar.gz
解压文件:
tar -xvf apache-seatunnel-2.3.6-bin.tar.gz
得到:
seatunnel@ubuntu24:/tmp$ ll
drwxr-xr-x 10 seatunnel seatunnel 4096 Nov 8 2023 apache-seatunnel-2.3.6/
移动文件:
sudo mv apache-seatunnel-2.3.6 /opt/
下载连接器
连接器下载配置
连接器配置列表:
文件路径: apache-seatunnel-2.3.6/config/plugin_config
建议初始下载连接器配置:
--connectors-v2--
connector-cdc-mysql
connector-fake
connector-console
--end--
默认下载连接器配置文件:
默认配置文件包含全部支持的连接器插件,如无必要,不需要全部下载。
config/plugin_config
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
#
# This mapping is used to resolve the Jar package name without version (or call artifactId)
#
# corresponding to the module in the user Config, helping SeaTunnel to load the correct Jar package.
# Don't modify the delimiter " -- ", just select the plugin you need
--connectors-v2--
connector-amazondynamodb
connector-assert
connector-cassandra
connector-cdc-mysql
connector-cdc-mongodb
connector-cdc-sqlserver
connector-cdc-postgres
connector-cdc-oracle
connector-clickhouse
connector-datahub
connector-dingtalk
connector-doris
connector-elasticsearch
connector-email
connector-file-ftp
connector-file-hadoop
connector-file-local
connector-file-oss
connector-file-jindo-oss
connector-file-s3
connector-file-sftp
connector-file-obs
connector-google-sheets
connector-google-firestore
connector-hive
connector-http-base
connector-http-feishu
connector-http-gitlab
connector-http-github
connector-http-jira
connector-http-klaviyo
connector-http-lemlist
connector-http-myhours
connector-http-notion
connector-http-onesignal
connector-http-wechat
connector-hudi
connector-iceberg
connector-influxdb
connector-iotdb
connector-jdbc
connector-kafka
connector-kudu
connector-maxcompute
connector-mongodb
connector-neo4j
connector-openmldb
connector-pulsar
connector-rabbitmq
connector-redis
connector-druid
connector-s3-redshift
connector-sentry
connector-slack
connector-socket
connector-starrocks
connector-tablestore
connector-selectdb-cloud
connector-hbase
connector-amazonsqs
connector-easysearch
connector-paimon
connector-rocketmq
connector-tdengine
connector-web3j
connector-milvus
下载连接器插件
进入程序目录:
cd /opt/apache-seatunnel-2.3.6
开始下载:
# 推荐
bash bin/install-plugin.sh
# 或:
./bin/install-plugin.sh
# 或:
sh bin/install-plugin.sh
注意: 请保证执行器为:bash ,以防解释器是 dash 而导致出错。
下载位置:
apache-seatunnel-2.3.6/connectors/
注: 经测试,SeaTunnel 2.3.4版本及以后 与 SeaTunnel 2.3.3之前 下载连接器路径不同
2.3.3 : apache-seatunnel-2.3.3/connectors/seatunnel
2.3.4 : apache-seatunnel-2.3.4/connectors/
2.3.6 : apache-seatunnel-2.3.6/connectors/
下载连接器加速
使用默认方式下载连接器插件时,可以注意到是从默认的apache仓库下载的。
Downloading from central: https://repo.maven.apache.org/maven2/org/apache/seatunnel/connector-cdc-mysql/2.3.6/connector....
速度很慢。
首次执行 install-plugin.sh 脚本后,可使用 Ctrl+C 终止掉,生成默认的 mavne wrapper 配置,.m2 文件夹配置。
配置 maven 地址:
~/.m2/settings.xml
如果没有此文件可新增。
<?xml version="1.0" encoding="UTF-8"?>
<settings xmlns="http://maven.apache.org/SETTINGS/1.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/SETTINGS/1.0.0 http://maven.apache.org/xsd/settings-1.0.0.xsd">
<pluginGroups></pluginGroups>
<proxies></proxies>
<servers>
</servers>
<mirrors>
<!-- 阿里云仓库 -->
<mirror>
<id>alimaven</id>
<mirrorOf>*</mirrorOf>
<name>aliyun maven</name>
<url>https://maven.aliyun.com/repository/central</url>
</mirror>
</mirrors>
<profiles>
</profiles>
</settings>
然后再重新执行:
bash bin/install-plugin.sh
可注意到已从阿里云仓库进行下载了。
测试SeaTunnel示例批任务
运行示例任务:
./bin/seatunnel.sh --config ./config/v2.batch.config.template -e local
示例运行成功日志:
2024-08-12 08:54:05,670 INFO [o.a.s.e.c.j.ClientJobProxy ] [main] - Job (875301094702448641) end with state FINISHED
2024-08-12 08:54:05,707 INFO [s.c.s.s.c.ClientExecuteCommand] [main] -
***********************************************
Job Statistic Information
***********************************************
Start Time : 2024-08-12 08:54:03
End Time : 2024-08-12 08:54:05
Total Time(s) : 2
Total Read Count : 32
Total Write Count : 32
Total Failed Count : 0
***********************************************
2024-08-12 08:54:05,707 INFO [c.h.c.LifecycleService ] [main] - hz.client_1 [seatunnel-664865] [5.1] HazelcastClient 5.1 (20220228 - 21f20e7) is SHUTTING_DOWN
2024-08-12 08:54:05,713 INFO [c.h.i.s.t.TcpServerConnection ] [hz.main.IO.thread-in-1] - [localhost]:5801 [seatunnel-664865] [5.1] Connection[id=1, /127.0.0.1:5801->/127.0.0.1:50189, qualifier=null, endpoint=[127.0.0.1]:50189, remoteUuid=4584e8d2-6b2f-4a10-af64-892d2fa897cb, alive=false, connectionType=JVM, planeIndex=-1] closed. Reason: Connection closed by the other side
2024-08-12 08:54:05,714 INFO [.c.i.c.ClientConnectionManager] [main] - hz.client_1 [seatunnel-664865] [5.1] Removed connection to endpoint: [localhost]:5801:89ddf390-cb35-4347-ab51-c794b2c6a868, connection: ClientConnection{alive=false, connectionId=1, channel=NioChannel{/127.0.0.1:50189->localhost/127.0.0.1:5801}, remoteAddress=[localhost]:5801, lastReadTime=2024-08-12 08:54:05.701, lastWriteTime=2024-08-12 08:54:05.670, closedTime=2024-08-12 08:54:05.710, connected server version=5.1}
2024-08-12 08:54:05,714 INFO [c.h.c.LifecycleService ] [main] - hz.client_1 [seatunnel-664865] [5.1] HazelcastClient 5.1 (20220228 - 21f20e7) is CLIENT_DISCONNECTED
2024-08-12 08:54:05,718 INFO [c.h.c.i.ClientEndpointManager ] [hz.main.event-5] - [localhost]:5801 [seatunnel-664865] [5.1] Destroying ClientEndpoint{connection=Connection[id=1, /127.0.0.1:5801->/127.0.0.1:50189, qualifier=null, endpoint=[127.0.0.1]:50189, remoteUuid=4584e8d2-6b2f-4a10-af64-892d2fa897cb, alive=false, connectionType=JVM, planeIndex=-1], clientUuid=4584e8d2-6b2f-4a10-af64-892d2fa897cb, clientName=hz.client_1, authenticated=true, clientVersion=5.1, creationTime=1723452843171, latest clientAttributes=lastStatisticsCollectionTime=1723452843212,enterprise=false,clientType=JVM,clientVersion=5.1,clusterConnectionTimestamp=1723452843154,clientAddress=127.0.0.1,clientName=hz.client_1,credentials.principal=null,os.committedVirtualMemorySize=3176402944,os.freePhysicalMemorySize=3446554624,os.freeSwapSpaceSize=2147479552,os.maxFileDescriptorCount=1048576,os.openFileDescriptorCount=51,os.processCpuTime=4630000000,os.systemLoadAverage=0.240234375,os.totalPhysicalMemorySize=8317079552,os.totalSwapSpaceSize=2147479552,runtime.availableProcessors=2,runtime.freeMemory=277072344,runtime.maxMemory=477626368,runtime.totalMemory=330301440,runtime.uptime=3282,runtime.usedMemory=53229096, labels=[]}
2024-08-12 08:54:05,719 INFO [c.h.c.LifecycleService ] [main] - hz.client_1 [seatunnel-664865] [5.1] HazelcastClient 5.1 (20220228 - 21f20e7) is SHUTDOWN
2024-08-12 08:54:05,720 INFO [s.c.s.s.c.ClientExecuteCommand] [main] - Closed SeaTunnel client......
2024-08-12 08:54:05,720 INFO [c.h.c.LifecycleService ] [main] - [localhost]:5801 [seatunnel-664865] [5.1] [localhost]:5801 is SHUTTING_DOWN
2024-08-12 08:54:05,724 INFO [c.h.i.p.i.MigrationManager ] [hz.main.cached.thread-11] - [localhost]:5801 [seatunnel-664865] [5.1] Shutdown request of Member [localhost]:5801 - 89ddf390-cb35-4347-ab51-c794b2c6a868 this master is handled
2024-08-12 08:54:05,729 INFO [c.h.i.i.Node ] [main] - [localhost]:5801 [seatunnel-664865] [5.1] Shutting down connection manager...
2024-08-12 08:54:05,732 INFO [c.h.i.i.Node ] [main] - [localhost]:5801 [seatunnel-664865] [5.1] Shutting down node engine...
2024-08-12 08:54:05,747 INFO [.c.c.DefaultClassLoaderService] [main] - close classloader service
2024-08-12 08:54:05,747 INFO [o.a.s.e.s.TaskExecutionService] [event-forwarder-0] - [localhost]:5801 [seatunnel-664865] [5.1] Event forward thread interrupted
2024-08-12 08:54:08,759 INFO [c.h.i.i.NodeExtension ] [main] - [localhost]:5801 [seatunnel-664865] [5.1] Destroying node NodeExtension.
2024-08-12 08:54:08,760 INFO [c.h.i.i.Node ] [main] - [localhost]:5801 [seatunnel-664865] [5.1] Hazelcast Shutdown is completed in 3037 ms.
2024-08-12 08:54:08,760 INFO [c.h.c.LifecycleService ] [main] - [localhost]:5801 [seatunnel-664865] [5.1] [localhost]:5801 is SHUTDOWN
2024-08-12 08:54:08,760 INFO [s.c.s.s.c.ClientExecuteCommand] [main] - Closed HazelcastInstance ......
2024-08-12 08:54:08,761 INFO [s.c.s.s.c.ClientExecuteCommand] [main] - Closed metrics executor service ......
2024-08-12 08:54:09,726 INFO [s.c.s.s.c.ClientExecuteCommand] [Thread-26] - run shutdown hook because get close signal
测试 Mysql-CDC 到 Postgresql
创建测试表
连接 Mysql 数据库,并创建表。
create table test.test_001(id int ,name varchar(100));
编辑任务配置文件
config/stream_mysql_postgresql.config
env {
job.mode = "STREAMING"
job.name = "streaming-mysql-pg"
}
source {
MySQL-CDC {
base-url = "jdbc:mysql://192.168.8.101:3306/test"
username = "root"
password = "123456"
table-names = ["test.test_001"]
}
}
sink {
jdbc {
url = "jdbc:postgresql://192.168.8.101:5432/postgres"
driver = "org.postgresql.Driver"
database = "postgres"
user = "postgres"
password = "postgres"
table = "test.test_001"
generate_sink_sql = true
}
}
注意:postgres 不支持跨库直接引用表名。如:登录数据库为 postgres 则不允许直接向表:test.test.test_001 插入数据。
因此,sink 中 jdbc 连接穿中的 database 与表配置中的 database 项要保持一致。
下载数据库驱动
下载MySQL驱动 Postgreql 驱动,并添加到lib目录
如:
mkdir -p ${SEATUNNEL_HOME}/plugins/jdbc/lib/
cp mysql-connector-j-8.2.0.jar ${SEATUNNEL_HOME}/plugins/jdbc/lib/
cp postgresql-42.7.2.jar ${SEATUNNEL_HOME}/plugins/jdbc/lib/
注:
- 按照 plugins/README.md 的说明,如果使用 Zeta Engine,请把jdbc drivers放到 $SEATUNNEL_HOME/lib/ 下。
- 经实验,驱动放到$SEATUNNEL_HOME/lib/下,需重启集群模式,否则加载不到。而plugins/jdbc/lib为动态加载。
启动集群模式
./bin/seatunnel-cluster.sh -d
启动任务
bash bin/seatunnel.sh --config config/stream_mysql_postgresql.config
TODO:
- 发现 bug。postgresql 的目录是 3 级结构:dataabse --> schema --> table ,而 mysql 是 2 级结构:database --> table 。
如果想同步:mysql 下的 test.test_table 到 postgresql 下的 postgres.test.test_table 自动建表语句将失败。
前提:
postgres.test schema 不存在。
postgres.test.test_table 不存在。
ERROR: database "postgres" already exists
标签:2024,SeaTunnel,12,54,08,seatunnel,connector,Ubuntu,2.3
From: https://www.cnblogs.com/nookvoice/p/18355020