首页 > 其他分享 >JDBC Vertica Source Connector 使用文档

JDBC Vertica Source Connector 使用文档

时间:2023-08-17 19:35:46浏览次数:42  
标签:JDBC vertica column partition Connector Source jdbc No type

file

支持以下引擎

  • Spark
  • Flink
  • SeaTunnel Zeta

关键特性

  • 批处理
  • 精确一次性处理
  • 列投影
  • 并行处理
  • 支持用户自定义拆分
  • 支持查询 SQL 并实现投影效果

描述

通过 JDBC 读取外部数据源数据。

支持的数据源信息

Datasource Supported versions Driver Url Maven
Vertica Different dependency version has different driver class. com.vertica.jdbc.Driver jdbc:vertica://localhost:5433/vertica Download

数据库依赖

请下载与 'Maven' 对应的支持列表,并将其复制到 '$SEATNUNNEL_HOME/plugins/jdbc/lib/' 工作目录中<br/> 例如,Vertica 数据源:cp vertica-jdbc-xxx.jar $SEATNUNNEL_HOME/plugins/jdbc/lib/

数据类型映射

Vertical Data type SeaTunnel Data type
BIT BOOLEAN
TINYINT<br/>TINYINT UNSIGNED<br/>SMALLINT<br/>SMALLINT UNSIGNED<br/>MEDIUMINT<br/>MEDIUMINT UNSIGNED<br/>INT<br/>INTEGER<br/>YEAR INT
INT UNSIGNED<br/>INTEGER UNSIGNED<br/>BIGINT LONG
BIGINT UNSIGNED DECIMAL(20,0)
DECIMAL(x,y)(Get the designated column's specified column size.<38) DECIMAL(x,y)
DECIMAL(x,y)(Get the designated column's specified column size.>38) DECIMAL(38,18)
DECIMAL UNSIGNED DECIMAL((Get the designated column's specified column size)+1,<br/>(Gets the designated column's number of digits to right of the decimal point.)))
FLOAT<br/>FLOAT UNSIGNED FLOAT
DOUBLE<br/>DOUBLE UNSIGNED DOUBLE
CHAR<br/>VARCHAR<br/>TINYTEXT<br/>MEDIUMTEXT<br/>TEXT<br/>LONGTEXT<br/>JSON STRING
DATE DATE
TIME TIME
DATETIME<br/>TIMESTAMP TIMESTAMP
TINYBLOB<br/>MEDIUMBLOB<br/>BLOB<br/>LONGBLOB<br/>BINARY<br/>VARBINAR<br/>BIT(n) BYTES
GEOMETRY<br/>UNKNOWN Not supported yet

源选项

Name Type Required Default Description
url String Yes - The URL of the JDBC connection. Refer to a case: jdbc:vertica://localhost:5433/vertica
driver String Yes - The jdbc class name used to connect to the remote data source,<br/> if you use Vertica the value is com.vertica.jdbc.Driver.
user String No - Connection instance user name
password String No - Connection instance password
query String Yes - Query statement
connection_check_timeout_sec Int No 30 The time in seconds to wait for the database operation used to validate the connection to complete
partition_column String No - The column name for parallelism's partition, only support numeric type,Only support numeric type primary key, and only can config one column.
partition_lower_bound Long No - The partition_column min value for scan, if not set SeaTunnel will query database get min value.
partition_upper_bound Long No - The partition_column max value for scan, if not set SeaTunnel will query database get max value.
partition_num Int No job parallelism The number of partition count, only support positive integer. default value is job parallelism
fetch_size Int No For queries that return a large number of objects,you can configure<br/> the row fetch size used in the query toimprove performance by<br/> reducing the number database hits required to satisfy the selection criteria.<br/> Zero means use jdbc default value.
common-options No - Source plugin common parameters, please refer to Source Common Options for details
  • 提示

如果未设置 partition_column,则会在单一并发中运行;如果设置了 partition_column,则将根据任务的并发性进行并行执行。

任务示例

简单示例:

此示例在单一并行中查询您的测试“数据库”中的 type_bin 'table' 16 个数据,并查询其所有字段。您还可以指定要查询的字段,以便将最终输出显示在控制台上。

env {

您可以在此处设置 Flink 配置
execution.parallelism = 2
job.mode = "BATCH"
}
source{
Jdbc {
url = "jdbc:vertica://localhost:5433/vertica"
driver = "com.vertica.jdbc.Driver"
connection_check_timeout_sec = 100
user = "root"
password = "123456"
query = "select * from type_bin limit 16"
}
}

transform {
# 如果您想获取有关如何配置 seatunnel 的更多信息,并查看完整的转换插件列表,
# 请访问 https://seatunnel.apache.org/docs/transform-v2/sql
}

sink {
Console {}
}

并行示例:

并行读取您的查询表,使用您配置的 shard 字段和 shard 数据。如果要读取整个表,可以这样做。

source {
Jdbc {
url = "jdbc:vertica://localhost:5433/vertica"
driver = "com.vertica.jdbc.Driver"
connection_check_timeout_sec = 100
user = "root"
password = "123456"
# 根据需要定义查询逻辑
query = "select * from type_bin"
# 并行分片读取字段
partition_column = "id"
# 片段数量
partition_num = 10
}
}

并行边界示例:

根据查询的上限和下限指定数据更加高效,根据您配置的上限和下限来读取数据源更加高效
source {
Jdbc {
url = "jdbc:vertica://localhost:5433/vertica"
driver = "com.vertica.jdbc.Driver"
connection_check_timeout_sec = 100
user = "root"
password = "123456"
# 根据需要定义查询逻辑
query = "select * from type_bin"
partition_column = "id"
# 读取起始边界
partition_lower_bound = 1
# 读取结束边界
partition_upper_bound = 500
partition_num = 10
}
}

本文由 白鲸开源 提供发布支持!

标签:JDBC,vertica,column,partition,Connector,Source,jdbc,No,type
From: https://blog.51cto.com/u_15459354/7126736

相关文章

  • Spring Boot集成Sharding JDBC分库分表
    背景近期公司购物车项目需要使用ShardingJDBC分表,特记录下。ps:未分库依赖引入<!--sharding-sphereVersion:4.1.1--><dependency><groupId>org.apache.shardingsphere</groupId><artifactId>sharding-jdbc-spring-boot-starter</artifactId><ver......
  • [BitSail] Connector开发详解系列三:SourceReader
    更多技术交流、求职机会,欢迎关注字节跳动数据平台微信公众号,回复【1】进入官方交流群SourceConnector本文将主要介绍负责数据读取的组件SourceReader:SourceReader每个SourceReader都在独立的线程中执行,只要我们保证SourceSplitCoordinator分配给不同SourceReader的切片没有交集,在S......
  • [BitSail] Connector开发详解系列三:SourceReader
    更多技术交流、求职机会,欢迎关注字节跳动数据平台微信公众号,回复【1】进入官方交流群SourceConnector本文将主要介绍负责数据读取的组件SourceReader:SourceReader每个SourceReader都在独立的线程中执行,只要我们保证SourceSplitCoordinator分配给不同SourceReader的切......
  • Unable to start activity Comandroid.content.res.Resources$NotFoundException: Str
    UnabletostartactivityComandroid.content.res.Resources$NotFoundException:StringresourceID#0x0 打开app->res->values->strings.xml文件添加<stringname="game_view_content_description">Gameview</string>......
  • 用 TaskCompletionSource 来做多线程间的数据同步
    publicabstractclassHunClientBase{protectedComunicationConfig_ComunicationConfig;protectedHubConnection_HubConnection;privateTaskCompletionSource<string>requestCompletionSource;protectedHunClientBas......
  • log4j 配置中数据库jdbc配置: sqlonly,sqltiming,audit,resultset,connection 区别
    log4j用以下几个可以配置的日志种类:jdbc.sqlonly:仅记录SQLjdbc.sqltiming:记录SQL以及耗时信息jdbc.audit:记录除了ResultSet之外的所有JDBC调用信息,会产生大量的记录,有利于调试跟踪具体的JDBC问题jdbc.resultset:会产生更多的记录信息,因为记录了ResultSet的信......
  • 【Datasource】Hikari
    【Datasource】Hikari配置常用配置spring:type:com.zaxxer.hikari.HikariDataSourcedatasource:hikari:#连接池名称,配置后日志中会打印。pool-name:hikari-pool#连接池核心线程数。默认值10。小于0或大于maximum-pool-size,都会重置为maximu......
  • ADM4016I The index indexName on the source table source-table does not match any
    ADM4016I Theindex indexName onthesourcetable source-table doesnotmatchanypartitionedindexesonthetargettable target-table .ALTERTABLEATTACHprocessingcontinues.https://www.ibm.com/docs/en/db2/10.5?topic=messages-adm0000-adm5999LastUp......
  • vue--day64--Vue-resource
    安装npminstallvue-resource//main.js使用importVueResourcefrom"vue-resource"Vue.use(VueResource)安装好Vue-resource之后,在Vue组件中,我们就可以通过this.$http或者使用全局变量Vue.http发起异步请求......
  • JDBC之常规插入,Statement和PreparedStatement批处理时间问题
    已经封装好的通用的批处理语句:importjava.io.FileNotFoundException;importjava.io.FileReader;importjava.io.IOException;importjava.sql.*;importjava.util.Properties;/**是一个工具类:作用:用于封装通用的获取连接、通用的增删改、通用的查询版本:v0.0.0.1方法:ge......