https://hudi.apache.org/docs/configurations
Hudi配置分类
- Spark Datasource Configs
Spark Datasource 的配置。
- Flink Sql Configs
Flink SQL source/sink connectors 的配置,如:index.type、write.tasks、write.operation、clean.policy、clean.retain_commits、clean.retain_hours、compaction.max_memory、hive_sync.db、hive_sync.table、hive_sync.metastore.uris、write.retry.times、write.task.max.size 等。
- Write Client Configs
控制 Hudi 使用 RDD 的 HoodieWriteClient API 的配置。
- Metastore and Catalog Sync Configs
同步外部元数据的配置。
- Metrics Configs
度量配置。
- Record Payload Config
低级别定制配置,比如设置 Payload 的配置 hoodie.compaction.payload.class 等。
- Kafka Connect Configs
使用 Kafka 作为 Sink Connector 的写 Hudi 表的配置。
- Amazon Web Services Configs
亚马逊 Web Service 配置。
Spark Datasource Configs
- 读配置
配置项 | 是否必须 | 默认值 | 配置说明 |
---|---|---|---|
as.of.instant | Y | N/A | 0.9.0 版本新增,时间旅行查询从哪儿开始,有两种格式的值:yyyyMMddHHmmss 和 yyyy-MM-dd HH:mm:ss,如果不指定则从最新的 snapshot 开始 |
hoodie.file.index.enable | N | true | |
hoodie.schema.on.read.enable | N | false | |
hoodie.datasource.streaming.startOffset | N | earliest | |
hoodie.datasource.write.precombine.field | N | ts | |
hoodie.datasource.read.begin.instanttime | Y | N/A | |
hoodie.datasource.read.end.instanttime | Y | N/A | |
hoodie.datasource.read.paths | Y | N/A | |
hoodie.datasource.merge.type | N | payload_combine | |
hoodie.datasource.query.incremental.format | N | latest_state | |
hoodie.datasource.query.type | N | snapshot | |
hoodie.datasource.read.extract.partition.values.from.path | N | false | |
hoodie.datasource.read.file.index.listing.mode | N | lazy | |
hoodie.datasource.read.file.index.listing.partition-path-prefix.analysis.enabled | N | true |
- 写配置
配置项 | 是否必须 | 默认值 | 配置说明 |
---|---|---|---|
hoodie.datasource.hive_sync.mode | Y | N/A | |
hoodie.datasource.write.partitionpath.field | Y | N/A | |
hoodie.datasource.write.precombine.field | N | ts | |
hoodie.datasource.write.recordkey.field | Y | N/A | |
hoodie.datasource.write.precombine.field | N | COPY_ON_WRITE | |
hoodie.sql.insert.mode | N | upsert | |
hoodie.sql.bulk.insert.enable | N | false | |
hoodie.datasource.write.table.name | Y | N/A | |
hoodie.datasource.write.operation | N | upsert | |
hoodie.datasource.write.payload.class | N | hoodie.datasource.write.payload.class | |
hoodie.datasource.write.partitionpath.urlencode | N | false | |
hoodie.datasource.hive_sync.partition_fields | N | N/A | |
hoodie.datasource.hive_sync.auto_create_database | N | true | 自动创建不存在的数据库 |
hoodie.datasource.hive_sync.database | N | default | |
hoodie.datasource.hive_sync.table | N | unknown | |
hoodie.datasource.hive_sync.use_jdbc | N | hive | |
hoodie.datasource.hive_sync.password | N | hive | |
hoodie.datasource.hive_sync.enable | N | false | |
hoodie.datasource.hive_sync.ignore_exceptions | N | false | |
hoodie.datasource.hive_sync.use_jdbc | N | true | |
hoodie.datasource.hive_sync.jdbcurl | N | jdbc:hive2://localhost:10000 | Hive metastore url |
hoodie.datasource.hive_sync.metastore.uris | N | thrift://localhost:9083 | Hive metastore url |
hoodie.datasource.hive_sync.base_file_format | N | PARQUET | |
hoodie.datasource.hive_sync.support_timestamp | N | false | |
hoodie.datasource.meta.sync.enable | N | false | |
hoodie.clustering.inline | N | false | |
hoodie.datasource.write.partitions.to.delete | Y | N/A | 逗号分隔的待删除分区列表,支持星号通配符 |
- PreCommit Validator 配置
配置项 | 是否必须 | 默认值 | 配置说明 |
---|---|---|---|
hoodie.precommit.validators | N | ||
hoodie.precommit.validators.equality.sql.queries | N | ||
hoodie.precommit.validators.inequality.sql.queries | N | ||
hoodie.precommit.validators.single.value.sql.queries | N |
Flink Sql Configs
配置项 | 是否必须 | 默认值 | 配置说明 |
---|---|---|---|
path | Y | N/A | Hudi表的 base path,如果不存在会创建,否则应是一个已初始化成功的 hudi 表 |
read.end-commit | Y | N/A | |
read.start-commit | Y | N/A | |
read.tasks | Y | N/A | |
write.tasks | Y | N/A | |
write.partition.format | Y | N/A | 分区路径格式,仅 write.datetime.partitioning 为 true 是有效。两种默认值:1、yyyyMMddHH,当分区字段类型为 timestamp(3) WITHOUT TIME ZONE, LONG, FLOAT, DOUBLE, DECIMAL 是;2、yyyyMMdd,当分区字段类型为 DATE 和 INT 时。 |
write.bucket_assign.tasks | Y | N/A | |
archive.max_commits | N | 50 | |
archive.min_commits | N | 40 | |
cdc.enabled | N | false | |
changelog.enabled | N | false | |
clean.async.enabled | N | true | |
clean.policy | N | KEEP_LATEST_COMMITS | 清理策略,可取值:KEEP_LATEST_COMMITS, KEEP_LATEST_FILE_VERSIONS, KEEP_LATEST_BY_HOURS.Default is KEEP_LATEST_COMMITS |
clean.retain_commits | N | 30 | |
clean.retain_file_versions | N | 5 | |
clean.retain_hours | N | 24 | |
clustering.async.enabled | N | false | |
clustering.delta_commits | N | 4 | |
clustering.plan.partition.filter.mode | N | NONE | 可取值:NONE, RECENT_DAYS, SELECTED_PARTITIONS, DAY_ROLLING |
clustering.plan.strategy.class | N | org.apache.hudi.client.clustering.plan.strategy.FlinkSizeBasedClusteringPlanStrategy | |
clustering.tasks | Y | N/A | |
clustering.schedule.enabled | N | false | |
compaction.async.enabled | N | true | |
compaction.delta_commits | N | 5 | |
compaction.delta_seconds | N | 3600 | |
compaction.max_memory | N | 100 | |
compaction.schedule.enabled | N | true | |
compaction.target_io | N | 512000 | |
compaction.timeout.seconds | N | 1200 | |
compaction.trigger.strategy | N | num_commits | 可取值:num_commits, time_elapsed, num_or_time |
hive_sync.conf.dir | Y | N/A | |
hive_sync.table_properties | Y | N/A | |
hive_sync.assume_date_partitioning | N | false | 假定分区为 yyyy/mm/dd 格式 |
hive_sync.auto_create_db | N | true | 自动创建不存在的数据库 |
hive_sync.db | N | default | |
hive_sync.table | N | unknown | |
hive_sync.table.strategy | N | ALL | |
hive_sync.enabled | N | false | |
hive_sync.file_format | N | PARQUET | |
hive_sync.jdbc_url | N | jdbc:hive2://localhost:10000 | |
hive_sync.metastore.uris | N | '' | Hive Metastore uris |
hive_sync.mode | N | HMS | |
hive_sync.partition_fields | N | '' | |
hive_sync.password | N | hive | |
hive_sync.support_timestamp | N | true | |
hive_sync.use_jdbc | N | true | |
hive_sync.username | N | hive | |
hoodie.bucket.index.hash.field | N | ||
hoodie.bucket.index.num.buckets | N | 4 | |
hoodie.datasource.merge.type | N | payload_combine | |
hoodie.datasource.query.type | N | snapshot | |
hoodie.datasource.write.hive_style_partitioning | N | false | |
hoodie.datasource.write.keygenerator.type | N | SIMPLE | |
hoodie.datasource.write.partitionpath.field | N | '' | |
hoodie.datasource.write.recordkey.field | N | uuid | |
hoodie.datasource.write.partitionpath.urlencode | N | false | |
hoodie.database.name | Y | N/A | |
hoodie.table.name | Y | N/A | |
hoodie.datasource.write.keygenerator.class | Y | N/A | |
index.bootstrap.enabled | N | false | |
index.global.enabled | N | true | |
index.partition.regex | N | * | |
index.state.ttl | N | 0.0 | |
index.type | N | FLINK_STATE | |
metadata.enabled | N | false | |
metadata.compaction.delta_commits | N | 10 | |
partition.default_name | N | HIVE_DEFAULT_PARTITION | |
payload.class | N | org.apache.hudi.common.model.EventTimeAvroPayload | |
precombine.field | N | ts | |
read.streaming.enabled | N | false | |
read.streaming.skip_compaction | N | false | |
read.streaming.skip_clustering | N | false | |
read.utc-timezone | N | true | |
record.merger.impls | N | org.apache.hudi.common.model.HoodieAvroRecordMerger | |
record.merger.strategy | N | eeb8d96f-b1e4-49fd-bbf8-28ac514178e5 | |
table.type | N | COPY_ON_WRITE | 指定表类型,可取:COPY_ON_WRITE 或 MERGE_ON_READ |
write.batch.size | N | 256.0 | |
write.commit.ack.timeout | N | -1 | |
write.ignore.failed | N | false | |
write.insert.cluster | N | false | |
write.log.max.size | N | 1024 | |
write.log_block.size | N | 128 | |
write.log_block.size | N | 100 | 单位:MB |
write.operation | N | upsert | |
write.precombine | N | false | |
write.parquet.block.size | N | 120 | |
write.rate.limit | N | 0 | |
write.retry.interval.ms | N | 2000 | |
write.retry.times | N | 3 | |
write.sort.memory | N | 128 | 单位:MB |
write.task.max.size | N | 1024.0 | 单位:MB |
Write Client Configs
-
Layout Configs
-
Clean Configs
-
Memory Configurations
-
Archival Configs
-
Metadata Configs
-
Consistency Guard Configurations
-
FileSystem Guard Configurations
-
Write Configurations
-
Metastore Configs
-
Key Generator Options
-
Storage Configs
-
Compaction Configs
-
File System View Storage Configurations
-
Clustering Configs
-
Common Configurations
-
Bootstrap Configs
-
Commit Callback Configs
-
Lock Configs
-
Index Configs
Metastore and Catalog Sync Configs
-
Common Metadata Sync Configs
-
Global Hive Sync Configs
-
DataHub Sync Configs
-
BigQuery Sync Configs
-
Hive Sync Configs
Metrics Configs
-
Metrics Configurations for Datadog reporter
-
Metrics Configurations for Amazon CloudWatch
-
Metrics Configurations
-
Metrics Configurations for Jmx
-
Metrics Configurations for Prometheus
-
Metrics Configurations for Graphite
Record Payload Config
- Payload Configurations
配置项 | 是否必须 | 默认值 | 配置说明 |
---|---|---|---|
hoodie.compaction.payload.class | N | org.apache.hudi.common.model.OverwriteWithLatestAvroPayload | |
hoodie.payload.event.time.field | N | ts | |
hoodie.payload.ordering.field | N | ts |
Kafka Connect Configs
- Kafka Sink Connect Configurations
配置项 | 是否必须 | 默认值 | 配置说明 |
---|---|---|---|
hadoop.conf.dir | Y | N/A | |
hadoop.home | Y | N/A | |
bootstrap.servers | N | bootstrap.servers | Kafka 集群的 bootstrap.servers |
hoodie.kafka.control.topic | N | hudi-control-topic | |
hoodie.meta.sync.classes | N | org.apache.hudi.hive.HiveSyncTool | |
hoodie.meta.sync.enable | N | false | |
hoodie.meta.sync.enable | N | org.apache.hudi.schema.FilebasedSchemaProvider | |
hoodie.kafka.coordinator.write.timeout.secs | N | 300 | |
hoodie.kafka.compaction.async.enable | N | true |
Amazon Web Services Configs
配置项 | 是否必须 | 默认值 | 配置说明 |
---|---|---|---|
hoodie.aws.access.key | Y | N/A | AWS access key id |
hoodie.aws.secret.key | Y | N/A | AWS secret key |
hoodie.aws.session.token | N | N/A | AWS session token |