首页 > 其他分享 >Hudi学习笔记5 - Hudi配置分析(1)

Hudi学习笔记5 - Hudi配置分析(1)

时间:2023-05-08 16:01:59浏览次数:40  
标签:hudi field 配置 笔记 class public hoodie Hudi final

Hudi 官方配置文档:https://hudi.apache.org/docs/configurations,从源码分析可以看到配置项 hoodie.payload.ordering.field 已经废弃,取而代之的是 hoodie.datasource.write.precombine.field 。

ConfigProperty

ConfigProperty 聚合了 HoodieConfig 。

// https://github.com/apache/hudi/blob/master/hudi-common/src/main/java/org/apache/hudi/common/config/ConfigProperty.java
public class ConfigProperty<T> implements Serializable {
  private final String key; // 配置项名
  private final T defaultValue; // 配置项默认值
  private final String docOnDefaultValue;
  private final String doc;
  private final Option<String> sinceVersion;
  private final Option<String> deprecatedVersion;
  private final Set<String> validValues;
  private final boolean advanced;
  private final String[] alternatives;

  // provide the ability to infer config value based on other configs
  private final Option<Function<HoodieConfig, Option<T>>> inferFunction;
}

HoodieConfig

HoodieConfig 是所有配置的基类,提供了公共的 get/set 接口。

public class HoodieConfig implements Serializable {
}

HoodieWriteConfig

// https://github.com/apache/hudi/blob/master/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java
// 相关文档:https://hudi.apache.org/docs/configurations
public class HoodieWriteConfig extends HoodieConfig {
  //  preCombineField 配置(hoodie.datasource.write.precombine.field)
  public static final ConfigProperty<String> PRECOMBINE_FIELD_NAME = ConfigProperty
    .key("hoodie.datasource.write.precombine.field")
    .defaultValue("ts") // 默认值
    .withDocumentation("Field used in preCombining before actual write. When two records have the same key value, "
      + "we will pick the one with the largest value for the precombine field, determined by Object.compareTo(..)");

  // Payload 配置(hoodie.datasource.write.payload.class)
  public static final ConfigProperty<String> WRITE_PAYLOAD_CLASS_NAME = ConfigProperty
    .key("hoodie.datasource.write.payload.class")
    .defaultValue(OverwriteWithLatestAvroPayload.class.getName())
    .markAdvanced()
    .withDocumentation("Payload class used. Override this, if you like to roll your own merge logic, when upserting/inserting. "
       + "This will render any value set for PRECOMBINE_FIELD_OPT_VAL in-effective");
}

HoodiePayloadConfig

从 HoodiePayloadConfig 的实现可以看到配置项 hoodie.payload.ordering.field 已被废弃,取而代之的是 hoodie.datasource.write.precombine.field 。

// https://github.com/apache/hudi/blob/master/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodiePayloadConfig.java
public class HoodiePayloadConfig extends HoodieConfig {
  public static final ConfigProperty<String> E // hoodie.payload.event.time.field
    .defaultValue("ts")
    .markAdvanced()
    .withDocumentation("Table column/field name to derive timestamp associated with the records. This can"
      + "be useful for e.g, determining the freshness of the table.");

  public static final ConfigProperty<String> PAYLOAD_CLASS_NAME = ConfigProperty
    .key("hoodie.compaction.payload.class")
    .defaultValue(OverwriteWithLatestAvroPayload.class.getName())
    .markAdvanced()
    .withDocumentation("This needs to be same as class used during insert/upserts. Just like writing, compaction also uses "
      + "the record payload class to merge records in the log against each other, merge again with the base file and "
      + "produce the final record to be written after compaction.");

  // hoodie.payload.ordering.field 已经废弃了,
  // 已由 hoodie.datasource.write.precombine.field 取代。
  /** @deprecated Use {@link HoodieWriteConfig#PRECOMBINE_FIELD_NAME} and its methods instead */
  @Deprecated - 弃用
  public static final ConfigProperty<String> ORDERING_FIELD = ConfigProperty
    .key(PAYLOAD_ORDERING_FIELD_PROP_KEY) // 即 hoodie.payload.ordering.field
    .defaultValue("ts")
    .markAdvanced()
    .withDocumentation("Table column/field name to order records that have the same key, before "
      + "merging and writing to storage.");

  /** @deprecated Use {@link #PAYLOAD_CLASS_NAME} and its methods instead */
  @Deprecated - 弃用
  public static final String DEFAULT_PAYLOAD_CLASS = PAYLOAD_CLASS_NAME.defaultValue();

  /** @deprecated Use {@link #PAYLOAD_CLASS_NAME} and its methods instead */
  @Deprecated - 弃用
  public static final String PAYLOAD_CLASS_PROP = PAYLOAD_CLASS_NAME.key();
}

// payload 类和 HoodiePayloadConfig 都要用到的公共类
// https://github.com/apache/hudi/blob/master/hudi-common/src/main/java/org/apache/hudi/common/model/HoodiePayloadProps.java
public class HoodiePayloadProps {
  public static final String PAYLOAD_ORDERING_FIELD_PROP_KEY = "hoodie.payload.ordering.field";
  public static final String PAYLOAD_EVENT_TIME_FIELD_PROP_KEY = "hoodie.payload.event.time.field";
  public static final String PAYLOAD_IS_UPDATE_RECORD_FOR_MOR = "hoodie.is.update.record.for.mor";
}

FlinkOptions

FlinkOptions 对部分 hudi 配置做了转换,比如使用 payload.class 替代 hoodie.datasource.write.payload.class 。

// https://github.com/apache/hudi/blob/master/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/configuration/FlinkOptions.java
public class FlinkOptions extends HoodieConfig {
  public static final ConfigOption<String> OPERATION = ConfigOptions
    .key("write.operation")
    .stringType()
    .defaultValue(WriteOperationType.UPSERT.value()) // 默认值为 upsert
    .withDescription("The write operation, that this write should do");

  @AdvancedConfig
  public static final ConfigOption<String> PAYLOAD_CLASS_NAME = ConfigOptions
    .key("payload.class")
    .stringType()
    .defaultValue(EventTimeAvroPayload.class.getName())
    .withFallbackKeys("write.payload.class", HoodieWriteConfig.WRITE_PAYLOAD_CLASS_NAME.key()) // 实为 hoodie.datasource.write.payload.class
    .withDescription("Payload class used. Override this, if you like to roll your own merge logic, when upserting/inserting.\n"
      + "This will render any value set for the option in-effective");

  /**
   * Flag to indicate whether to drop duplicates before insert/upsert.
   * By default false to gain extra performance.
   */
  @AdvancedConfig
  public static final ConfigOption<Boolean> PRE_COMBINE = ConfigOptions
    .key("write.precombine")
    .booleanType()
    .defaultValue(false) // 默认值为 false
    .withDescription("Flag to indicate whether to drop duplicates before insert/upsert.\n"
      + "By default these cases will accept duplicates, to gain extra performance:\n"
      + "1) insert operation;\n"
      + "2) upsert for MOR table, the MOR table deduplicate on reading");

  public static final ConfigOption<String> RECORD_KEY_FIELD = ConfigOptions
    // RECORDKEY_FIELD_NAME 在 hudi-common/src/main/java/org/apache/hudi/keygen/constant/KeyGeneratorOptions.java 中定义
    .key(KeyGeneratorOptions.RECORDKEY_FIELD_NAME.key()) // hoodie.datasource.write.recordkey.field
    .stringType()
    .defaultValue("uuid") // 默认值
    .withDescription("Record key field. Value to be used as the `recordKey` component of `HoodieKey`.\n"
      + "Actual value will be obtained by invoking .toString() on the field value. Nested fields can be specified using "
      + "the dot notation eg: `a.b.c`");
}

KeyGeneratorOptions

// https://github.com/apache/hudi/blob/master/hudi-common/src/main/java/org/apache/hudi/keygen/constant/KeyGeneratorOptions.java
// Hudi maintains keys (record key + partition path) for uniquely identifying a particular record.
// This config allows developers to setup the Key generator class that will extract these out of incoming records.
public class KeyGeneratorOptions extends HoodieConfig {
  public static final ConfigProperty<String> RECORDKEY_FIELD_NAME = ConfigProperty
      .key("hoodie.datasource.write.recordkey.field")
      .noDefaultValue()
      .withDocumentation("Record key field. Value to be used as the `recordKey` component of `HoodieKey`.\n"
          + "Actual value will be obtained by invoking .toString() on the field value. Nested fields can be specified using\n"
          + "the dot notation eg: `a.b.c`");
}

标签:hudi,field,配置,笔记,class,public,hoodie,Hudi,final
From: https://www.cnblogs.com/aquester/p/17382035.html

相关文章

  • Hudi学习笔记4 - Hudi配置之Spark配置
    SparkDatasourceConfigs读配置配置项是否必须默认值配置说明as.of.instantYN/A0.9.0版本新增,时间旅行查询从哪儿开始,有两种格式的值:yyyyMMddHHmmss和yyyy-MM-ddHH:mm:ss,如果不指定则从最新的snapshot开始hoodie.file.index.enableNtruehoodie.......
  • .Linux yum仓库配置
    1.准备网络安装源(服务器端)YUM软件仓库通常借助于HTTP或FTP协议进行发布,这样可以面向网络中所有的客户机提供软件源服务。为了便于客户机查询软件包、获取依赖关系等信息,在软件仓库中需要提供仓库数据(Repodata),其中收集了目录下的所有rpm包的头部信息2.配置软件仓库位置(客......
  • spring 配置https
    生成jks证书(请安装jdk)keytool-keystoremykeys.jks-genkey-aliasmyAlias-keyalgRSA自己生成的口令要记住,后面配置需要用到 然后把生成的jks文件复制到项目目录下,在spring配置文件(application.yml/application.properties)下进行对应ssl配置 ssl配置说明:......
  • Linux基础19 Gdisk, 挂载命令mount与配置文件, fstab文件的详细信息, Swap介绍与案例,
    gdiskgdisk分区,分区表是GPT,支持更大的容量分区。128个。#需要安装[root@oldboy~]#yuminstall-ygdisk 1.添加硬盘,3TB在vmware里面添加2.查看是否能识别出来。[root@oldboy~]#lsblkNAMEMAJ:MINRMSIZEROTYPEMOUNTPOINTsda8:0050G0disk......
  • Linux、yum仓库配置
    yum的常用命令1)基本语法:yuminstall-yhttpd              (功能描述:安装httpd并确认安装)yumlist                              (功能描述:列出所有可用的package和package组)yumcleanall                 ......
  • 企业短信遭疯狂盗用,可能是没配置验证码
    手机短信作为一种快捷的通讯方式被广泛应用。不仅在个人日常生活中,企业也习惯使用手机短信来进行验证和提醒,以保证业务的正常进行。随着数字化的发展,手机短信也成为了不法分子滥用的目标之一,给个人和企业带来不同经济损失。个人遭短信轰炸企业短信遭恶意滥用2023年2月,四川遂......
  • 思科胖AP配置
     dot11Radio0为2.4G频段dot11Radio1为5G频段 BVI接口:配置AP管理IP无线信号接(Radio)口:interfaceDot11Radio0 dot1q就是802.1q,是vlan的一种封装方式。IEEE802.1Q是VLAN的正式标准,在传统的以太网数据帧基础上(源MAC地址字段和协议类型字段之间)增加4个字节的802.1Q......
  • nacos配置自动刷新(不重启应用)
    (一)背景我们平常的开发中经常会遇到需要修改配置的情况,但是又不希望重启应用。以nacos为例子,哪些情况修改完配置不重启应用就可以自动生效呢?下面开始做个简单的测试(二)测试@value注解 @Value("${testa.name}")privateStringname; 经测试,每次在nacos修改完不重启应用是......
  • CentOS7之yum仓库配置
    一、指定本地光盘作为yum仓库1.首先挂载光驱,然后新建目录并进行挂在操作,建议删除/etc/yum.repos.d下面的文件,如下所示:12[root@node01/]#mkdircentos7[root@node01/]#mount/dev/cdrom/centos7/2.然后切换到/etc/yum.repos.d目录下面创建一个repo文件,......
  • visual studio配置库相关
    一般来说一个库包含有三个三个文件夹:include,bin,lib1.其中include包含有这个库的头文件,这一项在项目的属性配置项->VC++目录->包含目录下添加即可2.lib文件夹中包含这个库所需要的静态库文件(静态库文件需要在程序编译时候运行)vs下编译的是在生成->重新生成XXX静......