环境 hadoop:2.7.2 hive:2.3.1 spark:2.4.4 flink:1.13.0
1.下载hudi源码包
Index of /dist/hudi/0.12.0 (apache.org) hudi-0.12.0.src.tgz
tar -zxf hudi-0.12.0.src.tgz -C /hadoop/app/
2.安装maven
tar -zxf apache-maven-3.6.1.tar.gz
配置settings
<mirror>
<id>aliyunmaven</id>
<mirrorOf>*</mirrorOf>
<name>阿里云公共仓库</name>
<url>https://maven.aliyun.com/repository/public</url>
</mirror>
配置maven环境变量
vim /etc/profile
export MVN_HOME=/hadoop/app/apache-maven-3.6.1
export PATH=$PATH:$JAVA_HOME:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$MYSQL_HOME/bin:$HIVE_HOME/bin:$FLINK_HOME/bin:$ES_HOME/bin:$MVN_HOME/bin
3.在hudi安装目录下执行编译命令
cd /hadoop/app/hudi-0.12.0
mvn clean package -DskipTests -Dspark2.4 -Dflink1.13 -Dscala-2.11 -Dhadoop.version=2.7.2 -Pflink-bundle-shade-hive2
当出现参数异常时,直接修改对应的java文件即可
手动添加kafka依赖 注:需要的jar包可以在
http://packages.confluent.io/archive/5.3/confluent-5.3.4-2.12.zip 下载
上传到Linux服务器/hadoop/app/soft/hudi_needs_jar
cd hadoop/app/soft/hudi_needs_jar
mvn install:install-file -DgroupId=io.confluent -DartifactId=common-config -Dversion=5.3.4 -Dpackaging=jar -Dfile=./common-config-5.3.4.jar
mvn install:install-file -DgroupId=io.confluent -DartifactId=common-utils -Dversion=5.3.4 -Dpackaging=jar -Dfile=./common-utils-5.3.4.jar
mvn install:install-file -DgroupId=io.confluent -DartifactId=kafka-avro-serializer -Dversion=5.3.4 -Dpackaging=jar -Dfile=./kafka-avro-serializer-5.3.4.jar
mvn install:install-file -DgroupId=io.confluent -DartifactId=kafka-schema-registry-client -Dversion=5.3.4 -Dpackaging=jar -Dfile=./kafka-schema-registry-client-5.3.4.jar
其中编译期间还需要pentaho-aggdesigner-algorithm-5.1.5-jhyde.jar
可以在我的百度网盘下载
链接:https://pan.baidu.com/s/1V_sZzVePTexfq4A8wI3OMQ
提取码:66sw
下载后在jar包路径下执行命令
cd hadoop/app/soft/hudi_needs_jar
mvn install:install-file -DgroupId=org.pentaho -DartifactId=pentaho-aggdesigner-algorithm -Dversion=5.1.5-jhyde -Dpackaging=jar -Dfile=./pentaho-aggdesigner-algorithm-5.1.5-jhyde.jar
4.与spark集成
将/hadoop/app/hudi-0.12.0/packaging/hudi-spark-bundle/target/hudi-spark2.4-bundle_2.11-0.12.0.jar 复制到spark 的jars目录下
cp /hadoop/app/hudi-0.12.0/packaging/hudi-spark-bundle/target/hudi-spark2.4-bundle_2.11-0.12.0.jar /hadoop/app/spark-2.4.4-bin-hadoop2.7/jars/
spark-shell --master local --jars hudi-spark2.4-bundle_2.11-0.12.0.jar --packages org.apache.spark:spark-avro_2.11:2.4.4 --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'
后期参考Spark2.4-cdh6.2.1集成hudi0.10初探 - Shydow - 博客园 (cnblogs.com) 测试集成环境。
标签:5.3,hudi,jar,编译,install,0.12,spark From: https://www.cnblogs.com/hxy0001/p/17690007.html