首页 > 其他分享 >Hudi测试

Hudi测试

时间:2024-07-22 10:53:53浏览次数:6  
标签:Hudi java scala SparkSubmit 测试 apache org spark

实验环境

测试案例

案例hudi-spark-test001

apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: SparkApplication
metadata:
  name: hudi-spark-test001
  namespace: spark
spec:
  type: Scala
  mode: cluster
  image: "umr/spark:3.2.2_v2"
  imagePullPolicy: IfNotPresent
  mainClass: cc.hudi.HoodieSparkQuickstart
  mainApplicationFile: "s3a://bigdatas/jars/bigdataDemo-1.0-SNAPSHOT.jar"
  sparkVersion: "3.2.2"
  timeToLiveSeconds: 259200
  restartPolicy:
    type: Never
  volumes:
    - name: "test-volume"
      hostPath:
        path: "/tmp"
        type: Directory
  driver:
    cores: 1
    coreLimit: "1200m"
    memory: "512m"
    labels:
      version: 3.2.2
    serviceAccount: spark
    volumeMounts:
      - name: "test-volume"
        mountPath: "/tmp"
  executor:
    cores: 1
    instances: 1
    memory: "512m"
    labels:
      version: 3.2.2
    volumeMounts:
      - name: "test-volume"
        mountPath: "/tmp"
  sparkConf:
    spark.ui.port: "4045"
    spark.eventLog.enabled: "true"
    spark.eventLog.dir: "s3a://sparklogs/all"
    spark.hadoop.fs.s3a.access.key: "minio"
    spark.hadoop.fs.s3a.secret.key: "minio123"
    spark.hadoop.fs.s3a.impl: "org.apache.hadoop.fs.s3a.S3AFileSystem"
    spark.hadoop.fs.s3a.endpoint: "http://10.19.64.205:32000"
    spark.hadoop.fs.s3a.connection.ssl.enabled: "false"
    spark.hadoop.fs.s3a.path.style.access: "true"
    spark.hadoop.fs.s3a.aws.credentials.provider: "org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider"

报错:

  1. ClusterRole权限不足, ClusterRole账户缺失persistentvolumeclaims的权限:
3/07/07 06:26:54 ERROR Utils: Uncaught exception in thread main
io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: GET at: https://kubernetes.default.svc/api/v1/namespaces/spark-operator/persistentvolumeclaims?labelSelector=spark-app-selector%3Dspark-a9d7e8f78bc6459c9282db57a02815d9. Message: Forbidden!Configured service account doesn't have access. Service account may have been revoked. persistentvolumeclaims is forbidden: User "system:serviceaccount:spark-operator:spark-operator" cannot list resource "persistentvolumeclaims" in API group "" in the namespace "spark-operator".

修改权限:

# 部分示例
rules:
- apiGroups:
  - ""
  resources:
  - pods
  verbs:
  - '*'
- apiGroups:
  - ""
  resources:
  - persistentvolumeclaims
  verbs:
  - '*'
- apiGroups:
  - ""
  resources:
  - configmaps
  verbs:
  - '*'

2.案例hudi-spark-test001报错:

23/07/10 01:20:41 WARN SparkSession: Cannot use org.apache.spark.sql.hudi.HoodieSparkSessionExtension to configure session extensions.
java.lang.ClassNotFoundException: org.apache.spark.sql.hudi.HoodieSparkSessionExtension
	at java.base/java.net.URLClassLoader.findClass(Unknown Source)
	at java.base/java.lang.ClassLoader.loadClass(Unknown Source)
	at java.base/java.lang.ClassLoader.loadClass(Unknown Source)
	at java.base/java.lang.Class.forName0(Native Method)
	at java.base/java.lang.Class.forName(Unknown Source)
	at org.apache.spark.util.Utils$.classForName(Utils.scala:216)
	at org.apache.spark.sql.SparkSession$.$anonfun$applyExtensions$1(SparkSession.scala:1194)
	at org.apache.spark.sql.SparkSession$.$anonfun$applyExtensions$1$adapted(SparkSession.scala:1192)
	at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
	at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
	at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$applyExtensions(SparkSession.scala:1192)
	at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:956)
	at cc.utils.HoodieExampleSparkUtils.buildSparkSession(HoodieExampleSparkUtils.java:60)
	at cc.utils.HoodieExampleSparkUtils.defaultSparkSession(HoodieExampleSparkUtils.java:53)
	at cc.hudi.HoodieSparkQuickstart.main(HoodieSparkQuickstart.java:39)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
	at java.base/java.lang.reflect.Method.invoke(Unknown Source)
	at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
	at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955)
	at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
	at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
	at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
	at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1043)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1052)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

3.github源码编译错误:

[ERROR] Failed to execute goal on project hudi-utilities_2.12: Could not resolve dependencies for project org.apache.hudi:hudi-utilities_2.12:jar:0.14.0-SNAPSHOT: The following artifacts could not be resolved: io.confluent:kafka-avr
o-serializer:jar:5.3.4, io.confluent:common-config:jar:5.3.4, io.confluent:common-utils:jar:5.3.4, io.confluent:kafka-schema-registry-client:jar:5.3.4: io.confluent:kafka-avro-serializer:jar:5.3.4 was not found in http://10.41.31.10:9081/repository/maven-public/ during a previous attempt. This failure was cached in the local repository and resolution is not reattempted until the update interval of chinaunicom has elapsed or updates are forced -> [Help 1]      
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn <args> -rf :hudi-utilities_2.12

参考:

github的源码使用指南:

Developer Setup | Apache Hudi

社区查找问题:

Apache Hudi - ASF JIRA

标签:Hudi,java,scala,SparkSubmit,测试,apache,org,spark
From: https://www.cnblogs.com/chq3272991/p/18315612

相关文章

  • Acunetix v24.7 (Linux, Windows) - Web 应用程序安全测试
    Acunetixv24.7(Linux,Windows)-Web应用程序安全测试Acunetix|WebApplicationSecurityScanner请访问原文链接:https://sysin.org/blog/acunetix/,查看最新版。原创作品,转载请保留出处。作者主页:sysin.org重要提示AcunetixPremium现在使用日历化版本命名。请注意,从......
  • Metasploit Pro 4.22.2-2024071501 (Linux, Windows) - 专业渗透测试框架
    MetasploitPro4.22.2-2024071501(Linux,Windows)-专业渗透测试框架Rapid7Penetrationtesting,releaseJul15,2024请访问原文链接:https://sysin.org/blog/metasploit-pro-4/,查看最新版。原创作品,转载请保留出处。作者主页:sysin.org世界上最广泛使用的渗透测试框架......
  • 【Locust】实现grpc接口性能测试
    一、locusthttps://www.locust.io/ 二、准备测试服务1、下载测试服务https://github.com/grpc/grpc.git2、使用编辑器或者IDE打开 3、proto文件位置 4、创建测试代码安装相关库pipinstalllocustgrpciogrpcio-toolspython-mgrpc_tools.protoc-I=/path......
  • IDEA中用junit写基本测试用例
    前提:使用Maven管理依赖参考文档:https://www.liaoxuefeng.com/wiki/1252599548343744/1304048154181666https://segmentfault.com/a/1190000044666588首先在pom.xml中增加junit依赖,然后刷新Maven<!--junit依赖--><dependency><groupId>org.junit.jupiter</groupI......
  • 2024暑假集训测试8
    前言比赛链接。爆零了?!?T4莫名CE了,T2因为某些人打乱搞做法使出题人改数据和时限,\(O(npk)\)做法死掉了,主要还是数组开大了还忘了算,直接爆零了。T1WhiteandBlack显然不存在无解,从根开始扫,遇到黑色就翻转,前后顺序不影响结果,该方案为正确且唯一方案。继续观察发现若一个......
  • CogVLMv2环境搭建&推理测试
     引子之前写过一篇CogVLM的分享,感兴趣的移步https://blog.csdn.net/zzq1989_/article/details/138337071?spm=1001.2014.3001.5501,前一阵子,CogVLMv2横空出世,支持视频理解功能,OK,那就让我们开始吧。一、模型介绍CogVLM2 系列模型开源了两款基于 Meta-Llama-3-8B-Instruct 开......
  • 即使通过了示例测试用例,Dijkstra 算法也不起作用
    所以我遵循了维基百科关于Dijkstra算法和Brilliants的伪代码。https://en.wikipedia.org/wiki/Dijkstra%27s_algorithm#Pseudocodehttps://brilliant.org/wiki/dijkstras-short-路径查找器/这是我的代码,它不起作用。谁能指出我的代码中的缺陷吗?#Usespyt......
  • 【机器学习】FastGPT 知识库搜索测试功能解析
    本文以FastGPT知识库的搜索测试功能为入口,分析FastGPT的知识检索流程。一、搜索功能介绍1.1整体介绍搜索测试功能包含三种类型:语义检索、全文检索、混合检索。语义检索:使用向量进行文本相关性查询,即调用向量数据库根据向量的相似性检索;全文检索:使用传统的全文检索,适......
  • 2024暑假集训测试7
    前言比赛链接。终于不挂分了这次,但是T2写得太慢了导致T4没写完只能胡暴力。但是赛时数据和样例出了好多问题给不少人造成了影响。T1abc猜想\(ans=\lfloor\dfrac{a^b}{c}\rfloor\bmodc=\dfrac{a^b-a^b\bmodc}{c}\bmodc\)不妨设\(\dfrac{a^b-a^b\bmodc}{c}=kc+a......
  • React+TypeScript 组件库开发全攻略:集成Storybook可视化与Jest测试,一键发布至npm
    平时我除了业务需求,偶尔会投入到UI组件的开发中,大多数时候只会负责自己业务场景相关或者一小部分公共组件,极少有从创建项目、集成可视化、测试到发布的整个过程的操作,这篇文章就是记录组件开发全流程,UI组件在此仅作为调试用,重点在于集成项目环境。组件我们使用React+TypeScri......