问题现象
CDH版本:6.3.2
1)使用hive-cli on spark运行的时候,出现如下错误:
FAILED: Execution Error, return code 3 from org.apache.hadoop.hive.ql.exec.spark.SparkTask. Spark job failed due to: Job aborted due to stage failure:
Aborting TaskSet 0.0 because task 0 (partition 0)
cannot run anywhere due to node and executor blacklist.
Most recent failure:
Lost task 0.1 in stage 0.0 (TID 1, syx-prod-bigdata-cdh-79-163, executor 2): java.io.IOException: java.lang.reflect.InvocationTargetException
at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:271)
at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.<init>(HadoopShimsSecure.java:217)
at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getRecordReader(HadoopShimsSecure.java:345)
at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:702)
at org.apache.spark.rdd.HadoopRDD$$anon$1.liftedTree1$1(HadoopRDD.scala:272)
at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:271)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:225)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:96)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
at org.apache.spark.scheduler.Task.run(Task.scala:121)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$11.apply(Executor.scala:407)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1408)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:413)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:257)
... 21 more
Caused by: java.lang.UnsatisfiedLinkError: org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z
at org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy(Native Method)
at org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:63)
at org.apache.hadoop.io.compress.SnappyCodec.getDecompressorType(SnappyCodec.java:195)
at org.apache.hadoop.io.compress.CodecPool.getDecompressor(CodecPool.java:181)
at org.apache.hadoop.mapred.LineRecordReader.<init>(LineRecordReader.java:111)
at org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67)
at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.<init>(CombineHiveRecordReader.java:68)
... 26 more
这块是由于hive的表最终是采用的snappy压缩,在使用spark读取数据的时候,发现不能正常解压snappy格式的数据,从上面的错误也可以看出来。初步怀疑是本地的native库没有正常加载。
2)采用beeline客户端进行提交
采用这种方式提交没问题。
由于某些客观原因,一些脚本任务还是采用的hive-cli的方式运行。
问题排查
1)这块出现问题的时候,也在网上找到了一些资料,说是修改spark-env.sh文件的如下配置:
export JAVA_LIBRARY_PATH=$JAVA_LIBRARY_PATH:$HADOOP_HOME/lib/native
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HADOOP_HOME/lib/native
export SPARK_YARN_USER_ENV="JAVA_LIBRARY_PATH=$JAVA_LI:BRARY_PATH,LD_LIBRARY_PATH=$LD_LIBRARY_PATH"
通过这种方式并未能解决问题。
2)问题思考
在spark里面,对于配置,一般来说主要是spark-env.sh合spark-defaults.conf文件,除了可以定义环境变量以为,还可以在spark.defaults.conf文件定义spark.driver.extraLibraryPath
等类路径信息。按着这个思路。查看对应的spark.defaults.conf文件
cat /opt/cloudera/parcels/CDH/lib/spark/conf/spark-defaults.conf
spark.driver.extraLibraryPath=/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/hadoop/lib/native
spark.executor.extraLibraryPath=/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/hadoop/lib/native
spark.yarn.am.extraLibraryPath=/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/hadoop/lib/native
注:上述的参数是本人已经修改过后的值。
按照这种方式修改以后,接着在hive-cli里面在运行hive sql,发现还是不行,依然出现最开始的错误。
3)会不会hive-cli 根本就没有读取到spark.defaults.conf和spark-env.sh信息???
按着这个思路,执行了下述操作:
cp /opt/cloudera/parcels/CDH/lib/spark/conf/spark-env.sh /etc/hive/conf/
cp /opt/cloudera/parcels/CDH/lib/spark/conf/spark-defaults.conf /etc/hive/conf/
通过这种方式,让hive加载指定的spark配置信息。这一步在运行sql,出现了不能创建java 虚拟机的异常了。部分异常代码如下:
Caused by: java.lang.RuntimeException: spark-submit process failed with exit code 1 and error "Error: Could not create the Java Virtual Machine.","Error: A fatal exception has occurred. Program will exit."
at org.apache.hive.spark.client.SparkClientImpl$2.run(SparkClientImpl.java:495) ~[hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2]
... 1 more
ERROR : FAILED: Execution Error, return code 30041 from org.apache.hadoop.hive.ql.exec.spark.SparkTask. Failed to create Spark client for Spark session 0e5cca78-2a7e-4414-b5c9-c73b41b0a2c7_0: java.lang.RuntimeException: spark-submit process failed with exit code 1 and error "Error: Could not create the Java Virtual Machine.","Error: A fatal exception has occurred. Program will exit."
Error: Error while processing statement: FAILED: Execution Error, return code 30041 from org.apache.hadoop.hive.ql.exec.spark.SparkTask. Failed to create Spark client for Spark session 0e5cca78-2a7e-4414-b5c9-c73b41b0a2c7_0: java.lang.RuntimeException: spark-submit process failed with exit code 1 and error "Error: Could not create the Java Virtual Machine.","Error: A fatal exception has occurred. Program will exit." (state=42000,code=30041)
这说明hive-cli客户端正常加载了spark的配置信息。
问题解决
注释掉/etc/hive/conf/spark-defaults.conf配置文件里面的如下信息:
#spark.master=yarn
#spark.submit.deployMode=client
关于为什么beeline客户端在执行的时候正常加载了spark的信息,但是hive-cli没有正常加载,这块就不细研究了。hive-cli本身就已经过时,正常情况下都不会使用这种方式进行提交任务。
标签:java,cli,hadoop,hive,apache,org,spark,native From: https://www.cnblogs.com/yjt1993/p/18000206