首页 > 其他分享 >Cannot initialize Cluster. Please check your configuration for mapreduce.framework .name and the cor

Cannot initialize Cluster. Please check your configuration for mapreduce.framework .name and the cor

时间:2022-10-31 18:25:09浏览次数:98  
标签:cor name scala hive framework sql apache org spark

背景 利用ambari搭建的新环境,跑数据出现了不少问题,但如下问题困扰了很长时间,直到今天才得以解决,每次报错。按照网上的各种方式都不行。我知道问题点肯定在spark2.3.1 集成hive3.1.0的版本问题上,因为hive3.1.0新增了很多功能,如事务等,发布时间没有长时间的积累,出问题很容易不受控制。   环境 采用ambari2.7.1 + spark2.3.1 + hadoop3.1.1 + hive3.1.0   scala2.11.8, jdk1.8   代码 // 可以正常打印     df.show(10, truncate = false)       df.createOrReplaceTempView("tmp_logistics_track")   // 可以正常打印     spark.sql("select * from tmp_logistics_track").show(20, truncate = false)   // 以下是报错所在行     spark.sql(       s"""insert overwrite table ods_common.ods_common_logistics_track_d PARTITION(p_day='$day')          |select dc_id, source_order_id, order_id, tracking_number, warehouse_code, user_id, channel_name,          |creation_time, update_time, sync_time, last_tracking_time, tracking_change_time, next_tracking_time,          |tms_nti_time, tms_oc_time, tms_as_time, tms_pu_time, tms_it_time, tms_od_time, tms_wpu_time, tms_rt_time,          |tms_excp_time, tms_fd_time, tms_df_time, tms_un_time, tms_np_time from tmp_logistics_track          |""".stripMargin) 部署脚本 #!/usr/bin/env bash ################# LOAD DATA TO HIVE ######################   #windows 编辑shell 需要修改 编码为Unix #命令 #set ff=unix #SPARK_JARS_BASE_PATH=/home/isuhadoop/ark_data_bin/tag_batch/KafkaToHive/external_jar   set -x   v_proc_date=$1 #v_proc_date=$(date -d '-0 day' '+%Y%m%d')   echo "-----1: $1" echo "-----2: $2"   # shell的使用的磁盘根目录 SHELL_ROOT_DIR=/home/ztsauser/limin_work/warehouse     v_exec_time=`date "+%Y%m%d%H"`   ##日志目录 v_log_dir=${SHELL_ROOT_DIR}/logs/LogisticsTrackSourceProcess_${v_proc_date}.log   #如果没传参数,退出程序 if [[ "$v_proc_date" = "" ]] then     echo "没有传入参数,即将退出程序》》》》》》" > ${v_log_dir}     exit fi   echo "调用脚本开始》》》》》》" > ${v_log_dir}   export HADOOP_USER_NAME=hive   /usr/hdp/current/spark2-client/bin/spark-submit --class zt.dc.bigdata.bp.process.warehouse.LogisticsTrackSourceProcess \ --name LogisticsTrackSourceProcess_${v_proc_date} \ --master yarn-cluster \ --queue default \ --deploy-mode cluster \ --num-executors 5 \ --executor-cores 2 \ --executor-memory 18g \ --files ${SHELL_ROOT_DIR}/config/hive-site.xml \ --jars ${SHELL_ROOT_DIR}/jar/hadoop-distcp-3.1.1.3.0.1.0-187.jar \ ${SHELL_ROOT_DIR}/jar/dc-bp-1.0-SNAPSHOT-shaded.jar ${v_proc_date} > ${v_log_dir} 2>&1 异常 Caused by: java.io.IOException: Cannot execute DistCp process: java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework .name and the correspond server addresses.   详细信息如下:   21/01/09 23:06:08 INFO Client: Application report for application_1610095260612_0149 (state: RUNNING) 21/01/09 23:06:09 INFO Client: Application report for application_1610095260612_0149 (state: RUNNING) 21/01/09 23:06:10 INFO Client: Application report for application_1610095260612_0149 (state: RUNNING) 21/01/09 23:06:11 INFO Client: Application report for application_1610095260612_0149 (state: RUNNING) 21/01/09 23:06:12 INFO Client: Application report for application_1610095260612_0149 (state: RUNNING) 21/01/09 23:06:13 INFO Client: Application report for application_1610095260612_0149 (state: RUNNING) 21/01/09 23:06:14 INFO Client: Application report for application_1610095260612_0149 (state: RUNNING) 21/01/09 23:06:15 INFO Client: Application report for application_1610095260612_0149 (state: RUNNING) 21/01/09 23:06:16 INFO Client: Application report for application_1610095260612_0149 (state: RUNNING) 21/01/09 23:06:17 INFO Client: Application report for application_1610095260612_0149 (state: RUNNING) 21/01/09 23:06:18 INFO Client: Application report for application_1610095260612_0149 (state: RUNNING) 21/01/09 23:06:19 INFO Client: Application report for application_1610095260612_0149 (state: FINISHED) 21/01/09 23:06:19 INFO Client:   client token: N/A   diagnostics: User class threw exception: org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source hdfs ://ztcluster/warehouse/tablespace/managed/hive/ods_common.db/ods_common_logistics_track_d/.hive-staging_hive_2021-01-09_23-05-39_625_275694820341612468-1/-ext-10000 t o destination hdfs://ztcluster/warehouse/tablespace/managed/hive/ods_common.db/ods_common_logistics_track_d/p_day=20190321;  at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)  at org.apache.spark.sql.hive.HiveExternalCatalog.loadPartition(HiveExternalCatalog.scala:843)  at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.processInsert(InsertIntoHiveTable.scala:249)  at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.run(InsertIntoHiveTable.scala:99)  at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:104)  at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:102)  at org.apache.spark.sql.execution.command.DataWritingCommandExec.executeCollect(commands.scala:115)  at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:190)  at org.apache.spark.sql.Dataset$$anonfun$6.apply(Datas

  1. destination hdfs://ztcluster/warehouse/tablespace/managed/hive/ods_common.db/ods_common_logistics_track_d/p_day=20190321;
  2.   at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)
  3.   at org.apache.spark.sql.hive.HiveExternalCatalog.loadPartition(HiveExternalCatalog.scala:843)
  4.   at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.processInsert(InsertIntoHiveTable.scala:249)
  5.   at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.run(InsertIntoHiveTable.scala:99)
  6.   at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:104)
  7.   at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:102)
  8.   at org.apache.spark.sql.execution.command.DataWritingCommandExec.executeCollect(commands.scala:115)
  9.   at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:190)
  10.   at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:190)
  11.   at org.apache.spark.sql.Dataset$$anonfun$52.apply(Dataset.scala:3259)
  12.   at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77)
  13.   at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3258)
  14.   at org.apache.spark.sql.Dataset.<init>(Dataset.scala:190)
  15.   at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:75)
  16.   at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:642)
  17.   at zt.dc.bigdata.bp.process.warehouse.LogisticsTrackSourceProcess$.main(LogisticsTrackSourceProcess.scala:122)
  18.   at zt.dc.bigdata.bp.process.warehouse.LogisticsTrackSourceProcess.main(LogisticsTrackSourceProcess.scala)
  19.   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  20.   at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
  21.   at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  22.   at java.lang.reflect.Method.invoke(Method.java:498)
  23.   at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$4.run(ApplicationMaster.scala:721)
  24.   Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source hdfs://ztcluster/warehouse/tablespace/managed/hive/ods_common.db/ods_common_logisti
  25.   cs_track_d/.hive-staging_hive_2021-01-09_23-05-39_625_275694820341612468-1/-ext-10000 to destination hdfs://ztcluster/warehouse/tablespace/managed/hive/ods_common.db/
  26.   ods_common_logistics_track_d/p_day=20190321
  27.   at org.apache.hadoop.hive.ql.metadata.Hive.getHiveException(Hive.java:4057)
  28.   at org.apache.hadoop.hive.ql.metadata.Hive.getHiveException(Hive.java:4012)
  29.   at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:4007)
  30.   at org.apache.hadoop.hive.ql.metadata.Hive.replaceFiles(Hive.java:4372)
  31.   at org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1962)
  32.   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  33.   at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
  34.   at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  35.   at java.lang.reflect.Method.invoke(Method.java:498)
  36.   at org.apache.spark.sql.hive.client.Shim_v3_0.loadPartition(HiveShim.scala:1275)
  37.   at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$loadPartition$1.apply$mcV$sp(HiveClientImpl.scala:747)
  38.   at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$loadPartition$1.apply(HiveClientImpl.scala:745)
  39.   at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$loadPartition$1.apply(HiveClientImpl.scala:745)
  40.   at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:278)
  41.   at org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:216)
  42.   at org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:215)
  43.   at org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:261)
  44.   at org.apache.spark.sql.hive.client.HiveClientImpl.loadPartition(HiveClientImpl.scala:745)
  45.   at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$loadPartition$1.apply$mcV$sp(HiveExternalCatalog.scala:855)
  46.   at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$loadPartition$1.apply(HiveExternalCatalog.scala:843)
  47.   at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$loadPartition$1.apply(HiveExternalCatalog.scala:843)
  48.   at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
  49.   ... 21 more
  50.   Caused by: java.io.IOException: Cannot execute DistCp process: java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework
  51.   .name and the correspond server addresses.
  52.   at org.apache.hadoop.hive.shims.Hadoop23Shims.runDistCp(Hadoop23Shims.java:1151)
  53.   at org.apache.hadoop.hive.common.FileUtils.distCp(FileUtils.java:643)
  54.   at org.apache.hadoop.hive.common.FileUtils.copy(FileUtils.java:625)
  55.   at org.apache.hadoop.hive.common.FileUtils.copy(FileUtils.java:600)
  56.   at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:3921)
  57.   ... 40 more
  58.   Caused by: java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses.
  59.   at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:116)
  60.   at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:109)
  61.   at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:102)
  62.   at org.apache.hadoop.tools.DistCp.createMetaFolderPath(DistCp.java:410)
  63.   at org.apache.hadoop.tools.DistCp.<init>(DistCp.java:116)
  64.   at org.apache.hadoop.hive.shims.Hadoop23Shims.runDistCp(Hadoop23Shims.java:1141)
  65.   ... 44 more
  66.    
  67.   ApplicationMaster host: 192.168.81.58
  68.   ApplicationMaster RPC port: 0
  69.   queue: default
  70.   start time: 1610204535333
  71.   final status: FAILED
  72.   tracking URL: http://szch-ztn-dc-bp-pro-192-168-81-57:8088/proxy/application_1610095260612_0149/
  73.   user: hive
  74.   Exception in thread "main" org.apache.spark.SparkException: Application application_1610095260612_0149 finished with failed status
  75.   at org.apache.spark.deploy.yarn.Client.run(Client.scala:1269)
  76.   at org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1627)
  77.   at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:904)
  78.   at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198)
  79.   at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228)
  80.   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137)
  81.   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

解决方案

根据网上的方案我进行了以下尝试

尝试一(未解决)

尝试了在客户添加如下pom依赖,并未解决

  1.   <dependency>
  2.   <groupId>org.apache.hadoop</groupId>
  3.   <artifactId>hadoop-mapreduce-client-common</artifactId>
  4.   <version>${hadoop.version}</version>
  5.   </dependency>
  6.    
  7.   <dependency>
  8.   <groupId>org.apache.hadoop</groupId>
  9.   <artifactId>hadoop-mapreduce-client-jobclient</artifactId>
  10.   <version>${hadoop.version}</version>
  11.   <scope>${scopetype}</scope>
  12.   </dependency>
  13.    
  14.   <dependency>
  15.   <groupId>org.apache.hadoop</groupId>
  16.   <artifactId>hadoop-client</artifactId>
  17.   <version>${hadoop.version}</version>
  18.   <scope>${scopetype}</scope>
  19.   </dependency>
  20.    
  21.   <dependency>
  22.   <groupId>org.apache.hadoop</groupId>
  23.   <artifactId>hadoop-mapreduce-client-core</artifactId>
  24.   <version>${hadoop.version}</version>
  25.   <scope>${scopetype}</scope>
  26.   </dependency>
  27.    
  28.   <dependency>
  29.   <groupId>org.apache.hadoop</groupId>
  30.   <artifactId>hadoop-common</artifactId>
  31.   <version>${hadoop.version}</version>
  32.   <scope>${scopetype}</scope>
  33.   </dependency>还有mapreduce-jobclient,mapreduce等,注意scope一定要改成compile不能是test,不然还是会报错

尝试二(未解决)

修改hdfs-site.xml中

fs.hdfs.impl.disable.cache属性为true

尝试三(未解决)

关闭hive的事务功能

关闭hdp 3.0 创建表自动为acid表的参数:

hive.create.as.insert.only=false

metastore.create.as.acid=false 

hive.strict.managed.tables=false

hive.strict.managed.tables=false

hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager

hive.support.concurrency=false

// hive.stats.autogather属性要设置成true, 否则在执行sql时,会报异常

hive.stats.autogather=true

尝试四(未解决)

检查mapred-site.xml文件的mapreduce.framework.name属性是否为yarn

如果不是改成local再尝试

尝试五(解决)

在找遍了网上所有的方法无解后,想到了一个点,之前在使用hive3.0时,如果分区事先创建好了,通过第三方api或spark客户端写数据时有问题,

所以便尝试了,让创建分区的方法显示声明在代码中,再次尝试,问题解决。但为什么会这样,到目前还未知。

附代码:

  1.   // 删除分区
  2.   spark.sql(s"alter table ods_common.ods_common_logistics_track_d drop if exists partition(p_day='$day')")
  3.   // 添加分区
  4.   spark.sql(s"alter table ods_common.ods_common_logistics_track_d add if not exists partition (p_day='$day')")
  5.    
  6.    
  7.   df.show(10, truncate = false)
  8.    
  9.   df.createOrReplaceTempView("tmp_logistics_track")
  10.    
  11.   spark.sql("select * from tmp_logistics_track").show(20, truncate = false)
  12.    
  13.   spark.sql(
  14.   s"""insert overwrite table ods_common.ods_common_logistics_track_d PARTITION(p_day='$day')
  15.   |select dc_id, source_order_id, order_id, tracking_number, warehouse_code, user_id, channel_name,
  16.   |creation_time, update_time, sync_time, last_tracking_time, tracking_change_time, next_tracking_time,
  17.   |tms_nti_time, tms_oc_time, tms_as_time, tms_pu_time, tms_it_time, tms_od_time, tms_wpu_time, tms_rt_time,
  18.   |tms_excp_time, tms_fd_time, tms_df_time, tms_un_time, tms_np_time from tmp_logistics_track
  19.   |""".stripMargin)

 

 

本文转自:https://blog.csdn.net/arlanhon/article/details/112480999?spm=1001.2101.3001.6650.6&utm_medium=distribute.pc_relevant.none-task-blog-2%7Edefault%7EESLANDING%7Edefault-6-112480999-blog-98077988.pc_relevant_landingrelevant&depth_1-utm_source=distribute.pc_relevant.none-task-blog-2%7Edefault%7EESLANDING%7Edefault-6-112480999-blog-98077988.pc_relevant_landingrelevant&utm_relevant_index=7

标签:cor,name,scala,hive,framework,sql,apache,org,spark
From: https://www.cnblogs.com/nizuimeiabc1/p/16845259.html

相关文章

  • 二、.Net Core搭建Ocelot
    Ocelot是系统中对外暴露的一个请求入口,所有外部接口都必须通过这个网关才能向下游API发出请求1、Nuget引用Ocelot(注意版本,我用的是16.0.1)2、根目录添加配置文件Ocelot.js......
  • mysqldump 导出提示Couldn't execute SELECT COLUMN_NAME...
    mysqldump命令:mysql5.7版本内导出数据库:mysqldump-hip-uroot-pdbname>db.sql;导出数据库中的某个表:mysqldump-hip-uroot-pdbnametablename>......
  • JSON.parse Expected property name or '}' in JSON at position 1
    一、背景前端提交时,需要先判断提交的是否是json格式在实际执行中发现,提交的是json格式,但一直判定不是经过搜索发现JSON.parse不支持单引号二、代码isJson(str)......
  • Solr 8.11入门教程(2)创建core
    新建core添加core命令添加使用命令比较简单~$bin/solrcreate-cmytest[core名称]这样就添加完了。CoreAdmin就可以看到了。手动添加手动添加相对复杂一些,需要提......
  • cuda cores
    基本介绍从这个link看的:https://www.techcenturion.com/nvidia-cuda-cores/其中,抽象上这里表述较好理解:LetusconsideranexampletounderstandtheworkingofCUDA......
  • Solr 8.11入门教程(2)创建core
    新建core添加core命令添加使用命令比较简单~$bin/solrcreate-cmytest[core名称]这样就添加完了。CoreAdmin就可以看到了。手动添加手动添加相对复杂一些,需要提前创建目......
  • Solr 8.11入门教程(2)新建core
    Solr8.11入门教程(2)新建core添加core命令添加使用命令比较简单~$bin/solrcreate-cmytest[core名称]这样就添加完了。CoreAdmin就可以看到了。手动添加手动......
  • 乘风破浪,遇见最佳跨平台跨终端框架.Net Core/.Net生态 - 适用于Entity Framework Core
    什么是EFCoreCLI适用于EntityFrameworkCore的命令行接口(CLI)工具可执行设计时开发任务。例如,可以创建迁移、应用迁移,并为基于现有数据库的模型生成代码。获取EFCore......
  • 【C#进阶】.NET Core 中的筛选器 Filter
    官方文档:https://docs.microsoft.com/zh-cn/aspnet/core/mvc/controllers/filters?view=aspnetcore-5.0 通过使用ASP.NETCore中的筛选器,可在请求处理管道中的特定阶......
  • ASP.NET Core教程-Configuration(配置)- 配置文件
    更新记录转载请注明出处:2022年10月31日发布。2022年10月28日从笔记迁移到博客。ASP.NETCore应用配置说明当我们需要将程序发布到不同环境中时,需要让应用支持配......