spark on yarn 读取hdfs文件报错

时间：2023-06-08 10:31:52浏览次数：47

标签：hdfs file 读取 yarn bad 报错 words new txt

spark on yarn 读取hdfs文件报错_字符流

前提读取的文件已经put到hdfs上了，还是报错，仔细想想，为什么两个读取文件只报后面那个读取文件不存在呢？看代码，是读取的方式不同，前面一个是通过sparkcontext读取，后面是file,所以情况应该是只有通过spark生成的对象sc读取才可以，带着这个思路，修改代码，才运行成功。

 JavaRDD<String> linesRDD2 = sc.textFile("src/main/resources/santi/bad_words.txt");
       //JavaRDD<String> linesRDD2 = sc.textFile("/tmp/bad_words.txt");
      // Path path = Paths.get("src/main/resources/santi/santiquanji_liucixin.txt");
       //  byte[] bytes = Files.readAllBytes(path);
      //   String text = new String(bytes, Charset.defaultCharset());
       // System.out.println(text);

       // ArrayList<String> bad_words = new ArrayList<>();
         List<String> bad_words =  linesRDD2.collect();
        sc.parallelize(bad_words);
        //File file = new File("src/main/resources/santi/bad_words.txt");
        /*File file = new File("hdfs://hadoop:9000/user/hadoop/bad_words.txt");
        // 将字节流向字符流转换
        InputStreamReader inputStreamReader = new InputStreamReader(new FileInputStream(file),
                "utf-8");
        // 创建字符流缓冲区
        BufferedReader bufferedReader = new BufferedReader(inputStreamReader);
        String str = null;
        while ((str = bufferedReader.readLine()) != null) {
            bad_words.add(str);
        }*/

标签：hdfs,file,读取,yarn,bad,报错,words,new,txt
From： https://blog.51cto.com/u_15345945/6438077

【解决问题】libevent 编译时报错 Makefile:1708: test/.deps/test_regress-tinytest.
1开发环境linux版本：统信UOS1030（可以认为是特殊的ubuntu）开发语言：C++2报错现象截图：报错语句：make:进入目录“/home/depend/libevent-2.1.11-stable”Makefile:1708:test/.deps/test_regress-tinytest.Po:没有那个文件或目录make:***没有规则可制作目标“te......
golang导入私有仓库报错:“server response: not found:xxx: invalid version: git ls
1.问题：goget导入私有仓库报错➜goget"devops.gitlab.xxx.com/test/kafka-utils"go:devops.gitlab.xxx.com/test/[email protected]:verifyinggo.mod:devops.gitlab.xxx.com/testo/[email protected]/go.mod:readinghttps://goproxy.cn/sumdb/sum.golang.org/......
Three.js系列-报错export ‘Geometry‘ (imported as ‘THREE‘) was not found in ‘
今天遇到报错export'Geometry'(importedas'THREE')wasnotfoundin'three'port'Geometry'(importedas'THREE')wasnotfoundin'three'(possibleexports:ACESFilmicToneMapping,AddEquation,AddOpe......
kanzi的安卓工程报错解决办法：Error: Could not find or access Kanzi's Gradle plugin
这是因为安卓里配置的环境变量不对。需要检查下述文件的路径是否真实存在，以及和使用的版本是否匹配 ......
QA|重写了元素定位后报错xx object has no attribute 'find_element'|网页计算器自动
代码如下：1#basepage.py23fromseleniumimportwebdriver456classBasePage():7"""8基类用作初始化封装常用操作9"""1011def__init__(self):12"""13初始化driver14......
报错 ImportError: urllib3 v2.0 only
麻烦您到服务器命令行执行下面命令修复下btpipinstall-Irequests==2.27bt1 您好，该问题是requests模块库与OpenSSL模块问题导致的，您那边可到服务器命令行下使用下面命令尝试解决。修复requests模块命令：btpipinstallrequests-U 解决问题帖子......
解决使用yarn安装依赖出现“The engine "node" is incompatible with this module. Ex
1、问题描述某天在使用yarn安装依赖的时候，突然出现如下错误导致安装依赖终止：Theengine"node"isincompatiblewiththismodule.Expectedversion"^14.18.0||^16.14.0||>=18.0.0".Got"17.9.0"2、解决办法使用如下命令忽略错误：yarnconfigsetignore-enginestr......
rancher 卸载后重装报错
报错信息kubectlcreatenamespacecattle-systemErrorfromserver(InternalError):Internalerroroccurred:failedcallingwebhook"rancher.cattle.io.namespaces.create-non-kubesystem":failedtocallwebhook:Post"https://rancher-webhook.cattl......
记录一次POI导出文件超时报错的问题
后端日志错误信息![在这里插入图片描述](https://img-blog.csdnimg.cn/20200622165735646.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3FxODM3OTkzNzAy,size_16,color_FFFFFF,t_70)解决办法在nginx的location中......
Nginx出现403 forbidden (13: Permission denied)报错的解决办法
一、由于启动用户和nginx工作用户不一致所致1、将nginx.config的user改为和启动用户一致，命令：viconf/nginx.conf二、缺少index.html或者index.php文件，就是配置文件中indexindex.htmlindex.htm这行中的指定的文件。server{listen80;server_namelocalhost;indexindex.p......

spark on yarn 读取hdfs文件报错

相关文章

赞助商

阅读排行