首页 > 其他分享 >6.Hadoop MapReduce

6.Hadoop MapReduce

时间:2024-03-28 20:12:31浏览次数:15  
标签:WordCount hadoop Hadoop job MapReduce org apache import

6.1编辑WordCount.java

创建wordcount测试目录

 编辑WordCount.java

输入下面代码:

可以访问https://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html查看

import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class WordCount {

  public static class TokenizerMapper
       extends Mapper<Object, Text, Text, IntWritable>{

    private final static IntWritable one = new IntWritable(1);
    private Text word = new Text();

    public void map(Object key, Text value, Context context
                    ) throws IOException, InterruptedException {
      StringTokenizer itr = new StringTokenizer(value.toString());
      while (itr.hasMoreTokens()) {
        word.set(itr.nextToken());
        context.write(word, one);
      }
    }
  }

  public static class IntSumReducer
       extends Reducer<Text,IntWritable,Text,IntWritable> {
    private IntWritable result = new IntWritable();

    public void reduce(Text key, Iterable<IntWritable> values,
                       Context context
                       ) throws IOException, InterruptedException {
      int sum = 0;
      for (IntWritable val : values) {
        sum += val.get();
      }
      result.set(sum);
      context.write(key, result);
    }
  }

  public static void main(String[] args) throws Exception {
    Configuration conf = new Configuration();
    Job job = Job.getInstance(conf, "word count");
    job.setJarByClass(WordCount.class);
    job.setMapperClass(TokenizerMapper.class);
    job.setCombinerClass(IntSumReducer.class);
    job.setReducerClass(IntSumReducer.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);
    FileInputFormat.addInputPath(job, new Path(args[0]));
    FileOutputFormat.setOutputPath(job, new Path(args[1]));
    System.exit(job.waitForCompletion(true) ? 0 : 1);
  }
}

可以看到创建的java文件

 6.2编译WordCount.java文件

编辑~/.bashrc文件:sudo gedit ~/.bashrc

输入:

export PATH=${JAVA_HOME}/bin:${PATH}
export HADOOP_CLASSPATH=${JAVA_HOME}/lib/tools.jar

 让文件生效:source ~/.bashrc

编译程序:hadoop com.sun.tools.javac.Main WordCount.java -Xlint:deprecation

打包成wc.jar:jar cf wc.jar WordCount*.class

 6.3创建测试文本文件

启动所有虚拟机,启动Hadoop Multi-Node Cluster

在HDFS创建目录:hadoop fs -mkdir -p /user/hduser/wordcount/input

cd ~/wordcount/input

上传文件:hadoop fs -copyFromLocal LICENSE.txt /user/hduser/wordcount/input

查看:

 6.4运行WordCount.java

切换目录:cd ~/wordcount

运行WordCount程序:

hadoop jar wc.jar WordCount /user/hduser/wordcount/input/LICENSE.txt /user/hduser/wordcount/output

 查看运行结果

hadoop fs -cat /user/hduser/wordcount/output/part-r-00000|more

 删除输出目录

 

标签:WordCount,hadoop,Hadoop,job,MapReduce,org,apache,import
From: https://www.cnblogs.com/leexiao/p/18102467

相关文章

  • 5.Hadoop HDFS 命令
    5.1启动HadoopMuti-NodeClusterstart-all.sh5.2创建与查看HDFS目录创建user目录:hadoopfs-mkdir/user创建user下hduser子目录:hadoopfs-mkdir/user/hduser创建hduser下test子目录:hadoopfs-mkdir/user/hduser/test查看之前创建的HDFS目录: 一次查看HDFS所有子目......
  • 计算机java项目|Springboot基于Hadoop的物品租赁系统的设计与实现
    作者主页:编程指南针作者简介:Java领域优质创作者、CSDN博客专家、CSDN内容合伙人、掘金特邀作者、阿里云博客专家、51CTO特邀作者、多年架构师设计经验、腾讯课堂常驻讲师主要内容:Java项目、Python项目、前端项目、人工智能与大数据、简历模板、学习资料、面试题库、技术互......
  • 深入浅出:探索Hadoop生态系统的核心组件与技术架构
    目录前言HDFSYarnHiveHBaseSpark及SparkStreaming书本与课程推荐关于作者:推荐理由:作者直播推荐:前言进入大数据阶段就意味着 进入NoSQL阶段,更多的是面向OLAP场景,即数据仓库、BI应用等。大数据技术的发展并不是偶然的,它的背后是对于成本的考量。集中式数据库或......
  • Hadoop集群
    今天的一套题,顺便解决了之前的一套不太会的题目,快哉!分享一下hadoop取证过程中遇到的问题!1、拿到服务器镜像,第一时间去看历史命令,因为历史命令可以清楚地看到嫌疑人之前在计算机上干过什么事情,这里我们发现在data.E01这台机子上存在docker容器,容器里面很明显是一个hadoop的集群......
  • Hadoop:HDFS配置与基本命令
    接上篇Hadoop的单机布署,接下来准备以单机的形式体验一把HDFS。 写在前而,我本机hadoop的根目录是/hadoop/hadoop-2.10.2,请各位读者根据实际情况辨别各自的路径。第一步,修改配置文件/hadoop/hadoop-2.10.2/etc/hadoop/core-site.xml<configuration><property>......
  • 【Hadoop】Hadoop 编译源码
    目录为什么要源码编译Hadoop编译源码1前期工作准备2jar包安装2.1安装Maven2.2安装ant2.3安装glibc-headers和g++2.4安装make和cmake2.5安装protobuf2.6安装openssl库2.7安装ncurses-devel库3编译源码3.1解压源码到/opt/目录3.2进入到hadoop源码主目......
  • maven配置hadoop的依赖
    <dependencies><!--https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-common--><dependency><groupId>org.apache.hadoop</groupId><artifactId>hadoop-common</artifa......
  • Error: Could not find or load main class org.apache.hadoop.hbase.util.GetJavaPro
    Hbase没有将其自身的依赖包添加到classpath配置路径所以才会导致找不到自身主类的报错vim/usr/local/hbase/bin/hbase 在161行出修改CLASSPATH="${HBASE_CONF_DIR}"CLASSPATH=${CLASSPATH}:$JAVA_HOME/lib/tools.jar:/usr/local/hbase/lib/*修改成功后,不再报错......
  • xshell链接不上hadoop虚拟机
    输入ifconfig查看是否有ens33没有的话解决方案如下:systemctlstopNetworkManagersystemctlrestartnetwork.serviceservicenetworkrestartsys 依次输入以上命令如果报错或者没用的话,进入root重新输入一遍这三个命令大功告成! ......
  • dolphinscheduler调度任务执行hadoop命令报错(connot execute /opt/soft/hadoop/libexe
    问题描述在dolphinscheduler创建调度任务,任务里边命令为hadoopfs-mkdir-p/test/执行失败,报错信息如下:问题分析经排查该问题为dolphinscheduler配置的hadoop_home异常导致执行hadoop命令失败。问题解决配置dolphinscheduler服务config-env环境变量exportHADOOP_HOM......