首页 > 编程语言 >java操作hadoop之MapReduce

java操作hadoop之MapReduce

时间:2022-12-10 17:11:06浏览次数:42  
标签:java hadoop MapReduce job io org apache import

1.Mapper文件WordCountMapper.java

package com.hdfs;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

import java.io.IOException;

public class WordCountMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
    private final Text outK = new Text();
    private final IntWritable outV = new IntWritable(1);

    @Override
    protected void map(LongWritable key, Text value, Mapper<LongWritable, Text, Text, IntWritable>.Context context) throws IOException, InterruptedException {
        String line = value.toString();
        String[] words = line.split(" ");
        for (String word : words) {
            outK.set(word);
            context.write(outK, outV);
        }


    }
}

2.Reducer文件WordCountReducer.java

package com.hdfs;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

import java.io.IOException;

public class WordCountReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
    private final IntWritable outV = new IntWritable();

    @Override
    protected void reduce(Text key, Iterable<IntWritable> values, Reducer<Text, IntWritable, Text, IntWritable>.Context context) throws IOException, InterruptedException {
        int sum = 0;
        for (IntWritable value : values) {
            sum += value.get();
        }
        outV.set(sum);
        context.write(key, outV);
    }
}

3.Driver文件WordCountDriver.java

package com.hdfs;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import java.io.IOException;

public class WordCountDriver {
    public static void main(String[] args) throws IOException, InterruptedException, ClassNotFoundException {
        // 1.创建job实例
        Configuration config = new Configuration();
        Job job = Job.getInstance(config);
        // 2.设置jar
        job.setJarByClass(WordCountDriver.class);
        // 3.设置Mapper和Reducer
        job.setMapperClass(WordCountMapper.class);
        job.setReducerClass(WordCountReducer.class);
        // 4.设置map输出的kv类型
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(IntWritable.class);
        // 5.设置最终输出的kv类型
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);
        // 6.设置输入路径和输出路径
        FileInputFormat.setInputPaths(job, new Path("/home/navy/Desktop/work/hadoop/HadoopOperate/input"));
        FileOutputFormat.setOutputPath(job, new Path("/home/navy/Desktop/work/hadoop/HadoopOperate/output"));
        // 7.提交job
        boolean result = job.waitForCompletion(true);
        System.exit(result ? 0 : 1);
    }
}

4.pom.xml文件

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>org.example</groupId>
    <artifactId>HdfsOperate</artifactId>
    <version>1.0-SNAPSHOT</version>

    <properties>
        <maven.compiler.source>8</maven.compiler.source>
        <maven.compiler.target>8</maven.compiler.target>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
    </properties>
    <dependencies>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-client</artifactId>
            <version>3.2.4</version>
        </dependency>
        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <version>4.13.2</version>
        </dependency>
        <dependency>
            <groupId>org.slf4j</groupId>
            <artifactId>slf4j-log4j12</artifactId>
            <version>2.0.5</version>
        </dependency>
    </dependencies>

</project>

5.项目结构

 

 6.运行WordCountDriver.java里的main方法

 

标签:java,hadoop,MapReduce,job,io,org,apache,import
From: https://www.cnblogs.com/navysummer/p/16971889.html

相关文章

  • 使用JavaWeb进行增删改查操作
    一、基本环境准备1、配置pop.xmlpop文件可以直接复制使用不作修改,也可以根据自己的需求进行增删依赖和插件 1<?xmlversion="1.0"encoding="UTF-8"?>2<project......
  • java运算符
    运算符基本了解运算符:对字面量或者对变量进行操作的符号表达式:用运算符把字面量或者变量连接起来(符合java语法的句子)就可以称为表达式,不同运算符连接的表达式体现......
  • javascript:微信扫一扫下载android应用的引导页
    一,js代码:<html><head><metacharset="utf-8"/><title>测试</title></head><bodystyle="padding:0px;margin:0px;"><!--background--><divsty......
  • oracle/mysql/lightdb/postgresql java jdbc类型映射
    MySQL数据类型JAVA数据类型JDBCTYPE普通变量类型主键类型BIGINTLongBIGINT支持支持TINYINTByteTINYINT支持不支持SMALLINTShortSMALLINT支持不支持MEDIUMINTIntegerINTEGER......
  • 【Java】【数据库】B树
    B-树的形式(B-树就是B树,而且'-'是一个连接符号,不是减号。)B树的结构如下不同于B+树(关于B+树,我的这篇博客里有写:B+树)的一些特点:数据\(K_i\)左边的树不会将\(K_i\)......
  • java 接口(interface)
    接口的作用:1.接口就是约束,它可以定义一些方法,让不同的人实现。2.接口不能被实例化,接口中没有构造方法。3.接口可以实现多个通过implement关键字去实现,实......
  • 小新学Java16-【缓冲流、转换流、序列化流】
    一、缓冲流1.1概述缓冲流,也叫高效流,是对4个基本的Filexxx流的增强,所以也是4个流,按照数据类型分类︰字节缓冲流︰BufferedInputStream,BufferedoutputStream字符......
  • javascript: addEventListener对事件冒泡和事件捕获的处理演示(chrome 108.0.5359.98)
    一,js代码<html><head><metacharset="utf-8"/><title>测试</title></head><bodystyle="padding:0px;margin:0px;"><buttonid="btn">点击</button>......
  • Java网络编程总结
    一.网络编程中两个主要的问题 一个是如何准确的定位网络上一台或多台主机,另一个就是找到主机后如何可靠高效的进行数据传输。在TCP/IP协议中IP层主要负责网络主机的定位,......
  • java中this的使用
    本文主要讲述java中this的使用示例1,代码如下:publicclassContructorDetail{publicstaticvoidmain(String[]args){Personp=newPerson("Tom",18......